Linguistic Annotation Survey

Generated 2020-12-10T06:22:29.944-05:00 from stylesheet at file:/U:/linguistic-annotation/apps/parse%20survey.xsl

Key

Some criteria below assign specific meaning to the four codes. Default values, however, are as follows:
+: criterium is fully supported (score 1)
(+): criterium is partly supported, or offers potential for full support via extensions (score 0.5)
(-): criterium is not supported, but might be extended to partial or full support (score -0.5)
-: criterium is not supported, and cannot be extended to develop partial or full support (score -1)
Any code that looks like this...
(+)
comment
...contains a comment. Click to reveal the comment.

Survey

A. LLOD compliancy

NIF ISO and derivatives
NIF_2.0 NIF_2.1 CoNLL-RDF Ligt Web_Annotation POWLA LAF MAF SynAF SemAF TEI score
A.1 RDF serialization
6
+
RDF
6
+
RDF
6
+
6
+
6
+
RDF, preference for JSON-LD
6
+
2
-
2
-
2
-
2
-
2
A.2 Extent of standardization
5
(+)
widely used community standard, and referred to in W3C standards
5
(+)
widely used community standard, and referred to in W3C standards
3
(-)
some usage, not standardized
2
-
6
+
regular W3C standard
5
(+)
this basically implements LAF
6
+
6
+
6
+
6
+
5
A.3 Documentation
6
+
for NIF 2.0
6
+
for NIF 2.0
6
+
3
(-)
partially documented only
6
+
5
(+)
on-line documentation dated, more up-to-date documentation in Cimiano et al. 2020, but this is not open access
5
(+)
via https://www.cs.vassar.edu/~ide/papers/ISO+24612-2012.pdf
6
+
link tbc.
2
-
unless link/information provided
2
-
unless link/information provided
3.5
A.4 IRI fragment identifiers for strings
6
+
6
+
2
-
provides token identifiers, not string identifiers. surface string lost in the underlying CoNLL format
2
-
5
(+)
3
(-)
not specified, to be provided by complementary vocabulary, e.g., NIF or WA
2
-
proprietary means of defining selectors
-1
A.5 Explicit selectors
6
+
6
+
2
-
2
-
6
+
5
(+)
recent publications recommend to rely on external vocabularies instead
1.5
A.6 Explicit context strings
6
+
6
+
2
-
2
-
2
-
2
-
-2
A.7 API specifications for web services
6
+
https://persistence.uni-leipzig.org/nlp2rdf/specification/api.html
6
+
https://persistence.uni-leipzig.org/nlp2rdf/specification/api.html
6
+
https://www.w3.org/TR/annotation-protocol/
3
A.8 Assign data categories
5
(+)
nif:oliaLink, pointing to OLiA
5
(+)
nif:oliaLink, pointing to OLiA
5
(+)
Publications and examples feature OLiA links, but only via rdf:type assignments; CoNLL-RDF is a NIF fragment, so, nif:oliaLink could be used
5
(+)
Not specified, but designed as a NIF fragment, so, nif:oliaLink could be used
3
(-)
No reference vocabulary defined, but annotations can be defined as subclasses of OLiA classes
5
(+)
LAF and other ISO/TC37 standards: pointing to ISOCat, no longer maintained, successor solutions are only emerging: CCR, DatCatInfo
5
(+)
5
(+)
5
(+)
3.5
A.9 Compatible with Web Annotation vocabulary
5
(+)
Hellmann et al. 2013 describe the use of NIF string URIs as Web Annotation targets
5
(+)
Hellmann et al. 2013 describe the use of NIF string URIs as Web Annotation targets
5
(+)
5
(+)
6
+
5
(+)
5
(+)
A partial reconstruction of LAF within WA has been described by Verspoor et al. 2012, but this does not seem to have been adopted in subsequent research nor evaluated by any third party.
5
(+)
not checked in detail, but cf. LAF
5
(+)
not checked in detail, but cf. LAF
5
(+)
not checked in detail, but cf. LAF
5.5
A.10 Compatible with NIF 2.0 core vocabulary
6
+
6
+
6
+
extends NIF vocabulary with data structures for one-word-per-line annotations, e.g., CoNLL, SketchEngine formats, see here; note the extension for the encoding of trees by means of POWLA
6
+
extends NIF vocabulary with specifications for morphology (interlinear glossed text); still under development, see here
3
(-)
conversion restricted to types of annotation supported by NIF, potential loss of information, e.g., distinction between target and annotation, resp., body and annotation is unclear
5
(+)
2
-
NIF systematically conflates LAF data structures, see A.11
3
A.11 Compatible with ISO standards
2
-
no generic data structures for linguistic annotation; conflates regions and nodes, see below
2
-
no generic data structures for linguistic annotation; conflates regions and nodes, see below
2
-
only a specific subset, possibly SynAF
2
-
only a specific subset, possibly MAF
5
(+)
regions ~ targets, nodes ~ annotation, annotation ~ body; but no linguistic data structures, combination has been explored by Verspoor et al. (2012)
6
+
6
+
6
+
6
+
0.5
score 4 4 0.5 -2.5 2.5 1 1 1.5 1.5 1.5 0

B. Expressiveness

About this category
This mostly refers to the capability of providing 'generic' data structures as defined by : generic linguistic data structures from which annotation-specific datastructures can be derived. These are
  1. pointers to primary data ("media")
    1. Anchor (pointer to a piece of primary data)
    2. Region (group of anchors that define a markable that can be annotated)
  2. units of annotation ("graph")
    1. node (annotatable element linked with a region)
    2. edge (relation from one or multiple nodes to one or multiple nodes)
    3. graph (collection of nodes and edges)
  3. annotations
    1. annotation (linguistic information attached to any node or edge)
    2. annotation space (groups together annotations of the same type, say, POS)
Note that it is not required here to provide an RDF encoding, but 'any' solution.
Web Annotation does not define any specifically 'linguistic' data structures. It does provide pointers ('targets') and an abstract
oa:Annotation
class.
NIF defines:
  1. String (for the annotation of strings)
    1. Word
    2. Sentence
    3. Phrase
    4. Title
    5. Paragraph
  2. AnnotationUnit (for annotations that need to be distinguished from a particular way, note that these must not be any of the String subclasses)
Generic linguistic data structures (as in LAF) are missing. For specific levels of description (e.g., morphology), see below.
NIF ISO and derivatives
NIF_2.0 NIF_2.1 CoNLL-RDF Ligt Web_Annotation POWLA LAF MAF SynAF SemAF TEI score
B.1 Pointers to primary data
6
+
6
+
3
(-)
pointers to tokens, not primary data
3
(-)
pointers to data structures, not primary data
6
+
5
(+)
offset-based pointers possible, last publications recommend to use external vocabularies
6
+
XPointers
6
+
from LAF
6
+
from LAF
6
+
from LAF
6.5
B.2 Pointers: Vocabulary for explicit references
6
+
6
+
2
-
2
-
6
+
5
(+)
offset-based pointers possible, last publications recommend to use external vocabularies
2
-
no RDF statements
0.5
B.3 Pointers: User-provided selectors
2
-
would be considered out of scope
2
-
would be considered out of scope
2
-
6
+
provides the class oa:Selector. For different media, users can provide selector subclasses that encode the information an appication needs to identify the annotated element
5
(+)
in combination with Web Annotation
3
(-)
actually, this is hard to tell, technically, this would be possible, but there don't seem to be documented examples
-2
B.4 Pointers: Support the annotation of continuous strings
6
+
6
+
6
+
6
+
6
+
6
+
6
+
6
+
6
+
6
+
10
B.5 Pointers: Annotation of discontinuous strings
2
-
annotations such as ext:offset_0_14_23_32 are not considered nor supported
2
-
annotations such as ext:offset_0_14_23_32 are not considered nor supported
3
(-)
not natively, but in combination with POWLA
3
(-)
in principle, discontinuous aggregates could exist, but there are no examples for that nor any model in current IGT annotation
6
+
multiple selectors can be combined into an aggregate selector
5
(+)
a terminal is offset-defined, so it apparently has to be a continuous string, but nonterminals can aggregate discontinuous sequences of terminals
5
(+)
the existence of discontinuous anchors needs to be confirmed, but nodes can aggregate other nodes regardless of their position
4
4
4
-1
B.6 Pointers: Annotation of media files
2
-
2
-
2
-
2
-
6
+
core feature
3
(-)
tbc.
-3.5
B.7 Pointers: Support the annotation of timestamps/timelines
2
-
2
-
2
-
2
-
5
(+)
users may provide a specialized selector for such data
5
(+)
if combined with Web Annotation
5
(+)
tbc.; CC: I'm pretty sure this has been addressed
4
4
4
-2.5
B.8 Pointers: standoff annotation
6
+
6
+
6
+
5
(+)
no examples known
6
+
6
+
6
+
4
4
4
6.5
B.9 Generic data structures for linguistic annotation: node != pointer
2
-
2
-
5
(+)
in combination with POWLA
3
(-)
unclear
6
+
pointer = target, node = annotation or body -- depending on interpretation
6
+
6
+
4
4
4
1
B.10 Generic data structures for linguistic annotation: zero nodes
5
(+)
the default encoding for annotations in NIF is by subclasses of nif:String
5
(+)
the default encoding for annotations in NIF is by subclasses of nif:String
5
(+)
for zero tokens in underlying CoNLL format
6
+
5
(+)
Web annotation requires some target.
6
+
from LAF
6
+
4
4
4
5
B.11.a Non-reified representation of edges
3
(-)
nif:subStringOf could be abused for this purpose, for hierarchical relations only, operates on regions/strings, not annotations. Default strategy in NIF is to introduce ad hoc properties, cf. NIF Stanford Core demo
3
(-)
nif:subStringOf could be abused for this purpose, for hierarchical relations only, operates on regions/strings, not annotations. Default strategy in NIF is to introduce ad hoc properties, cf. NIF Stanford Core demo
5
(+)
for semantic roles only
2
-
2
-
5
(+)
for hierarchical relations only
4
4
4
4
-2
B.12 Reified representation of edges (annotation relations)
2
-
2
-
2
-
only in combination with POWLA
2
-
2
-
6
+
5
(+)
reification" is not directly applicable to non-RDF data
-3.5
B.13 Generic data structures for linguistic annotation: graphs
5
(+)
5
(+)
5
(+)
5
(+)
5
(+)
6
+
6
+
4
4
4
4.5
B.14 Generic data structures for linguistic annotation: annotations
3
(-)
nif:String), resp. (+) (nif:AnnotationUnit, not the default encoding, though
3
(-)
nif:String), resp. (+) (nif:AnnotationUnit, not the default encoding, though
3
(-)
not created by default
3
(-)
tbc
4
6
+
5
(+)
tbc
6
+
4
4
4
0.5
B.15 Generic data structures for linguistic annotation: annotation space ("tagset")
5
(+)
via OLiA
5
(+)
via OLiA
5
(+)
via OLiA
3
(-)
no example known
6
+
5
(+)
via OLiA
6
+
4
4
4
3.5
B.16 Provenance and confidence
3
(-)
NIF 2.0
3
(-)
NIF 2.0
3
(-)
3
(-)
3
(-)
3
(-)
2
-
no RDF extension possible
-4
B.17 Concurrent annotation
2
-
2
-
3
(-)
different properties in different columns, controlled by the user
3
(-)
no example known
2
-
6
+
6
+
4
4
4
-2
B.18 Sequence of annotation units
5
(+)
only for selected annotation units, e.g., nif:nextWord, nif:nextSentence
5
(+)
only for selected annotation units, e.g., nif:nextWord, nif:nextSentence
5
(+)
for native CoNLL-RDF annotation units
5
(+)
for native annotation units
2
-
6
+
6
+
4
4
4
3
B.19 annotation values: plain literals
6
+
6
+
6
+
6
+
6
+
6
+
6
+
4
4
4
7
B.20 annotation values: feature structures
6
+
6
+
5
(+)
can be represented, but are not created by default conversion
5
(+)
no example known
6
+
6
+
e.g., using OLiA
6
+
6
+
tbc
6
+
tbc
6
+
9
score 1.5 1.5 -1 -2 4 7.5 11 1 1 2 0

C. Levels of linguistic analysis: units of annotation

About this category
For a particular level of linguistic analysis, the vocabulary should
  1. define (
    +
    ) or address (
    (+)
    ) the relevant units of annotation, and
  2. permit to navigate among units of annotation (e.g., retrieving the next annotation of the same kind)
NIF ISO and derivatives
NIF_2.0 NIF_2.1 CoNLL-RDF Ligt Web_Annotation POWLA LAF MAF SynAF SemAF TEI score
C.1 Word-level annotations: word unit
6
+
nif:Word
6
+
nif:Word
6
+
nif:Word
6
+
5
(+)
no designated concept
5
(+)
powla:Terminal, but can be sub-token
6
+
6
+
6
+
6
+
9
C.2 Sentence-level annotation: sentence unit
6
+
nif:Sentence
6
+
nif:Sentence
6
+
nif:Sentence
6
+
5
(+)
no explicit data structure, but can be added
5
(+)
powla:Root, but this doesn't have to be sentential
3
(-)
tbc.
4
6
+
6
+
tbc.
6.5
C.3 morphology: morphological segments
3
(-)
no designated vocabulary, can be accessed as substrings, but not in all cases.
3
(-)
no designated vocabulary, can be accessed as substrings, but not in all cases.
2
-
6
+
5
(+)
no designated vocabulary, but can be added
5
(+)
no designated vocabulary, but can be added
5
(+)
no designated vocabulary, but can be modelled as segments
6
+
tbc.
2
-
tbc
4
0.5
C.4 syntax/text structure: node labels/types
5
(+)
5
(+)
predefined datatypes nif:Word, nif:Phrase, nif:Paragraph, etc.; but note that these do not describe nodes in the sense of LAF, but regions
5
(+)
5
(+)
predefined datatypes nif:Word, nif:Phrase, nif:Paragraph, etc.; but note that these do not describe nodes in the sense of LAF, but regions
5
(+)
phrase structures can only be expressed in combination with POWLA
2
-
no examples for phrase-level annotations
5
(+)
via user-provided subclasses of oa:Annotation
6
+
using an external vocabulary, OLiA
6
+
ISOcat
2
-
no syntax nodes
5
(+)
ISOcat, for syntax, but hard-wired data structures only
5
(+)
ISOcat, for text/discourse, but hard-wired data structures only
4
C.5 semantics: node labels/types
3
(-)
NIF supports entity linking, but no other form of semantic annotation, hence (-)
3
(-)
NIF supports entity linking, but no other form of semantic annotation, hence (-)
3
(-)
can be created from CoNLL-RDF data
2
-
morphology only
3
(-)
Web Annotation is widely used for entity linking, but it does not provide a designated vocabulary for entities, hence (-)
6
+
using an external vocabulary, OLiA
6
+
ISOcat
2
-
morphology only
2
-
tbc
6
+
tbc
-2
score 0 0 -1 3 0 0 1 1 2 1 0

D. Levels of linguistic analysis: sequential structure

About this category
For a particular level of linguistic analysis, the vocabulary should
  1. define (
    +
    ) or address (
    (+)
    ) the relevant units of annotation, and
  2. permit to navigate among units of annotation (e.g., retrieving the next annotation of the same kind)
As for "navigation" relations between adjacent units: For different levels of linguistic annotation (e.g., morphology, word-level annotations), the vocabulary should provide an explicit means to identify the next (or preceding) unit of annotation. For pre-RDF vocabularies, this can be encoded in the structure of a a file (e.g., in XML or CoNLL formats), for RDF vocabularies, this must be explicit triples.
In Web Annotation, this is absent, hence
-
. NIF defines such properties for limited number of possible relations among concepts, e.g.
nif:nextWord
or
nif:nextSentance
, but not for others (e.g., morphs).
NIF ISO and derivatives
NIF_2.0 NIF_2.1 CoNLL-RDF Ligt Web_Annotation POWLA LAF MAF SynAF SemAF TEI score
D.1 Word-level annotation: sequence of words
6
+
nif:nextWord
6
+
nif:nextWord
6
+
nif:nextWord
6
+
2
-
no sequence properties whatsoever
5
(+)
words are not a designated datatype
5
(+)
tbc., implicitly via offsets?
5
(+)
tbc: implicitly via XML?
5
(+)
tbc: implicitly via XML?
5
(+)
tbc: implcitly via XML?
5.5
D.2 Sentence-level annotation: sequence of sentences
6
+
nif:nextSentence
6
+
nif:nextSentence
6
+
6
+
2
-
5
(+)
tbc
5
(+)
tbc
5
(+)
tbc
5
(+)
tbc
5
D.3 Morphology: sequence of morphological segments
2
-
2
-
2
-
6
+
2
-
5
(+)
tbc
6
+
3
(-)
tbc
3
(-)
tbc
-2.5
D.4 Syntax: discontinuous multi-word segments
2
-
NIF phrases are strings, i.e., necessarily continuous
2
-
NIF phrases are strings, i.e., necessarily continuous
3
(-)
no phrases, could be added when combined with POWLA
3
(-)
no examples known
3
(-)
no vocabulary for phrase-level structures, could be added, e.g., from POWLA
6
+
6
+
4
5
(+)
tbc., syntax is likely the reason for having such nodes in LAF
5
(+)
tbc., in analogy with SynAF?
-0.5
D.5 Syntax/text structure: sequence of elements within a phrase
3
(-)
depends on internal structure of the phrase, if these are words, this could be nif:nextWord, if these are phrases, this is undefined
3
(-)
depends on internal structure of the phrase, if these are words, this could be nif:nextWord, if these are phrases, this is undefined
2
-
no phrases
6
+
2
-
6
+
5
(+)
tbc
2
-
tbc
6
+
tbc
6
+
tbc., e.g., for discourse annotation
0.5
score -1 -1 0 4 -3 2 1 1 0 0 0

E. Levels of linguistic analysis: relational structure

About this category
(At least) two kinds of relations must be distinguished: Relations there one node contains the other (hierarchical structure, e.g., phrase structure syntax) and relations between independent nodes (relational structure, e.g., dependency syntax, coreference, etc.).
Web Annotation does not provide any vocabulary for relations between annotations (or bodies). Offset-based selectors do, however, permit reasoning over offsets that can be (ab)used to indicate hierarchical structures. However, this does not hold between annotations, but between bodies only, hence
(-)
, because hierarchical structures between co-extensional elements cannot be expressed.
Similarly, NIF encodes hierarchical structure by means of
nif:subString
, but note that this is 'not' a relation between LAF nodes (~
nif:AnnotationUnit
) but between LAF regions (
nif:String
). Hence, this is
(-)
.
Example for hierarchical relations between co-extensional elements:
``` [S [VF [NP [NE Peter ] ] ] [LK [V war ] ] [MF [ADVX [ADV nicht] [ADV zuhause] ] ] ] ```
German, 'Peter wasn't home', example inspired by
``` VF -> Peter (Vorfeld, syntactic position) NP -> Peter (Noun Phrase, constituent at VF position) NE -> Peter (Named Entity that constitutes the NP) ```
These are co-extensional, but their hierarchical organization (in the annotation) is meaningful and must not be reversed. In the following, NIF and WA all have
(-)
for hierarchical relations.
NIF ISO and derivatives
NIF_2.0 NIF_2.1 CoNLL-RDF Ligt Web_Annotation POWLA LAF MAF SynAF SemAF TEI score
E.1 Morphology: relations
2
-
no morphologial segmentation
2
-
no morphologial segmentation
3
(-)
no examples, via link with OntoLex-Morph?
3
(-)
via link with external vocabulary, e.g., OntoLex-Morph?
3
(-)
via link with OntoLex-Morph?
4
6
+
tbc
2
-
tbc
2
-
tbc
-4.5
E.2 Dependency syntax
5
(+)
not part of NIF core, but example implementation with OLiA for Stanford Parser
5
(+)
not part of NIF core, but example implementation with OLiA for Stanford Parser
6
+
native vocabulary
2
-
2
-
5
(+)
via external vocabulary, e.g., OLiA
5
(+)
voa external vocabulary: ISOcat
2
-
6
+
tbc
3
(-)
tbc
0.5
E.3 Phrase structure syntax: hierarchical relations
3
(-)
labelled edges do not seem be foreseen, must be encoded as phrase-level features
3
(-)
labelled edges do not seem be foreseen, must be encoded as phrase-level features
3
(-)
via POWLA and OLiA
2
-
5
(+)
labels via external vocabulary, OLiA
6
+
via external vocabulary, ISOcat
2
-
6
+
tbc
4
-1
E.4 Phrase structure syntax: other relations
3
(-)
3
(-)
3
(-)
support for Penn-style encoding of traces and coindexing, these need to be manually resolved, though
2
-
2
-
no relational annotation foreseen
5
(+)
via external vocabulary, e.g., OLiA
5
(+)
via external vocabulary, ISOCat
2
-
6
+
4
-2.5
E.5 Semantics: relations
3
(-)
can be extended, e.g., a FrameNet extension, using a separate namespace
3
(-)
can be extended, e.g., a FrameNet extension, using a separate namespace
5
(+)
native support for semantic roles
2
-
2
-
no relational annotation foreseen
5
(+)
via OLiA
5
(+)
via ISOcat
2
-
3
(-)
tbc.
6
+
-2
score -0.5 -0.5 0 -4 -1 0 0 -4 1 1 0

F. Data structures for novel applications

About this category
Aside from the types of annotation listed below, RDF-based technology enables better support for phenomena that no community standard currently does exist for. These are usecases that involve linking across documents. Note that requirements for "conventional annotations" (such as covered by existing W3C, ISO or other standards) are listed below.
NIF ISO and derivatives
NIF_2.0 NIF_2.1 CoNLL-RDF Ligt Web_Annotation POWLA LAF MAF SynAF SemAF TEI score
F.1 Intertextual relations
3
(-)
3
(-)
5
(+)
a partial linking functionality can be implemented using CoNLL-Merge, then corresponding tokens in different editions refer to the same nif:Word -- which may be an empty token, see alignment below
2
-
3
(-)
3
(-)
tbc.
2
-
2
-
2
-
tbc
-5.5
F.2 Collation and alignment
3
(-)
3
(-)
5
(+)
directed alignment only, encoded in CoNLL columns, i.e., CoNLL-RDF properties, cf. here
2
-
3
(-)
extensible for undirected alignment for more than 2 languages: "alignment" as subclass of oa:Annotation with multiple targets
3
(-)
can be extended
2
-
tbc., no examples known
2
-
2
-
tbc
2
-
tbc
-6.5
F.3 Links with lexical resources
5
(+)
5
(+)
5
(+)
5
(+)
5
(+)
5
(+)
4
4
4
4
3
F.4 Dialog annotation
2
-
Web Annotation allows to annotate multiple targets simultaneously, but it lacks the vocabulary to create links between annotations, e.g., for marking turn shifts), hence (+
2
-
Web Annotation allows to annotate multiple targets simultaneously, but it lacks the vocabulary to create links between annotations, e.g., for marking turn shifts), hence (+
2
-
only if annotated as plain text, no formal vocabulary
2
-
2
-
tbc., no example known
5
(+)
extensible
5
(+)
tbc., via ISOcat?
2
-
tbc
4
6
+
tbc
-4
score -0.5 -0.5 0.5 -2.5 0 0.5 0 -2 -1 0 0

G. Best practices beyond vocabulary

About this category
Sections A-F are primarily about standardization. To provide best practices are yet another requirement, but are less part of standardization / vocabulary development rather than the application of existing conventions. Whether these are within scope of the current discussion is yet to be discussed.
NIF ISO and derivatives
NIF_2.0 NIF_2.1 CoNLL-RDF Ligt Web_Annotation POWLA LAF MAF SynAF SemAF TEI score
score 0 0 0 0 0 0 0 0 0 0 0

All categories

NIF ISO and derivatives
NIF_2.0 NIF_2.1 CoNLL-RDF Ligt Web_Annotation POWLA LAF MAF SynAF SemAF TEI score
A.1 RDF serialization
6
+
RDF
6
+
RDF
6
+
6
+
6
+
RDF, preference for JSON-LD
6
+
2
-
2
-
2
-
2
-
2
A.2 Extent of standardization
5
(+)
widely used community standard, and referred to in W3C standards
5
(+)
widely used community standard, and referred to in W3C standards
3
(-)
some usage, not standardized
2
-
6
+
regular W3C standard
5
(+)
this basically implements LAF
6
+
6
+
6
+
6
+
5
A.3 Documentation
6
+
for NIF 2.0
6
+
for NIF 2.0
6
+
3
(-)
partially documented only
6
+
5
(+)
on-line documentation dated, more up-to-date documentation in Cimiano et al. 2020, but this is not open access
5
(+)
via https://www.cs.vassar.edu/~ide/papers/ISO+24612-2012.pdf
6
+
link tbc.
2
-
unless link/information provided
2
-
unless link/information provided
3.5
A.4 IRI fragment identifiers for strings
6
+
6
+
2
-
provides token identifiers, not string identifiers. surface string lost in the underlying CoNLL format
2
-
5
(+)
3
(-)
not specified, to be provided by complementary vocabulary, e.g., NIF or WA
2
-
proprietary means of defining selectors
-1
A.5 Explicit selectors
6
+
6
+
2
-
2
-
6
+
5
(+)
recent publications recommend to rely on external vocabularies instead
1.5
A.6 Explicit context strings
6
+
6
+
2
-
2
-
2
-
2
-
-2
A.7 API specifications for web services
6
+
https://persistence.uni-leipzig.org/nlp2rdf/specification/api.html
6
+
https://persistence.uni-leipzig.org/nlp2rdf/specification/api.html
6
+
https://www.w3.org/TR/annotation-protocol/
3
A.8 Assign data categories
5
(+)
nif:oliaLink, pointing to OLiA
5
(+)
nif:oliaLink, pointing to OLiA
5
(+)
Publications and examples feature OLiA links, but only via rdf:type assignments; CoNLL-RDF is a NIF fragment, so, nif:oliaLink could be used
5
(+)
Not specified, but designed as a NIF fragment, so, nif:oliaLink could be used
3
(-)
No reference vocabulary defined, but annotations can be defined as subclasses of OLiA classes
5
(+)
LAF and other ISO/TC37 standards: pointing to ISOCat, no longer maintained, successor solutions are only emerging: CCR, DatCatInfo
5
(+)
5
(+)
5
(+)
3.5
A.9 Compatible with Web Annotation vocabulary
5
(+)
Hellmann et al. 2013 describe the use of NIF string URIs as Web Annotation targets
5
(+)
Hellmann et al. 2013 describe the use of NIF string URIs as Web Annotation targets
5
(+)
5
(+)
6
+
5
(+)
5
(+)
A partial reconstruction of LAF within WA has been described by Verspoor et al. 2012, but this does not seem to have been adopted in subsequent research nor evaluated by any third party.
5
(+)
not checked in detail, but cf. LAF
5
(+)
not checked in detail, but cf. LAF
5
(+)
not checked in detail, but cf. LAF
5.5
A.10 Compatible with NIF 2.0 core vocabulary
6
+
6
+
6
+
extends NIF vocabulary with data structures for one-word-per-line annotations, e.g., CoNLL, SketchEngine formats, see here; note the extension for the encoding of trees by means of POWLA
6
+
extends NIF vocabulary with specifications for morphology (interlinear glossed text); still under development, see here
3
(-)
conversion restricted to types of annotation supported by NIF, potential loss of information, e.g., distinction between target and annotation, resp., body and annotation is unclear
5
(+)
2
-
NIF systematically conflates LAF data structures, see A.11
3
A.11 Compatible with ISO standards
2
-
no generic data structures for linguistic annotation; conflates regions and nodes, see below
2
-
no generic data structures for linguistic annotation; conflates regions and nodes, see below
2
-
only a specific subset, possibly SynAF
2
-
only a specific subset, possibly MAF
5
(+)
regions ~ targets, nodes ~ annotation, annotation ~ body; but no linguistic data structures, combination has been explored by Verspoor et al. (2012)
6
+
6
+
6
+
6
+
0.5
B.1 Pointers to primary data
6
+
6
+
3
(-)
pointers to tokens, not primary data
3
(-)
pointers to data structures, not primary data
6
+
5
(+)
offset-based pointers possible, last publications recommend to use external vocabularies
6
+
XPointers
6
+
from LAF
6
+
from LAF
6
+
from LAF
6.5
B.2 Pointers: Vocabulary for explicit references
6
+
6
+
2
-
2
-
6
+
5
(+)
offset-based pointers possible, last publications recommend to use external vocabularies
2
-
no RDF statements
0.5
B.3 Pointers: User-provided selectors
2
-
would be considered out of scope
2
-
would be considered out of scope
2
-
6
+
provides the class oa:Selector. For different media, users can provide selector subclasses that encode the information an appication needs to identify the annotated element
5
(+)
in combination with Web Annotation
3
(-)
actually, this is hard to tell, technically, this would be possible, but there don't seem to be documented examples
-2
B.4 Pointers: Support the annotation of continuous strings
6
+
6
+
6
+
6
+
6
+
6
+
6
+
6
+
6
+
6
+
10
B.5 Pointers: Annotation of discontinuous strings
2
-
annotations such as ext:offset_0_14_23_32 are not considered nor supported
2
-
annotations such as ext:offset_0_14_23_32 are not considered nor supported
3
(-)
not natively, but in combination with POWLA
3
(-)
in principle, discontinuous aggregates could exist, but there are no examples for that nor any model in current IGT annotation
6
+
multiple selectors can be combined into an aggregate selector
5
(+)
a terminal is offset-defined, so it apparently has to be a continuous string, but nonterminals can aggregate discontinuous sequences of terminals
5
(+)
the existence of discontinuous anchors needs to be confirmed, but nodes can aggregate other nodes regardless of their position
4
4
4
-1
B.6 Pointers: Annotation of media files
2
-
2
-
2
-
2
-
6
+
core feature
3
(-)
tbc.
-3.5
B.7 Pointers: Support the annotation of timestamps/timelines
2
-
2
-
2
-
2
-
5
(+)
users may provide a specialized selector for such data
5
(+)
if combined with Web Annotation
5
(+)
tbc.; CC: I'm pretty sure this has been addressed
4
4
4
-2.5
B.8 Pointers: standoff annotation
6
+
6
+
6
+
5
(+)
no examples known
6
+
6
+
6
+
4
4
4
6.5
B.9 Generic data structures for linguistic annotation: node != pointer
2
-
2
-
5
(+)
in combination with POWLA
3
(-)
unclear
6
+
pointer = target, node = annotation or body -- depending on interpretation
6
+
6
+
4
4
4
1
B.10 Generic data structures for linguistic annotation: zero nodes
5
(+)
the default encoding for annotations in NIF is by subclasses of nif:String
5
(+)
the default encoding for annotations in NIF is by subclasses of nif:String
5
(+)
for zero tokens in underlying CoNLL format
6
+
5
(+)
Web annotation requires some target.
6
+
from LAF
6
+
4
4
4
5
B.11.a Non-reified representation of edges
3
(-)
nif:subStringOf could be abused for this purpose, for hierarchical relations only, operates on regions/strings, not annotations. Default strategy in NIF is to introduce ad hoc properties, cf. NIF Stanford Core demo
3
(-)
nif:subStringOf could be abused for this purpose, for hierarchical relations only, operates on regions/strings, not annotations. Default strategy in NIF is to introduce ad hoc properties, cf. NIF Stanford Core demo
5
(+)
for semantic roles only
2
-
2
-
5
(+)
for hierarchical relations only
4
4
4
4
-2
B.12 Reified representation of edges (annotation relations)
2
-
2
-
2
-
only in combination with POWLA
2
-
2
-
6
+
5
(+)
reification" is not directly applicable to non-RDF data
-3.5
B.13 Generic data structures for linguistic annotation: graphs
5
(+)
5
(+)
5
(+)
5
(+)
5
(+)
6
+
6
+
4
4
4
4.5
B.14 Generic data structures for linguistic annotation: annotations
3
(-)
nif:String), resp. (+) (nif:AnnotationUnit, not the default encoding, though
3
(-)
nif:String), resp. (+) (nif:AnnotationUnit, not the default encoding, though
3
(-)
not created by default
3
(-)
tbc
4
6
+
5
(+)
tbc
6
+
4
4
4
0.5
B.15 Generic data structures for linguistic annotation: annotation space ("tagset")
5
(+)
via OLiA
5
(+)
via OLiA
5
(+)
via OLiA
3
(-)
no example known
6
+
5
(+)
via OLiA
6
+
4
4
4
3.5
B.16 Provenance and confidence
3
(-)
NIF 2.0
3
(-)
NIF 2.0
3
(-)
3
(-)
3
(-)
3
(-)
2
-
no RDF extension possible
-4
B.17 Concurrent annotation
2
-
2
-
3
(-)
different properties in different columns, controlled by the user
3
(-)
no example known
2
-
6
+
6
+
4
4
4
-2
B.18 Sequence of annotation units
5
(+)
only for selected annotation units, e.g., nif:nextWord, nif:nextSentence
5
(+)
only for selected annotation units, e.g., nif:nextWord, nif:nextSentence
5
(+)
for native CoNLL-RDF annotation units
5
(+)
for native annotation units
2
-
6
+
6
+
4
4
4
3
B.19 annotation values: plain literals
6
+
6
+
6
+
6
+
6
+
6
+
6
+
4
4
4
7
B.20 annotation values: feature structures
6
+
6
+
5
(+)
can be represented, but are not created by default conversion
5
(+)
no example known
6
+
6
+
e.g., using OLiA
6
+
6
+
tbc
6
+
tbc
6
+
9
C.1 Word-level annotations: word unit
6
+
nif:Word
6
+
nif:Word
6
+
nif:Word
6
+
5
(+)
no designated concept
5
(+)
powla:Terminal, but can be sub-token
6
+
6
+
6
+
6
+
9
C.2 Sentence-level annotation: sentence unit
6
+
nif:Sentence
6
+
nif:Sentence
6
+
nif:Sentence
6
+
5
(+)
no explicit data structure, but can be added
5
(+)
powla:Root, but this doesn't have to be sentential
3
(-)
tbc.
4
6
+
6
+
tbc.
6.5
C.3 morphology: morphological segments
3
(-)
no designated vocabulary, can be accessed as substrings, but not in all cases.
3
(-)
no designated vocabulary, can be accessed as substrings, but not in all cases.
2
-
6
+
5
(+)
no designated vocabulary, but can be added
5
(+)
no designated vocabulary, but can be added
5
(+)
no designated vocabulary, but can be modelled as segments
6
+
tbc.
2
-
tbc
4
0.5
C.4 syntax/text structure: node labels/types
5
(+)
5
(+)
predefined datatypes nif:Word, nif:Phrase, nif:Paragraph, etc.; but note that these do not describe nodes in the sense of LAF, but regions
5
(+)
5
(+)
predefined datatypes nif:Word, nif:Phrase, nif:Paragraph, etc.; but note that these do not describe nodes in the sense of LAF, but regions
5
(+)
phrase structures can only be expressed in combination with POWLA
2
-
no examples for phrase-level annotations
5
(+)
via user-provided subclasses of oa:Annotation
6
+
using an external vocabulary, OLiA
6
+
ISOcat
2
-
no syntax nodes
5
(+)
ISOcat, for syntax, but hard-wired data structures only
5
(+)
ISOcat, for text/discourse, but hard-wired data structures only
4
C.5 semantics: node labels/types
3
(-)
NIF supports entity linking, but no other form of semantic annotation, hence (-)
3
(-)
NIF supports entity linking, but no other form of semantic annotation, hence (-)
3
(-)
can be created from CoNLL-RDF data
2
-
morphology only
3
(-)
Web Annotation is widely used for entity linking, but it does not provide a designated vocabulary for entities, hence (-)
6
+
using an external vocabulary, OLiA
6
+
ISOcat
2
-
morphology only
2
-
tbc
6
+
tbc
-2
D.1 Word-level annotation: sequence of words
6
+
nif:nextWord
6
+
nif:nextWord
6
+
nif:nextWord
6
+
2
-
no sequence properties whatsoever
5
(+)
words are not a designated datatype
5
(+)
tbc., implicitly via offsets?
5
(+)
tbc: implicitly via XML?
5
(+)
tbc: implicitly via XML?
5
(+)
tbc: implcitly via XML?
5.5
D.2 Sentence-level annotation: sequence of sentences
6
+
nif:nextSentence
6
+
nif:nextSentence
6
+
6
+
2
-
5
(+)
tbc
5
(+)
tbc
5
(+)
tbc
5
(+)
tbc
5
D.3 Morphology: sequence of morphological segments
2
-
2
-
2
-
6
+
2
-
5
(+)
tbc
6
+
3
(-)
tbc
3
(-)
tbc
-2.5
D.4 Syntax: discontinuous multi-word segments
2
-
NIF phrases are strings, i.e., necessarily continuous
2
-
NIF phrases are strings, i.e., necessarily continuous
3
(-)
no phrases, could be added when combined with POWLA
3
(-)
no examples known
3
(-)
no vocabulary for phrase-level structures, could be added, e.g., from POWLA
6
+
6
+
4
5
(+)
tbc., syntax is likely the reason for having such nodes in LAF
5
(+)
tbc., in analogy with SynAF?
-0.5
D.5 Syntax/text structure: sequence of elements within a phrase
3
(-)
depends on internal structure of the phrase, if these are words, this could be nif:nextWord, if these are phrases, this is undefined
3
(-)
depends on internal structure of the phrase, if these are words, this could be nif:nextWord, if these are phrases, this is undefined
2
-
no phrases
6
+
2
-
6
+
5
(+)
tbc
2
-
tbc
6
+
tbc
6
+
tbc., e.g., for discourse annotation
0.5
E.1 Morphology: relations
2
-
no morphologial segmentation
2
-
no morphologial segmentation
3
(-)
no examples, via link with OntoLex-Morph?
3
(-)
via link with external vocabulary, e.g., OntoLex-Morph?
3
(-)
via link with OntoLex-Morph?
4
6
+
tbc
2
-
tbc
2
-
tbc
-4.5
E.2 Dependency syntax
5
(+)
not part of NIF core, but example implementation with OLiA for Stanford Parser
5
(+)
not part of NIF core, but example implementation with OLiA for Stanford Parser
6
+
native vocabulary
2
-
2
-
5
(+)
via external vocabulary, e.g., OLiA
5
(+)
voa external vocabulary: ISOcat
2
-
6
+
tbc
3
(-)
tbc
0.5
E.3 Phrase structure syntax: hierarchical relations
3
(-)
labelled edges do not seem be foreseen, must be encoded as phrase-level features
3
(-)
labelled edges do not seem be foreseen, must be encoded as phrase-level features
3
(-)
via POWLA and OLiA
2
-
5
(+)
labels via external vocabulary, OLiA
6
+
via external vocabulary, ISOcat
2
-
6
+
tbc
4
-1
E.4 Phrase structure syntax: other relations
3
(-)
3
(-)
3
(-)
support for Penn-style encoding of traces and coindexing, these need to be manually resolved, though
2
-
2
-
no relational annotation foreseen
5
(+)
via external vocabulary, e.g., OLiA
5
(+)
via external vocabulary, ISOCat
2
-
6
+
4
-2.5
E.5 Semantics: relations
3
(-)
can be extended, e.g., a FrameNet extension, using a separate namespace
3
(-)
can be extended, e.g., a FrameNet extension, using a separate namespace
5
(+)
native support for semantic roles
2
-
2
-
no relational annotation foreseen
5
(+)
via OLiA
5
(+)
via ISOcat
2
-
3
(-)
tbc.
6
+
-2
F.1 Intertextual relations
3
(-)
3
(-)
5
(+)
a partial linking functionality can be implemented using CoNLL-Merge, then corresponding tokens in different editions refer to the same nif:Word -- which may be an empty token, see alignment below
2
-
3
(-)
3
(-)
tbc.
2
-
2
-
2
-
tbc
-5.5
F.2 Collation and alignment
3
(-)
3
(-)
5
(+)
directed alignment only, encoded in CoNLL columns, i.e., CoNLL-RDF properties, cf. here
2
-
3
(-)
extensible for undirected alignment for more than 2 languages: "alignment" as subclass of oa:Annotation with multiple targets
3
(-)
can be extended
2
-
tbc., no examples known
2
-
2
-
tbc
2
-
tbc
-6.5
F.3 Links with lexical resources
5
(+)
5
(+)
5
(+)
5
(+)
5
(+)
5
(+)
4
4
4
4
3
F.4 Dialog annotation
2
-
Web Annotation allows to annotate multiple targets simultaneously, but it lacks the vocabulary to create links between annotations, e.g., for marking turn shifts), hence (+
2
-
Web Annotation allows to annotate multiple targets simultaneously, but it lacks the vocabulary to create links between annotations, e.g., for marking turn shifts), hence (+
2
-
only if annotated as plain text, no formal vocabulary
2
-
2
-
tbc., no example known
5
(+)
extensible
5
(+)
tbc., via ISOcat?
2
-
tbc
4
6
+
tbc
-4
score 3.5 3.5 -1 -4 2.5 11 14 -1.5 4.5 5.5 0