Markup

Mark Up (Annotation)

Annotation (sometimes called ‘tagging’ or ‘mark up’) is the process of manually or automatically adding information into text for a given purpose.  As FORCE11 aims to bring together multiple different groups, we note that the purpose of mark up may differ across these communities.  In the typical situation, humans (called ‘annotators’ or ‘taggers’) use appropriate interfaces to make and record interpretations of specific phenomena in text, so that automated machine learning algorithms can be trained on the results in order later to perform the same function on new text.  In certain applications, often in the social sciences and biomedicine, annotation is often performed in order to discover empirically the nature and range of variations of the phenomenon in question or to record and tabulate all occurrences of the phenomenon, i.e., to manually mine the literature.  Thus annotation is for some people primarily an activity of corpus creation to support machine learning, while for others it can equally be a method of theory development and discovery.  In new types of enhanced publications, annotation also serves to link content of a paper (manually and/or automatically) to other relevant content:  Mention of a chemical compound, for example, might be linked to its structure in Pub Chem and other databases containing additional information (http://pubs.rsc.org/en/content/articlehtml/2008/mb/b718732g).   

The essence of creating an annotation task is specifying exactly what phenomena/fragments should be annotated, selecting the annotation labels,  and defining them clearly enough for annotators to understand.  The results of annotation may be recorded within the text at each marked location (called ‘in-line annotation’) or in a separate file (called ‘standoff annotation’), in which case it must be accompanied by suitable addressing information to ensure alignment with the source.

Annotation is not yet a well-grounded endeavor.  Several foundational questions remain unanswered, and potentially undermine work done to date (Hovy and Lavid, 2010).  Nonetheless, a great deal of experience has been gathered, and has been codified, over the past few years.  Some of the key issues (Hovy, 2010) are:

1) Choosing the Material to Annotate: The questions of coverage, balance, and representativeness are central both the annotation and to corpus creation in general.  No phenomenon is present in all genres, domains, and eras in exactly the same way.  Knowledge about these phenomena changes over time, perhaps changing interpretation of content.   

2) Choosing and Training Annotators: The question of who annotates when one cannot create an algorithm by which the phenomenon in question can be identified and classified automatically.  Some human insight is required.  But how much human training is appropriate?  How many annotators are needed for a given task?  Is there some mark-up that the authors themselves can provide, if given the appropriate tools?  For more formal annotation projects, and when annotation is cheap, managers generally hire ten to twenty individuals.   For certain tasks, much of the work may be done by one or two annotators with little formal training.

3)  Evaluating the Quality of Annotation: Measuring the quality of the results is probably the most difficult issue in annotation.  For theory validation and corpus creation, several annotators independently must understand the phenomenon and perform their annotation consistently.  But how many annotators?  And what measures to apply?  And what is an acceptable level of residual disagreement?  Different tasks and domains can provide different requirements.   The major measures are discussed in (Hovy, 2010).

Popular Annotation Services and Tools 

Several on-line annotation services exist, where managers can crowd source the process of annotation through services like and CrowdFlower. Mechanical Turk is a service offered by Amazon.com.  Annotation managers specify their tasks:.define the task, specify how much they wish to pay per annotation, and provide the data, as well as upload money to Amazon.  Amazon posts the task, collects the annotation decisions, forwards them to the manager, and pays the annotators.  A growing number of workshops are devoted to the experiences of managers using Mechanical Turk (e.g., HLT-NAACL Workshop Proceedings, 2010).

CrowdFlower is a service increasingly popular in Europe that offers either complete management of the whole annotation process, including training annotators, or the do-it-yourself style of Mechanical Turk, and provides helpful analysis graphs and charts with the results. 

The QDAP centre at the University of Pittsburgh provides annotators and annotation services, using a tailor-made interface.  This service is oriented toward Political Science work. QDAP has familiarity with the now rather older ATLAS.TI PoliSci annotation toolkit,

The Unstructured Information Management Architecture (UIMA) provides a framework in which one can define, build, obtain, and run so-called Analysis Engines that perform Annotations on text.   Casting one’s algorithms into UIMA can be onerous, but the following package from the University of Colorado is helpful: http://code.google.com/p/uutuc/.

Standards

Standards for language-based annotation formalisms are emerging; see the ISO Standards Working Group TC37 SC WG1-1 and (Ide et al., 2003).  However, as attested on the FORCE11 Tools and Resources page, there is no shortage of annotation methods still in development.  

 

References

Hovy, E.H. 2010. Annotation: A Tutorial.  Presentation and booklet presented at various conferences. 

Hovy, E.H. and J.M. Lavid. 2010. Towards a ‘Science’ of Corpus Annotation: A New Methodological Challenge for Corpus Linguistics. International Journal of Translation Studies, 13–36.

Ide, N., L. Romary, and E. de la Clergerie. 2003. International Standard for a Linguistic Annotation Framework. Proceedings of HLT-NAACL'03 Workshop on The Software Engineering and Architecture