GREC Corpus

The GREC corpus is a semantically annotated corpus of 240 MEDLINE abstracts (167 on the subject of E. coli species and 73 on the subject of the Human species) which is intended for training information extraction (IE) systems and/or resources which are used to extract events from biomedical literature.The corpus has been manually annotated with events relating to gene regulation by biologists. Each event is centered on either a verb (e.g. transcribe) or nominalized verb (e.g. transcription) and annotation consists of identifying as exhaustively as possible the structurally-related arguments of the verb or nominalized verb within the same sentence. Each event argument is then assigned the following information:* A semantic role from a fixed set of 13 roles which are tailored to the biomedical domain.* A biomedical concept type (where appropriate).As a simple example consider the following sentence:The narL gene product activates the nitrate reductase operonThe sentence contains a single event centered on the verb activates with 2 arguments i.e.:* The narL gene product* the nitrate reductase operonThe argument The narL gene product is assigned the semantic role AGENT and the biological concept Protein whilst the argument the nitrate reductase operon is assigned the semantic role THEME and the biological concept Operon.Other types of argument include:* LOCATION e.g. In Escherichia Coli glnAP2 may be activated by NifA* MANNER e.g. cpxA gene increases the levels of csgA transcription by dephosphorylation of CpxR* CONDITION e.g. Strains carrying a mutation in the crp structural gene fail to repress ODC and ADC activities in response to increased cAMPThe corpus in available for download in 2 formats:* A standoff format based on the BioNLP'09 Shared Task format* An XML format based on the GENIA event annotation format

Resource Type: 
Sub-categories: 
Parent organization: 
National Centre for Text Mining
Supporting agency: 
JISC
Grant: 
PMID: 
19852798