GENIA Project: Mining literature for knowledge in molecular biology

The GENIA project seeks to automatically extract useful information from texts written by scientists to help overcome the problems caused by information overload. We intend that while the methods are customized for application in the micro-biology domain the basic methods should be generalizable to knowledge acquisition in other scientific and engineering domains.We are currently working on the key task of extracting event information about protein interactions. This type of information extraction requires the joint effort of many sources of knowledge which we are now developing. These include a parser ontology thesaurus and domain dictionaries as well as supervised learning models. GENIA corpus is a collection of biomedical literature. It has been compiled and annotated within the scope of the GENIA project. The goal of the project is to develop text mining (TM) systems for the domain of molecular biology. The GENIA corpus has been developed to provide a reference material for the development of bio-TM systems. The corpus currently contains 1999 Medline abstracts which were collected using the three MeSH terms human blood cells and transcription factors. The corpus has been annotated with various levels of linguistic and semantic information.The GENIA coprus includes the following:* POS annotation* Treebank* Coreference Annotation* Term annotation* Event annotation* Relation annotation* Cellular localization* Disease-Gene association* Pathway corpusAvailable tools include:* GENIA sentence splitter: a sentence splitter optimized for biomedical texts. GeniaSS reads a text and splits it into sentences by inserting line breaks.* GENIA tagger: part-of-speech tagging shallow parsing and named entity recognition for biomedical text * AkaneRE: PPI and Molecular Events Relation Extraction (RE)* XConc suite: a collection of XML-based tools which are integrated to support the corpus development and annotation.

Resource Type: 
Parent organization: 
National Centre for Text Mining University of Tokyo; Tokyo; Japan
Supporting agency: 
Japanese Ministry of Education Culture Sports Science and Technology (MEXT) JST - Japan Science and Technology Agency
Grant: 
PMID: