Sapienta

SAPIENT Automation aims to help researchers process scientific papers faster and get the information they are interested in out of them. The project achieves this by automating the recognition of core scientific concepts such as Motivation Method Result Conclusion in papers and uses them to generate automatic summaries.The goal of the SAPIENT Automation (SAPIENTA project) is to see how useful it is to annotate Core Scientific Concepts (CoreSC)(e.g. Goal Experiment Method Result Conclusion etc.) in scientific papers. We therefore evaluated the outcomes of the JISC funded project ART (completed March 2009) in order to assess the added benefit from annotating CoreSCs in papers. The ART project produced a corpus of 265 papers (> 1 million words) from physical chemistry and biochemistry annotated with such concepts as well as a web annotation tool SAPIENT which allowed experts to manually annotate the papers. We have automated the annotation of CoreSC concepts and have delivered the SAPIENTA tool for this purpose training it and testing it on the ART corpus. We have also used the automatically annotated CoreSCs to create automatic summaries evaluated by Chemistry experts.One of the main objectives of the ART project was to create a tool that would enable manual annotation of scientific papers with semantic information pertaining to the key components of a paper describing a scientific investigation. To this effect a tool SAPIENT was created as well as a formalism representing the Core Information about Scientific Papers (CISP). CISP defines key generic scientific concepts and their properties including the following: Goal of investigation Motivation Object of investigation Research Method Experiment Result Observation Conclusion. The CoreSC scheme implements these concepts as well as Hypothesis Model and Background as a sentence-based annotation scheme for 3-layered annotation.The first layer pertains to the previously mentioned 11 categories the second layer is for the annotation of properties of the concepts (e.g. New Old) and the third layer caters for identifiers (conceptID) which link together instances of the same concept e.g. all the sentences pertaining to the same method will be linked together with the same conceptID (e.g. Met1).

Resource Type: 
Parent organization: 
ART project
Supporting agency: 
JISC
Grant: 
PMID: