Java library and a web service for extracting metadata and content from scientific articles in born-digital form. The system analyses the entire content of a PDF file containing a publication and attempts to extract information such as: the title of the article journal information (title etc.) bibliographic information (volume issue page numbers etc.) authors and affiliations keywords abstract bibliographic references and structured sections hierarchy.

Resource Type: 
Parent organization: 
University of Warsaw; Warsaw; Poland
Supporting agency: 
National Centre for Research and Development (Poland)