PDFX

A fully-automated PDF-to-XML converter service for scientific articles. It takes a full-text PDF article as input and outputs the hierarchy of its distinct logical elements in an XML format.The elements that PDFX can currently extract are:* Front Matter** title abstract author author footnote* Body Matter** body text h1 h2 h3 image table figure/table caption figure/table reference bibliographic item bibliographic reference (citation)* Extras** header footer side note page number email URINote: This system has been designed for processing scientific articles. While virtually any PDF file is acceptable input quality of the processing output might be degraded e.g. for entire books slide presentations or spreadsheet/strictly tabular data.There are two ways in which you can use PDFX:* via a web browser* via any other HTTP client such as the curl command-line tool

Resource Type: 
Parent organization: 
Utopia Documents
Supporting agency: 
Grant: 
PMID: