Tools and Resources

COMING SOON TO FORCE 11 - A catalog in being built that will allow our community to add, search for and rate tools.  Launching in October 2013.

Below are a few examples of tools and thoughts for advancing scientific communication. We hope this will be a living document to help search for the right toolkit to tackle this digital jungle. Maintaining a list of tools and examplars of research documentation done well can help us use each others’ work in the best possible way - asking our own questions, but - in terms of tools, at least - joining our fellow scholars in inventing this future.

New! Try out the new search function, brought to FORCE11 via the Neuroscience Information Framework.

Relevant papers and web articles can also be added to our FORCE11 Reading List in Mendeley by joining the FORCE11 group. You are welcome to add relevant articles, regardless of whether or not they are open access, but articles that are not open access should be tagged as such.

 

Alternative metrics

Author Identification

Annotation

Authoring tools

Citation analysis

Computational Linguistics/Text Mining Efforts

Data citation

Data repositories

Ereaders

Hypothesis/claim-based representation of the rhetorical structure of a scientific paper

Mapping initiatives between ontologies

Metadata standards and ontologies

Modular formats for science publishing

Open Citations

Open Data

Peer Review: New Models

Provenance

Publications and reports relevant to scholarly digital publication and data

Resource Management

Semantic publishing initiatives and other enriched forms of publication

Structured Digital Abstracts - modeling science (especially biology) as triples

Structured experimental methods and workflows

Text Extraction

Web Tools

 

Alternative metrics: Alternative ways of measuring impact

  1. Altmetric manifesto
  2. Total impact
  3. Publish or Perish (Harzing.com). Software that allows authors to perform citation analysis using Google Scholar. Calculates a variety of impact factors.
  4. Acumen:  Create portfolio for academic evaluation.  Currently in development

Author Identification:

  1. I am not a Scientist I am a Number
  2. Open Reseacher & Contributor ID (ORCID)

Annotation

  1. W3C Open Annotation Community Group
  2. Ciccarese P, Ocana M, Das S, Clark T. AO: An Open Annotation Ontology for Science on the Web. Paper at Bio-ontologies 2010 http://esw.w3.org/images/c/c4/AO_paper_Bio-Ontologies_2010_preprint.pdf.
  3. How to express and exchange annotations. https://github.com/nichtich/marginalia/wiki/Support-of-PDF-annotations.
  4. PDFX A PDF-to-XML converter for scientific articles via Utopia Documents.
  5. Peter Sefton on annotation: http://ptsefton.com/2010/11/05/towards-beyond-the-pdf-a-summary-of-work-weve-been-doing.htm/comment-page-1#id9.

Authoring Tools:

  1. Knowledge Blog– using Wordpress for Science.
  2. Evernote - Now with Tags
  3. Google Docs!
  4. David Argue's zebrafish HTML/JavaScript paper format: http://www.zfishbook.org/NGP/.
  5. The Scalar Project. The Alliance for Networking Visual Culture. Our work explores new forms of scholarly publishing aimed at easing the current economic crisis faced by many university presses while also serving as a model for media-rich digital publication.
  6. Semantic MediaWiki
  7. Authorea:  Write your papers on the web
  8. Fiduswriter:  an online collaborative editor especially made for academics who need to use citations and/or formulas. The editor focuses on the content rather than the layout, so that with the same text, you can later on publish it in multiple ways.
  9. iPython:  a web-based interactive computational environment where you can combine code execution, text, mathematics, plots and rich media into a single document

Citation analysis

  1. Publish or Perish (Harzing.com). Software that allows authors to perform citation analysis using Google Scholar. Calculates a variety of impact factors.

    Computational Linguistics/Text Mining EffortsData citation

    1. AcroMine (NaCTeM, University of Manchester). Automatically determines the full forms of acronyms.
    2. Argumentational Zoning, work by Simone Teufel and others.
    3. Original work on zones.
    4. Current work in defining elements within Chemistry papers, with Colin Batchelor.
    5. Automatic recognition of sentence types in biomedical abstracts. Tsujii lab, University of Tokyo. Title, conclusion, method, objective, result. See MEDIE (advanced search) for a demo.
    6. GENIA (Tsujii Lab, University of Tokyo) and GREC (NacTeM, University of Manchester). Textual corpora anotated with biomedical events – permit system training to identify and structure relevant information in biomedical documents automatically.
    7. Hypothesis identification at Xerox. The Xerox Integrated Parser is used to find key retorical statements in biology research papers.
    8. In-Context Summaries. The work of Stephen Wan of CSIRO, Sydney, providing in-browser summaries of referenced papers, weighted by the textual context of the in-text reference.
    9. Linking of biomedical Named Entities in document to related database entries - such links are provided in the BioLexicon. Examples of search engines providing such links are MEDIE and UKPMC.
    10. Metaknowledge annotation of biomedical events. NaCTeM, University of Manchester.
    11. Annotation of interpretative information for biomedical events along 5 different dimensions: Knowledge Type (fact, analysis, observation, etc), Certainty Level, Polarity, Manner and Source.
    12. OpenCalais. A web service provided by Thompson-Reuters that creates semantic markup of submitted text. Good for terms relating to current events, commerce and politics. Weak for scientific terms. Check conditions of use – Thompson-Reuters retains text submitted for its own purposes!
    13. REFLECT. European Molecular Biology Laboratory, Heidelberg. A Web service that provides semantic markup for gene and protein names in submitted HRML documents, with links to relevant bioinformatics databases.
    14. U-Compare. An integrated text mining/natural language processing system based on the UIMA Framework, allowing documents to be processed by various text-mining tools.
  2. Data citations

    1. Australian National Data Service has a nice page on data citation awareness: http://ands.org.au/guides/data-citation-awareness.html.
    2. David Shotton’s Data Citation Best Practice Discussion Document.
    3. Gary King on data sharing http://gking.harvard.edu/projects/repl.shtml.
    4. The challenges with tracking dataset reuse today, based on DOIs and paper-oriented tools:http://researchremix.wordpress.com/2010/11/09/tracking-dataset-citations-using-common-citation-tracking-tools-doesnt-work/.
    5. Universal Numerical Fingerprint (UNF)
    6. Micah Altman, Gary King, 2007. "A Proposed Standard for the Scholarly Citation of Quantitative Data", D-Lib 3(3/4). http://www.dlib.org/dlib/march07/altman/03altman.html.
    7. Oak Ridge National Laboratory Distributed Active Archive Center data citation policy http://daac.ornl.gov/citation_policy.html.
    8. Data Cite.org: helping you find, access and reuse research data
    9. DataUp.org:  Software for documenting, managing and storing research data; assigns a DOI to each data set.
    10. Amsterdam Manifesto:  Principles of data citation developed by multiple stakeholders at Beyond the PDF2
  3. Data repositories

    1. re3data.org - Registry of Research Data Repositories

Ereaders:

  1. For some really useful articles on this issue from someone who does understand typography and design see Craig Mod's site, for example this one on Books: http://craigmod.com/journal/ebooks/.
  2. Or this on how the reading experience should work: http://craigmod.com/satellite/bad_ereaders/.
  3.  

Hypothesis/claim-based representation of the rhetorical structure of a scientific paper

These projects all start with the assumption that a scientific paper is, at heart, a persuasive text that makes a number of claims, that are backed by research data and references. The paper comprises a set of hypotheses supported by evidence in the form of included data or references to other work.

  1. aTags. DERI, 2009- now. aTags ("associative tags") are snippets of HTML that capture the information that is most important to you in a machine-readable, interlinked format. aTags works with any Web text and can store and connect any textual element that is highlighted in a browser.
  2. Cohere. KMI, 2007- now. The Cohere project, which builds on the earlier 'ClaiMaker' project, offers a web-based interface to create claims, hypotheses, or statements, and relate these to other claims using an open set of relationships. It is usable for science, but also for structuring online debateson other topics.
  3. Hypotheses in Biology. UvA, 2009. A methodology and set of proto-ontologies in OWL for capturing different aspects of a text mining experiment: the biological hypothesis, text and documents, text mining, and workflow provenance.
  4. HyBrow. Stanford, 2008. A prototype bioinformatics tool for designing hypotheses and evaluating them for consistency with existing knowledge
  5. HypER. 2009 – now. HypER is an ad-hoc group of researchers who all represent scientific communications as a set of hypotheses, with relations to evidence. It includes representatives of LiquidPub, Cohere, SWAN, SALT, SPAR, aTags and abcde work. The main focus of HypER has shifted to the W3C HCLS work on Scientific Discourse structures.
  6. SALT. DERI, 2008. SALT is a LaTeX-based authoring tool that allows authors to identify Rhetorical Structure Theory (RST-) relations between sentences in their paper. It offers the author the opportunity to define main and secondary (satellite) sentences and create relations between them
  7. SWAN. Alzheimer Research Forum and Massachusetts General Hospital / Harvard Medical School, 2006 – now. The SWAN Alzheimer Knowledgebase project adds a collection of hand-curated hypotheses and claims to a research paper, which are then related through a set of discourse relationships. They can be browsed and relations between claims, as well as support networks for a specific claim, are made and visualised.
  8. The SWAN Scientific Discourse Ontology is publicly available and has been harmonized with the SIOC and CiTO ontologies for wider use.
  9.  

Mapping initiatives between ontologies

  1. SWAN/SIOC/CiTO alignment. 20010, HCLS SiG of W3C: Harmonization and alignment between three ontology systems of relevance to citations and rhetorical relationships between publications: SWAN, used for the SWAN project, SIOC, used to describe social media, and the SPAR ontologies CiTO (Citation Typing Ontology) and FaBiO (FRBR-aligned Bibliographic Ontology).
  2. NCBI BioPortal: http://bioportal.bioontology.org/.

Metadata standards and ontologies

  1. Bioinformatics Links Directory: http://www.bioinformatics.ca/links_directory/.
  2. Catalogue of standards and ontologies relevant to life sciences: http://www.biosharing.org/standards_view.
  3. MIBBI: Minimum Information for Biological and Biomedical Investigations: http://www.mibbi.org/.
  4. NCBI BioPortal: http://bioportal.bioontology.org/.
  5. Neuroscience Information Framework: http://www.neuinfo.org/nif/nifgwt.html?tab=registry.
  6. Ontology of Biomedical Investigation. A broad-based community effort to develop an ontology that provides a representation for biomedical experiments.
  7. Open Biological and Biomedical Ontologies: http://www.obofoundry.org/.
  8. Open Archives Initiatives: Object Reuse & Exchange (OAI-ORE): defines standards for the description and exchange of aggregations of Web resources
  9. OREChem project on the Experiment Ontology - There's a slightly out of date description of this idea at:http://www.aejournal.net/content/1/1/3
  10. SPAR (Semantic Publishing and Referencing Ontologies) http://purl.org/spar/.

Modular formats for science publishing
These propose greater granularity for the scientific paper, the 'smallest publishable unit' being smaller than the size of a full paper.

  1. ScholOnto: this is the precursor of the Cohere work; the relationship ontology is available in RDF.
  2. abcde format. Utrecht University, 2007. The abcde format is a proposal for a simple, structured format for conference papers in computer science, based on LaTeX. Each paper consists of three sections: Background, Contribution, and Discussion, and three added elements: A = Annotation, Dublin Core annotation; E = Entities, these are RDF-formatted entities of interest, including references, and (no contribution to the acronym) Core Sentences: these are sentences that are marked up by the author to be core elements. They can be extracted to form a structured abstract
  3. The Annotation Ontology is an OWL vocabulary designed to support stand-off annotation of web documents, without requiring these documents to be under update control of the annotators. It is orthogonal to domain ontologies.
  4. 'Coarse-grained rhetorical structure'. Work done in the HCLS SiG of the W3C since 2009. This group aims to define a 'rhetorical structure' for scientific papers, to use in authoring or mark-up tools. They are trying to come to a definition of such a format; have an intermediary proposal of their own and are beginning to make an overview of existing publisher's proposals.
  5. LiquidPub. EU Project, U Trento and others (2008- 2011) A 'liquid' format for science papers is proposed, that consists of a set of research objects, connected by links.
  6. Modular Physics Paper. University of Amsterdam (1999). A modular form for Physics papers: by investigating a collection of papers, a more fine-grained structure for science papers and an extensive relationships taxonomy is proposed
  7. Nanopublications. NBIC, the Netherlands Bioinformatics Centre: The notion of a nanopublication is basically a general scientific assertion, represented using controlled vocabularies as “triples” (subject-predicate-object) in the semantic-web standard RDF format, with additional meta-data concerning provenance.
  8. The Concept Web Alliance proposes to model scientific research as sets of triples (CWA Nanopublications, 2010).
  9. The definition of the format has been published (The Anatomy of a Nanopublication).
  10. See also The Value of Data, motivating the use of nanopublications in Nature Genetics.
  11. Push:  a journal dedicated to publishing original research on writing with source code

Open Citations

  1. The Open Citation Corpus. University of Oxford, 2010 onward.
  2. A public RDF triplestore of biomedical literature citations encoded as Open Linked Data, linked using CiTO, the Citation Typing Ontology. Encoding references to some 3.4 million to unique papers, representing >20% of all PubMed Central papers published between 1950 and 2010, including all the most highly cited papers in every biomedical field. Citation data freely available under a CC0 waiver from http://opencitations.net/data/ in a variety of formats including RDF and BibJSON. Hopefully soon to include data citations from the Dryad data repository.

Peer Review: new models

  1. F1000 Research. New on-line journal that employs a fast publication process followed by open peer review.
  2. Rubriq: New model for independent peer review where review is decoupled from journal submission;  rather review is done independently and can be shared across all journals and distribution venues. 
  3. Peerage of Science:  A new service for scientific peer review and publishing.  

 Provenance

A key part of science is knowing the provenance of a paper, experiment, data item, etc. Provenance includes attribution, sources, experimental workflow, citations and quotes, i.e. who, what, when where, why.

  1. A comprehensive review of provenance research: Moreau, L. (2010) The Foundations for Provenance on the Web. Foundations and Trends in Web Science, 2 (2--3). pp. 99-241. ISSN 1555-077X.
  2. Open Provenance Model - a model for the interoperable exchange of provenance information arising out of a series of Provenance Challenges focusing on understanding the compatibility and interchange of information between provenance systems
  3. Open Provenance Model Vocabulary: http://open-biomed.sourceforge.net/opmv/ns.html.
  4. W3C Provenance Working Group - follow-on from the group below. Will standardize an exchange format for provenance on the Web.
  5. Prov W3C Provenance Standard
  6. W3C incubator group on provenance - mission was to provide a state-of-the art understanding and develop a roadmap in the area of provenance for Semantic Web technologies, development, and possible standardization. Finishes Dec. 2010.
  7. Workflows4Ever. This EU Project has a strong provenance of workflows component.

Publications and reports relevant to scholarly digital publication and data

  1. Charles Bailey's Scholarly Electronic Publishing Bibliography: http://www.digital-scholarship.org/sepb/sepb.html.
    The Scholarly Electronic Publishing Bibliography presents over 3,800 articles, books, and a limited number of other textual sources that are useful in understanding scholarly electronic publishing efforts on the Internet. It covers digital copyright, digital libraries, digital preservation, digital rights management, digital repositories, economic issues, electronic books and texts, electronic serials, license agreements, metadata, publisher issues, open access, and other related topics.
  2. Publishing Research Consortium list of links Publishing Research Links

The list of publications is not longer maintained here; please see the FORCE11 library in Mendeley

Open Data

  1. Geoffrey Boulton, Michael Rawlins, Patrick Vallance, Mark Walport (2011). Science as a public enterprise: the case for open data. The Lancet, Volume 377, Issue 9778, Pages 1633 - 1635, 14 May 2011. doi:10.1016/S0140-6736(11)60647-8.
  2. Liz Lyon (2010). Open science in the data decade - article in Issue 20 of the Central Government edition of Public Service Review. http://www.ukoln.ac.uk/ukoln/staff/e.j.lyon/ publications.html#central-government-2010-04.
  3. Liz Lyon (2007). Dealing with Data: Roles, Rights, Responsibilities and Relationships - Consultancy Report. http://www.ukoln.ac.uk/ukoln/staff/e.j.lyon/publications.html#2007-06-19.
  4. O'Donnell RP, Supp SR, Cobbold SM. (2010). Hindrance of conservation biology by delays in the submission of manuscripts. Conserv. Biol. 24 (2): 615-620. Epub 2010 Jan 11. http://www.ncbi.nlm.nih.gov/pubmed/20067489.
  5. Open Biology. The Royal Society has just launched Open Biology, its first fully open access journal. Open Biology is a rapid, open-access, peer-reviewed online journal publishing high quality research in cell biology, developmental and structural biology, molecular biology, biochemistry, neuroscience, immunology, microbiology and genetics. The Editor-in-Chief, Professor David Glover (FRS) from the University of Cambridge, aims to provide a journal with a fair and speedy review system, run by active, practicing scientists with high expertise in this area, allowing good papers to be published quickly.
  6. Sommers J (2010). The delay in sharing research data is costing lives. Nature Medicine 16 (7): 744. https://chordoma.box.net/shared/static/azpn8pxuzk.pdf.
  7. Denton Declaration:  An open data manifesto:  On May 22, 2012 at the University of North Texas, a group of technologists and librarians, scholars and researchers, university administrators, and other stakeholders gathered to discuss and articulate best practices and emerging trends in research data management.  This declaration bridges the converging interests of these stakeholders and promotes collaboration, transparency, and accountability across organizational and disciplinary boundaries

Resource Management

  1. Zotero

Semantic publishing initiatives and other enriched forms of publication

  1. Adventures in Semantic Publishing. Oxford University, 2009. A paper reporting a manually marked-up version of an epidemiplogical research paper in PLoS Neglected Tropical Diseases, with data enhancements, better browsing, reference linking and citation typing
  2. Article of the Future. Cell, 2009 onwards. Tabbed and hyperlinked presentation of the article; Graphical Abstract and Highlights on the landing page
  3. Open Access journals published by Pensoft Journals come with semantic enhancements. Example: PhytoKeys.
  4. Project Prospect. Royal Society of Chemistry, 2009 onwards. RSC editors annotate compounds, concepts and data within the articles and linking these to additional electronic resources such as biological databases.
  5. The Scalar Project. The Alliance for Networking Visual Culture. Our work explores new forms of scholarly publishing aimed at easing the current economic crisis faced by many university presses while also serving as a model for media-rich digital publication.
  6. Semantic Biochemical Journal. 2010 onwards that uses Utopia, an innovative PDF reader which allows enrichment of the PDF with interactive figures and active data.
  7. Biotea:  The research paper as an interface to the Web of Data. Biotea has semantically processed the full-text, open-access subset of PubMed Central. Th RDF model and resulting dataset make extensive reuse of existing ontologies and semantic enrichment services. The model, services, prototype, and datasets are available.
  8. FEBS Letters SDA, 2008 – now. The journal FEBS Letters adds curator-created triples to describe protein-protein interaction to every appropriate paper

Structured Digital Abstracts - modeling science (especially biology) as triples

Representing scientific information as sets of triples. There is a special interest in this representation within biology and life sciences. Some intiiatives include:

  1. The Structured Digital Abstract, Seringhaus/Gerstein, 2008. This paper basically proposes to include a 'structured XML-readable summary of pertinent facts'
  2. Eagle-I - eagle-i is a distributed platform for creating and sharing semantically rich data. It is built around semantic web technologies and follows linked open data principles. In its current incarnation and operational deployment, eagle-i focuses on biomedical research resources.
  3. crowdLabs: a platform for sharing and executing computational tasks.

Structured experimental methods and workflows

  1. Investigation/Study/Assay (ISA). European Bioinformatics Institute and University of Oxford, 2009 – present. The ISA infrastructure is a general-purpose format and freely available desktop software suite targeted to curators and experimentalists that assists in management of experimental metadata, engages with minimum information checklists, ontologies and formats, perticularly relating to genomics data for submission to international public repositories (e.g. ENA for genomics, PRIDE for proteomics and ArrayExpress for transcriptomics).
  2. Knowledge Engineering from Experimental Design (KEfED): A structured way of constructing 'observational assertions' based on statistical relationships from experiments. The model is general-purpose and forms a basis for reasoning over experimental data.
  3. My Experiment: A platform to create and exchange experimental workflow components.
  4. VisTrails: an open-source data analysis and visualization tool that supports the creation of documents whose results have deep captions that point to their provenance, and thus can be reproduced and verified. Provenance-rich results derived by VisTrails can be included in LaTeX, Wiki, Microsoft Word and PowerPoint documents.
  5. The NIH Neuroscience Information Framework has developed a registry of over 800,000 unique antibodies from the neuroscience literature, with sourcing and availability information, based on a semantic annotation pipeline supported by the Domeo web annotation toolkit.
  6. Workflows 4Ever: Wf4Ever addresses some of the biggest challenges for the preservation of scientific workflows in data intensive science, including: (a) the consideration of complex digital objects that comprise both their static and dynamic aspects, including workflow models, the provenance of their executions, and interconnections between workflows and related resources, (b) the provision of access, manipulation, sharing, reuse and evolution functions to these complex digital objects, (c) integral lifecycle management functions for workflows and their associated materials. To address these challenges, the Wf4Ever project will investigate and develop technological infrastructure for the preservation and efficient retrieval and reuse of scientific workflows in a range of disciplines.

Text Extraction

  1. LA-PDFText A PDF-to-XML converter for scientific articles from the Biomedical Knowledge Engineering group @ the Information Sciences Institute.
  2. PDFX A PDF-to-XML converter for scientific articles via Utopia Documents.

Web tools

  1. MementoMemento proposes a technical framework aimed at better integrating the current and the past Web. The framework adds a time dimension to the HTTP protocol and, inspired by content negotiation, introduces the notion of datetime negotiation.