The purpose of this working group is to create a practical, shared formalism using semantic web standards for scientific experimental protocols and data generated by these protocols. Our work is centered on the following competency questions:

  1. The formulation should be able to describe the workflow of a protocols at multiple levels: (i) full description of every step to permit 'design hacking' at the level of a researcher's lab notebook, (ii) minimal information representations that enable reproduction, (iii) high-level representation to permit correct interpretation of data. The different levels of representation should interoperate seamlessly.

  2. This formulation should be used to fully document and possibly automate the execution of an experiment. 

  3. The approach should be able to generate a structured representation for experimental data from the protocol consisting of measurements indexed by parameters and meta-data of the study.  Thus, data should be provided with semantics 'at birth' (at the point of primary data collection, or production), so that measurements are taken in the context of their semantics, and to never be separated from their semantics.  

  4. This structured data representation should be able to act as input to (i) workflow computations, (ii) reasoning engines and (iii) representations claims that cite evidence (e.g. Nanopublications, Micropublications, statements from the SEE or SEPIO frameworks).

  5. Our approach should be implementable as practical tools that leverage and integrate well into existing systems and existing standards for data and experimental design.  

  6. It should be possible to generate these representations from (i) existing structured data using information integration and (ii) written descriptions in papers using natural language processing.  

 This approach differs from existing methods by seeking to define protocols as a generative model for data by examining the structure of variables and metadata from papers. Our goal is not to create a standard or an ontology, but a practical formalism that extends existing best-practices (i.e., the PROV framework). If we are successful, we hope to enable the automation of experimental science (both in terms of experimental execution and understanding). This will permit acceleration of the research cycle and an improved ability to develop machine intelligence into our tools.


  1. Development of a lightweight practical formalism for protocols and data using Semantic Web standards. 
  2. Curation of several medium sized datasets in well-defined scientific use cases to demonstrate feasbility.
  3. Creation of evaluation framework for practical aspects of modeling, curation and use of these data structures. 
  4. Demonstration of the usage of data curated with this methodology by other working groups and communities.  


We are currently in the planning phase of establishing the working group. 

Deliverables and Timeline

We envisage an 18 month project development process to develop prototypes, generate interest and create a stable development platform for this work.  


