Decision Trees: Licenses, Attribution, Provenance, Credit and Glitches

Printer-friendly version

During our discussions about how to conduct scholarly commons compliant work using current infrastructures, our group has had a number of detailed discussions about many of the standing issues in scholarly communication, such as “Attribution vs Credit vs Provenance”, that is, when we are asserting that all the entities and actions that exist or take place within the Scholarly Commons should be Open, FAIR and Citable, why are we requiring these characteristics?   Attribution is the hallmark of scholarship: statements and works are attributed so that credit can be assigned and provenance and responsibility can be determined. Explicit attribution to a stable identity, when possible, is also an important safeguard against malicious or irrelevant works within the Commons, e.g., spam.  While the principles as currently written leave any rewards for attribution to outside of the commons, the idea that all actions and works within the commons can be traced to a stable identity is central to the commons.

In much of our discussions, we have conflated the concepts mentioned in the title: attribution, provenance and, credit.  The need for attribution in the commons is covered in the previous paragraph. Attribution both acknowledges the creator of a work, but also implies responsibility.  Provenance literally means: something’s origin. The FAIR principles include provenance as a baseline characteristic and recognise that scholarly objects need provenance in order to be reliably assessed, and interwoven into the overall knowledge canon. But provenance can be provided in the absence of attribution, e.g., for objects that have no individual who could be identified as the “author” (think tablets from ancient Sumeria, for instance). Citation is the practice of providing "a quotation from or reference to a book, paper, or author, especially in a scholarly work."  A well-constructed citation will include the identifier of the object or work cited, but also elements of attribution in the metadata.   So the 3 are related, but not identical.  Ideally, the commons supports and easy and reliable way to achieve all three as is appropriate for any object or situation in the commons.

Credit is defined as: “publicly acknowledge someone as a participant in the production of (something published or broadcast)”. We currently credit people with having participated in a work through the acknowledgements, without formally including them as the originator or primary mover of the work. As in the commons, credit itself needs to be FAIR, we have envisioned that each scholarly work would be accompanied by a fully machine-readable credits list. Note that those who are in the credit list need not assume the overall responsibility for the work;  they are simply (and FAIR-ly) acknowledged for having participated in its creation in some identified capacity.

As is defined by FAIR (a core feature of the commons), the rights to reuse objects must be instantiated through the provision of a machine-readable license (licenses, like all objects in the commons should be both human understandable and machine readablew).  

For much of our time in the group, we’ve effectively skirted round these issues because, by recommending the use of CC-BY licences for Scholarly Commons materials and outputs, attribution becomes legally required (although the ethical and cultural facets remain community concerns). However, in order to express the underlying rationale for ‘commons-ing’ as research practice, and thereby enabling researchers and communities to understand and adapt the principles to their own domains, we recognise that we need to be more explicit about how we see the principles and practices inter-relating.

The decision trees are the first research objects we’ve attempted to produce in a commons-compliant way, and we worked to make them attributable in order that they comply with the FAIR and Citable aspects of the commons definition. But in trying to do this, we hit several snags.

First, we evaluated several software packages for creating the decision trees. Draw.io is a drawing program that lets you enter hypertext but doesn’t produce citable, stable versions. And Zingtree is a decision tree package which for reasons of cost and output types is not commons-compliant. These sorts of barriers to practice led to our commissioning the PolicyModels project (that continues to be developed and which we’re planning to roll out at FORCE2017). This program is open source, commons-compliant, and provides a huge proportion of the Open, FAIR and Citable characteristics we’ve been seeking throughout the project. However, for many researchers there will still be a technological barrier to using it.   

Second, we wanted to use the Decision Trees Working Group as the author/contributor entity, and also use our ORCIDs to do so. There is a facility to pull together a group within ORCIDs (https://share-my-id.orcid.org). However, this results in a page that you can only circulate to those you’re inviting to join your group. In other words, to see the Working Group ORCIDs page is to be invited to the join the Working Group itself. So future work is clearly indicated in this area.

We’re aware that the issue of group/organisational identifiers has been debated and flagged before, and that there are genuine concerns about how such sorts of identifiers/lists/entities ought to be managed (ethically, who decides who has made what contribution, what is the bar to entry to a particular group, etc). However, we’ve come to the conclusion that “the community” or “communities” themselves need to be able to decide where the credit goes - whether to the individual, the working group or to the project itself. We need to be able to ask ourselves: which contributions and individuals or entities does the community recognise? There are already instances of this - more prevalent in some fields than others. The New England Journal of Medicine, for instance, has a byline functionality that confers considerable kudos to whomever is name checked. Accordingly, the decision as to who gets the byline is debated by the authors and it could be based upon a number of considerations (e.g. career stage, name recognition by potential readers, degree of controversy of the paper, etc).

“If you have to cut and paste, then the Commons is broken” Maryann Martone

We strongly believe that practising commons-compliant research should be do-able in a research “business as usual” context. Researchers should be able to do the “right” thing via infrastructures that facilitate them. Automatically associating awards with projects with people with datasets should be easy. Unfortunately, this is not yet the case.

While developing session proposals for FORCE2017, and thinking about community scrutiny and requirements, we have come up against a number of glitches in the current system(s). Time and time again, we hit points in the research workflow where we needed interoperability but instead were faced with effortful cutting and pasting or the dead end of free text. So far, we haven’t identified a set of interoperable platforms on which we could wholly rely.

How might things be improved? We propose that for individuals, an identity system which uses open identification protocols and is governed by the community, provides the single sign on for scholarly work.  Currently, ORCID is the best-fit solution (throughout this process, we’ve been balancing the need to simplify things for scholars by settling on a specific system for a specific purpose with the fact that the commons itself shouldn’t be relying on any single entity - it is technologically enabled NOT technically driven). Although some might argue for Facebook or Google identities, these are account IDs with no uniqueness or permanence and we also think it important to be able to distinguish between our work as scholars and our work in other contexts.  Those platforms that allow groups to form also should provide a means to associate ORCIDs, provide a stable identify for that group and also provide an API so that groups can assert authorship.  If the group changes regularly, then the group itself needs to be versioned, just like any other scholarly object.

But even more importantly, and something that is being explored in WG4, is to make it easier to capture provenance and cite relevant evidence in the appropriate places within a new work.  Right now, the process is extraordinarily tedious and involves a lot of cutting and pasting, or deliberate import of reference metadata into reference managers which then have to be inserted manually into the text.  We need to consider  generation technologies for making it easier to insert references and attribute actions and works within the Commons.  Technologies such as open web annotation and block-chain may in time contribute to solutions that make it easier for us to automatically record provenance.

Although at time of writing the FORCE2017 programme has not yet been finalised, we are planning to bring these items to the delegates and invite critique, comment and further collaborations. This project has proved to be fascinating - though often challenging - to work on and we’d welcome your input. Even in the course of writing this blog post, we discovered new lines of enquiry and possible actions - including a decision tree for allocating byline credit and an eventual governance model that might evolve into a ‘Committee on Commons Ethics’.

Text mainly by Fiona Murphy and Maryann Marton based upon the content of Decision Tree Working Group calls - especially with Danny Kingsley, Michael Bar-Sinai and Daniel S Katz.


About Fiona Murphy

After completing a DPhil in English Literature, Fiona held a range of scholarly publishing roles with Oxford University Press, Bloomsbury Academic and Wiley.

As Publisher for Earth and Environmental Sciences at Wiley, she began to specialise in emerging scholarly communications with particular emphasis on Open Science and Open Data.... More

View Profile