Data Citation Principles Glossary

Printer-friendly version

 

Attribution:

(First used in principle 2)

Specification of terms of use of data, usually in the form of a license.

Legal attribution is founded on intellectual property rights and licenses as well as on strong normative values in the research community, and the data citations concern individual rights and norms of credit and publicity. Legal attribution is therefore distinguished in these principles from normative (scholarly) attribution, which is concerned with the incentives and systems of scholarly credit and evaluation (adapted from CoData 2013).

 

Citation:

(First used in preamble)

A formal structured reference to another scholarly published or unpublished work (adapted from https://www.jstage.jst.go.jp/article/dsj/12/0/12_OSOM13-043/_pdf).

In traditional print publishing, a "bibliographic citation" refers to a formal structured reference to another scholarly published or unpublished work. (This is in contrast to formal bibliometric terminology in which references are made, and citations received.) Typically, intra-document citation pointers to these structured references are marked and abbreviated. These are accompanied by the full bibliographic references to the work appearing in the bibliography or reference list, often following the end of the main text, and is called a "reference" or "bibliographic reference." Traditional print citations include "pinpointing" information, typically in the form of a page range that identifies which part of the cited work is being referenced.

The terminology commonly used for digital citation has come to differ from this older print usage. We adopt the more current usage in which "citation" is used to refer to the full bibliographic reference information for the object. The current usage leaves open the issue of the terminology used to describe the more granular references to data, including subsets of observations, variables, or other components and subsets of a larger data set. These granular references are often necessary in-text to describe the precise evidential support for a data table, figure, or analysis and are analogous to the "pin citation" used in the legal profession or the "page reference" used in citing journal articles. The term "deep citation" has been applied to granular citation to subsets of data.

 

Data:

(First used in preamble)

Any record which can be used to support a scholarly research argument, even if it may not be considered valid evidence in all disciplines. In the social sciences, data may include survey responses, interviews and historical documents. Source: modified from http://vso1.nascom.nasa.gov/vso/misc/vocab_2p3.pdf.

The term "data" as used in this document is meant to be broadly inclusive. In addition to digital manifestations of literature (including text, sound, still images, moving images, models, games, and simulations), digital data refers as well to forms of data and databases that are not self-describing -- that generally require the assistance of metadata, computational machinery and/or software in order to be useful, such as various types of laboratory data including spectrographic, genomic sequencing, and electron microscopy data; observational data, such as remote sensing, geospatial, and socio-economic data; and other forms of data either generated or compiled by humans or machines (adapted from CoData Report, 2013).

 

Dataset:

(First used in preamble)

Recorded information, regardless of the form or medium on which it may be recorded including writings, films, sound recordings, pictorial reproductions, drawings, designs, or other graphic representations, procedural manuals, forms, diagrams, work flow, charts, equipment descriptions, data files, data processing or computer programs (software), statistical records, and other research data." (from the U.S. National Institutes of Health (NIH) Grants Policy Statement via DataCite's Best Practice Guide for Data Citation). - From DataCite Business Models Principles http://www.datacite.org/sites/default/files/Business_Models_Principles_v1.0.pdf

 

Identifier and persistent identifier:

(First use in principle 6)

An identifier is an association between a character string and an object. Objects can be files, parts of files, names of persons or organizations, abstractions, etc. Objects can be online or offline. Character strings include URLs, serial numbers, names addresses, etc. A "persistent identifier" is an identifier that is available and managed over time; it will not change if the item is moved or renamed. This means that an item can be reliably referenced for future access by humans and software (from http://n2t.net/ezid/home/understanding).

 

Interoperability:

(First used in principle 8)

The ability of making systems and organizations work together (adapted from Wikipedia). Access to research data, as facilitated by data citations, requires technological infrastructure that is appropriately designed and based on interoperability best practices that include data quality control, security, and authorizations. Currently, interoperability at both the semantic and the infrastructure levels is important to ensure that data citations facilitate access to research data. However, organizations working to develop improved infrastructures that foster interoperability should widely communicate the standards, guidelines, and best practices that are being implemented; adopt standards for data documentation (such as metadata) and dissemination (data citations, including bidirectional links from data to publications and vice versa); and maintain an up-to-date knowledge of the evolution of not only the technologies implemented but also the best practices efforts being executed by the community of practice (adapted from CoData Report, 2013, Ch 5).

 

Machine-actionable:

(First used in introduction to principles)

Content that can be used and manipulated by computers (http://www.libraries.psu.edu/tas/jca/ccda/docs/tf-MRData3.pdf).

 

Metadata:

(First used in preamble)

Information about the data being tracked within a data system. Metadata typically conforms to a metadata information model. Metadata may include, for example, the name of the sensor used to collect the data or person who collected the data, where the data was collected, information about the units and dimensionality of the data, and other notes recorded by the investigator about how the data has been processed. Source: modified from http://vso1.nascom.nasa.gov/vso/misc/vocab_2p3.pdf.

Metadata is information (data) about the object and its disposition, such as the name of the object's creator, the date of creation, the target URL, the version of the object, its title, and so on. (from: http://n2t.net/ezid/home/understanding).

 

Research object:

(First used in preamble)

Sharable, reusable digital objects that enable research to be recorded and reused (adapted from Wikipedia).

 

Scholarship:

(First used in preamble)

Serious formal study or research of a subject (adapted from Merriam-Webster Dictionary).  

 

Verification, provenance and fixity:

(First used in principle 7)

Verification means to reliably establish the relationship between the cited object of a original citation and a current  object -- verification enables one to confirm that the data retrieved is the data cited. This is separate from persistence, which remains the responsibility of the archive, not the citation..

Types of verification information include fixity -- which can be used directly to assess the integrity of specific content, and provenance, which provides information about parts of the chain of custody and/or processing  to which the content was subject. Specific forms of citation verification include, but are not limited to: embedding fixity information in the citation itself; associating the citation with a surrogate (such as a landing page) where additional metadata, such as the data form, fixity, and final stage of provenance, are given explicitly; or associating such metadata with the  DOI, handle, or other persistent identifier persistent identifier itself directly, through the persistent identifier’s resolution or index service (adapted from CoData, 2013).

 

Version:

(First used in principle 7)

A modified dataset based on a single designated dataset -- roughly equivalent to an "edition" in FRBR terms. [1] 
This is often denoted with a number that is increased when the data changes, and can also be described by a "timeslice" or access date where a formal version is unavailable, for example [2]

[1] http://archive.ifla.org/VII/s13/frbr/frbr2.htm
[2] Starr, J., & Gastl, A. (2011). isCitedBy: A metadata scheme for DataCite. D-Lib Magazine, 17(1/2). doi:10.1045/january2011-starr