Why data citation is a computational problem
Peter Buneman University of Edinburgh; Susan Davidson University of Pennsylvania; James Frew University of California, Santa Barbara
February 23, 2016
Abstract Most information is now published in complex, structured, evolving datasets or databases. There is increasing demand that this digital information should be treated in the same way as conventional publications and be appropriately cited. While principles and standards have been developed for data citation, they are unlikely to be used unless we can couple the process of extracting information with that of providing a citation for it. We discuss how to generate citations automatically for data in a database given how the data was obtained – the query – as well as the content – the data. We show how the problem of generating a citation is related to a well-understood problem in databases and describe this in two examples with radically different citation requirements.