Data Infrastructures in the Sloan Digital Sky Survey and Large Synoptic Survey Telescope Projects
Scientific data sharing is not an end in itself. Rather, data must be managed in ways that they are discoverable, interpretable, trustworthy, and reusable. Sustaining data access at these levels of quality control is a challenging and expensive undertaking.
While astronomy is one of the oldest sciences, data-intensive astronomy has led to shifts from individual investigators collecting data with private telescopes to collaborative projects to design, build, and maintain shared telescope facilities. Collaborative data collection is more amenable to sharing data, databases, and catalogs with scientists not initially involved in collection.
The Sloan Digital Sky Survey (SDSS), which began taking data in 1998, became the gold standard for sky survey data sharing. The Large Synoptic Survey Telescope (LSST), a successor to the SDSS, will begin its 10-year survey operations in 2022. This dissertation research addresses questions of who, what, when, where, why, and how of managing astronomy data by examining the data practices of builders and users in these two collaborations. The study focuses on scientific research data; data management tasks; knowledge, expertise and experience required; workforces responsible for these activities; and how data management differs between astronomy populations. Using document analysis, ethnographic fieldwork (n=21 weeks), and semi-structured interviews (n=80), the study reveals how research data are embedded in knowledge infrastructures.
Findings indicate that the SDSS is not a single project but a long-lived data collection effort supported by successive short-term grants, whereas their goal is to have a long-lived structure. A full environment of data management funding and expertise is necessary to enable continually usable data. These findings imply that LSST and other scientific collaborations should determine what research data and accompanying documentation to save, when, and for how long to take a long-lived approach to building funding models, infrastructures, and the human expertise essential to supporting re-usable data.