Session: Data by the people, for the people

Monday, April 18, 2016 - 3:00pm to 4:00pm
Co-chairs: Catherine Brownstein and Rose Relevo

Human subjects research is a gold standard for science, however the reporting and sharing of human subjects data is mired in ethical issues. How do we ensure that the data is usable downstream?  How can we be unbiased and yet promote an active culture of data reuse? How can we integrate knowledge systematically to support a sum that is greater than the parts? What peer review needs to or could happen further upstream of the publishing process? How can scholarly communication be made actionable and available outside of traditional publication venues? This session will discuss these issues, which are rising in importance as we strive to make data public and accessible- everything from drug trials to how the public feels about GMOs in their food.

Crowdsourced Human Genetics: What if we put people first? 

Bastian Greshake


In the past, research participants were "human subjects", having a passive role in research, with very little influence. Often they didn't get access to their own data. This has changed a lot over the last couple of years: over two million people already have access to a glimpse into their genome, thanks to Direct-To-Consumer genetic testing. This development could be enabled to shift the traditional paradigm of how human genetic research can be done.

Research can now be driven by the crowd, which actively participates in the design of studies, donates data and is highly interested in the outcomes. While this opens huge opportunities to accelerate and facilitate research it also leads to unique challenges. How to collect this kind of data? How to make it available in a way that's useful for citizen scientists and academic scientists alike? And how can the vast amount of literature based on this kind of data be made available to the general public? 

With openSNP we are trying to create such a data resource, which is aimed at sharing research findings as well as raw data. People can share their personal genomes, along with their trait data, with the public by dedicating it into the public domain. At the same time we apply text- & data-mining to summarize the vast quantity of primary publications. In this talk we will see what personal genomics is, how it can be used and linked to the published scientific record and how this can help scientists and the general public alike. 

Phenopackets: Making phenotype profiles FAIR++ for disease diagnosis and discovery

Melissa Haendel

Oregon Health & Science University

It is estimated 350 million people worldwide are afflicted with a rare disease. Because each disease is different, there are significant challenges in obtaining enough information relevant to the patient’s condition to help inform diagnosis and treatment. While great strides have been made in exchange formats for sequence data, complementary standards for phenotypes and environment are urgently needed. Patient phenotypic abnormalities are currently described in diverse places in diverse formats: publications, public databases, electronic health records, clinical testing labs, disease registries, and social media. Here we propose a new standard for exchange of patient phenotype data that is optimized for integration from these distributed contexts. The PXF standard will allow phenotypic data to be captured at the point of publication, to be transmitted in the context of diagnostic testing, to be used for exchange of data in clinical studies, and as a backbone for patient-contributed data registries and social media. Increasing the volume of computable phenotype data across a diversity of systems will support large-scale computational disease analysis using combined genotype and phenotype data - something that patients themselves will now be able to participate in.

Peer review After Results are Known: Are we “PARKing” the Cart Before the Horse?

Erick Turner

Oregon Health & Science University

The peer review will be prone to bias as long as it occurs after study results are known. This allows one to “torture the data until it confesses” to a statistically significant result. When studies refuse to “confess” and remain negative, authors and/or journals generally regard them as “not interesting” or “not publishable”. Such reporting biases become apparent when we use an “inception cohort” of trials from FDA Drug Approval Packages and compare those results to corresponding journal articles—this approach will be demonstrated with psychotropic drugs. The FDA’s immunity to these biases lies largely in the fact that is it aware of each trial’s existence—and prespecified methods—before study inception. Why not conduct results-free review of manuscripts and make (at least preliminary) publication decisions based on the importance of the scientific question and the methodological rigor? Two such peer-review models—one of them already underway in the UK—will be presented.  

Overcoming obstacles to sharing data about human subjects

Robin Rice

University of Edinburgh

Confidentiality requirements are often pitted against data sharing requirements in social and medical research. Does the need for disclosure control about human subjects necessarily mean that your research data cannot be shared and re-used? This presentation will touch on topics such as informed consent, anonymisation and pseudonomisation techniques, and what it means to be ethical with regard to data sharing about human subjects, including rich, qualitative data and research into social media content. New forms of governance and delivery, such as safe data havens will be discussed, and what is lost and what is gained for the public interest when data are shared under conditions other than open access. Robin Rice has over twenty years’ experience as a data librarian in the US and the UK; her team operates the open access Edinburgh DataShare repository and developed the popular Research Data Management Training (MANTRA) open online training course for researchers, as well as leading data-related training at the University of Edinburgh in Scotland.


Dr. Erik Jones


With over 750,000 members organized around 215 communities and thousands of topics, Inspire is the leading social network for health that holds one of the largest sources for patient-created content on the Internet. Through their discussions we have a wealth of user-generated content about their patient experience, including everything from adverse events and medication adherence to the initial diagnosis. Using natural language processing of anonymized data, we are able to uncover a wealth of information to supplement traditional research methods. We can also reach out to large groups of members with common interests and experiences, inviting them to participate in studies and sharing their findings with other members of the community. In addition, we can use what we know about our member populations to engage with the members at critical junctures in their medical journey, leading to better patient outcomes. This talk will demonstrate the power of user-generated content for research purposes.


Digital Science
Gordon and Betty Moore Foundation
PLOS (Public Library of Science)
Microsoft Research
Taylor & Francis Group
River Valley Technologies
International Society for Biocuration