How do you evaluate a database?
I was speaking with a colleague recently who, like many of us, had experienced the frustration of trying to support his on-line resources. He has assembled a comprehensive on-line resource, it is used by the community and was used by others to publish their studies. It is not Genbank or EBI; it is one of the thousands of on-line databases created by individuals or small groups that the Neuroscience Information Framework and others have catalogued. My colleague has spent years on this resource, pored over hundreds of references and entered close to a million statements in the database. By many means, it is a successful resource. But in the grant review, he was criticized for not having enough publications. I experienced the same thing in a failed grant for the resource that I had created, the Cell Centered Database. In fairness, that was not the most damning criticism, but it just seemed so very misplaced. I had succeeded in standing up and populating a resource, well before there was any thought of actually sharing data. People used the database and published papers on it, but apparently I should have been spending more time writing about it and less time working on it.
The problems of creating and maintaining these types of resources are well known and were discussed at Beyond the PDF2: to be funded, you have to be innovative. But you don't have to be innovative to be useful. To quote or paraphrase Carole Gobles at the recent conference, "Merely being useful is not enough."
But presumably there is a threshold of perceived value where "merely being useful" is enough. I am thinking of the Protein Databank or Pub Med. These resources are well funded and also well used but hardly innovative. I am guessing that many of the resources like my colleague and I created were started with the hope that they would be as well supported and integral to people's work as the PDB or Pub Med. But the truth is, they are not in the same class. But they are still valuable and represent works of scholarship. We are now allowed to list them on our biosketch for NSF. So my question to you is: how do we evaluate these thousands of smaller databases?
Ironically, our peers have no trouble evaluating an article about our databases, but they have much more trouble evaluating the resource itself. How does one weigh 30,000 curated statements against 1 article? What level of page views, visits, downloads and citations make a database worthwhile? If my colleague had published 10 papers, the reviewers wouldn't have likely checked how often they were cited, particularly if they were recent. What is the equivalent of a citation classic for databases? If you don't have the budget of NCBI, then what level of service can you reasonably expect from these databases? I thought that the gold standard was a published study that utilized your database to do something else, by a group unconnected to you. Grant reviewers found that unconvincing. Perhaps I didn't have enough? But how many of these do you need, relative to the size of your community, and on what time frame should you expect them to appear? Sometimes studies take years to publish. Do they need to be from the community that you thought you were targeting (and whose institute may have funded your resource) or does evidence from other communities count?
So perhaps if we want to accept databases and other artefacts in lieu of the article, we should help define a reasonable set of criteria by which they can be evaluated. Anyone care to help here?
Latest blog postsThe tyranny of formatting
How do you evaluate a database?
What would you do with 1K to make research communication better?
What I liked about Beyond the PDF2
OK. I’ve got my ORCID ID and I’m a lifetime member of PeerJ; Are we there yet?
Scholarly Communication 101: Improving data literacy