On the complexities of sharing scientific data

July 16th, 2008 by dwentworth

Ethan Zuckerman, the Berkman fellow who founded Geekcorps and co-leads Global Voices with Rebecca MacKinnon, has a nice piece today on our efforts to clear the legal hurdles blocking the integration of scientific databases, highlighting research by Melanie Dulong de Rosnay (see our post from earlier this week).

Writes Zuckerman at World Changing:

Creative Commons is a clever use of the copyright system intended to make it easier for people who want to, to share their work with others. Jonathan Coulton has used Creative Commons to enable an army of remixers and videomakers to produce promotional materials for his songs and albums. Authors like Dan Gillmor and Cory Doctorow have used Creative Commons to let people download, translate and make audio versions of their books. And Global Voices uses Creative Commons so that blogs and news sites can use our content without asking us for permission.

What about scientists?


Under US law, pretty much anything you write down is copyrighted. Scrawl an original note on a napkin and it’s protected until 70 years after your death. Facts, however, are another matter – they can’t be copyrighted. So while trivial but creative scribblings are copyrighted, unless you choose to release them into the public domain, the information painstakingly discovered about the human genome – DNA sequences, for instance – aren’t. But the containers they’re stored in – the databases they’re held in – can be copyrighted.

If I sound confused about this stuff, that’s because I am.


This question of complexity is what Melanie’s research has focused on. She looked at the terms of use for roughly 200 databases necessary for work in the life sciences. Evaluating the terms on all those databases, she discovered that only 7 met her stringent definitions of Open Access to data – these databases could be accessed without registration; they could be downloaded for local use; they could be incorporated into other works; they had clear, understandable terms of use. This last factor proved to be the most challenging. She spent hours reading these terms with other experts in the field and discovered that, a great deal of time, the experts disagreed on what was permitted under a specific agreement.

The reason this is important, Melanie explains, is that scientific research proceeds more quickly when researchers can share resources. But with databases encumbered by different, confusing legal protections, it can become a legal nightmare for researchers to do complex work building new tools that combine information from two databases in a novel way, for instance. And databases that are protected by access restrictions can be out of reach to scientists in developing nations who might not have the financial or technical resources to access them.

So how do you deal with the problem of conflicting terms of use for the “containers” of scientific data?

As Zuckerman points out, we initially offered advice aimed at helping database publishers figure out when it made sense to use Creative Commons licenses. But it was evident that this wouldn’t solve the problem.

After further research, Science Commons collaboratively developed and published the Protocol for Implementing Open Access Data, through which we recommend explicitly returning the data to the public domain, using legal tools like the CC0 waiver or the Open Data Commons Public Domain Dedication and License (ODC-PDDL).

If you’d like to learn more about the protocol, we encourage you to check out the FAQ or send us an email. We’re happy to answer any questions you may have.

2 Responses

  1. Plausible Accuracy » Blog Archive » Why is sharing information so hard to do?, on July 17th, 2008 at 2:09 pm

    […] Wentworth followed up on the Science Commons blog with their response, which boils down to “that’s why we […]

  2. Science Commons » Blog Archive » How open is that data?, on July 21st, 2008 at 11:10 am

    […] 0 (3) « On the complexities of sharing scientific data […]