Weblog

On the complexities of sharing scientific data

July 16th, 2008

Ethan Zuckerman, the Berkman fellow who founded Geekcorps and co-leads Global Voices with Rebecca MacKinnon, has a nice piece today on our efforts to clear the legal hurdles blocking the integration of scientific databases, highlighting research by Melanie Dulong de Rosnay (see our post from earlier this week).

Writes Zuckerman at World Changing:

Creative Commons is a clever use of the copyright system intended to make it easier for people who want to, to share their work with others. Jonathan Coulton has used Creative Commons to enable an army of remixers and videomakers to produce promotional materials for his songs and albums. Authors like Dan Gillmor and Cory Doctorow have used Creative Commons to let people download, translate and make audio versions of their books. And Global Voices uses Creative Commons so that blogs and news sites can use our content without asking us for permission.

What about scientists?

[...]

Under US law, pretty much anything you write down is copyrighted. Scrawl an original note on a napkin and it’s protected until 70 years after your death. Facts, however, are another matter - they can’t be copyrighted. So while trivial but creative scribblings are copyrighted, unless you choose to release them into the public domain, the information painstakingly discovered about the human genome - DNA sequences, for instance - aren’t. But the containers they’re stored in - the databases they’re held in - can be copyrighted.

If I sound confused about this stuff, that’s because I am.

[...]

This question of complexity is what Melanie’s research has focused on. She looked at the terms of use for roughly 200 databases necessary for work in the life sciences. Evaluating the terms on all those databases, she discovered that only 7 met her stringent definitions of Open Access to data - these databases could be accessed without registration; they could be downloaded for local use; they could be incorporated into other works; they had clear, understandable terms of use. This last factor proved to be the most challenging. She spent hours reading these terms with other experts in the field and discovered that, a great deal of time, the experts disagreed on what was permitted under a specific agreement.

The reason this is important, Melanie explains, is that scientific research proceeds more quickly when researchers can share resources. But with databases encumbered by different, confusing legal protections, it can become a legal nightmare for researchers to do complex work building new tools that combine information from two databases in a novel way, for instance. And databases that are protected by access restrictions can be out of reach to scientists in developing nations who might not have the financial or technical resources to access them.

So how do you deal with the problem of conflicting terms of use for the “containers” of scientific data?

As Zuckerman points out, we initially offered advice aimed at helping database publishers figure out when it made sense to use Creative Commons licenses. But it was evident that this wouldn’t solve the problem.

After further research, Science Commons collaboratively developed and published the Protocol for Implementing Open Access Data, through which we recommend explicitly returning the data to the public domain, using legal tools like the CC0 waiver or the Open Data Commons Public Domain Dedication and License (ODC-PDDL).

If you’d like to learn more about the protocol, we encourage you to check out the FAQ or send us an email. We’re happy to answer any questions you may have.

Video: Timo Hannay on CC-based science publishing

July 14th, 2008

In case you missed it over at Open Access News, there’s a new 3-part video documentary on publishing open content using Creative Commons licenses. Part two (below) is an interview with Nature’s Timo Hannay — an especially interesting bit in light of the recent discussions about business models for sustainable open access (OA) publishing.

If there’s anything that the interview makes clear, it’s that scientific publishing is undergoing a profound transformation, and people in OA are confronting its most difficult challenges head-on. They’re freeing science while providing us with valuable lessons about the kinds of publishing models that can work in the digital era — a boon on both counts.

Melanie Dulong de Rosnay on opening access to science

July 14th, 2008

This spring, Harvard’s Berkman Center launched the Publius Project, which ccLearn’s Jane Park aptly called a “Web 2.0 version of the Federalist papers.” The concept:  a diverse group of thinkers — scholars, technologists, activists and others — would publish essays about the “constitutional” moments shaping Internet governance. The essays, and responses to them, would serve to seed a broader discussion about the choices we’re making as we collectively build the future of the networked world.

Peter Suber penned an essay entitled The Opening of Science and Scholarship, asking who controls peer-reviewed research, and arguing for a future where scholars themselves ensure it is open access (OA). The latest response, written by Berkman Fellow Melanie Dulong de Rosnay, focuses on opening access to scientific research. De Rosnay is the legal lead of Creative Commons France, and works with us here at Science Commons on open data policy. She asks:

How can society take advantage of the opportunities offered by digital publishing and distributing to share scientific results more quickly and thus facilitate the discovery of new knowledge? … Should we simply ensure access to knowledge without paying a fee, or should we do even more to improve that access, such as enhancing legal and technical capabilities for finding, extracting, annotating and compiling information in order to make better use of it?

De Rosnay’s answer:  to take full advantage of OA’s benefits, we should lift all three barriers to access identified in the Budapest Open Access Initiative definition:  financial, legal and technical. That way,  she writes, “researchers and the public can not only access, but also redistribute and reuse materials in any way, including ways that initial creators had not considered.”

In other words, if we want more innovation and discovery to come from OA research, we should be designing for it — publishing research and data in open formats, with the legal rights “baked in” to make use of it.

If you’d like to hear more from de Rosnay, she’ll be giving a presentation on “Openness for Life Science Databases” tomorrow, Tuesday, July 15th, as part of the Berkman Center’s terrific luncheon series. There may still be spots open if you want to join in person, but if not, anyone is free to watch the live webcast or view the archived video when it is posted. You can find all the details here.

Where’s the CC in Science Commons?

July 10th, 2008

When I joined Science Commons last year, that was the question at the core of the queries I got from curious friends. In most cases, they “got” Creative Commons — it was about freeing culture with licenses for the legal sharing and remixing of creative works. Science Commons, they reasoned, must be about freeing science by creating special licenses for sharing and remixing scientific research.

We do work to “free” science — that is, to make it easier to legally share, integrate and remix research and data, with the goal of accelerating discovery. But you won’t find any specialized licenses at Science Commons. Indeed, when the goal is integration of open scientific databases published under different jurisdictions, we advise against using licenses of any kind, including the CC-BY license.

So what exactly is Science Commons doing, and where’s the CC in it? Glad you asked. Last month, Creative Commons held its first TechSummit, which was graciously hosted by Google. John Wilbanks, who leads Science Commons, gave a short talk on our work, showing what the CC methodology “looks like” in the world of science rather than culture. You can watch the presentation by clicking on part 1, below, which begins with the keynote address by Creative Commons CEO Joi Ito (the Science Commons talk is at 1:05/1:21.53).

Of course, there were lots of other interesting presentations at the TechSummit, which brought together folks from every corner of CC. You can check out the details at the Creative Commons blog, and watch parts 2, 3 and 4 on YouTube.

If you watch the Science Commons presentation and have questions about our mission, methodology or any of our projects, feel free to send us an email. We’re happy to provide more detailed information.

Collaborating for breakthroughs

July 4th, 2008

Over at the FasterCures blog, Margaret Anderson, the organization’s Chief Operating Officer, has a post on the recent Institute of Medicine forum: Breakthrough Business Models: Drug Development for Rare and Neglected Diseases and Individualized Therapies. Anderson, who moderated a panel at the forum, observes that while the Michael J. Fox Foundation is often cited as an example of what’s working well, surprisingly few research foundations embrace its innovative approaches. Key among them:  pursuing collaborations with for-profit companies.

The focus of the forum was on finding new models for drug development, and many speakers echoed Anderson in emphasizing the need for more public-private collaboration. Our own Kaitlin Thaney was there, and spoke with fellow participants about our newest project, the Health Commons. The project, launched in collaboration with CommerceNet, CollabRx and the Public Library of Science (PLoS), is designed specifically to lift barriers to collaborations among non-profit and for-profit entities.

“In the Health Commons, participants agree to share data, knowledge, materials and services under standard, pre-negotiated terms and conditions,” explains Thaney. “That way, resources can move smoothly among participants, without the legal wrangling and delays that can derail collaboration.”

One of the most troublesome areas, for foundations and companies alike, is materials transfer. On the panel that Anderson moderated, Michael Mowatt, who directs the Office of Technology Development at the National Institute of Allergy and Infectious Diseases, NIH, described how using standardized agreements and repositories can facilitate collaboration, and explained how our Biological Materials Transfer Agreement (MTA) project lays the groundwork for “virtual repositories” of biological materials.

You can find more information about the MTA project here. If you’d like to learn more about the Health Commons, you can check out the white paper or video introduction at the project site.  And if you’d like more details on the IOM forum, you can find the agenda and a collection of audio recordings and slides at the IOM forum website.

Update: Public Knowledge co-founder David Bollier has a post sharing his reflections on the Health Commons project:

For those of us who don’t venture into the laboratories of science, it’s difficult to appreciate how fragmented, proprietary and inefficient drug and disease research truly is. At a time when the Internet is making it easier than ever to share and collaborate, some of the most well-funded, high-tech scientific projects today still operate in their own isolated silos. They are effectively cut off from vast quantities of potentially useful research, scientific literature, emerging ideas and potential collaborators. [...]
Tenenbaum and Wilbanks are two of the champions behind an ambitious new project, Health Commons, which aspires to build a new ecosystem for scientific research.

SCOAP3’s Jens Vigen: opening access on a global scale

June 30th, 2008

Among the biggest challenges for opening access to scientific research is developing sustainable ways to fund it. CERN’s SCOAP3 has been a global trailblazer, setting up a system where funding bodies and libraries contribute to the consortium, which pays centrally for peer review. The INIST-CNRS in France has a new interview with Jens Vigen, scientific information officer and director of the CERN library. Vigen explains how SCOAP3 came to be, discusses the principles behind the funding system and where the initiative is headed next.

Excerpt (from the Google English translation):

Three elements were important in [CERN launching SCOAP3]: our long tradition of [pre-print publishing], the fact that the Open Access movement among librarians [had] gained momentum since 2003 and the fact that some journals in particle physics models already offered free access long before the concept [was widely] accepted. CERN, which supported and supports these journals, decided to respond to this moment. A briefing was held in September 2005, with researchers to explain the movement and [raise] awareness, then a two-day seminar in late 2005, with those involved in science communication: publishers, funding agencies and researchers. Following this meeting, a task force was set up bringing together publishers, representatives of agencies and researchers. Their discussions led to the model of sponsorship.

In a few weeks, Science Commons will hold a free, open workshop featuring Vigen and others who are leading the charge for open access to scientific research. The workshop, held July 16-17 in conjunction with ESOF 2008 in Barcelona, Spain, is aimed at defining the foundational principles to foster the growth of open science worldwide. Vigen will talk about the state of open access in nations across the globe, as well as offering his perspective on strategies to take the OA movement to the next level.

If you plan to join us, we encourage you to register here. If you have questions, please let us know. We hope to see you there.

A new open access mandate at Stanford

June 28th, 2008

I’m late to the game on this, but can’t resist passing along the good news: the faculty at the Stanford University School of Education has reportedly voted to adopt an open access (OA) mandate.

Les Carr, who attended the conference where OA luminary John Willinsky shared the news, writes:  “[Willinsky] banged the drum for Open Access and announced an OA mandate for the Stanford School of Education. According to the story, he was describing the Harvard mandate to his colleagues in a meeting and they instantly voted to adopt a similar mandate themselves. Way to go!”

At Science Commons, we work to help scholars retain the rights to share their work, and to bring open access to more institutions [PDF], so it’s extremely encouraging to see faculty authors at Stanford not only embracing OA personally, but also working to implement it at the institutional level — changing the “default setting” for published research from closed to open.

Just in time for the Stanford announcement, C&RL News has published an article that puts the decision in a larger context. The piece, Two new policies widen the path to balanced copyright management: Developments on author rights. explores the implications of the NIH and Harvard mandates, and contains the following apropos observation:

Norms are always more difficult to change than technologies. We are now witnessing a key shift in norms for sharing scholarly work that promises a giant step forward in leveraging the potential of network technologies and digital scholarship to advance research, teaching, policy development, professional practice, and technology transfer.

Hear, hear. Kudos to the faculty at the Stanford School of Education for helping to make it happen.

Update (6/30): Open Access News has additional details.

GSK, caBIG give away cancer data to speed research

June 25th, 2008

It’s no secret that we’re fans of the National Cancer Institute’s caBIG, the Cancer Biomedical Informatics Grid. So we were thrilled to learn that the organization, which connects more than 60 NCI centers with a common infrastructure, played a central role this past week in what Wired is calling a “Massive Cancer Information Giveaway.” The big prize, provided by GlaxoSmithKline (GSK) and shared freely with cancer researchers via caBIG’s platform:  genomic profiling data for over 300 cancer cell lines. The lines were derived from a wide variety of tumors, including breast, prostate, lung and ovarian cancers.

Why would a major pharmaceutical company give away information that its researchers painstakingly uncovered? Put simply, if the goal is to speed the translation of data into drugs, it helps significantly to have more researchers looking at the data and identifying leads.

“Cataloging this type of information in a network like caBIG leads to a ready-made body of biologic information that can be mined by all cancer researchers to further everyone’s understanding of cancer,” explains Dr. Richard Wooster, Director of Translational Medicine Oncology, Research & Development at GlaxoSmithKline, in the company’s media release.  “In turn, we hope this data will further drive the identification of predictive biomarkers and lead to shorter, more directed clinical trials allowing us to bring drugs more quickly to patients who need them.”

Any researcher is free to download the GSK cancer data through caArray. The caArray tool is free and open source.

Science Commons is a strong believer in the utility of a commons-based approach to drug discovery, and this afternoon, John Wilbanks will give a talk at caBIG to discuss how data sharing agreements can help simplify, standardize and automate sharing. We have begun to explore implementing tools such as the CC0 waiver and our machine-readable contracts for transferring materials at caBIG, and we look forward to deepening our involvement as its legal and technical infrastructure continues to take shape.

Update: For another perspective on the giveaway, check out GSK’s big bang on open drug discovery [Business Standard via Rediff News]: “Big pharma claims that it costs as much $1 billion to bring a new molecule to the market and 8-12 years to develop it. That’s something that few companies can afford anymore. For developing countries, too, [open source drug development] may prove to be the route of the future.”

Pubic domain + community norms = freedom to integrate science

June 23rd, 2008

In the current issue of the Journal of Science Communication, our own John Wilbanks has a note explaining why Science Commons believes that the best — perhaps the only — way to integrate and make use of the exponentially growing number of scientific databases on the global digital network is to mark them explicitly as part of the public domain. This counters the trend toward using “copyleft” licenses for databases, which, despite the good intentions behind it, threatens the usefulness of the data.

“The public domain for science should be the first choice if integration is our goal,” writes Wilbanks, “and there are other strategies that show potential to achieve the social goals embodied in many common-use licensing systems without the negative consequences of a copyright-based approach.”

To help people and organizations mark their data and databases as free to use without restriction, Creative Commons has developed the CC0 waiver, while the Open Data Commons offers the ODC-PDDL. Using either public domain waiver puts you in compliance with the Science Commons Protocol for Implementing Open Access Data.

You can read the full note at the JCOM site, along with two other relevant pieces by our colleagues in the community:

Poynder interviews Leslie Chan: minding the 10/90 gap

June 20th, 2008

Richard Poynder today published yet another remarkable — nay, superbinterview. This time, his subject is Leslie Chan, whom Poynder describes as the “unsung hero” of the open access (OA) movement. Chan works tirelessly to increase the visibility and impact of scientific research from developing countries — one way to bridge the “10/90″ gap. What’s that, you ask?

Explains Poynder:

The 10/90 gap is the phenomenon in which 90% of the world’s R&D money is spent on the 10% of diseases that primarily affect people in developed countries, while only 10% is spent on diseases that mainly affect the 90% of people who live in the developing world. [...]

Of course there is more than one reason for this dollar-spend inequity (including the fact that Western-based pharmaceutical companies know they cannot make a large profit from selling drugs to treat diseases primarily affecting poor people), but since much of the research into the neglected diseases is undertaken in developing countries themselves, and the findings published in local journals with limited circulations, the relative invisibility of that research makes it far harder to get funding.

And since research tends to be a cumulative process — in which researchers build on the work of previous research in order to arrive at new understandings, and eventual breakthroughs — the invisibility (and consequent shortfall in funding) of [developing countries] research inevitably lengthens the time before cures are developed for the neglected diseases.

Science Commons is working to make it faster, easier and more cost-efficient to find cures for neglected and orphan diseases. On July 16-17, we’re holding a workshop in Barcelona, Spain, in conjunction with the ESOF 2008 conference. The aim:  to define the basic principles that would enable the emergence of global, collaborative infrastructure for accelerating research. We’re honored to have Leslie Chan join us to talk about OASIS, a resource to provide practical steps for implementing OA.

If you’re interested in coming to the workshop, we invite you to register here (the meeting is free and open to the public, but seating is limited). We hope to see you there.