Blog archive for May, 2008

Towards “research in a box”

May 13th, 2008 by dwentworth

At Science Commons, we want to bring the same efficiency to scientific research that the Web brought to commerce. Our Materials Transfer Agreement project isn’t just about contracts — it’s about bringing together all the resources on the Web for finding and ordering materials and getting towards one-click access, with the goal of accelerating discovery.

Chris Kronenthal of the Coriell Institute for Medical Research has an article this week in Bio-IT World that explores the role of “biobanks” in scientific innovation, including a description of our MTA project that puts it in a broader context:

In [fostering growth], biorepositories will have two primary contributions. The first, likely industry changing, will be that of providing “research in a box.” Modern, matured biorepositories have come a long way in streamlining the many processes involved in R&D (materials processing, storage and management, consent management), allowing researchers to focus on tracking their own results. With solid platforms for distribution, like Coriell’s first-of-a-kind Google (“Mini”) driven eCommerce catalogue of specimens and data, researchers can quickly identify which subjects they are interested in, procure said samples, and download phenotypic, genotypic, and any other relevant knowledge pool data.

In an effort to spur progress by reducing the barriers on the distribution of materials for research, too often locked away in various biobanks, organizations such as Science Commons have recognized the need to standardize current hurdles such as locating specimens across various biobanks and the authorizing of material transfer agreements (or MTAs), thus providing a level of accessibility and fluidity to the normally snag-prone process. […]

[Science Commons VP] Wilbanks is clear on the pivotal role that biorepositories will play in furthering research and personalized medicine: “Right now, we’re stuck in a pre-industrial culture of tool making and transfer, where scientists have to beg labs to stop doing research and start making tools… It’s absurd that tool making is slowing down even a single experiment if there’s a way to avoid it. We have the tools, the technologies and the legal systems to bring all the benefits of eCommerce to biological tool making – it just takes the willpower of [donors] and universities – but the entire system rests on biobanks for fulfillment. Scientists don’t get grants for fulfilling orders for cells.”

You can read the entire piece here.

Update (May 14): Plausible Accuracy responds: “It’s amazing to me that it’s taken this long to sort of start generating significant interest in validated, standardized, open repositories.  The clones, cell lines, mice, etc that we generate in great quantities need a better method of sharing and distribution than some antiquated version of quid pro quo.”

How to free your facts

May 12th, 2008 by dwentworth

With the open access movement surging — and the discussion surrounding open data gaining velocity — we’re getting more emails with questions about how best to share collections of factual data. One of the most common questions: How do I mark my data explicitly as “open access” and free for anyone to use?

In general, we encourage you to choose waivers, like the Open Data Commons Public Domain Dedication and License (ODC-PDDL) or the Creative Commons CC0 waiver, rather than licenses, such as CC-BY, FDL or other licenses.

The issues surrounding how to treat factual data are complex. To help bring more clarity for those of you exploring your options, here’s a short overview of the reasons why we generally advise using waivers, prepared by Science Commons Counsel Thinh Nguyen.

Facts are (and should be) free
There is long tradition in science and law of recognizing basic facts and ideas as existing in the public domain of open discourse. At Science Commons we summarize that by saying “facts are free.”

Of course you can patent some ideas, but you can’t stop people from talking about or referring to them. In fact, the patent system was established to encourage public disclosure of facts and ideas, so that we can discuss them in the open. When Congress wrote the Copyright Act, it made sure to spell out that facts cannot be subject to copyright. “In no case does copyright protection for an original work of authorship extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery, regardless of the form in which it is described, explained, illustrated, or embodied in such work.” (Section 102(b) of the United States Copyright Act)

And there are good reasons for this. Imagine if you couldn’t reference physical constants — like the height of Mount Everest — without permission. Imagine you couldn’t use the laws of gravity to calculate without attributing Isaac Newton each time. Or if you had to get a license from the heirs of Charles Darwin to talk or write about evolution. Such a world would be absurd, and we can easily understand why. We all need access to a basic pool of ideas and concepts in order to have any kind of meaningful discourse. So copyright is supposed to protect creative expression–the unique and individual ways we express ourselves–but not the invariant concepts and ideas that we need to think and carry on a conversation.

Licensing facts can cause legal uncertainty and confusion
So why is it that increasingly, especially online, there is talk about licensing factual data–assertions of rights and obligations over assertions of facts? Part of the answer is that as facts get represented in formats that look more like computer code, the impulse is to treat it like any other computer code. And that means putting a license on it. Part of the answer is that the law is still struggling with how to treat databases, and in some countries, database rights have expanded (particularly in Europe under the database directive). Other countries have loosened copyright standards to allow purely factual databases to be protected. (For a more detailed discussion of these issues, see the Science Commons paper, Freedom to Research: Keeping Scientific Data Open, Accessible, and Interoperable [PDF].)

But even if you could find a legal angle from which to impose licensing or contractual controls over factual data, why would you want to? Doesn’t this just create the very absurdity that Congress and the scientific tradition have been able to avoid for many years?

Attribution for facts can add complexity and hamper reuse
Many people cite the desire to receive attribution. In scientific papers, we have a tradition of citing sources for facts and ideas. But those traditions evolved over hundreds of years. There’s a lot of discretion and judgment that goes into deciding whom to cite and when. At some point, you don’t need to cite Isaac Newton any more for the formula for gravity, or Darwin for the idea of evolution. Sometimes you do, and sometimes you don’t need to, but that’s a matter of common sense. But what happens to common sense when you convert that requirement into a legal requirement? Can a license represent these complex norms and traditions? We don’t think so.

Imposing licensing on data creates all kinds of unanticipated problems. If you have a database with thousands or hundreds of thousands of pieces of facts, does each fact have to come with their own attribution and licensing data? How do we aggregate and recombine such data? If we use a tiny piece of that data to make an assertion about the world–to carry on a discourse–do you still have to attribute, and how far does that obligation go? In the future, will every database need its own database of attribution? Will every book need another book in which every word and idea and fact comes with its own genealogy detailing how it made its way through various databases, web sites and so on?

This problem, which we call “attribution stacking,” can saddle science with an unbearable administrative burden. It could shut down present and future sites that aggregate and federate data from many different sources. It could stifle entire fields of research that rely on summarizing, annotating, translating and integrating many different kinds and sources data.

The solution: use a waiver for factual data, not a license or contract
Can licensing facts create its own technological absurdities? We think it can, and it will unless we resist the impulse to license. We think the best answer is to go back to what scientists themselves have been doing for centuries: giving attribution without legal requirements. We think Congress got it right when it excluded facts and ideas from copyright protection. And we think it should stay that way, even when those facts happen to get incorporated into databases. That’s why we published the Science Commons Data Protocol and the accompanying FAQ.

We hope that if you are preparing to publish a compilation of factual data, you will choose to waive any rights to the data, whatever they may be.

Harvard Law School goes open access

May 8th, 2008 by dwentworth

As a Berkman Center alum, I’m especially excited to share the news that the faculty of Harvard Law School has voted unanimously to implement an open access mandate (full text here).

The Berkman Center is the wellspring of Creative Commons, and here at Science Commons, we work to make legal scholarship open and accessible to all. The decision, which comes in the wake of the historic vote for open access by Harvard’s Faculty of Arts and Sciences, makes Harvard Law School the first law school to enact an open access mandate.

Here, a brief round-up of commentary:

Peter Suber, open access leader: “This is not only another university OA mandate, and the first for a law school, but another unanimous faculty vote for an OA mandate. The unanimous faculty support makes a very good development positively beautiful.

John Palfrey, the Berkman Center’s executive director and newly appointed Vice Dean for Library and Information Resources at Harvard Law School, who proposed the mandate to the faculty: “The acceptance of open access ensures that our faculty’s world-class scholarship is accessible today and into the future.”

Robert Darnton, director of the Harvard University Library: “That such a renowned law school should support Open Access so resoundingly is a victory for the democratization of knowledge. Far from turning its back to the outside world, the HLS is sharing its intellectual wealth.”

Tim Armstrong, a former Berkman fellow and current Assistant Professor of Law at the University of Cincinnati College of Law: “As John Willinsky has explained, open access is a force multiplier for scholarship: it correlates with increased influence (as measured by citations) and broader scholarly impact as compared with work published only in closed or proprietary fora.”

Gene Koo, a Berkman fellow and Director of Online Training at Legal Aid University: “[Legal] scholarship has the potential to leap forward by large bounds with policies like Harvard’s in place.”

We agree with David Weinberger: Yay! Congratulations to everyone involved.

National Cancer Institute to use Tranche Network to share data

May 2nd, 2008 by Kaitlin Thaney

The National Cancer Institute will soon be using Tranche to store and share mouse proteomic data from its Mouse Proteomic Technologies Initiative (MPTI). Tranche, a free and open source file sharing tool for scientific data, was one of the earliest testers of CC0. Many thanks to Tranche for providing us with such valuable early feedback on CC0.

From GenomeWeb News:

The MPTI collects tissue and serum measurements from mouse models of different types of cancers using analytical techniques such as mass spectrometry. Tranche researchers, along with University of Michigan researcher Philip Andrews, deposited nearly 1 terabyte of MPTI raw data into the Tranche network, where it can be shared between participating researchers.

The dataset is now being released in publicly accessible formats as well and is available to others in the research community. Because of the encryption used on the site, data on Tranche can be privately used by labs with access to the information until it is ready to be released to the public.

Congratulations to everyone over at Tranche and keep up the good work!

New consensus for defining open access

May 1st, 2008 by dwentworth

Even among those who follow developments in the open access (OA) movement closely, there is sometimes confusion over definitions. Does open access publishing mean placing the work online without price barriers (for free) — or must you also remove permission barriers (for instance, by adopting a Creative Commons license that permits reuse without permission)?

Earlier this week, open access leader Peter Suber and “archivangelist” Stevan Harnad reached consensus on terms to describe these two forms of open access: “weak” OA (removing price barriers alone) and “strong” OA (removing price and permission barriers). Explains Suber:

There are two good reasons why our central term became ambiguous. Most of our success stories deliver OA in the first sense, while the major public statements from Budapest, Bethesda, and Berlin (together, the BBB definition of OA) describe OA in the second sense. […]

We have agreed to use the term “weak OA” for the removal of price barriers alone and “strong OA” for the removal of both price and permission barriers. To me, the new terms are a distinct improvement upon the previous state of ambiguity because they label one of those species weak and the other strong. To Stevan, the new terms are an improvement because they make clear that weak OA is still a kind of OA.

On this new terminology, the BBB definition describes one kind of strong OA. A typical funder or university mandate provides weak OA. Many OA journals provide strong OA, but many others provide weak OA.

Forging agreement on the terms “weak” and “strong” OA is a promising development. Not only could it bring more clarity to the discussion about open access in the community, it could also help more people understand intuitively that there is a spectrum of openness, and choices you can make to maximize the value of that openness.

For further discussion, check out Why weakOA and strongOA are so important, What is strongOA? and Klaus Graf on what is strongOA over @ Peter Murray-Rust‘s blog.

Update (May 6): Stevan Harnad: “[We] are looking for a shorthand or stand-in for ‘price-barrier-free OA’ and ‘permission-barrier-free OA’ that will convey the distinction without any pejorative connotations for either form of OA.”

Peter Suber: “Stevan is right.  Last week we introduced terms (‘weak’ and ‘strong’ OA) to describe an important and widely recognized distinction.  But the terms were infelicitous and we’re still looking for better ones…The effort here is not to make any kind of policy recommendation, but simply to achieve new clarity in talking about different policy options.”

Rockefeller U. Press Uses CC Licenses to Reduce Permission Barriers

May 1st, 2008 by Thinh

Leading by example, the Rockefeller University Press has issued a bold challenge to other non-OA publishers to find new ways to strike a balance between sustainable publishing and advancing authors’ freedoms and the public interest. The Press adopted a new copyright policy that returns essential freedoms to authors and extends permissions to the public that are vital to advancing science. This new policy covers its journals, which include the prestigious Journal of Cell Biology, The Journal of Experimental Medicine and The Journal of General Physiology.

Under the policy, there are two license periods. An initial license, available during the first six month period after publication, permits sharing and reuse of the work, but prohibits distribution through mirror sites (whether commercial or non-commercial). After this six months, the Press grants the public a standard Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License. These two licenses differ only in the mirroring prohibition clause — otherwise, the conditions are essentially similar.

The new policy covers all of the Press’s archives as well. This opens up a rich resource to text-mining and knowledge integration, using technologies such as our Neurocommons project. This allows the corpus of scientific knowledge to be upgraded to take advantage of the Web. That opportunity that been largely missed for vast tracts of the scientific literature, not due to lack of interest or technological means, but due to the lack of access and copyright permission.

The significance of this announcement lies not only in the importance of the journals involved, but also in demonstrating that we need not yield to the false dichotomy between sustainability and access. Finding ways to strike a reasonable balance requires forward-thinking leadership. By going beyond what the NIH Public Access Policy requires and using Creative Commons licenses to remove not only access but permission barriers, the Press is demonstrating that leadership and its commitment to the interests of the community that it serves.

Here’s an excerpt from the terrific editorial by Emma Hill, Executive Editor, The Journal of Cell Biology and Mike Rossner, Executive Editor, The Rockefeller University Press:

Preying on authors’ desire to publish, and thus their willingness to sign virtually any form placed in front of them, scientific publishers have traditionally required authors to sign over the copyright to their work before publication. […]

At The Rockefeller University Press, we have followed this tradition in the past and obtained copyright from authors as a condition of publication. Several years ago, however, we recognized that the advent of the internet had irrevocably changed the nature and mechanisms of knowledge distribution, and we returned some of those rights to authors. Since July 2000, we have allowed our authors to freely distribute their published work by posting the final, formatted PDF version on their own websites immediately after publication.

With the growing demand for public access to published data, we recently started depositing all of our content in PubMed Central. In a further step to enhance the utility of scientific content, we have now decided to return copyright to our authors. In return, however, we require authors to make their work available for reuse by the public. Instead of relinquishing copyright, our authors will now provide us with a license to publish their work. This license, however, places no restrictions on how authors can reuse their own work; we only require them to attribute the work to its original publication. Six months after publication, third parties (that is, anyone who is not an author) can use the material we publish under the terms of the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License ( […]

We are pleased to finally comply with the original spirit of copyright in our continuing effort to promote public access to the published biomedical literature.