Blog archive for September, 2008

OWLED 2008 — building the craft of reasoning on the Semantic Web

September 26th, 2008 by Kaitlin Thaney

Registration is now open for OWLED 2008 – OWL: Experiences and Directions – a workshop series for practitioners in industry and academia, tool developers and others interested in the ontology language, to describe real and potential applications, to share experience and to discuss requirements for language extensions/modifications. The forum will be held October 26-27 in Karlsruhe, Germany, co-located with the 7th International Semantic Web Conference (ISWC). The event is co-sponsored by Science Commons, and our own Alan Ruttenberg sits as the chair of OWLED.

Like its predecessors, the fifth workshop in the series aims at bringing users, implementors and researchers together in order to measure the current state of need against the state of the art, to share experience applying OWL and to set an agenda for language evolutions that satisfy users. With the specification of OWL 2 well in progress, the workshop will be an excellent opportunity to learn about, discuss and debate new developments in the language, and to give feedback to the working group.

OWLED is a forum for meeting people who have experience using OWL. Many of the designers and leading researchers who participated in the development of OWL attend, and the workshop is characterized by its interactive nature and the friendly sharing of knowledge.

To register, visit the ISWC Registration page and make your selection in the section Conferences Early Registration.

For more information about registration, including details on deadlines for discounted registration and student support, please visit this page.

Register today! The deadline for getting the best discount is coming up quickly – September 30. We hope to see you there.

Voices from the future of science: Matthew Cockerill of BioMed Central

September 18th, 2008 by dwentworth

Perhaps the best description of what scientific publishing can achieve in the digital age comes from Richard Poynder, the UK journalist who’s been interviewing leaders in the open access (OA) movement. In his interview with our own John Wilbanks, Poynder articulates the vision of a scholarly paper that is “no longer simply an article to be viewed by as many eyeballs as possible, but also the raw material for multiple machines and software agents to data mine, a front-end to hundreds of databases, and the launch pad for an ecommerce system designed to speed up the process of research.”

In this light, Poynder writes, OA is not an end in itself, but “the necessary precondition for a complete revolution in the way that science is done.”

BioMed Central (BMC) is a publisher that has taken bold steps to help realize this vision. A trailblazer in OA since the company’s inception, BMC recently launched the innovative BMC Research Notes, which aims explicitly to “complete the scientific record” by providing a venue for dark data and other kinds of useful information, and to make sure that the data are “published in standard, reusable formats and are exposed to ensure that they are searchable and easily harvested for reuse.” It is also investigating novel approaches for enriching the literature, exploring such areas as data mining and semantic technology. Under the leadership of Matthew Cockerill, who started off at BMC as its Technical Director, the company has seen remarkable growth, introducing new journals in cutting-edge fields, gaining impressive impact factors and surging in readership, submissions and revenue.

“I came into this from my own perspective as a biologist, trying to deal with a flood of data and results,” said Cockerill in an interview last year with Information World Review. “One of the founding reasons for BioMed Central was the idea that if you want to encourage the development of tools to analyze research results, as a basic starting point you need to ensure all the research is openly accessible. By opening up access to the original research, we’re helping the community develop better tools to work with that research.”

I talked with Cockerill about the progress we’re making in the OA movement, the factors behind BMC’s success and what he envisions for making OA research even more useful to scientists.
BMC has been a pioneer in road-testing models for making OA publishing sustainable, including introducing an institutional membership program. Can you tell us about some of the milestones you’ve reached? What’s driving your growth?

We’re super happy with how things are going right now — both submissions and access rates are on the rise. We’re now seeing 4 million article downloads per month, and it will be more than 50 million for the year. That’s a lot of access to research. Submissions are currently at about 1,800 per month, up from 1,500 at the same time last year. And it’s worth noting that in terms of submission rates, we expect the next year to be an even bigger growth period. The reason is simple: last year, we had one journal that got its first impact factor. This year, we have 13 journals in the same position! These include prominent titles such as Retrovirology, Journal of Translational Medicine, Biology Direct and BMC Biology. In our experience, the release of an official impact factor can triple submissions to a journal.

We now have a sustainable business model. Revenue has roughly doubled in the last 12 months — in part because we’re publishing more articles, and in part because we’ve structured our institutional membership fees so they’re proportionate to the value of the articles published, as well as realistic for covering the costs of publishing on a large scale. Our fees are still among the lowest for an open access option, and we’ve been able to maintain growth in terms of the absolute number of submissions.

Another important trend that’s going to drive submissions growth over the next year is existing journals switching across to open access with BioMed Central. Quite a few have done so already, coming from Taylor & Francis and other traditional publishers. And the pipeline for such transfers is growing rapidly, now that OA has proven itself as a viable model with many benefits for society publishers.

What’s behind the trend?

There are three factors driving it:

(1) Not all society journals are big money spinners — a large number only just break even, and this can be an unstable situation. With the majority of library serials budgets swallowed up by “big deals,” subscription cancellations are an ongoing worry for society journals. OA provides security and stability — since the publication costs for each article are covered, the society can be sure that the journal will not drain its resources.

(2) OA gives society journals the visibility that many don’t currently have, especially with a dwindling pool of subscribers. OA increases visibility and enhances dissemination, which fits well with the mission of scientific societies to advance communication within their field.

(3) Societies have, quite reasonably, taken a “wait and see” approach to open access. But now that OA journals are doing well, and some societies have led the way, the rest, I think, are seeing it as a much more viable option — just as authors no longer see choosing an OA journal as such a big risk.

So part of why we’re seeing the migration to OA is that even well-endowed libraries can no longer afford to keep the traditional publishing model afloat. And at the same time, OA is attractive to publishers because it promises to leverage the Web better for visibility and reputation building?

Yes. If you’re a publisher starting a new journal, or seeking to reinvigorate an existing journal and gain mindshare, how do you do it? Open access is the natural way to go.

All of this makes OA a much more dynamic, interesting and responsive publishing environment. It keeps us on our toes in terms of the competition, as well.

Looking ahead, do you think the Harvard faculty mandate can help change the culture and attitudes toward making OA happen — not just with words but also actions and investment?

Absolutely — the mandate at Harvard has already changed perceptions, but I think that the next steps from Harvard, in terms of practical actions, such as providing sustainable central funding for open access publication, will be even more profoundly influential.

It’s been encouraging to see the shift over the past few years in the way people think about funding OA. With OA taking off, initially there was a certain amount of concern from libraries and institutions — OA seemed to be turning from this small thing to a more substantial cost, which could be expected to grow from year to year. But now more institutions are recognizing that this is an important transition in scientific publishing. They’re working with us to bring about the changes necessary to move smoothly and progressively from a model of funding subscriptions that keep the results of research locked up, to funding the cost of publishing openly.

Foundations like HHMI and the Wellcome Trust led the way on this, explicitly setting aside and providing institutions with the extra funds to support OA costs. Now, other funders are following suit — the Research Information Network in the UK, for instance, has set up a working group that brings together publishers, university representatives and research councils to develop guidelines and best practices for efficiently channeling funds to OA. Individual institutions are also working on plans for central OA funds, along the lines of the funds at Imperial College, Nottingham and Amsterdam.

In the US, Harvard’s Stuart Shieber has signaled his commitment to putting OA journals on “equal footing” with subscription journals, calling attention to the need for institutional funding. And we also have the Berkeley Research Impact Initiative, which is funding OA and looking at the best way to make it work at the University of California.

Of course, each country has its own complications, and research and university funding in the US is very different from in the UK. So the change isn’t going to be easy, but it’s happening.

You’ve described opening access to research as laying the foundation for innovation in analyzing and working with the results. What are the next steps for enhancing the usefulness of published research?

Publication shouldn’t just be about putting words on paper, but about contributing to a structure of knowledge — more like contributing to an open source software project. We launched BMC Research Notes with a view to trying to catalyze some of the standards required for that. One of the things we want to do is enable people to share data sets — not only if they happen to fit into the narrative of a traditional paper, but just if they encapsulate useful knowledge.

To some extent, what we need to do next is connect basic scientific concepts with the kind of meaningful interconnections that exist on social network sites between people and their stuff. That isn’t easy stuff to do, in practice, but some of the underlying technologies (e.g. semantic representations like RDF) are increasingly getting towards the point of being practically useful — not least because of the massive enthusiasm for social networks, which has seen the world start looking at graph and network analysis algorithms as hot topics.

Can you elaborate on the connection you’re making between semantic technology and social networks?

The best way to explain what I mean is probably to refer you to a paper published in Genome Biology: Calling on a million minds for community annotation in WikiProteins. It’s about a project — WikiProteins — that implements some of these ideas in terms of mining the literature for conceptual relationships. The project is currently being developed to also use these conceptual relationships to connect researchers to one another. BioMed Central has worked extensively with the folks responsible for Wikiproteins, and I’m a co-author on the paper (as is Jimmy Wales of Wikipedia!).

For another perspective, Toby Segaran‘s excellent book, Programming Collective Intelligence: Building Smart Web 2.0 Applications, explores techniques to make use of information that can be mined from the pattern of connections in a network, social or otherwise. Perhaps the best known examples of this type of network mining in action are recommendation systems like Amazon’s “Recommended for You” feature, or Apple’s “Genius Playlists.”

Toby is now at Metaweb/Freebase, Danny Hillis‘s consumer-focused semantic web project. Although the extraction of semantic structure from scientific data is conceptually similar to this work in the consumer space, it doesn’t always get quite the same resources thrown at it — just as scientific computing does not get the same resources as Nintendo or Xbox. One recent case in point: Microsoft’s announcement that it is abandoning Live Academic Search, its never-quite-successful attempt at a Google Scholar-type service, because it wants to focus on consumer markets where it feels it is easier for the company to add value.

However, that said, scientific computing can and does benefit from the Nintendo/Xbox innovations, and I think the same can be said about the ingenuity being poured into developing social networks — ironically, some of the commercially-driven innovation in the consumer space gradually finds its way back to the scientific information space.

BMC is one of most prominent OA publishers using Creative Commons licenses. Do you have thoughts to share about your experience?

Returning to the theme of progress in OA, I’m happy that more people (including publishers) are starting to understand that open access doesn’t just mean not hitting a pay-wall at the publisher’s site. It’s about enabling people to get full use and value from the research that’s being conducted.

Creative Commons has it absolutely right with the idea of “some rights reserved” — providing a way for authors and publishers to allow a wide range of uses, while reserving the author’s right of attribution. There are an increasing number of scholarly publishers who embrace this principle of licensing open access content to encourage reuse. BMC is now involved in an initiative to to set up an association with these other OA publishers, analogous to associations which exist in the world of open source software. The development of such an open access publisher organization reflects the increasing importance of OA publishing, which goes beyond the activities of any one publisher.


Previous posts in this series:

Tim Hubbard on open science

September 10th, 2008 by dwentworth

Open licensing advocate and Science Commons friend Victoria Stodden is among those live-blogging the Access to Knowledge 3 (A2K3) Conference that’s wrapping up today, and she’s posted notes from a session at which the Sanger Institute’s Tim Hubbard argued for more data sharing in science — provided that the data are published in a way that makes the information (re)useful:

[Hubbard] says that openness in science needs to happen before publication, the traditional time when scientists release their work. But this is a tough problem. Data must be released in such a way that others can understand and use it. This parallels the argument made in the opening remarks about the value of net neutrality as preserving an innovation platform: in order for data to be used it must be open in the sense that it permits further innovation.

Nicely put.

You can check out more notes from the conference via the A2K3 conference blog.

Nature on Big Data

September 9th, 2008 by dwentworth

There’s plenty to recommend in Nature‘s special issue on Big Data (Sept. 3), but readers of this blog might especially appreciate The future of biocuration.

Here’s a glimpse:

Biology, like most scientific disciplines, is in an era of accelerated information accrual and scientists increasingly depend on the availability of each others’ data. Large-scale sequencing centres, high-throughput analytical facilities and individual laboratories produce vast amounts of data such as nucleotide and protein sequences, protein crystal structures, gene-expression measurements, protein and genetic interactions and phenotype studies. By July 2008, more than 18 million articles had been indexed in PubMed and nucleotide sequences from more than 260,000 organisms had been submitted to GenBank1, 2. The recently announced project to sequence 1,000 human genomes in three years to reveal DNA polymorphisms ( is a tip of the data iceberg.

Such data, produced at great effort and expense, are only as useful as researchers’ ability to locate, integrate and access them.


If you’re interested in exploring further, here are a few pointers to other relevant pieces (but be aware that the articles are available free for only two weeks from the publication date):

  • Community cleverness required (editorial) — “Researchers need to adapt their institutions and practices in response to torrents of new data — and need to complement smart science with smart searching.”
  • The next Google (special report) — “Ten years ago this month, Google’s first employee turned up at the garage where the search engine was originally housed. What technology at a similar early stage today will have changed our world as much by 2018?”
  • Welcome to the petacentre (feature) — “What does it take to store bytes by the tens of thousands of trillions? Cory Doctorow meets the people and machines for which it’s all in a day’s work.”
  • Wikiomics (feature) — “Pioneering biologists are trying to use wiki-type web pages to manage and interpret data, reports Mitch Waldrop. But will the wider research community go along with the experiment?”
  • How does your data grow? (commentary) — “Scientists need to ensure that their results will be managed for the long haul. Maintaining data takes big organization, says Clifford Lynch.”

Toward a global platform for open science

September 8th, 2008 by dwentworth

Watching the new video about the NeuroCommons project, I was struck by how many different elements are necessary for making knowledge from one domain in science interoperable — or “remixable” — with knowledge from another. 

In a recent piece on open science, Era of Scientific Secrecy Near End (LiveScience, Sept. 2), Cameron Neylon articulates the vision of a global platform to do just that, built using design principles from open source software:

“Making things more open leads to more innovation and more economic activity, and so the technology that underlies the Web makes it possible to share in a way that was never really possible before, while at same time it also means that [the] kinds of models and results generated are much more rich,” he said.

This is the open source approach to software development, as opposed to commercial closed source approaches, Neylon said. The internals are protected by developers and lawyers, but the platform is available for the public to build on in very creative ways.

“Science was always about mashing up, taking one result and applying it to your [work] in a different way,” Neylon said. “The question is ‘Can we make that as effective [for] samples [of] data and analysis as it [is] for a map and set of addresses for a coffee shop?’ That is the vision.”

That’s a vision Science Commons shares. The past ten years have brought the rise of a robust infrastructure for sharing and remixing cultural content, and thanks to the emergence of innovative tools like Google Maps, more people are grasping the power of open systems for connecting information from disparate sources to make it more useful. Yet we remain in the early stages of building an open infrastructure for science that would make it easy to integrate and make sense of research and data from different sources. 

The NeuroCommons is our effort to jumpstart the process, with the goal of making all scientific research materials — research articles, annotations, data, physical materials — as available and as useable as they can be. If you’re new to the project, we hope you’ll take a look at the video and let us know what you think.

Interested in open science?

September 5th, 2008 by dwentworth

If so, there’s a new discussion list you may want to join, brought to you by the good folks at the Open Knowledge Foundation.

Writes OKF’s Jonathan Gray:

As far as we could tell, there wasn’t a general mailing list for people interested in open science. Hence the new list aims to cover this gap, and to strengthen and consolidate the open science community.

We hope it will be a relatively low volume list for relevant announcements, questions and notes. We also hope to get as full as possible representation from the open science community — so please forward this to anyone you think might be interested to join!

How academic health research centers can foster data sharing

September 2nd, 2008 by dwentworth

PLoS Medicine today published a new paper that provides useful guidelines for people at academic health centers seeking to support scientific data sharing. The paper, Towards a Data Sharing Culture: Recommendations for Leadership from Academic Health Centers, discusses both the enormous benefits and the obstacles to forging a research culture that fosters data sharing, and outlines practical steps people can take to set the process in motion.

Here’s an excerpt summarizing the paper’s recommendations:

Recommendations for Academic Health Centers to Encourage Data Sharing

  1. Commit to sharing research data as openly as possible, given privacy constraints. Streamline IRB, technology transfer, and information technology policies and procedures accordingly.
  2. Recognize data sharing contributions in hiring and promotion decisions, perhaps as a bonus to a publication’s impact factor. Use concrete metrics when available.
  3. Educate trainees and current investigators on responsible data sharing and reuse practices through class work, mentorship, and professional development. Promote a framework for deciding upon appropriate data sharing mechanisms.
  4. Encourage data sharing practices as part of publication policies. Lobby for explicit and enforceable policies in journal and conference instructions, to both authors and peer reviewers.
  5. Encourage data sharing plans as part of funding policies. Lobby for appropriate data sharing requirements by funders, and recommend that they assess a proposal’s data sharing plan as part of its scientific contribution.
  6. Fund the costs of data sharing, support for repositories, adoption of sharing infrastructure and metrics, and research into best practices through federal grants and AHC funds.
  7. Publish experiences in data sharing to facilitate the exchange of best practices.

The paper, co-authored by Heather Piwowar, Michael Becich, Howard Bilofsky and Rebecca Crowley, was written on behalf of the caBIG Data Sharing and Intellectual Capital Workspace (you can read our previous posts about caBIG by following this link).

Progress on the CC0 public domain waiver

September 2nd, 2008 by dwentworth

Over at the main Creative Commons blog, Diane Peters has the scoop on draft 3 of the CC0 public domain waiver, a tool for those who wish to relinquish their rights under copyright to a work, and mark it with machine-readable metadata for harvesting as part of the public domain. It is this type of tool that Science Commons advocates using in our Protocol for Implementing Open Access Data, a method for legally integrating scientific databases regardless of the country of origin. The goal of the protocol, to use Catriona MacCallum’s phrase:  increasing the “Lego factor” for scientific data.

The news, in brief:  Creative Commons had added additional language to the CC0 waiver to ensure that it makes sense and can be useful for people across the globe. Explains Diane:

We remain dedicated to pursuing a Universal CC0, but with some substantial revision to the text. Here are a few of the changes you will see in draft 3 as a result of [the community’s] comments and discussions:

  • Inclusion of a Statement of Purpose that provides context and explanation for issues CC0 attempts to solve while also identifying limitations inherent in such an attempt;
  • Clarifying language about the IP rights affected by CC0 through a new comprehensive definition of “Copyright Related Rights”; and
  • Emphasis on the possible existence of privacy and publicity rights of others with respect to a work, and the need for those to be cleared where appropriate.

Creative Commons plans to take CC0 out of beta in late October or early November, and comments on this draft are due on September 26. If you’d like to check out the waiver or weigh in, visit the newly updated CC0 Wiki and subscribe to the cc-licenses mailing list.