Data: Gold in the archives

June 5th, 2006 by John Wilbanks

Howard Hughes Medical Bulletin on the need to preserve primary data, not just the published data…

There’s a wasteful data economy out there, and if we don’t save primary data, we’ll just have to pay someone to generate it again, and again, and again. The good news is that re-mining data works:

“Her quest was to fill knowledge gaps in the neural circuitry of the roundworm Caenorhabditis elegans—that ubiquitous experimental-model organism. Chen indeed “discovered” several new neural synapses and neuromuscular junctions, but she did it without so much as lifting a pipette or looking through a microscope.”

The bad part:

“A quiet crisis looms in many labs as the volume of data generated by large-scale science grows at an alarming rate.”

2 Responses

  1. Research Cooperative, on October 7th, 2006 at 6:36 pm

    The volume of new data is alarming, in some areas of research, but also alarming is the lack of new data, in many other areas of research. The crisis lies in the question of how to recognise priorities in research, and how to balance the benefits of centralised large-scale science, and decentralised small-scale science. The internet in theory should allow small-scale research projects to link up to related projects world-wide, and thus achieve more effect.

    One starting point would be to comprehensively index existing research journals and newsetters worldwide, their approximate geographical reach (distribution), and the contact details for submissions and back issues. Effective publication, and effective data-mining, are two sides of the same coin.

  2. AJ Chen, on December 12th, 2006 at 3:42 pm

    Opening up source data on the web by researchers themselves is what’s needed. The open data are freely shared and thus will reduce the time of repeating experiments. To encourage researchers to open raw experiment data, easy-to-use web publishing tools need to be available for end users (i.e. researchers). And community search engines need to be built so that researchers can see immediate benefits from the search engines as they explore the open data approach.

    The scientific publishing task under W3C HCLS group is exploring semantic web for open data. Demo for open data tool and search engine is available at

    AJ Chen