Nature on Big Data

September 9th, 2008 by dwentworth

There’s plenty to recommend in Nature‘s special issue on Big Data (Sept. 3), but readers of this blog might especially appreciate The future of biocuration.

Here’s a glimpse:

Biology, like most scientific disciplines, is in an era of accelerated information accrual and scientists increasingly depend on the availability of each others’ data. Large-scale sequencing centres, high-throughput analytical facilities and individual laboratories produce vast amounts of data such as nucleotide and protein sequences, protein crystal structures, gene-expression measurements, protein and genetic interactions and phenotype studies. By July 2008, more than 18 million articles had been indexed in PubMed and nucleotide sequences from more than 260,000 organisms had been submitted to GenBank1, 2. The recently announced project to sequence 1,000 human genomes in three years to reveal DNA polymorphisms ( is a tip of the data iceberg.

Such data, produced at great effort and expense, are only as useful as researchers’ ability to locate, integrate and access them.


If you’re interested in exploring further, here are a few pointers to other relevant pieces (but be aware that the articles are available free for only two weeks from the publication date):

  • Community cleverness required (editorial) — “Researchers need to adapt their institutions and practices in response to torrents of new data — and need to complement smart science with smart searching.”
  • The next Google (special report) — “Ten years ago this month, Google’s first employee turned up at the garage where the search engine was originally housed. What technology at a similar early stage today will have changed our world as much by 2018?”
  • Welcome to the petacentre (feature) — “What does it take to store bytes by the tens of thousands of trillions? Cory Doctorow meets the people and machines for which it’s all in a day’s work.”
  • Wikiomics (feature) — “Pioneering biologists are trying to use wiki-type web pages to manage and interpret data, reports Mitch Waldrop. But will the wider research community go along with the experiment?”
  • How does your data grow? (commentary) — “Scientists need to ensure that their results will be managed for the long haul. Maintaining data takes big organization, says Clifford Lynch.”

Comments are closed.