PubChem, and Competition

June 6th, 2005 by John Wilbanks

I’ve been meaning to blog this for a while.

There was an article in Science (registration or one-time fee required for access) a few weeks ago. If you aren’t at an institution with access to Elsevier journals, you may want to check out the page on pubchem.

In a nutshell, there’s an organization called the American Chemical Society, which has a division called the Chemical Abstracts Service, or CAS. CAS operates at a budget of over $200M a year and has about 1200 employees according to the various press tagged at Connotea. They provide a database to their customers full of small molecule information – about 25M compounds, derived from the chemical literature and other places. In my bioinformatics days, everyone I talked to in the pharma industry was a customer of CAS. It was a constant in the discovery equation, not a variable.

And there’s good reason for that. Detailed information on small molecules is essential to drug discovery and research (think about it – a lot more drugs have come out of the canon of chemistry science and publishing than out of the genome, at least to date!). And if you’re going to bet hundreds of millions of dollars on a clinical trial you are probably going to splurge and get the best tools on the market for the task. This is why that Science article said that CAS is a “significant contribution to the society’s $317 million in annual revenue from publications.”

Now, public databases are key for work like the BioDASH demo at the World Wide Web Consortium. They are used to link up the biological and chemical worlds through the web and semantic web, and they are a rich resource for those researchers who don’t have perhaps the funding to get all the tools that pharma can afford. A lot of academic research is aimed simply at identifying a mechanism of a disease, and perhaps some hints as to what kind of chemicals might – might! – have some therapeutic impact. It’s a long way from a drug. This is why these researchers don’t have the funding of a pharma behind them, and it’s why pharma spends so much more money (and takes so much more risk). I’d always seen this reflected in the public-private database split as well. Public databases help public researchers (and, cough, venture-funded bioinformatics entrepeneurs) get farther with less expense; private databases are what you use when you lay down a Massive Bet On A Molecule.

Along comes a US government initiative called PubChem. PubChem comes from the National Institute of Health’s Molecular Libraries Roadmap Initiative to provide, for “public sector biomedical researchers,” a database with information about the biological activity of small molecules. Like all the US government databases at the NCBI it will be integrated with the existing body of knowledge contained in the Entrez search system, and like all the US databases there, it will be free to all. No logins, full data dumps allowed with no charge or clickthrough agreement. It’s a small database (~650,000 compounds, growing) with a small staff and a small budget. It combines new, raw data generated by the roadmap initiative with data available from other public sources. Government-funded raw data + public data. That’s PubChem.

Reaction from CAS and ACS was swift. Robert Massie, President of CAS, said “If NIH would limit itself to publishing NIH-funded information, this controversy would disappear immediately.” Brian Dougherty, senior adviser to the chief strategy officer at ACS: “We think this is going to put us out of business if it keeps growing and no parameters are set.”

The folks behind PubChem have spoken out as well. Christopher Austin, senior adviser to the translational research director at the NIH Chemical Genomics Center at the National Human Genome Research Institute, stated that killing PubChem “would have profoundly negative effects on this new paradigm of making medical discoveries, right at the time that it is just getting started…Unfettered access to a large number of different types of information is what allows fundamental new discoveries to be made.” (emphasis mine)

It’s not just those at NCBI getting in the fight. Richard Roberts, 1993 Nobel Prize for Physiology or Medicine, recently resigned from the ACS over this dispute. Here’s an excerpt: “The recent legal actions against Google have also disturbed me very much, but the current opposition to PubChem is reprehensible and without any redeeming merit.” (He’s talking about ACS’s other current fight, a lawsuit against Google Scholar.)

This is one of two important database discussions in the US, the other being the fight against open access to weather data. It will be interesting to watch this all move forward, to say the least.

One Response

  1. David Bradley, on April 18th, 2007 at 10:56 am

    There’s yet another chemical structure search engine on the scene now. ChemSpider Chemical Searching is different though. It is essentially a meta search engine that pools all the various academic, commercial and free-for-all structure databases on the web and allows you to search across all of them using chemical names, Smiles strings, and InChI’s all from a single box. PubChem ChEBI and others are searchable allowing you to discover what molecules are online. Even if at the end of the search you have to pay for access to a non-OA database at least you will know the structure exists and can exclude or include it in your follow-up work

    David Bradley