NeuroCommons Project

Open Source Knowledge Management

At the core of Science Commons mission is the goal of developing and maintaining a free, open source data integration toolkit to make it easy for all scientists, academic and corporate, to get accurate answers to complicated queries about public databases.

Primary Activities and Achievements

Primary Architect of Neurocommons Knowledge Base:

– Integrated a number of existing public databases, including:

– The 2007 Medical Subject Headlines (MeSH), available for download at (also converted to SKOS (simple knowledge organization system, a family of formal languages designed for representing classification schemes, thesauri, taxonomies, etc.)
– Brain Architecture Management System (BAMS), an online resource for information about neural circuitry
– Brain Pharm and NeuronDB, databases of neuronal properties and agents that act on neuronal receptors in neurological research ,
– Entrez Gene, a searchable database of genes (for PubMed references)
– Gene Ontology Annotations (GOA) @ EBI, a database of GO annotations for proteins specifically in the UniProt Knowledgebase and International Protein Index
– Gene Ontology (GO), a collection of category relationships in a controlled vocabulary to describe gene and gene product attributes
– HomoloGene, a system of automated detection of homologs among annotated genes of sequences eukaryotic genomes, developed by the National Center for Biotechnology Information (NCBI)
– Mammalian Phenotype Ontology, an ontology of standard terms of annotating mammalian phenotypic data
– Medline 2007 baseline, a collection of raw files for text mining and public use
– Neurocommons text mining pilot, the starting point of our proof-of-concept project “the Neurocommons”, aimed at mining the text from neuroscience-related PubMed abstracts for addition to the knowledgebase
– Rat genome database, a collection of rat genomic data
– Semantic Web Applications in Neuromedicine (SWAN), a project devoted to developing knowledge bases for Alzheimer Disease research using Semantic Web technology

– Developed and deployed a search engine for the knowledge base, found at

-The search engine allows for such questions to be asked:

• “from what’s been published in journals and databases, what signal transduction genes may be active in pyramidal neurons?”
• “from what’s been published in journals and databases, what plasmids are available under standard contracts for Alzheimer’s disease?”

– Continue to develop and extend knowledge base through various activities, including:

– Encourage deployment of mirror sites worldwide

• Ireland, Bulgaria, Poland, Australia have developed sites, and more are in process
• Build “push button” installation scripts to facilitate global mirroring

– Deploying scripts and systems to refresh content, ensuring the knowledge base remains current

– Developed proof of concept “data mashups” demonstrating the power of open source software + open data

-Google Maps / Allen Mouse Brain Atlas demo (at )

• Known as the Google “Moupse” to represent the intersection of Google Maps and the Mouse Brain images from the Allen Brain Atlas, showing the potential for combining open data and in this case, Google Maps’ API

– Developed and deployed open namespaces for Semantic Web data integration and queries

– Built resolutions for major public databases like PubMed and Entrez, which can be found at http://purl.org/science/owl/sciencecommons

– Built namespace for “realist” descriptions of experiments with the National Center for Biomedical Ontology and the Open Biological Ontology Foundry. An example of a resolved name can be found at http://sw.neurocommons.org/cgi-bin/obiterm?ref=OBI_0000225

Developed a Process for Text Mining of Scientific Writings:

– Tested utility of text mining on scientific journal articles by mining the PubMed abstracts for gene and protein relationships, using a commercially available product targeted at the pharmaceutical industry:

– Parsed 16,000,000 abstracts, classified 874,727 as related to the central nervous system (CNS), software recognized 368,688, extracted relationships from 94,381

– Created graph of 30,000 relationships among 5,500 genes and proteins, describing gene-protein relationships in the brain and nervous system

– Integrated the aforementioned knowledge into the Neurocommons knowledgebase, serving as the annotations for Science Commons’ open source analytics platform, the Neurocommons

Current Activities and Future Achievements

Extending the Knowledge Base:

– Launched a community text mining challenge to enhance the knowledgebase for rare brain diseases

– Collected focus set of 2,000 fulltext papers on Huntington’s Disease

– Distributed to key members in the community to mine as an exercise

– Launched a project to classify patents that are likely to restrict access to generic testing and diagnostics

– Currently exploring using text mining and spam-filtering technology on patents on the university level – in progress.

– Liberated valuable genomic software into the commons

– Negotiated the open-source release of award-winning, peer-reviewed software written by Millennium Pharmaceuticals (to run on top of the Neurocommons knowledgebase).

– Currently performing code review and extension to ensure the software runs correctly on the Neurocommons knowledgebase and preparing code for open source release

Built and Continue to Foster Strong Community Partnerships

– Ontology for Biomedical Investigations (OBI) / the National Center for Biomedical Ontology (NCBO)

– Bioinformatics Research Network (BIRN)

– Massachusetts Institute of Technology (MIT)

– The NeuroInformatics Framework (NIF)

– The Society for Neuroscience (SfN)

– The World Wide Web Consortium (W3C)