Title: P1253814652aeSXf
1The Human Metabolome Database (HMDB) Dan Tzur,
Kevin Jeroncic, Kevin Jewell, David Block, Craig
Knox, Roman Eisner, An Chi Guo, Paul Stothard,
Ian Forsythe, Savita Shrivastava, Russ Greiner,
David Wishart, University of Alberta, Edmonton,
Canada
Abstract The Human Metabolome Database (HMDB) is
a freely available electronic database containing
detailed information about small molecule
metabolites found in the human body. The database
is designed to contain or link three kinds of
data 1) chemical data, 2) clinical data and 3)
molecular biology/biochemistry data. The database
currently contains more than 1400 metabolite
entries including both water-soluble and
lipid-soluble metabolites as well as metabolites
that would be regarded as abundant (gt1 mM) or
relatively rare (lt1 nM). Additionally, more than
3,000 protein (and DNA) sequences are linked to
these metabolite entries. Each MetaboCard entry
contains more than 80 data fields with half of
the information being devoted to
chemical/clinical data and the other half devoted
to enzymatic or biochemical data. Many data
fields are hyperlinked to other databases (KEGG,
PubChem, MetaCyc, ChEBI, PDB, Swiss-Prot,
GenBank) and a variety of structure and pathway
viewing applets. The database supports extensive
text, sequence, chemical structure and relational
query searches. The HMDB is available at
http//www.hmdb.ca
Introduction
Browsing the HMDB
A GenBank for the Metabolome
Searching the HMDB
Metabolomics involves the rapid, high throughput
characterization of the small molecule
metabolites found in an organism. Since the
metabolome is closely tied to an organisms
genotype, metabolomics offers a unique
opportunity to look at genotype-phenotype
relationships. Metabolomics is increasingly being
used in a variety of health applications
including pharmacology, pre-clinical drug trials,
toxicology, transplant monitoring, newborn
screening and clinical chemistry. However, a key
limitation to metabolomics is the fact that the
human metabolome is not at all well characterized
nor is the existing information particularly well
archived or easily accessible.
A similar situation also existed in the early
days of the Human Genome Project, where almost no
centralized resources existed that contained any
information about gene or protein sequences.
Indeed, most sequence databases were private
collections archived and maintained by individual
labs or underpaid students. With the creation of
such publicly accessible databases such as
GenBank, Swiss-Prot and the PIR, along with the
placement of these databases on the World Wide
Web, the fields of genomics and proteomics were
fundamentally transformed. Our objective with the
Human Metabolome Project is to create a GenBank
for the metabolome, with the hope that a freely
accessible, current and well archived resource
could be as transformative to metabolomics as
GenBank has been to genomics. Our GenBank is
called the HMDB.
Fig. 2 The HMDB may be browsed, viewed and sorted
using a variety of pull-down options similar to
those seen in PubMed. Each metabolite is listed
in a synoptic table with data on names,
structure, MW, etc.
Fig. 4 The HMDB may be searched by name, chemical
structure, by sequence (using BLAST) or in a
relational manner (using the data extractor).
This allows detailed querying of the database at
many levels by users with different needs or
skills.
Metabolomics Proteomics Genomics
1400?? Chemicals
The HMDB MetaboCard
Database Description
Using the HMDB
The HMDB currently contains more than 1400
metabolite entries including both water-soluble
and lipid-soluble metabolites. The number of
metabolites continues to grow as the number of
metabolites identified through experimental and
archival research grows. The HMDB is structured
in a card format similar to GeneCards to
facilitate rapid searching and easy browsing of
the data contained within the database. The HMDB
is intended to provide more than just chemical
(or metabolite) data. It is also designed to
provide detailed information about the medical,
clinical and biochemical relevance of each
metabolite. This is done by linking the
metabolite data to a large number of external
databases and by using a variety of sources to
extensively annotate each of the 80 data fields
that are typically associated with each
metabolite in the database. Most of the HMDB is
manually curated.
The HMDB may be used for both educational and
research purposes. At one level, the HMDB
contains most of the known information about
human metabolism. Users may search for compounds,
enzymes, metabolic pathways, etc. to learn more
about their role in metabolism, catabolism and
disease. Additionally, users may use the database
to ID new metabolites, to compare measured
concentrations with reference concentrations, to
identify possible diseases or disorders and to
learn more about new biomarkers. The HMDB is not
a finished database. It continues to expand and
the features and content continue to change. We
would encourage users to provide us with their
feedback. The HMDB is available at
http//www.hmdb.ca
3000 Enzymes
25,000 Genes
Figure 1. The Pyramid of Life, a diagram
illustrating the relationship between
metabolomics, genomics and proteomics.
It is estimated that only ¼ to ½ of endogenous
human metabolites in blood or urine have been
positively identified. Of those that have been
identified, very few have any information on
their normal concentration ranges, biological
role or associated pathways. Furthermore, there
is no central, electronic repository that lists
this kind of information about the metabolome.
Fig. 3 To obtain more information about a given
metabolite users should click on the MetaboCard
button seen in the browser view. Each MetaboCard
entry contains more than 80 data fields with half
of the information being devoted to
chemical/clinical data and the other half devoted
to enzymatic or biochemical data. Many data
fields are hyperlinked to other databases and a
variety of structure/pathway viewing applets.