Title: BiGG: Biochemical, Genetic and Genomic Database
1BiGG Biochemical, Genetic and Genomic
Database Jun Young Park1, Jan Schellenberger2,
Tom M. Conrad3, Bernhard Ø. Palsson1,2 1Departme
nt of Bioengineering, University of California
San Diego jyp007_at_ucsd.edu bpalsson_at_ucsd.edu
2Bioinformatics Program, University of
California San Diego jschelle_at_ucsd.edu
3Department of Chemistry and Biochemistry,
University of California San Diego
tconrad_at_ucsd.edu
We describe BiGG, a database of Biochemically,
Genetically and Genomically structured
genome-scale metabolic network reconstructions.
BiGG integrates several published genome-scale
metabolic networks into one resource with
standard nomenclature which allows components to
be compared across different species.
Furthermore, BiGG contains links to several
publicly available databases where additional
information can be found and integrated. In
addition, BiGG contains a customized export tool
that enables the generation of SBML files for
further network analysis by external software
packages. BiGG addresses a need in the systems
biology community to have access to high quality
curated metabolic reconstructions.
ABSTRACT
BROWSING
DATABASE CONTENTS
EXPORTING
Introduction
Reactions
The BiGG database is capable of exporting
reconstructions in SBML format. This XML format
is widely used for distributing systems biology
models. The user has several options to
customize export on the Web.
Reactions may be searched for by name, EC number,
or associated gene as well as by using the model
name as the only search parameter. Specifying
compartment, pathway, or metabolite participation
is also a possibility. Results may be limited by
only including reactions with known gene
associations, high or low confidence, or by
excluding transport reactions. In addition,
reactions may be searched across reconstructions
allowing for model comparison. Lists of
reactions matching a set of criteria may be
exported as a tab delimited flat file. The
exported files can contain information for
multiple models, simplifying model comparison.
The last ten years have seen the emergence of
many genome-scale metabolic reconstructions.
These manually-curated, component-by-component
(bottom-up) reconstructions of genomic and
bibliomic data have lead to a biochemically,
genetically and genomically structured (BiGG)
knowledgebase.
Compartmentalization
A compartment in a metabolic reconstruction has a
distinct pool of metabolites and a set of
reactions which may be unique to that
compartment. By default, reactions and
metabolites are compartmentalized in the models
meaning they exist in distinct compartments such
as the Cytosol or the Golgi. The user can choose
the model to be partially decompartmentalized
or fully decompartmentalized. If partially
decompartmentalized, reactions and metabolites
ordinarily assigned to subcompartments of the
Cytosol (Mitochondria, Peroxisome, etc) are
instead assigned to the Cytosol, while the
Extraorganism compartment is untouched. In a
fully decompartmentalized model, there are no
compartments and all reactions and metabolites
exist in an unsegregated single-compartment
system.
Such reconstructions are of interest for their
detailed curated content and for their utility in
assessing metabolic capabilities. A metabolic
reconstruction can be mathematically represented
as an in silico model for computing allowable
network states through the application of
governing chemical and genetic constraints under
the constraint based reconstruction and analysis
(COBRA) framework. Furthermore, gap analysis
identifies possible missing reactions by finding
so called dead end metabolites which can be
produced by the network but not consumed.
Optional Information
Metabolites
The user can choose which optional information to
include in the SBML file. The notes field of the
Reaction entries can include Boolean strings
corresponding to the GPR statements. The GPR
field is read and interpreted by the COBRA
toolbox. The SBML file may also include
information on genes, proteins and citations.
Because the SBML specification does not include
fields for this kind of data, this information is
stored in the notes field of the reaction
entries.
Gene-Protein-Reaction (GPR) associations
Metabolites may be searched for by name, KEGG ID,
CAS ID, or charge. Limiting searches by
compartment, pathway, and organism is possible.
In addition to basic metabolite information such
as formula and charge, lists of reactions in
which the metabolite participates are listed and
categorized by the metabolites role as a
reactant or a product. This feature facilitates
the tracing of a metabolite through a pathway in
the absence of graphical pathway maps. Lists of
metabolites matching a set of search criteria may
be exported, and contain information such as
metabolite name, abbreviation, formula, KEGG ID,
and CAS ID.
Single Gene Reaction
Multiple Gene Reaction
DNA
transcription
mRNA
translation complexing
NMN Metabolism in S. cerevisiae
Protein
activity
BiGG includes seven different genome-scale
reconstructions of six organisms
Reaction
sphingosine kinase 2
platelet-activating factor acetylhydrolase
Homo sapiens Recon 1, Escherichia coli iJR 904
and iAF1260, Saccharomyces cerevisiae iND750,
Staphylococcus aureus iSB619, Methanosarcina
barkeri iAF692, and Helicobacter pylori iIT341.
The on or off state of each reaction in the
network may be controlled by the genotype and
expression level of associated genes. Some cases
involve multiple genes and proteins whose
relationship is described using Boolean logic. A
single protein may be composed of subunits coded
by two (or more) genes. GPRs may be used to
evaluate the effects of gene knockouts and gene
regulation on the metabolic reconstructions,
ruling out reactions whose necessary genes are
not available.
Metabolic Maps
Database Schema
Each model includes several metabolic maps. All
the maps are drawn in SVG format and can be
displayed on all major browsers. When there are
available maps that include any chosen reaction
or metabolite, the maps will be listed in
details page under appropriate organisms.
Primary molecules are drawn larger compared to
other non-primary molecules. Molecules that are
outside the cell (extraorganism) are colored
yellow. Molecules in different compartment have
different suffixes in their names. For example,
Cytosol is c and Nucleus is n. In case of
reversible reactions, reactant-side molecules are
pointed with smaller arrowheads. The reaction or
metabolite the user searched is highlighted red
so that it is easier to locate it on the map.
The components of the maps, lines and circles,
are hyperlinked to display more information on
them when they are clicked. This graphical
representation would provide the user with
another way of understanding chemical pathways.
The Website
BiGG is available at http//bigg.ucsd.edu/
Simulation
The SBML file contains a few additional reactions
that are necessary for simulation purposes. In
case of H. pylori iIT341, reactions DM_HMFURN,
sink_ahcys(c), and sink_amob are present in the
exported model, for example. To run meaningful
simulations, it is important that the bounds of
exchange fluxes be specified to model the
environment. By including the flux bound vectors
in the SBML file, the simulation process is
simplified for simulations. In addition, upper
and lower flux bounds of all reactions may be
refined before exporting so as to allow the user
to create SBML files with customized parameters.
Reconstructions are developed in and stored on a
Genomatica (San Diego, CA)-supplied SimphenyTM
server running an OracleTM database. Access to
this database is provided by a read-only client
with several tables and views for accessing
information on Reactions, Metabolites, Genes,
Proteins and Citations.
COBRA Compatibility
SBML files are compatible with the COBRA toolbox
which allows performing many computational
procedures. Using the COBRA toolbox, the SBML
file exported from BiGG may be imported as a
network data structure into Matlab.
All queries are performed by a Linux/Apache
Server using Perl with the CGI and DBI modules.
The BiGG browser and exporter.
The map to the right shows a part of Carbohydrate
Metabolism in human.
The left diagram shows the number of reactions
shared by the three largest reconstructions. The
numbers in parentheses represent non-exchange
reactions.
E coli iAF1260
H sapiens
E coli iAF1260
H sapiens
CONCLUSION
106 (67)
87
3197 (2915)
1901 (1733)
1037
517
- References
- Becker SA, Feist AM, Mo ML, Hannum G, Palsson BO,
Herrgard MJ Quantitative prediction of cellular
metabolism with constraint-based models The
COBRA Toolbox. Nat Protocols 2007, 2(3)727-738. - Ogata H, Goto S, Sato K, Fujibuchi W, Bono H,
Kanehisa M KEGG Kyoto Encyclopedia of Genes and
Genomes. Nucleic Acids Res 1999, 27(1)29-34. - Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC,
Kitano H, Arkin AP, Bornstein BJ, Bray D,
Cornish-Bowden A et al The systems biology
markup language (SBML) a medium for
representation and exchange of biochemical
network models. Bioinformatics 2003,
19(4)524-531.
- The scope of covered reactions is often greater
than for other databases. - BiGG uses both genetics and literature based
data to assess whether a reaction is present. - BiGG assigns confidence levels to each reaction
which can be used when evaluating the resultant
model. - BiGG includes relationships between genes and
proteins (GPR). - Compartmentalization in BiGG gives a more
accurate description of reactions involving
membrane transporters. - BiGG provides the gap between a reconstruction
and a model. - The BiGG database provides the first collection
of curated high quality metabolic reconstructions
suitable for - study with COBRA methods.
240 (160)
311
134 (122)
200 (195)
74
124
691 (672)
137
The right diagram shows the number of metabolites
shared by the three largest reconstructions.
766 (745)
269
S cerevisiae
S cerevisiae