Title: The Ontrez project at NCBO
1The Ontrez project at NCBO
- Nigam Shah
- nigam_at_stanford.edu
2Public data repositories
- Around 1100 databases in the NARs 2008 database
issue. - High throughput gene expression data in
repositories such as GEO, SMD, Array Express - Clinical Trial repositories such as caBIG,
TrialBank, clinicaltrials.gov - Guideline repositories such as www.guideline.gov
- Image repositories such as BIRN
- Observational studies such as Framingham, NHANES,
AMCIS.
3Database annotation
- Ontology based annotation is not as wide-spread
as desired - Most annotation is still free-text
- Possible reasons
- Lack of a one stop shop for bio-ontologies
- Lack of tools to annotate experimental data
- Manual ? phenote
- Automatic ? ?
- Lack of a sustainable mechanism to create
ontology based annotations
4Different kinds of annotations
- Expression profiling of cultured bladder smooth
muscle cells subjected to repetitive mechanical
stimulation for 4 hours. Chronic overdistension
results in bladder wall thickening, associated
with loss of muscle contractility. Results
identify genes whose expression is altered by
mechanical stimuli.
- ELMO1 expression is altered by mechanical stimuli
-
-
- Other experiments
-
-
- ELMO1 associated_with actin cytoskeleton
organization and biogenesis
Low level result
metadata
summary result
annotation
Chronic Bladder Overdistension
5Annotations as assertions
- Annotation An assertion declaring a
relationship b/w a biomedical entity and a type
in an ontology. - e.g. p53 cell death
- Annotations tell us what the biologists believe
to be true (in particular or in general) - Most annotations are based on particular
observations and are generalized during
interpretation by a biologist/curator. - Semantics of annotations are not always declared
apriori (e.g. associated_with, involves)
6Annotations as Meta-data
- Metadata The text description accompanying a
dataset in a database. - Metadata-annotations should be machine processed
(and indexed using ontologies) because - The volume is orders of magnitude more than the
summary results - These annotations are not stating any biological
fact - Hence dont need a curator to create them
- These annotations are to be used to LOCATE
datasets accurately as soon as they are available
in a public repository - we can not afford to have a curation bottleneck
7High level goal
- Process the metadata annotations to automatically
tag the elements in public repositories with as
many ontology terms as possible. - For example in case of the GEO dataset 906
- Expression profiling of cultured bladder smooth
muscle cells subjected to repetitive mechanical
stimulation for 4 hours. Chronic overdistension
results in bladder wall thickening, associated
with loss of muscle contractility. Results
identify genes whose expression is altered by
mechanical stimuli. - Gets tagged with
- Expression, Expression of bladder, bladder,
smooth, bladder muscle, muscle, smooth muscle,
cells, mechanical, mechanical stimulation,
stimulation, Chronic, results, bladder
overdistension, associated, associated with,
with, loss, genes, altered
8Tagging annotating with ontology terms
9(No Transcript)
10Querying the annotation index
11(No Transcript)
12(No Transcript)
13(No Transcript)
14(No Transcript)
15What new science do we enable?
16New Science enabled
- Nature study on image features and gene
expression - Correlation b/w protein and gene expression for
cancer classification - Correlating gene expression and drug effect
information for predicting drug efficacy - Training and testing image processing algorithms
17Decoding global gene expression programs in liver
cancer by noninvasive imaging Eran Segal, Claude
B Sirlin, Clara Ooi, Adam S Adler, Jeremy Gollub,
Xin Chen, Bryan K Chan, George R Matcuk,
Christopher T Barry, Howard Y Chang Michael D
Kuo Nature Biotechnology 25, 675 - 680 (2007)
Published online 21 May 2007
18Correlation of protein and gene expression for
the stratification of breast cancer patients
19There are 20 other diseases for which this is
possible!
20(No Transcript)
21TMAD incorporates the NCI Thesaurus ontology for
searching tissues in the cancer domain. Image
processing researchers can extract images and
scores for training and testing classification
algorithms.
22Current status of the prototype
23Ontrez Target resources
24Where can we go?
- Become a service for annotating biomedical
text. - People send us text, we send back recognized
concepts (may be even relationships) - Given a set of concepts we provide a similarity
metric between them - Both these services can be plugged into a variety
of community and collaborative annotations tools - Become the one stop shop for finding items
across a wide variety of resources - Integrate on the disease dimension. Gene cards
exist, disease cards dont - Focus on approx. 15 resources in the next year.
- PDB and PLoS are interested
25Research questions - 1
26Research questions - 2
27Credits and collaborations
- Clement Jonquet
- Nipun Bhatia
- Manhong Dai
- Fan Meng
- Brian Athey
- Mark Musen