Exploring and Exploiting the Biological Maze - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Exploring and Exploiting the Biological Maze

Description:

Exploring and Exploiting the Biological Maze. Zo Lacroix ... Pub- Med. HUGO. NCBI. Protein. DNA Seq. Disease. Gene. Citation. Protein Seq. Conceptual level ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 31
Provided by: Nes66
Category:

less

Transcript and Presenter's Notes

Title: Exploring and Exploiting the Biological Maze


1
Exploring and Exploiting the Biological Maze
  • Zoé Lacroix
  • Arizona State University

2
Data collection queries
  • Scientific protocol
  • Must be able to reproduce the process
  • Involve multiple resources
  • Data sources
  • Applications

3
Expressing scientific protocols
  • Scientific protocols mix design and
    implementation
  • Design
  • What the protocols does (tasks)
  • Scientific objects involved
  • Implementation
  • How the protocol is executed
  • Data sources and applications

4
Expressing scientific protocols
  • Scientific protocols are driven by their
    implementation
  • Scientists use the resources they know
  • data (quality)
  • access to data
  • format, limits, etc.
  • Scientists may not exploit better resources
    because they do not know them
  • Queries should be driven by the design, the
    implementation should meet the design needs

5
Example - Pipeline for Analysis of Protein
Variation Due to Alternative Splicing and SNPs
  • The alternative splicing pipeline will provide
    a complete characterization of variations in
    proteins due to splice variation or SNPs evident
    in repositiories of contiguous genome sequence
    data and expressed sequence tags (ESTs). The
    pipeline applies secondary structure, tertiary
    structure, domain motif detection and sequence
    comparison tools to proteins encoded by genes
    with alternatively splice forms or SNPs.
  • Courtesy of Dr. Marta Janer, Institute for
    Systems Biology

6
Step 2 - Pipeline for Analysis of Protein
Variation Due to Alternative Splicing and SNPs
  • From GenBank, Dbest and the Riken Clone
    Collection, collect all EST and full-length cDNA
    sequences from the target organisms of interest
    (in this case, human and mouse) that match the
    query proteins (mouse DNA binding proteins) using
    tblastn. Map the query protein to the target DNA
    sequences, keeping track of which query amino
    acids correspond to which nucleotides.

7
Step 2 - Pipeline for Analysis of Protein
Variation Due to Alternative Splicing and SNPs
  • From GenBank, Dbest and the Riken Clone
    Collection, collect all EST and full-length cDNA
    sequences from the target organisms of interest
    (in this case, human and mouse) that match the
    query proteins (mouse DNA binding proteins) using
    tblastn. Map the query protein to the target DNA
    sequences, keeping track of which query amino
    acids correspond to which nucleotides.

Data sources
8
Step 2 - Pipeline for Analysis of Protein
Variation Due to Alternative Splicing and SNPs
  • From GenBank, Dbest and the Riken Clone
    Collection, collect all EST and full-length cDNA
    sequences from the target organisms of interest
    (in this case, human and mouse) that match the
    query proteins (mouse DNA binding proteins) using
    tblastn. Map the query protein to the target DNA
    sequences, keeping track of which query amino
    acids correspond to which nucleotides.

tools
9
Step 2 - Pipeline for Analysis of Protein
Variation Due to Alternative Splicing and SNPs
  • From GenBank, Dbest and the Riken Clone
    Collection, collect all EST and full-length cDNA
    sequences from the target organisms of interest
    (in this case, human and mouse) that match the
    query proteins (mouse DNA binding proteins) using
    tblastn. Map the query protein to the target DNA
    sequences, keeping track of which query amino
    acids correspond to which nucleotides.

tasks
10
Step 2 - Pipeline for Analysis of Protein
Variation Due to Alternative Splicing and SNPs
  • From GenBank, Dbest and the Riken Clone
    Collection, collect all EST and full-length cDNA
    sequences from the target organisms of interest
    (in this case, human and mouse) that match the
    query proteins (mouse DNA binding proteins) using
    tblastn. Map the query protein to the target DNA
    sequences, keeping track of which query amino
    acids correspond to which nucleotides.

Scientific objects
11
Pipeline Selecting Target Proteins
Step 1 retrieve all proteins from SMART and
Swiss-Prot with textual search with the keyword
apoptosis Step 2 retrieve all proteins from
Swiss-Prot with a signal peptide feature and the
keyword apoptosis Step 3 retrieve their
binding partners from DIP, BIND and the C.elegans
dataset Step 4 run through a signal peptide
prediction program such as SigPep to check for
the presence of signal peptides in each of the
sequences Step 5 homology search using BLAST of
the retrieved sequences with proteins predicted
from the Drosophila melanogaster genome might
yield additional candidates Output final set of
signal peptide proteins involved in apoptosis
Courtesy of Dr. Terry Gaasterland, The
Rockefeller University
12
Design and implementation
13
Expressing scientific pipelines with BioNavigation
  • Queries are expressed at a conceptual level
    (design)

Disease
Protein Seq.
Scientific classes
DNA Seq.
Gene
Citation
Conceptual level
14
Conceptual graph
  • Labeled edges
  • Scientific meaningful edges

15
Conceptual graph
16
Mapping to physical resources
17
Mapping to physical resources
Disease
Protein Seq.
Scientific classes
DNA Seq.
Gene
Citation
Conceptual level
Physical level
Gen- Bank
Pub- Med
HUGO
Data Sources
OMIM
NCBI Protein
18
Exploring biological metadata
  • Return all citations that are related to some
    disease or condition
  • Diabetes 11 Aging 71 Cancer 391
  • Link Entrez provides an index with the Links in
    the display option from each entry
  • Parse Parsing each entry to retrieve its
    related entries
  • All Entrez provides an index with the Links in
    the display option which allows to look at a set
    of entries at a time

19
Selecting biological resources
  • 3 resources that look the same
  • Are they the same?
  • 3 paths that will retrieve PubMed entries related
    to citations
  • Do they have the same semantics?

20
Results for the disease conditions diabetes,
aging and cancer
21
Overlap results for the disease conditions
diabetes
22
Evaluating resources
  • Similar applications
  • Different outputs
  • Similar data sources
  • Different output
  • Number of resources
  • Different output
  • Order of resources
  • Different output

23
Exploiting semantics of resources
  • Number of entries
  • Characterization of entries (number of
    attributes)
  • Time

24
Exploiting the semantics of links
25
BioNavigation (joint work with Louiqa Raschid and
Maria-Esther Vidal)
  • Conceptual graph
  • No labeled links
  • Queries
  • Regular expressions of concepts
  • ESearch
  • Path cardinality - number of instances of paths
    of the result. For a path of length 1 between two
    sources S1 and S2, it is the number of pairs (e1,
    e2) of entries e1 of S1 linked to an entry e2 of
    S2.
  • Target Object Cardinality number of distinct
    objects retrieved from the final data source.
  • Evaluation Cost cost of the evaluation plan,
    which involves both the local processing cost and
    remote network access delays.

26
Work in progress
  • Conceptual graph
  • Labeled links
  • Queries
  • Complex dataflows
  • Physical graph
  • Access to a BioMetaDatabase
  • Data sources
  • Applications

27
Representing the conceptual graph in Protégé
28
Visualization Limitations in Protégé
  • Using the GraphViz plugin
  • Shows only IsA hierarchy

29
  • TgiViz plugin

30
Conclusion
  • Scientists need support to select resources to
    express their protocols
  • Semantics of resources may be exploited to
    enhance the data collection process
  • Need for a repository of biological metadata
    (BioMetaDatabase)
Write a Comment
User Comments (0)
About PowerShow.com