Asymmetries in Retrieval of Gene Function Information - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Asymmetries in Retrieval of Gene Function Information

Description:

University of Missouri National Library of Medicine Biomedical and Health ... fetch DNA. sequence. BLAST. UI. Sequence. string. BLAST. results. Physical Workflow ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 48
Provided by: timpa3
Category:

less

Transcript and Presenter's Notes

Title: Asymmetries in Retrieval of Gene Function Information


1
Asymmetries in Retrieval of Gene Function
Information
  • Timothy B. Patrick, PhD1,
  • Lillian C. Folk, MS2,
  • Catherine K. Craven, MLS3
  • 1Healthcare Administration and Informatics,
    University of Wisconsin-Milwaukee
  • 2College Of Veterinary Medicine, 3Health
    Management and Informatics,
  • University of Missouri-Columbia

2
Acknowledgements
  • 2004 Donald A. B. Lindberg Research Fellowship
  • University of Missouri National Library of
    Medicine Biomedical and Health Informatics
    Research Training grant

3
Overview
  • Background
  • What is an asymmetry in retrieval of gene
    function information?
  • Life science information retrieval and processing
    workflows
  • Example of asymmetrical workflows
  • Compare three apparently equivalent asymmetrical
    workflows
  • Conclusion
  • Documentation standards
  • Multidisciplinary teams for life science workflows

4
What is an Asymmetry in Retrieval?
  • Taking different paths to get the same kind of
    information about a given biological object
  • Life science information retrieval and processing
    workflows

5
Complex Information Retrieval
  • May involve the use of multiple information
    resources databases and analysis tools, in
    combination
  • Such combinations of resources are often
    represented as workflows.

6
Workflow Standards
  • Business Process Execution Language for Web
    Services Version 1.1
  • http//www-128.ibm.com/developerworks/library/spec
    ification/ws-bpel/
  • Simple Conceptual Unified Flow Language (SCUFL)
  • Taverna Workbench
  • http//taverna.sourceforge.net/

7
Logical Workflows
  • A logical workflow is sort of like a logical
    process model, with processes, data links, and
    control links
  • Key aspects of the workflow are inputs, outputs
    and processes that transform the data

Sequence ID
Sequence string
get DNA sequence
Similarity search
results
8
Physical Workflows
  • A physical workflow is like a physical process
    model, with processes, data links, and control
    links

9
Physical Workflow
Antoon Goderis, Ulrike Sattler and Carole Goble,
Applying DLs to workflow reuse and repurposing
Description Logics workshop, Edinburgh, Scotland,
24-26 July 2005
10
Asymmetry
  • Asymmetry means the paths or workflows are
    different from the same set of potential inputs
    about some biological object they take different
    paths to produce the same kind of results.
  • Asymmetrical workflows are equivalent if they do
    produce the same results.

11
This Study
  • Example of asymmetrical workflows that might look
    to a user to be equivalent but which are not
    equivalent due to various features of the
    resources involved.
  • Knowledge that they are not equivalent requires
    knowledge of metadata about the resources.

12
Three Workflows
13
(No Transcript)
14
http//www.affymetrix.com/corporate/media/genechip
_essentials/gene_expression/Features_and_probes.af
fx
15
http//www.mygrid.org.uk/images/pagemaster/GravesD
iseasescenario_1.png
16
http//www.mygrid.org.uk/images/pagemaster/GravesD
iseasescenario_1.png
17
Three Workflows
Affymetrix
Affymetrix
Affymetrix
Genbank Accession number
Genbank Accession number
Genbank Accession number
Nucleotide
Gene
Pubmed links
Pubmed links
Pubmed
Pubmed
Pubmed
Pubmed ID
Pubmed ID
Pubmed ID
18
Methods
  • We first collected representative DNA Accession
    numbers associated with genes expressed in a
    microarray experiment designed to identify
    changes in gene expression associated with
    skeletal muscle recovery from immobilization-induc
    ed sarcopenia. This experiment sought, using a
    mouse model, to identify differences in gene
    expression associated with successful recovery
    from sarcopenia in young muscle as compared to
    failed recovery in old muscle.
  • NIH grant AG18881
  • Pattison JS, Folk LC, Madsen RW, Childs TE, Booth
    FW. Transcriptional profiling identifies
    extensive downregulation of extracellular matrix
    gene expression in sarcopenic rat soleus muscle.
    Physiological Genomics 15(1)34-43, 2003.
  • Pattison JS, Folk LC, Madsen RW, Booth FW.
    Selected Contribution Identification of
    differentially expressed genes between young and
    old rat soleus muscle during recovery from
    immobilization-induced atrophy. Journal of
    Applied Physiology 95(5)2171-9, 2003.
  • Pattison JS, Folk LC, Madsen RW, Childs TE,
    Spangenburg EE, Booth FW. Expression profiling
    identifies dysregulation of myosin heavy chains
    IIb and IIx during limb immobilization in the
    soleus muscles of old rats. Journal of Physiology
    553(Pt 2)357-68, 2003.

19
Methods
  • Next, we retrieved the Unique Identifiers (UIs)
    of Entrez Pubmed citations that were associated
    with the Accession numbers by each of the three
    Entrez resources.
  • Directly in the case of Entrez Pubmed
  • Indirectly, via Pubmed links in the case of
    Entrez Nucleotide and Entrez Gene
  • Next, we compared the number of Pubmed ID's
    retrieved by the three resources for each of the
    Accession numbers.

20
Three Workflows
Affymetrix
Affymetrix
Affymetrix
Genbank Accession number
Genbank Accession number
Genbank Accession number
Nucleotide
Gene
Pubmed links
Pubmed links
Pubmed
Pubmed
Pubmed
Pubmed ID
Pubmed ID
Pubmed ID
21
Three Workflows
Affymetrix
Affymetrix
Affymetrix
Genbank Accession number
Genbank Accession number
Genbank Accession number
Nucleotide
Gene
Pubmed links
Pubmed links
Pubmed
Pubmed
Pubmed
Pubmed ID
Pubmed ID
Pubmed ID
22
(No Transcript)
23
Three Workflows
Affymetrix
Affymetrix
Affymetrix
Genbank Accession number
Genbank Accession number
Genbank Accession number
Nucleotide
Gene
Pubmed links
Pubmed links
Pubmed
Pubmed
Pubmed
Pubmed ID
Pubmed ID
Pubmed ID
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
Three Workflows
Affymetrix
Affymetrix
Affymetrix
Genbank Accession number
Genbank Accession number
Genbank Accession number
Nucleotide
Gene
Pubmed links
Pubmed links
Pubmed
Pubmed
Pubmed
Pubmed ID
Pubmed ID
Pubmed ID
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
Summary of Pubmed IDs by Accession Number
Pubmed
Nucleotide
Gene
34
Methods
  • Compared number of Pubmed IDs produced for each
    Accession number by each workflow.
  • Applied non-parametric test Kendalls W
  • Pubmed versus Nucleotide versus Gene
  • p lt .05

35
The Three Workflows Are Not Equivalent
?
?
36
The SI field identifies secondary source
databanks and accession numbers of outside
resources discussed in MEDLINE articles. The
field is composed of the source followed by a
slash followed by an accession number and can be
searched with one or both components, e.g.,
genbank si, AF001892 si, genbank/AF001892
si. The SI field and the Entrez sequence
database links are not linked. The PubMed links
to these databases are created from the reference
field of the GenBank or GenPept flat file. These
references include citations that discuss the
specific sequence presented in these flat files.
http//www.ncbi.nlm.nih.gov/books/bv.fcgi?ridhelp
pubmed.box.pubmedhelp.Box_1_Search_Field_Dpubmedh
elp.Secondary_Source_ID_
37
Conclusions
38
Need for Documentation
  • The first conclusion I take from this project is
    that there is a need for documentation of
    workflow details.
  • In another study we look at the character of
    documentation of information processing and
    retrieval methods in published reports of
    microarray experiments

39
Multidisciplinary Teams for Workflows
  • The second conclusion I take is that the
    development of workflows requires
    multidisciplinary teams.

40
(No Transcript)
41
domain expert (scientist)
42
domain metadata expert (information specialist)
domain expert (scientist)
43
KNOWLEDGE-ENABLED WORKFLOWS
METADATA
domain metadata expert (information specialist)
TOOLS
domain expert (scientist)
INFORMATION ITEMS
44
(No Transcript)
45
(No Transcript)
46
(No Transcript)
47
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com