Title: Asymmetries in Retrieval of Gene Function Information
1Asymmetries in Retrieval of Gene Function
Information
- Timothy B. Patrick, PhD1,
- Lillian C. Folk, MS2,
- Catherine K. Craven, MLS3
- 1Healthcare Administration and Informatics,
University of Wisconsin-Milwaukee - 2College Of Veterinary Medicine, 3Health
Management and Informatics, - University of Missouri-Columbia
2Acknowledgements
- 2004 Donald A. B. Lindberg Research Fellowship
- University of Missouri National Library of
Medicine Biomedical and Health Informatics
Research Training grant
3Overview
- Background
- What is an asymmetry in retrieval of gene
function information? - Life science information retrieval and processing
workflows - Example of asymmetrical workflows
- Compare three apparently equivalent asymmetrical
workflows - Conclusion
- Documentation standards
- Multidisciplinary teams for life science workflows
4What is an Asymmetry in Retrieval?
- Taking different paths to get the same kind of
information about a given biological object - Life science information retrieval and processing
workflows
5Complex Information Retrieval
- May involve the use of multiple information
resources databases and analysis tools, in
combination - Such combinations of resources are often
represented as workflows.
6Workflow Standards
- Business Process Execution Language for Web
Services Version 1.1 - http//www-128.ibm.com/developerworks/library/spec
ification/ws-bpel/ - Simple Conceptual Unified Flow Language (SCUFL)
- Taverna Workbench
- http//taverna.sourceforge.net/
7Logical Workflows
- A logical workflow is sort of like a logical
process model, with processes, data links, and
control links - Key aspects of the workflow are inputs, outputs
and processes that transform the data
Sequence ID
Sequence string
get DNA sequence
Similarity search
results
8Physical Workflows
- A physical workflow is like a physical process
model, with processes, data links, and control
links
9Physical Workflow
Antoon Goderis, Ulrike Sattler and Carole Goble,
Applying DLs to workflow reuse and repurposing
Description Logics workshop, Edinburgh, Scotland,
24-26 July 2005
10Asymmetry
- Asymmetry means the paths or workflows are
different from the same set of potential inputs
about some biological object they take different
paths to produce the same kind of results. - Asymmetrical workflows are equivalent if they do
produce the same results.
11This Study
- Example of asymmetrical workflows that might look
to a user to be equivalent but which are not
equivalent due to various features of the
resources involved. - Knowledge that they are not equivalent requires
knowledge of metadata about the resources.
12Three Workflows
13(No Transcript)
14http//www.affymetrix.com/corporate/media/genechip
_essentials/gene_expression/Features_and_probes.af
fx
15http//www.mygrid.org.uk/images/pagemaster/GravesD
iseasescenario_1.png
16http//www.mygrid.org.uk/images/pagemaster/GravesD
iseasescenario_1.png
17Three Workflows
Affymetrix
Affymetrix
Affymetrix
Genbank Accession number
Genbank Accession number
Genbank Accession number
Nucleotide
Gene
Pubmed links
Pubmed links
Pubmed
Pubmed
Pubmed
Pubmed ID
Pubmed ID
Pubmed ID
18Methods
- We first collected representative DNA Accession
numbers associated with genes expressed in a
microarray experiment designed to identify
changes in gene expression associated with
skeletal muscle recovery from immobilization-induc
ed sarcopenia. This experiment sought, using a
mouse model, to identify differences in gene
expression associated with successful recovery
from sarcopenia in young muscle as compared to
failed recovery in old muscle. - NIH grant AG18881
- Pattison JS, Folk LC, Madsen RW, Childs TE, Booth
FW. Transcriptional profiling identifies
extensive downregulation of extracellular matrix
gene expression in sarcopenic rat soleus muscle.
Physiological Genomics 15(1)34-43, 2003. - Pattison JS, Folk LC, Madsen RW, Booth FW.
Selected Contribution Identification of
differentially expressed genes between young and
old rat soleus muscle during recovery from
immobilization-induced atrophy. Journal of
Applied Physiology 95(5)2171-9, 2003. - Pattison JS, Folk LC, Madsen RW, Childs TE,
Spangenburg EE, Booth FW. Expression profiling
identifies dysregulation of myosin heavy chains
IIb and IIx during limb immobilization in the
soleus muscles of old rats. Journal of Physiology
553(Pt 2)357-68, 2003.
19Methods
- Next, we retrieved the Unique Identifiers (UIs)
of Entrez Pubmed citations that were associated
with the Accession numbers by each of the three
Entrez resources. - Directly in the case of Entrez Pubmed
- Indirectly, via Pubmed links in the case of
Entrez Nucleotide and Entrez Gene - Next, we compared the number of Pubmed ID's
retrieved by the three resources for each of the
Accession numbers.
20Three Workflows
Affymetrix
Affymetrix
Affymetrix
Genbank Accession number
Genbank Accession number
Genbank Accession number
Nucleotide
Gene
Pubmed links
Pubmed links
Pubmed
Pubmed
Pubmed
Pubmed ID
Pubmed ID
Pubmed ID
21Three Workflows
Affymetrix
Affymetrix
Affymetrix
Genbank Accession number
Genbank Accession number
Genbank Accession number
Nucleotide
Gene
Pubmed links
Pubmed links
Pubmed
Pubmed
Pubmed
Pubmed ID
Pubmed ID
Pubmed ID
22(No Transcript)
23Three Workflows
Affymetrix
Affymetrix
Affymetrix
Genbank Accession number
Genbank Accession number
Genbank Accession number
Nucleotide
Gene
Pubmed links
Pubmed links
Pubmed
Pubmed
Pubmed
Pubmed ID
Pubmed ID
Pubmed ID
24(No Transcript)
25(No Transcript)
26(No Transcript)
27(No Transcript)
28(No Transcript)
29Three Workflows
Affymetrix
Affymetrix
Affymetrix
Genbank Accession number
Genbank Accession number
Genbank Accession number
Nucleotide
Gene
Pubmed links
Pubmed links
Pubmed
Pubmed
Pubmed
Pubmed ID
Pubmed ID
Pubmed ID
30(No Transcript)
31(No Transcript)
32(No Transcript)
33Summary of Pubmed IDs by Accession Number
Pubmed
Nucleotide
Gene
34Methods
- Compared number of Pubmed IDs produced for each
Accession number by each workflow. - Applied non-parametric test Kendalls W
- Pubmed versus Nucleotide versus Gene
- p lt .05
35The Three Workflows Are Not Equivalent
?
?
36The SI field identifies secondary source
databanks and accession numbers of outside
resources discussed in MEDLINE articles. The
field is composed of the source followed by a
slash followed by an accession number and can be
searched with one or both components, e.g.,
genbank si, AF001892 si, genbank/AF001892
si. The SI field and the Entrez sequence
database links are not linked. The PubMed links
to these databases are created from the reference
field of the GenBank or GenPept flat file. These
references include citations that discuss the
specific sequence presented in these flat files.
http//www.ncbi.nlm.nih.gov/books/bv.fcgi?ridhelp
pubmed.box.pubmedhelp.Box_1_Search_Field_Dpubmedh
elp.Secondary_Source_ID_
37Conclusions
38Need for Documentation
- The first conclusion I take from this project is
that there is a need for documentation of
workflow details. - In another study we look at the character of
documentation of information processing and
retrieval methods in published reports of
microarray experiments
39Multidisciplinary Teams for Workflows
- The second conclusion I take is that the
development of workflows requires
multidisciplinary teams.
40(No Transcript)
41domain expert (scientist)
42domain metadata expert (information specialist)
domain expert (scientist)
43KNOWLEDGE-ENABLED WORKFLOWS
METADATA
domain metadata expert (information specialist)
TOOLS
domain expert (scientist)
INFORMATION ITEMS
44(No Transcript)
45(No Transcript)
46(No Transcript)
47(No Transcript)