TimeSearcher: Interactive Querying for Identification of Patterns in Genetic Data PowerPoint PPT Presentation

presentation player overlay
1 / 18
About This Presentation
Transcript and Presenter's Notes

Title: TimeSearcher: Interactive Querying for Identification of Patterns in Genetic Data


1
TimeSearcher Interactive Querying for
Identification of Patterns in Genetic Data
  • Harry Hochheiser
  • Eric Baehrecke
  • Stephen Mount
  • Ben Shneiderman

Harry Hochheiser is supported by a fellowship
from America Online.
2
Time Series Data
  • Real-Valued function over time
  • Goal find patterns
  • Starts Low, Ends High
  • Outliers
  • Periodic Patterns
  • Laggards and Leaders
  • Hypothesis generation

3
Microarray Data
Chu, et al. The transcriptional program of
sporulation in budding yeast, Science 1998 Oct
23 282(5389) 699-705.
4
Timeboxes
  • Rectangular query regions
  • Value must be in range for all time points in
    region
  • Combine multiple timeboxes for conjunctive query

Sharp Rise
Panic Reversal
5
TimeSearcher/Microrarray demo
6
TimeSearcher
  • Interactive exploration of time-series data
  • Dynamic queries (lt100ms)
  • Linear display of individual items
  • Create queries on graph area
  • Move, scale timeboxes to modify query
  • Drag-and-Drop for query-by-example

7
Other Applications
  • Time linear ordered sequence
  • Use TimeSearcher for general sequences
  • E.g., DNA

8
TimeSearcher for analysis of weak signals in
nucleotide sequences
Application to the case of the Arabidopsis
thaliana branch site consensus splicing signal.
Steve Mount Cell Biology and Molecular
Genetics Harry Hochheiser and Ben
Shneiderman Human Computer Interaction
Lab Steven Salzberg The Institute for Genomic
Research
U1
Exon 1
SF1
Splicing signals are recognized during
early steps in the biochemical process of
splicing.
U2AF65
Branch Site
U2AF35
Exon 2
(Y)n
AG
9
Consensus sequences
Two-step pre-mRNA splicing mechanism with
branched intermediate
Yeast (Saccharomyces cerevisiae) Invariant
TACTAAC Humans (Homo sapiens) Consensus
TNYTRAYY Fruit flies (Drosophila
melanogaster) Invariant WCTAATY Weeds
(Arabidopsis thaliana) Invariant
CTRAY
Diagram courtesy of Dr. Martinez Hewlett
Y C or T W A or T R A or G N A,
C, G or T
Here we sought to verify and extend the
experimentally determined branch site consensus
CTRAY determined by Simpson et al. (2002). Our
long-term goal is the characterization of an even
weaker signal, the exonic splicing enhancer.
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
Conclusions TimeSearcher can be used
to identify weak signals in aligned nucleotide
sequences. Analysis of 8,550 exons from
Arabidopsis supports the branch site consensus
WYTRAY.
ACTAA ACTGA ATAAC ATTGA CTAAA CTAAC
CTAAT CTCAT CTGAC TAACG TAACT TCTAA
TGACT TGATT TTAAC WYTRAY
one sigma
Branch site
Number of over-represented words
two sigma
Pyrimidines
Y C or T W A or T R A or G N A,
C, G or T
Distance to 3 splice site
17
Future Work Extensions to query model
  • Leaders and Laggards
  • Identification of regulatory genes
  • Multiple time-varying values
  • Variable Time timeboxes
  • Collaborations with biologists
  • inform design
  • What sort of queries are of interest?

18
Conclusions
  • TimeSearcher interactive tool for graphical
    exploration of time series data
  • Ongoing use for analyzing microarray data and
    sequence data
  • Were interested in working with motivated users
    real data sets
  • www.cs.umd.edu/hcil/timesearcher
Write a Comment
User Comments (0)
About PowerShow.com