Year One Milestones - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Year One Milestones

Description:

Parameter-free' prediction tool for uncharacterized genomes ... Moby Dick. DimerFinder (Results in Biofiles for Dv, Gm, So) Predicted regulon methods ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 28
Provided by: tri5181
Category:
Tags: milestones | moby | one | year

less

Transcript and Presenter's Notes

Title: Year One Milestones


1
Year One Milestones For the VIMSS Comparative
Genomics Pipeline Eric Alm EJAlm_at_lbl.gov
2
Outline
  • Comparative Genomics Database
  • Over 125 genomes
  • Perl-based API
  • Over 600 genomes expected soon!
  • Protein Annotation Web Tools
  • Comparative Genomics Browser
  • Genome Annotation
  • Operon prediction
  • Regulon prediction
  • Motif detection
  • Data Analysis
  • Prokaryotic gene expression database
  • Integration with FGC

3
Protein Annotation Pages
Release Date 8/15/2003
4
Comparative Genomics Browser
5
Genome Annotation
  • Operon prediction
  • Parameter-free prediction tool for
    uncharacterized genomes
  • Survey of Operon structure in prokaryotes
  • Evolution of operons
  • Operons Early scenario
  • Regulon prediction
  • DNA motif detection
  • Structure based Protein-DNA interface
    modeling/design

6
Operon Prediction
  • Why predict operon structure?
  • Functional annotation
  • Evolutionary questions
  • Few genomes with experimental data
  • Computational Approaches
  • Microarray coexpression (Liao et al.)
  • Metabolic pathways (Kasif et al.)
  • Intergenic length (Collado-Vides et al.)
  • Gene clusters (Salzberg et al.)

7
Outline of Approach
  • Calculate number of operons (Nop) in genome
  • Based on the number of direction changes
  • Verify with intergenic length distributions
  • For each pair of genes calculate log-likelihood
    that the genes are on the same operon
  • Find the optimal map (based on pairwise score)
    that partitions the genome into Nop operons

8
Scoring Function
  • Intergenic length
  • The most common separation is 1bp or 4bp overlap
  • TGATG
  • ATGA
  • Large intergenic lengths rarely observed
  • mRNA instability

9
Scoring Function
  • Gene Neighbors
  • Gene pairs that tend to occur physically nearby
    on the chromosome in many divergent genomes

10
Gene Neighbors
  • Calculating GNM score
  • Orthologs are defined as Bidirectional Best BLAST
    hits
  • The probability of two genes being neighbors by
    chance is estimated from the fraction of
    adjacent genes on convergent transcripts that are
    neighbors
  • Related genomes are clustered such that (Pchance
    lt eps)
  • Only one genome per cluster can contribute to the
    total score to avoid overestimating significance

Intergenic length distributions for true
positives (highly conserved gene neighbors) and
combined (same direction) sets
Log-likelihood scores depend on true positive
distribution, true negative distribution,
combined distribution, and prior knowledge of Nop
11
Scoring Function
  • Phylogenetic profiles
  • Genes that co-occur in the same genomes tend to
    be functionally related

Taken from Marcotte et al., 1999
12
Scoring Function
  • Intrinsic termination signals
  • Rho-independent terminators can be detected using
    RNA folding algorithms (RNAfold) in some organisms

13
Operon Prediction
  • Limitations
  • Alternative transcripts
  • Limited by accurate prediction of start codon
  • Parameter-free version requires accurate Codon
    Adaptation Index
  • Preliminary Benchmarks in E. coli

Scoring function
Accuracy
Based on experimentally verified E. coli
operons Accuracy TPTN/TPTNFPFN
14
Regulon (?) Prediction
Phylogenetic Profiles
Gene Neighbors
15
cis-element Prediction
  • Whole genome methods
  • Moby Dick
  • DimerFinder (Results in Biofiles for Dv, Gm, So)
  • Predicted regulon methods
  • AlignACE
  • GIBBS-sampler
  • MEME
  • Phylogenetic footprinting
  • What genomes to use?

16
Choosing Genomes for Phylogenetic Footprinting
Fully sequenced gamma-proteobacteria
17
Integration with the Functional Genomics Core
  • Prokaryotic Gene Expression DB
  • Currently no central repository
  • Growing amounts of data in many species
  • Comparative analysis of gene expression
  • Evaluating predicted Regulons
  • Correlating gene expression with genome structure

18
Prokaryotic Gene Expression DB
gt20 organisms gt35 different treatments Expecting
90 publications this year Currently gt820
experiments in our DB
19
Comparative Analysis of Gene Expression
Heat shock response in two species
20
Comparative Analysis of Heat Shock Response
5 most up-regulated genes in both species
Name Description b1664 possible enzyme clpB heat
shock protein dnaJ chaperone with DnaK heat
shock protein dnaK chaperone Hsp70 DNA
biosynthesis autoregulated heat shock
proteins ftsJ cell division protein fucP fucose
permease grpE phage lambda replication host DNA
synthesis heat shock protein protein
repair hflB degrades sigma32, integral membrane
peptidase, cell division protein hslV heat shock
protein hslVU, proteasome-related peptidase
subunit htpG chaperone Hsp90, heat shock protein
C 62.5 hybC probable large subunit,
hydrogenase-2 ibpA heat shock protein lon DNA-bind
ing, ATP-dependent protease La heat shock
K-protein miaA delta(2)-isopentenylpyrophosphate
tRNA-adenosine transferase mopA GroEL, chaperone
Hsp60, peptide-dependent ATPase, heat shock
protein mopB GroES, 10 Kd chaperone binds to
Hsp60 in pres. Mg-ATP, suppressing its ATPase
activity rpoD RNA polymerase, sigma(70) factor
regulation of proteins induced at high
temperatures rpoH RNA polymerase, sigma(32)
factor regulation of proteins induced at high
temperatures rseA sigma-E factor, negative
regulatory protein yaiU putative flagellin
structural protein ybbN putative thioredoxin-like
protein ybeD orf, hypothetical protein ycdQ orf,
hypothetical protein yfjI orf, hypothetical
protein
21
Detecting cis-regulatory motifs
22
Evaluating Predicted Regulons
Correlation
Correlation
  • E. coli data for 14 conditions
  • Blattner Lab
  • Regulons are nearly as tightly correlated as
    operons
  • Operon structure in distantly related species can
    be used to infer coregulation
  • Shewanella data for 4 conditions
  • ORNL
  • Regulons are significantly more correlated than
    random, but not as tightly correlated as for E.
    coli
  • Fewer conditions
  • No mutant regulator data

23
Gene Expression and Genome Structure
  • Test Case - Cyanobacteria
  • Circadian Rhythms
  • Expression of nearly all genes is tied to 24-hour
    cycle
  • Heterologous genes/promoters adapt to host cycle
  • Clock gene has homology to Helicase/Recombinase
  • Genome Structure
  • Little known about DNA replication
  • Very little GC-skew - no obvious peak
  • Little conservation of gene order/operons

24
Gene Expression and Genome Structure
Cyanobacterial Clocks
25
Summary
  • Web based comparative genomics tools
  • Operon predictions
  • Gene interaction predictions
  • Gene neighbors (Regulons?)
  • Phylogenetic profiles
  • Correlated microarray expression
  • Release Date 8/15/2003
  • Operon prediction tool
  • Insights into the evolution of operons
  • Regulon predictions
  • Gene neighbors
  • Phylogenetic profiles
  • Coexpressed genes
  • cis-regulatory motif detection
  • Dimer-based method
  • Comparative Gene Expression DB
  • Preliminary comparative analysis of microarray
    data

26
Future Directions
  • Automated input pipeline for new genomes
  • Complete parameter-free operon prediction tool
  • cis-regulatory motif predictions
  • Regulon-based methods
  • Phylogenetic profiling
  • Integrate experimental data into protein
    annotation pages
  • Microarray
  • Proteomics
  • Gene deletion
  • Construct DB queries over experimental results
    via website
  • Other ideas?
  • Release Comparative Gene Expression DB
  • Work with Functional Genomics Core to add genomic
    context to high-throughput data

27
Acknowledgements
  • Director
  • Adam Arkin
  • VIMSS - Arkin Lab
  • Katherine Huang
  • Richard Koche
  • Sarah Wang
  • Dubchak Lab
  • Simon Minovitsky
  • Volunteer
  • Vladmir Ulyashin
  • ORNL
  • Jizhong Zhou
  • Matthew Fields
  • Dorothea Thompson
  • Yongqing Liu
  • Adam Leaphart
  • Haichun Gao
  • and others
Write a Comment
User Comments (0)
About PowerShow.com