Identification of Transcription Factor Binding Sites - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Identification of Transcription Factor Binding Sites

Description:

Identifying known TFBSs in previously unknown locations. ... 40% alignment between human and mice genome. 80% of mouse genes have orthologs in human genome ... – PowerPoint PPT presentation

Number of Views:175
Avg rating:3.0/5.0
Slides: 32
Provided by: stude560
Category:

less

Transcript and Presenter's Notes

Title: Identification of Transcription Factor Binding Sites


1
Identification of Transcription Factor Binding
Sites
  • Lior Harpaz
  • Ofer Shany
  • 09/05/2004

2
Goal - find TFBS !
input
output
3
Importance
  • TF regulate gene expression.
  • Identification of TF can teach us
  • Mapping of regulatory pathways
  • Potential functions of genes

4
Experimental Methods
  • Footprinting
  • EMSA - electrophoretic mobility shift assay

Problems
  • Time consuming
  • Not scaled up to whole genomes

5
Computational Methods - Goals
  • Identifying known TFBSs in previously unknown
    locations.
  • Identifying unknown TFBSs.

6
Computational Methods
  • Basic idea - locate TFBS using sequence-searching

Problems
  • Short sequences (5-15 bp)
  • Degenerate sequences
  • Location
  • Biological reality

7
Computational Methods
  • Possible solutions

Conservation functional importance
  • mRNA expression pattern
  • Phylogenetic footprinting
  • Network-level conservation

8
Phylogenetic footprinting
  • Identify ortholog genes
  • Concentrate on conserved non-coding regions
    (possible regulatory regions)
  • Look for conserved motifs.

9
Why should it work ?
  • 40 alignment between human and mice genome
  • 80 of mouse genes have orthologs in human genome
  • Only 1-5 of human genome encodes proteins.

10
Things to consider
  • Choosing genomes.
  • Locating transcriptional start site.
  • Alignment method.

11
More things to consider
  • Different evolution rates for different regions
    in the genome.
  • PSSM score cut-off
  • Note - TFBSs within ORFs are not detected.

12
Phylogentetic footprinting in proteobacterial
genomes
  • Study set of 190 genes of E.Coly with known
    TBFSs.
  • Orthologs were searched in eight other bacteria.
  • Motif search by Bayesian Gibbs sampling.

13
Bayesian Gibbs sampling
  • Algorithm for motif search.
  • Each motif is assigned with a MAP value.

14
Bayesian Gibbs sampling
  • Parameters and extensions
  • Model sequence
  • Palindromic patterns
  • Background pattern
  • Distribution of spacing between TFBSs and
    translation start site

15
Results
  • Overall in 146/184 sets, motives matched known
    regulatory sequences.
  • In 18 genes (with 1 ortholog) only 67 known
    sites were matched, and with low MAP value.
  • In 166 sets (with gt2 orthologs) 81 of motives
    matched known regulatory sequences.

16
Results
  • Out of the 166 sets (with gt 2 orthologs)
  • 131 corresponded to known TFBSs.
  • 3 corresponded to known stem loop structures.
  • 32 data sets contained predictions with large MAP
    value could be undocumentd sites !
  • Documented site were found in 138 sites without
    using palindromic models.

17
Identification of a new TF
  • New site found near fabA, fabB yqfA
  • YijC binds to these sites.
  • Site location, protein structure previous
    experimental results suggests YijC is a repressor
    for the fab genes.
  • Indication of yqfAs involvement in metabolism of
    fatty-acids.

18
Genomic scale phylogenetic footprinting
  • 2113 ORFs of E.coli used.
  • 187 new sites identified as probable sites for 46
    known TFs.
  • Remaining sites are expected to represent unknown
    TFBSs
  • MAP Values of predicted sites were lower.

19
MAP values left-shift
20
Study set
Ortholog Distribution
Full set
21
Conclusions
  • New sites for known TF were found.
  • Conservation of Regulatory stem-loops.
  • New sites for unknown TF are predicted.
  • New TF identified (YijC).
  • Predicted gene function (yqfA).

22
?????
23
Network level conservation
  • Each TF regulates the expression of many genes
    (20-400).
  • Conservation of global gene expression requires
    the conservation of regulatory mechanisms.

24
(No Transcript)
25
Data analysis
  • Total motifs 80,000
  • P-value filter 12,000
  • Low-complexity filter 7,673
  • Hierarchically clustering 1,269

26
Validation
  • 34/48 known sites discovered.
  • Large fraction of matches for significant
    p-values.

27
Identification of known binding sites
28
Biological Significance
  • Functional coherence
  • Expression coherence

29
Characteristic Features
  • Conservation of binding affinity
  • Conservation of position orientation

30
References
  • Bulyk, M. Computational prediction of
    transcription-factor binding site locations.
    Genome Biol. 2003 5201
  • McCue L, Thompson W, Carmack C, Ryan MP, Liu JS,
    Derbyshire V, Lawrence CE. Phylogenetic
    footprinting of transcription factor binding
    sites in proteobacterial genomes. Nucleic Acids
    Res. 2001 29774-782.
  • Pritzker M, Liu YC, Beer MA, Tavazoie S.
    Whole-genome discovery transcription factor
    binding sites by network-level conservation.
    Genome Res. 2004 1499-108

31
Sensitivity Vs. Specificity
Write a Comment
User Comments (0)
About PowerShow.com