Promoter Analysis - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Promoter Analysis

Description:

Each cell contains a copy of the whole genome. BUT utilizes only a subset of the genes. Most ... CREME Sharan et al. 03. MCAST Bailey & Noble 03. PRIMA ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 31
Provided by: tau1
Category:

less

Transcript and Presenter's Notes

Title: Promoter Analysis


1
Promoter Analysis
  • Goals, Problems Solutions

Chaim Linhart Rani Elkon Dec. 03
2
Outline
  • Background
  • Questions
  • Some answers
  • PRIMA

3
Regulation of Expression
  • Each cell contains a copy of the whole genome
  • BUT utilizes only a subset of the genes
  • Most genes are highly regulated
  • their expression is limited to specific tissues,
    developmental stages, physiological condition

How is the expression of genes regulated?
One way is through transcriptional regulation
4
Regulation of Transcription
  • The conditions in which a gene is transcribed are
    mainly encoded in the DNA in a region called
    promoter
  • Each promoter contains several short DNA
    subsequences, called binding sites (BSs) that
    are bound by specific proteins called
    transcription factors (TFs)

5
Regulation of Transcription (II)
6
Regulation of Transcription (III)
  • By binding to a genes promoter, TFs can either
    promote or repress the recruitment of the
    transcription machinery
  • The conditions in which a gene is transcribed are
    determined by the specific combination of BSs in
    its promoter

7
Regulation of Transcription (III)
  • Assumption
  • Co-expression
  • ?
  • Transcriptional co-regulation
  • ?
  • Common BSs

8
DNA chips
? Data analysis (normalization,
clustering) ? Co-expression
9
WH-questions
  • So we know why were looking for common BSs
  • What exactly are we trying to find?
  • Where should we look for it?
  • How can we find it?

10
Promoter Region (Where?)
  • What is the promoter region?
  • Upstream Transcription Start Site (TSS)
  • Too short ? miss many real BSs (false negatives)
  • Too long ? lots of wrong hits (false positives)
  • Length is species dependent (e.g., yeast 600bp,
    thousands in human)
  • Common practice 500-2000bp
  • Mask-out repetitive sequences?
  • Common practice Yes
  • Consider both strands?
  • Common practice Yes

11
Promoter Region II
  • Additional problems
  • Where exactly is the TSS?
  • What about 1st exon, intron?
  • Multiple transcripts
  • Answers actually depend on the TF

12
The What? question
  • Computational tasks
  • New BSs of known TFs
  • New motifs (BSs of unknown TFs)
  • Modules combinations of TFs

13
BSs Models
  • Exact string(s)
  • Example
  • BS TACACC , TACGGC
  • CAATGCAGGATACACCGATCGGTA
  • GGAGTACGGCAAGTCCCCATGTGA
  • AGGCTGGACCAGACTCTACACCTA

14
BSs Models (II)
  • String with mismatches
  • Example
  • BS TACACC 1 mismatch
  • CAATGCAGGATTCACCGATCGGTA
  • GGAGTACAGCAAGTCCCCATGTGA
  • AGGCTGGACCAGACTCTACACCTA

15
BSs Models (III)
  • Degenerate string
  • Example
  • BS TASDAC (SC,G DA,G,T)
  • CAATGCAGGATACAACGATCGGTA
  • GGAGTAGTACAAGTCCCCATGTGA
  • AGGCTGGACCAGACTCTACGACTA

16
BSs Models (IV)
  • Position Weight Matrix (PWM)
  • Example BS

Need to set score threshold
  • ATGCAGGATACACCGATCGGTA 0.0605
  • GGAGTAGAGCAAGTCCCGTGA 0.0605
  • AAGACTCTACAATTATGGCGT 0.0151

17
BSs Models (V)
  • More complex models
  • PWM with spacers (e.g., for p53)
  • Markov model (dependency between adjacent columns
    of PWM)
  • Hybrid models, e.g., mixture of two PWMs

And we also need to model the non-BSs sequences
in the promoters
18
How to find novel motifs
  • Degenerate string
  • YMF - Sinha Tompa 02
  • String with mismatches
  • WINNOWER Pevzner Sze 00
  • Random Projections Buhler Tompa 02
  • MULTIPROFILER Keich Pevzner 02
  • PWM
  • MEME Bailey Elkan 95
  • AlignACE Hughes et al. 98
  • CONSENSUS - Hertz Stormo 99

19
How to find TF modules
  • BioProspector Liu et al. 01
  • Co-Bind GuhaThakurta Stormo 01
  • MITRA Eskin Pevzner 02
  • CREME Sharan et al. 03
  • MCAST Bailey Noble 03

20
PRIMAPRomoter Integration in Microarray Analysis
  • Goal Identify TFs whose BSs are abundant
    (statistically over-represented) in promoters of
    co-expressed genes
  • Limited to known TFs
  • Uses PWM to model BSs
  • Allows multiple BSs per promoter
  • Integrated into Expander

21
PRIMA input-output
  • Input
  • Promoter sequences (typically 1200bp) of
  • Background (BG) set, typically all genes
  • Target set, i.e., co-expressed genes
  • PWMs of known TFs, e.g., TRANSFAC
  • Output
  • p-values of over-represented TFs

22
PRIMA algorithm
  • For each PWM
  • Compute a threshold score for declaring hits of
    the PWM (hit subsequence that is similar to the
    PWM hypothetical BS)
  • Scan BG and target-set promoters for hits
  • Apply a statistical test to decide whether the
    number of hits in the target-set is significantly
    higher than expected by chance, given the
    distribution of hits in the BG
  • (Find co-occurring pairs of TFs)

23
PRIMA results on HCC
  • We ran PRIMA on 568 genes that are
  • periodically expressed in the human cell-cycle
  • (data from Whitfield et al. 02)

24
PRIMA results on HCC (II)
  • Locations
  • of hits

25
PRIMA results on HCC (III)
  • Co-occurring pairs of TFs

26
PRIMA future directions
  • More information to utilize
  • Distribution of hits locations
  • Modules co-occurrence of TFs, possibly with
    distance and/or strand bias
  • Homology BSs are more conserved than rest of
    promoter

27
PRIMA in EXPANDER
28
PRIMA in EXPANDER (II)
29
Acknowledgements
  • PRIMA
  • Rani Elkon
  • Roded Sharan
  • Ron Shamir
  • Yossi Shiloh
  • Expander
  • Adi Maron-Katz
  • Amos Tanay
  • Israel Steinfeld
  • Naama Arbili
  • Roded Sharan

30
Questions?
Write a Comment
User Comments (0)
About PowerShow.com