Finding regulatory modules from local alignment - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Finding regulatory modules from local alignment

Description:

Finding regulatory modules from local alignment ... with good local alignment = Smith-Waterman type algorithm with a novel scoring ... Smith-Waterman ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 26
Provided by: ukko4
Category:

less

Transcript and Presenter's Notes

Title: Finding regulatory modules from local alignment


1
Finding regulatory modules from local alignment
  • -
  • Department of Computer Science
    Helsinki Institute of Information Technology
    HIIT
  • University of Helsinki
  • Erice 30 Nov 2005

2
Pairwise alignment of strings
  • A S T O C K H O L M

    B
    T U K H O L M A
  • minimum number of mutation steps a -gt b a
    -gt ? ? -gt b

3
Dynamic programming
di,j min(if aibj then di-1,j-1 else ?,
di-1,j 1, di,j-1 1)
distance between i-prefix of A and
j-prefix of B (without substitutions)

B
mxn table d
bj
di-1,j-1
di-1,j
A
1
di,j
di,j-1
ai
1
dm,n
4
di,j min(if aibj then di-1,j-1 else ?,
di-1,j 1, di,j-1 1)

optimal alignment by trace-back
dID(A,B)
5
Homology searches
  • find homologous sequences new sequence versus
    all old ones in database the most popular
    computational task in present-day molecular
    biology approximate string matching
  • BLAST - big success
  • good homology gt same biological function

D A T A B A S E
NEW SEQUENCE
?
6
Multiple alignment
  • multiple alignment of sequence families to find
    interesting conserved motifs NP-hard gt
    heuristics, Hidden Markov models, MCMC
  • comparison of entire genomes

7
Gene enhancer module prediction
8
Problem
  • Gene expression regulation in multicellular
    organisms is controlled in combinatorial fashion
    by so called transcription factors (TFs).
  • Transcription factors bind to DNA cis-elements
    (TF binding sites) on enhancer modules
    (promoters), and multiple factors need to bind to
    activate the module.
  • In mammals, the modules are few and far
  • The problem Locate functional regulatory
    modules, that is, find interesting patterns.

9
Gene enhancer modules
enhancer module
gene1
gene2
gene3
gene4
DNA
transcription
transcription factors
RNA
translation
Proteins
10
Model of cell type specific regulation of target
gene expression
Common targets (e.g. Patched)
GLI
GLI
Ubiquitously expressed TF
transcription
Cell type specific targets (e.g. N-myc)
GLI
X
Y (tissue specific TFs)
transcription
11
Binding affinity matrices
  • The TF binding sites are represented by affinity
    matrices.
  • A column per position
  • A row per nucleotide
  • Discovered
  • Computationally
  • Traditional wet lab
  • Microarrays

9 11 49 51 0 1 1 4 19 3 0 0
0 45 25 16 5 1 2 0 17 0 4 21
18 36 0 0 34 5 21 10
12
Binding affinity matrices
9 11 49 51 0 1 1 4 19 3 0
0 0 45 25 16 5 1 2 0 17 0 4
21 18 36 0 0 34 5 21 10
13
Determined TF binding profiles ( JASPAR)
14
Finding conserved motifs of binding sites
  • looking at one (human) genome gives too many
    positives
  • comparative genomics approach
  • take the 200 kB regions surrounding the same
    genes (paralogs and orthologs) of different
    mammals human, mouse, chicken,
  • find conserved clusters ( motifs) of binding
    sites
  • cluster group of binding sites with good local
    alignment gt Smith-Waterman type algorithm with
    a novel scoring function

15
Smith-Waterman
  • find the best local alignment of strings A and B
    substring X of A and substring Y of B such that X
    and Y have the best scoring pairwise alignment

Y
X
16
Computational identification of enhancer elements
  • Preserved in evolution
  • Affinities of functional cis-elements.
  • Spatial arrangement of elements within a module.

Human
Mouse
17
Parameter optimization
  • scoring function has 3 free parameters.
  • Find good parameters by greedy hill climbing
    using a training data

18
Whole genome comparisons
  • Whole genomes can be analyzed with our
    implementation EEL (Enhancer Element Locator)
  • We compared human genes to orthologs in mouse,
    rat, chicken, fugu, tetraodon and zebrafish
  • 100 kbp flanking regions on both sides of the
    gene.
  • Coding regions masked out.
  • About 20 000 comparisons for each pair of
    species.

19
Annotating the Human genome with mammalian
enhancer-elements
20
EEL output
  • Output from EEL program.
  • Previously known functional sites are highlighted
  • DNA between the sites is aligned just for the
    output

21
Enhancer prediction for N-myc
200 kb Mouse N-Myc genomic region
200 kb Human N-Myc genomic region
Conserved GLI binding sites in two predicted
enhancer elements, CM5 and CM7
22
Wet-lab verification
  • Selected some predicted enhancer modules for
    wet-lab verification
  • Fused 1kb DNA segment containing the predicted
    enhancer to a marker gene (LacZ) with a minimal
    promoter, and generated transgenic embryos.

23
Enhancer prediction for N-myc
200 kb Mouse N-Myc genomic region
200 kb Human N-Myc genomic region
Conserved GLI binding sites in two predicted
enhancer elements, CM5 and CM7
24
Summary
  • input - 100 kb flanking sequences of DNA of
    orthologous pairs of genes from human and mouse
  • find all good enough TF binding sites from the
    sequences
  • find the best local alignments of the binding
    sites using the EEL scoring function
  • output the sequences in good local alignments
    these are the putative enhancers
  • postprocessing an expert biologist selects the
    most promising predictions for wet lab
    verification hopefully he/she has good luck!

25
Acknowledgements
  • Kimmo Palin
  • Outi Hallikas (Biom)
  • Jussi Taipale (Biom)
Write a Comment
User Comments (0)
About PowerShow.com