Local%20Multiple%20Sequence%20Alignment%20Sequence%20Motifs - PowerPoint PPT Presentation

About This Presentation
Title:

Local%20Multiple%20Sequence%20Alignment%20Sequence%20Motifs

Description:

Local Multiple Sequence Alignment Sequence Motifs – PowerPoint PPT presentation

Number of Views:128
Avg rating:3.0/5.0
Slides: 28
Provided by: esti86
Category:

less

Transcript and Presenter's Notes

Title: Local%20Multiple%20Sequence%20Alignment%20Sequence%20Motifs


1
Local Multiple Sequence AlignmentSequence Motifs
2
Motifs
  • Motifs represent a short common sequence
  • Regulatory motifs (TF binding sites)
  • Functional site in proteins (DNA binding motif)

3
Regulatory Motifs
  • DNA in every cell is identical
  • Different cells have different functions
  • Transcription is crucial aspect of regulation
  • Transcription factors (TFs) affect transcription
    rates
  • TFs bind to regulatory motifs
  • Motifs are 6 20 nucleotides long
  • Activators and repressors
  • Usually located near target gene, mostly upstream

Transcription Start Site
SBF
MCM1
Gene X
SBF motif
MCM1 motif
4
E. Coli promoter sequences
5
Challenges
  • How to recognize a regulatory motif?
  • Can we identify new occurrences of known motifs
    in genome sequences?
  • Can we discover new motifs within upstream
    sequences of genes?

6
1. Motif Representation
  • Exact motif CGGATATA
  • Consensus represent only deterministic
    nucleotides.
  • Example HAP1 binding sites in 5 sequences.
  • consensus motif CGGNNNTANCGG
  • N stands for any nucleotide.
  • Representing only consensus loses information.
    How can this be avoided?

CGGATATACCGG CGGTGATAGCGG CGGTACTAACGG CGGCGGTAACG
G CGGCCCTAACGG ------------ CGGNNNTANCGG
7
Transcription start site
Consensus considerations
-35 hexamer
-10 hexamer
spacer
interval
TTGACA
TATAAT
15 - 19 bases
5 - 9 bases
A weight matrix contains more information
2
3
4
5
6
1
2
3
4
5
6
1
A
A
0.1 0.1 0.1 0.5 0.2 0.5
T
0.7 0.7 0.2 0.2 0.2 0.2
T
G
0.1 0.1 0.5 0.1 0.1 0.2
G
C
0.1 0.1 0.2 0.2 0.5 0.1
C
-35
-10
Based on 450 known promoters
8
PSPM Position Specific Probability Matrix
  • Represents a motif of length k
  • Defines PiA,C,G,T for i1,..,k.
  • Pi (A) frequency of nucleotide A in position i.

1 2 3 4 5
A 0.1 0.25 0.05 0.7 0.6
C 0.3 0.25 0.8 0.1 0.15
T 0.5 0.25 0.05 0.1 0.05
G 0.1 0.25 0.1 0.1 0.2
9
PSPM Position Specific Probability Matrix
  • Represents a motif of length k
  • Defines PiA,C,G,T for i1,..,k.
  • Pi (A) frequency of nucleotide A in position i.
  • Each k-mer is assigned a probability.
  • Example P(TCCAG)0.50.250.80.70.2

1 2 3 4 5
A 0.1 0.25 0.05 0.7 0.6
C 0.3 0.25 0.8 0.1 0.15
T 0.5 0.25 0.05 0.1 0.05
G 0.1 0.25 0.1 0.1 0.2
10
Graphical Representation Sequence Logo
  • Horizontal axis position of the base in the
    sequence.
  • Vertical axis amount of information.
  • Letter stack order indicates importance.
  • Letter height indicates frequency.
  • Consensus can be read across the top of the
    letter columns.

11
2. Identification of Known Motifs within Genomic
Sequences
  • Motivation
  • identification of new genes controlled by the
    same TF.
  • Infer the function of these genes.
  • enable better understanding of the regulation
    mechanism.

12
Detecting a Known Motif within a Sequence using
PSPM
  • The PSPM is moved along the query sequence.
  • At each position the sub-sequence is scored for a
    match to the PSPM.
  • Example
  • sequence ATGCAAGTCT

1 2 3 4 5
A 0.1 0.25 0.05 0.7 0.6
C 0.3 0.25 0.8 0.1 0.15
T 0.5 0.25 0.05 0.1 0.05
G 0.1 0.25 0.1 0.1 0.2
13
Detecting a Known Motif within a Sequence using
PSPM
  • The PSPM is moved along the query sequence.
  • At each position the sub-sequence is scored for a
    match to the PSPM.
  • Example
  • sequence ATGCAAGTCT
  • Position 1 ATGCA 0.10.250.10.10.61.510-4

1 2 3 4 5
A 0.1 0.25 0.05 0.7 0.6
C 0.3 0.25 0.8 0.1 0.15
T 0.5 0.25 0.05 0.1 0.05
G 0.1 0.25 0.1 0.1 0.2
14
Detecting a Known Motif within a Sequence using
PSPM
  • The PSPM is moved along the query sequence.
  • At each position the sub-sequence is scored for a
    match to the PSPM.
  • Example
  • sequence ATGCAAGTCT
  • Position 1 ATGCA 0.10.250.10.10.61.510-4
  • Position 2 TGCAA 0.50.250.80.70.60.042

1 2 3 4 5
A 0.1 0.25 0.05 0.7 0.6
C 0.3 0.25 0.8 0.1 0.15
T 0.5 0.25 0.05 0.1 0.05
G 0.1 0.25 0.1 0.1 0.2
15
Detecting a Known Motif within a Sequence using
PSSM
  • Is it a random match, or is it indeed an
    occurrence of the motif?
  • PSPM -gt PSSM (Probability Specific Scoring
    Matrix)
  • odds score matrix Oi(n) where n? A,C,G,T for
    i1,..,k
  • defined as Pi(n)/P(n), where P(n) is background
    frequency.
  • Oi(n) increases gt higher odds that n at position
    i is part of a real motif.

16
PSSM as Odds Score Matrix
  • Assumption the background frequency of each
    nucleotide is 0.25.
  • Original PSPM (Pi)
  • Odds Matrix (Oi)
  • Going to log scale we get an additive score,Log
    odds Matrix (log2Oi)

1 2 3 4 5
A 0.1 0.25 0.05 0.7 0.6
1 2 3 4 5
A 0.4 1 0.2 2.8 2.4
1 2 3 4 5
A -1.322 0 -2.322 1.485 1.263
17
Calculating using Log Odds Matrix
  • Odds ? 0 implies random match Odds gt 0 implies
    real match (?).
  • Example sequence ATGCAAGTCT
  • Position 1 ATGCA -1.320-1.32-1.321.26-2.7odd
    s 2-2.70.15
  • Position 2 TGCAA101.681.481.26
    5.42odds25.4242.8

1 2 3 4 5
A -1.32 0 -2.32 1.48 1.26
C 0.26 0 1.68 -1.32 -0.74
T 1 0 -2.32 -1.32 -2.32
G -1.32 0 -1.32 -1.32 -0.32
18
Calculating the probability of a Match
  • ATGCAAG
  • Position 1 ATGCA 0.15

19
Calculating the probability of a Match
  • ATGCAAG
  • Position 1 ATGCA 0.15
  • Position 2 TGCAA 42.3

20
Calculating the probability of a Match
  • ATGCAAG
  • Position 1 ATGCA 0.15
  • Position 2 TGCAA 42.3
  • Position 3 GCAAG 0.18

21
Calculating the probability of a match
  • ATGCAAG
  • Position 1 ATGCA 0.15
  • Position 2 TGCAA 42.3
  • Position 3 GCAAG 0.18

P (1) 0.003 P (2) 0.993 P (3) 0.004
P (i) S / (? S) Example 0.15 /(.1542.8.18)0.0
03
22
Building a PSSM
  • Collect all known sequences that bind a certain
    TF.
  • Align all sequences (using multiple sequence
    alignment).
  • Compute the frequency of each nucleotide in each
    position (PSPM).
  • Incorporate background frequency for each
    nucleotide (PSSM).

23
PROBLEMS
  • When searching for a motif in a genome using PSSM
    or other methods the motif is usually found all
    over the place
  • -gtThe motif is considered real if found in the
    vicinity of a gene.
  • Checking experimentally for the binding sites of
    a specific TF (location analysis) the sites
    that bind the motif are in some cases similar to
    the PSSM and sometimes not!

24
3. Finding new Motifs
  • We are given a group of genes, which presumably
    contain a common regulatory motif.
  • We know nothing of the TF that binds to the
    putative motif.
  • The problem discover the motif.

25
Difficulties in Computational Identification
  • Each motif can appear in any of m-k
    columnsthere are (m-k)n possibilities.
  • NoiseMismatches are allowed, the motif is not
    exact.Not all sequences contain the motif.
  • Statistical significancek is short (6-20
    nucleotides).m ranges from 10s (prokaryotes) to
    1000s (eukaryotes) of nucleotides.gt a random
    motif can appear by chance in sequences.

26
Computational Methods
  • This problem has received a lot of attention from
    CS people.
  • Methods include
  • Probabilistic methods hidden Markov models
    (HMMs), expectation maximization (EM), Gibbs
    sampling, etc.
  • Enumeration methods problematic for inexact
    motifs of length kgt10.
  • Current status Problem is still open.

27
Tools on the Web
  • MEME Multiple EM for Motif Elicitation.
    http//meme.sdsc.edu/meme/website/
  • metaMEME- Uses HMM method
  • http//meme.sdsc.edu/meme
  • MAST-Motif Alignment and Search Tool
  • http//meme.sdsc.edu/meme
  • TRANSFAC - database of eukaryotic cis-acting
    regulatory DNA elements and trans-acting factors.
    http//transfac.gbf.de/TRANSFAC/
  • eMotif - allows to scan, make and search for
    motifs in the protein level.
  • http//motif.stanford.edu/emotif/
Write a Comment
User Comments (0)
About PowerShow.com