Motif discovery - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Motif discovery

Description:

Tutorial 5 Motif discovery Prosite - input Input motif a regular expression Database Filters Prosite - Output Input motif Location in the protein sequence protein ... – PowerPoint PPT presentation

Number of Views:270
Avg rating:3.0/5.0
Slides: 36
Provided by: acil150
Category:

less

Transcript and Presenter's Notes

Title: Motif discovery


1
Tutorial 5
Motif discovery
2
Multiple sequence alignments and motif discovery
  • Motif discovery
  • MEME
  • MAST
  • TOMTOM
  • GOMO
  • PROSITE

3
Can we find motifs using multiple sequence
alignment?
..YDEEGGDAEE.. ..YDEEGGDAEE.. ..YGEEGADYED.. ..
YDEEGADYEE.. ..YNDEGDDYEE.. ..YHDEGAADEE..
  • Motif
  • A widespread pattern with a biological
    significance

1 2 3 4 5 6 7 8 9 10
A 0 0 0 0 0 3/6 1/6 2/6 0 0
D 0 3/6 2/6 0 0 1/6 5/6 1/6 0 1/6
E 0 0 4/6 1 0 0 0 0 1 5/6
G 0 1/6 0 0 1 1/3 0 0 0 0
H 0 1/6 0 0 0 0 0 0 0 0
N 0 1/6 0 0 0 0 0 0 0 0
Y 1 0 0 0 0 0 3/6 3/6 0 0
4
Can we find motifs using multiple sequence
alignment (MSA)?
YES!
NO
5
Using MSA for motif discovery
  • Can only work if things align nicely alone
  • For most motifs this is not the case!

6
ClustalW - Input
http//www.ebi.ac.uk/Tools/clustalw2/index.html
Input sequences
Scoring matrix
Gap scoring
Output format
Email address
7
Muscle
http//www.ebi.ac.uk/Tools/muscle/index.html
Input sequences
Output format
Email address
8
Motif search from de-novo motifs to motif
annotation
gapped motifs
Large DNA data
http//meme.sdsc.edu/
9
MEME Multiple EM for Motif finding
  • http//meme.sdsc.edu/
  • Motif discovery from unaligned sequences
  • Genomic or protein sequences
  • Flexible model of motif presence (Motif can be
    absent in some sequences or appear several times
    in one sequence)

Expectation-maximization
10
MEME - Input
Email address
How many times in each sequence?
Input file (fasta file)
Range of motif lengths
How many motifs?
How many sites?
11
MEME - Output
Motif score
12
MEME - Output
Motif score
Motif length
Number of times
13
MEME - Output
Low uncertainty High information content
14
MEME - Output
Multilevel Consensus
15
Patterns can be presented as regular expressions
  • AG-x-V-x(2)-YW
  • - Either residue
  • x - Any residue
  • x(2) - Any residue in the next 2 positions
  • - Any residue except these
  • Examples AYVACM, GGVGAA

16
MEME - Output
Position in sequence
Strength of match
Sequence names
Motif within sequence
17
MEME - Output
Sequence names
Motif location in the input sequence
Overall strength of motif matches
18
What can we do with motifs?
  • MAST - Search for them in non annotated sequence
    databases (protein and DNA)
  • TOMTOM - Find the protein who binds the DNA
    motifs.
  • GOMO - Find putative target genes (DNA) of motifs
    and analyze their associated annotation terms.
  • PROSITE - Search for them in annotated protein
    sequence databases.

19
MAST
http//meme.sdsc.edu/meme4_4_0/cgi-bin/mast.cgi
  • Searches for motifs (one or more) in sequence
    databases
  • Like BLAST but motifs for input
  • Similar to iterations of PSI-BLAST
  • Profile defines strength of match
  • Multiple motif matches per sequence
  • Combined E value for all motifs
  • MEME uses MAST to summarize results
  • Each MEME result is accompanied by the MAST
    result for searching the discovered motifs on the
    given sequences.

20
MAST - Input
Email address
Database
Input file (motifs)
21
MAST - Output
Input motifs
Presence of the motifs in a given database
22
TOMTOM
http//meme.sdsc.edu/meme/doc/tomtom.html
  • Searches one or more query DNA motifs against one
    or more databases of target motifs, and reports
    for each query a list of target motifs, ranked
    by p-value.
  • The output contains results for each query, in
    the order that the queries appear in the input
    file.

23
TOMTOM - Input
Input motif
Background frequencies
Database
24
DNA IUPAC code
  • A --gt adenosine M --gt A C (amino)
  • C --gt cytidine S --gt G C
    (strong)
  • G --gt guanine W --gt A T
    (weak)
  • T --gt thymidine
  • B --gt G T C D --gt G A T
  • R --gt G A (purine) H --gt A C T
  • Y --gt T C (pyrimidine) V --gt G C A
  • K --gt G T (keto) N --gt A G C T
    (any)

Example YCAY TCCATC
IUPAC International Union of Pure and Applied
Chemistry
25
TOMTOM - Output
Input motif
Matching motifs
26
TOMTOM Output
Wrong input, ok results
27
JASPAR
  • Profiles
  • Transcription factor binding sites
  • Multicellular eukaryotes
  • Derived from published collections of experiments
  • Open data accesss

28
logo
score
organism
Name of gene/protein
29
GOMO
  • GOMO takes DNA binding motifs to find putative
    target genes and analyze their associated GO
    terms. A list of significant GO terms that can be
    linked to the given motifs will be produced.
  • GOMO returns a list of GO-terms that are
    significantly associated with target genes of the
    motif.
  • Gene Ontology provides a controlled vocabulary to
    describe gene and gene product attributes in any
    organism.

30
GOMO - Input
Email address
Database
Input file (motifs)
31
GOMO - Output
Input motifs
GO annotation
MF - Molecular function BP - Biological
process  CC - Cellular compartment
32
Prosite
http//www.expasy.org/tools/scanprosite
  • ProSite is a database of protein domains and
    motifs that can be searched by either regular
    expression patterns or sequence profiles.

33
(No Transcript)
34
Prosite - input
Input motif a regular expression
Database
Filters
35
Input motif
Prosite - Output
Location in the protein sequence
protein
Write a Comment
User Comments (0)
About PowerShow.com