Genome - PowerPoint PPT Presentation

About This Presentation
Title:

Genome

Description:

Genome Nucleus Tissue Cell The chromosomes contains the set of instructions for alive beings The chromosomes are the volumes of an encyclopedia called Genome – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 24
Provided by: LCL6
Learn more at: https://www.cs.upc.edu
Category:
Tags: alignment | genome

less

Transcript and Presenter's Notes

Title: Genome


1
Genome
  • The chromosomes are the volumes of an
    encyclopedia called Genome

2
Chromosome
gthuman chromosome TACGTATACTGCATCGATGCTATACGACGAT
CGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTAC
ACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAC
GATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTGCATC
GATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACG
TTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGCGATGCGAC
GATGCGACGATCGTACGACTGCTAGCTACGCATGCCTGCATCGATGCTAT
ACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTTGCATCG
ATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGT
ACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGA
TCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATC
GTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAG
CTGCATGCTAGCGATGCTACGACGATCGTACGGTACACCGCGCACGATCA
CACGATGCGACGATGCGACGATCGTACGATGCTGCATCGATGCTATACGA
CGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACG
GTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTG
CTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCG
ATGCTACGTACGACGATCGATATTAATGCAATCATGCCGATGCGACGATG
CGACGATCGTACGACTGCTAGCTACGCATGCCTGCATCGATGCTATACGA
CGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTTGCATCGATGC
TATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGA
TCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGT
ACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGC
AGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGC
ATGCTAGCGATGCTACGACGATCGTACGGTACACCGCGCACGATCACACG
ATGCGACGATGCGACGATCGTACGATGCTGCATCGATGCTATACGACGAT
CGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTAC
ACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAG
CTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGC
TACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGAT
GCTACGATCGATGCTATACGACGATCGTAGCTAGCTGCATGCTAGCGATG
CTACGATCGATGCTATACGACGATCGTAGCTTACGACGTACGTTACGTAC
GATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATC
GTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGT
CGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTGCAT
CGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTAC
GTTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGT
ACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACG
ATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTA
CGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCA
ATCATGCAGCTGCATGCTAGCGATGCTACGACGATCGTACGGTACACCGC
GCACGATCACACGATGCGACGATGCGACGATCGTACGATGCTGCATCGAT
GCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTAC
GATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATC
GTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGT
GCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCT
GCATGCTAGCGATGCTACGATCGATGCTATACGACGATCGTAGCTGCAGC
ATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATG
CTAGCGATGCTACGACGATCGTACGGTACACCGCGCACGATCACACGATG
CGACGATGCGACGATCGTACGATGCTGCATCGATGCTATACGACGATCGT
AGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACC
GCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTA
CGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTAC
GTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCT
ACGATCGATGCTATACGACGATCGTAGCTGCTACGCATGCCTACGTACGT
ATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATT
AATGCAATCATGCAGCTGCATGCTAGCGATGCTACGGTACGATCGTCGAT
CGTCAGCTCGATACGTTACGATCTACGATTACGATCATCTATACTATACT
ATACGATATATCTAGATATCGATCTA.ACTCCATTCTTTAAACCGTACTA
CACACACTACTGATCGACGATTACGACGACGAAAGGGCCATATCGGCTAA
CTACATCATAGACAACATCACGGATCGTCTAAGGCCGAGTTAGGTACGAT
TAACGTACGACTACCTATCGTATATACATCACGGATATAACCTATCTACT
ACGATTAACACGATCTATCGTACGGCATATGCATCGTATAGCATCGATTA
GAATACGTATACGTACGATCGTGCATCGATGCTATACGACGATCGTAGCT
ACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGC
ACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCA
TGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTTG
CATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGT
TACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGC
GTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTA
CGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGA
TGCGACGATCGTACGACTGCTAGCTACGCATGCCTGCATCGATGCTATAC
GACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTTGCATCGAT
GCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTAC
GATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATC
GTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGT
GCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCT
GCATGCTAGCGATGCTACGACGATCGTACGGTACACCGCGCACGATCACA
CGATGCGACGATGCGACGATCGTACGATGCTGCATCGATGCTATACGACG
ATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGT
ACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCT
AGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGAT
GCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCG
ATGCTACGATCGCGATGCGACGATGCGACGATCGTACGACTGCTAGCTAC
GCATGCCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTAC
GACGTACGTTACGTTGCATCGATGCTATACGACGATCGTAGCTACGTACG
ATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCA
CACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTAC
GTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATC
GATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGACGATCGT
ACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGA
TGCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACG
TACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGAC
GATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCT
ACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGC
AATCATGCAGCTGCATGCTAGCGATGCTACGATCGATGCTATACGACGAT
CGTAGCTATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTAC
GTTACGTACGATCGTGCATCGATGCTATACGACGATCGTAGCTACGTACG
ATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCA
CACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTAC
TGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTAC
GTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGAT
GCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACG
TACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAAT
CATGCAGCTGCATGCTAGCGATGCTACGGTACGTATCCTACGTACGATCG
TGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGC
TGCATGCTAGCGATGCTACGTACGGTACACCGCGCACGATCACACGATGC
GACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTAT
CCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAA
TGCAATCATGCAGCTGCATGCTAGCGATGCTACGCTGCTAGCTACGCATG
CCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGA
TGCATGCTAGCGATGCTACGACGATCGTACGGTACACCGCGCACGATCAC
ACGATGCGACGATGCGACGATCGTACGATGCTGCATCGATGCTATACGAC
GATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGG
TACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGC
TAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGA
TGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGC
GATGCTACGATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACG
ACGTACGTTACGTACGATCGTGCATCGATGCTATACGACGATCGTAGCTA
CGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCA
CGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCAT
GCCTACTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACG
ACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGC
GACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTAT
CCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAA
TGCAATCATGCAGCTGCATGCTAGCGATGCTGTCACGTAGCATGCTGACG
TACGATCGATTCGATCGATCGTACGATCGTAGCTAGCTAGTCGTAGCGAC
GTAGGATTCACGTAGCGATGCGTAGCGTAGCATGCTGACGATGCATCGAT
CGATGCATCATGCTAGCGTAGCTAGCTAGCATGACTGATCGATTAACGGT
ACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGA
TATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGTACGGTACAC
CGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCT
ACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTA
CGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGC
TACGCTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGC
AGCGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGT
ACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGA
TCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGACGATC
GTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGT
GCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCT
GCATGCTAGCGATGCTACGACGACGATCGATATTAATGCAATCATGCAGC
TGCATGCTAGCGATGCTACGTACGATCGTATGCTAGCTAGCATGCATGCA
TGCATGCAT ..
3
Recuperació de la informació
  • Bioinformatics. Sequence and genome analysis
  • David W. Mount
  • Flexible Pattern Matching in Strings (2002)
  • Gonzalo Navarro and Mathieu Raffinot
  • Algorithms on strings (2001)
  • M. Crochemore, C. Hancart and T. Lecroq
  • http//www-igm.univ-mlv.fr/lecroq/string/index.ht
    ml

4
String Matching
String matching definition of the problem
(text,pattern)
depends on what we have text or patterns
  • Exact matching
  • The patterns ---gt Data structures for the
    patterns
  • 1 pattern ---gt The algorithm depends on p and
    ?
  • k patterns ---gt The algorithm depends on k, p
    and ?
  • Extensions
  • Regular Expressions
  • The text ----gt Data structure for the text
    (suffix tree, ...)
  • Approximate matching
  • Dynamic programming
  • Sequence alignment (pairwise and multiple)
  • Sequence assembly hash algorithm
  • Probabilistic search

Hidden Markov Models
5
Exact string matching one pattern
How does the string algorithms made the search?
For instance, given the sequence CTACTACTACGTCTAT
ACTGATCGTAGCTACTACATGC search for the pattern
ACTGA.
and for the pattern TACTACGGTATGACTAA
6
Exact string matching Brute force algorithm
Example
Given the pattern ATGTA, the search is
G T A C T A G A G G A C G T A T G T A C T G ...
7
Exact string matching Brute force algorithm
Text
Pattern
Text
Pattern
8
Exact string matching one pattern
How does the matching algorithms made the search?
There is a sliding window along the text against
which the pattern is compared
At each step the comparison is made and the
window is shifted to the right.
Which are the facts that differentiate the
algorithms?
  1. How the comparison is made.
  2. The length of the shift.

9
Exact string matching one pattern (text on-line)
Experimental efficiency (Navarro Raffinot)
BNDM Backward Nondeterministic Dawg Matching
?
BOM Backward Oracle Matching
64
32
16
Horspool
8
BOM
BNDM
4
Long. pattern
2
w
2 4 8 16
32 64 128
256
10
Horspool algorithm
We need a preprocessing phase to construct the
shift table.
11
Horspool algorithm example
12
Horspool algorithm example
13
Horspool algorithm example
14
Horspool algorithm example
15
Horspool algorithm example
16
Horspool algorithm example
17
Exemple algorisme de Horspool
18
Qüestions sobre lalgorisme de Horspool
Given the pattern ATGTA, the shift table is
Given a random text over an
equally likely probability distribution (EPD)
1.- Determine the expected shift of the window.
And, if the PD is not equally likely?
2.- Determine the expected number of shifts
assuming a text of length n.
3.- Determine the expected number of comparisons
in the suffix search phase
19
Exact string matching one pattern (text on-line)
Experimental efficiency (Navarro Raffinot)
BNDM Backward Nondeterministic Dawg Matching
?
BOM Backward Oracle Matching
64
32
16
Horspool
8
BOM
BNDM
4
Long. pattern
2
w
2 4 8 16
32 64 128
256
20
BNDM algorithm
21
BNDM algorithm exaple
Given the pattern ATGTA
D1 ( 0 1 0 1 0 )
D2 ( 1 0 1 0 0 ) ( 0 0 0 0 0 ) ( 0 0 0 0 0 )
D1 ( 0 0 1 0 0 )
D2 ( 0 1 0 0 0 ) ( 0 0 1 0 0 ) ( 0 0 0 0 0 )
D1 ( 1 0 0 0 1 )
D2 ( 0 0 0 1 0 ) ( 0 1 0 1 0 ) ( 0 0 0 1 0 )
D3 ( 0 0 1 0 0 ) ( 0 0 1 0 0) ( 0 0 1 0 0 )
D4 ( 0 1 0 0 0 ) ( 0 0 0 0 0) ( 0 0 0 0 0 )
22
Exemple algorisme BNDM
D1 ( 1 0 0 0 1 )
D2 ( 0 0 0 1 0 ) ( 0 1 0 1 0 ) ( 0 0 0 1 0 )
D3 ( 0 0 1 0 0 ) ( 0 0 1 0 0 ) ( 0 0 1 0 0 )
D4 ( 0 1 0 0 0 ) ( 0 1 0 1 0 ) ( 0 1 0 0 0 )
D5 ( 1 0 0 0 0 ) ( 1 0 0 0 1 ) ( 1 0 0 0 0 )
D6 ( 0 0 0 0 0 ) ( ) ( 0 0 0 0
0 )
Trobat!
23
Exemple algorisme BNDM
Given the pattern ATGTA
How the shif is determined?
D1 ( 0 1 0 1 0 )
D2 ( 1 0 1 0 0 ) ( 0 0 0 0 0 ) ( 0 0 0 0 0 )
D1 ( 0 1 0 1 0 )
D2 ( 1 0 1 0 0 ) ( 1 0 0 0 1 ) ( 1 0 0 0 0 )
D3 ( 0 0 0 0 0 ) ( 1 0 0 0 1 ) ( 0 0 0 0 0
)
Write a Comment
User Comments (0)
About PowerShow.com