Title: BRNO, Czech Republic
1BRATISLAVA, Slovakia
BRNO, Czech Republic
PADOVA, Italy
2PRIMEX 1.0 and VPCR 2.0 Processing genomic
sequence data for efficient and accurate
simulation of PCR reactions with genomic DNA as
template
- Matej Lexa1,2, Ivano Zara1, Giorgio Valle1
- 1 CRIBI Biotechnology Center, University of
Padova, via U.Bassi, 58/b, 35131 Padova, Italy - 2 Laboratory of Functional Genomics and
Proteomics, Masaryk University Brno, Kotlarska 2,
61137 Brno, Czech Republic
3SUMMARY
- - full genomic sequence data increasingly
available - - new bioinformatic tools
- - gene identification, regulatory sequence
discovery, similarity searches (short
patterns, genes, genomes) - - PCR reaction prediction (VPCR 2.0)
- - trivial for a well-designed pair of primers
- - difficult in special situations
- - search for primer annealing sites (PRIMEX 1.0)
- - mathematical model of the PCR reaction
- PRIMEX is a tool that can find all relevant
primer-binding sites in a single genome in a
fraction of a second. - VPCR is a set of routines that analyze the output
provided by PRIMEX and run a dynamic mathematical
model of the PCR amplification process.
4(No Transcript)
5(No Transcript)
6ACGATGACGATGCAGCAGCGATGACGCGACTGTAGC TGCAGCATGACGA
CTTTAGATGACGATGACGCGAGA ACGATGACGATGCAGCAGCGATGACG
CGACTGTAGC TGCAGCATGACGACTTTAGATGACGATGACGCGAGA AC
GATGACGATGCAGCAGCGATGACGCGACTGTAGC TGCAGCATGACGACT
TTAGATGACGATGACGCGAGA ACGATGACGATGCAGCAGCGATGACGCG
ACTGTAGC TGCAGCATGACGACTTTAGATGACGATGACGCGAGA ACGA
TGACGATGCAGCAGCGATGACGCGACTGTAGC TGCAGCATGACGACTTT
AGATGACGATGACGCGAGA ACGATGACGATGCAGCAGCGATGACGCGAC
TGTAGC TGCAGCATGACGACTTTAGATGACGATGACGCGAGA
7(No Transcript)
8(No Transcript)
9HRÁCI primer templát DNA
polymeráza CYKLY denaturace annealing
polymerizace
10SUMMARY
- - full genomic sequence data increasingly
available - - new bioinformatic tools
- - gene identification, regulatory sequence
discovery, similarity searches (short
patterns, genes, genomes) - - PCR reaction prediction (VPCR 2.0)
- - trivial for a well-designed pair of primers
- - difficult in special situations
- - search for primer annealing sites (PRIMEX 1.0)
- - mathematical model of the PCR reaction
- PRIMEX is a tool that can find all relevant
primer-binding sites in a single genome in a
fraction of a second. - VPCR is a set of routines that analyze the output
provided by PRIMEX and run a dynamic mathematical
model of the PCR amplification process.
11PRIMEX 1.0
ATATTAGATACGTTGACG easily found
by BLAST GCGGTAGATACGTTGAAC ATACTAGGTACGTAGAAC
not found by
BLAST GTATTAGATACGTTGAAC
BLAST is not designed to identify primer binding
sites
12PRIMEX 1.0
- - detects similarity to short sequences (12-40bp)
in entire genomes - - designed for speed and full-coverage
- - splits queries into words (typically 8 or 10
bases) - - creates a memory-resident word lookup table
- - operates in server mode
- - can search a 100MB genome for an exact 20bp
match instantly - - about 4 seconds when allowing 4 mismatches
- http//bioinformatics.cribi.unipd.it/primex/
13PRIMEX 1.0
Flow chart of the basic functions of
the Primex 1.0 server
14PRIMEX 1.0
ATAGTAGGTCCGTCGATA 18 bp -gt 3 words of 6bp
each allowing m11 mismatch per word the
worst case scenario is GTATTAGGTACGTTGACA 5
mismatches A sequence with 6 mismatches may or
may not be found, depending on the position of
individual mismatches GTATTAGATACGTTGACA not
found GTATTAGGTACGTTGACG found
Splitting queries into smaller words speeds up
search, but limits the number of mismatches
guaranteed to be found by the algorithm.
15PRIMEX 1.0
The sensitivity limit mmax (the maximal number of
mismatches guaranteed to be found by PRIMEX)
calculated for different m1 (mismatches allowed
in a lookup word) and w (word length). Values
were calculated using Equation 4. Values shown in
bold fall into the range 3-8 and are thought to
represent settings of PRIMEX with sensitivity
appropriate for PCR primer searches. Values
outside this range are shown in gray.
16PRIMEX 1.0
Primex 1.0 WWW client interface
17PRIMEX 1.0
0 ATGAAAATGTTTATGCCCGG TTGAAAATGTTTATGCACCG
15217430 10431147 17 0 ATGAAAATGTTTATGCCCGG
AAGAAAATGTCTATGCCCGA 15237134 110303864 17 0
ATGAAAATGTTTATGCCCGG ATGAACATGCTTATGCCCGA
15224037 35273639 17 0 CCGGGCATAAACATTTTCAT
CCGGGAATAATCATTTTCAA 15228160 49317860 - 17 0
CCGGGCATAAACATTTTCAT CCGGTCAAAAACATTTTCAG
15217430 16374988 - 17 0 ATGAAAATGTTTATGCCCGG
ATGAAAGTGGTCGTGCTCGG AE004438 2221653 15 0
ATGAAAATGTTTATGCCCGG ATGAACGTGTTGCTGGCCGG
AE004437 1019964 15 0 ATGAAAATGTTTATGCCCGG
ACGTAGCTTTTTATGCCCGG 2822278 2496447 15 0
CCGGGCATAAACATTTTCAT CAGGGCATCCACATCCTCAT
AE004438 2190083 - 15 0 CCGGGCATAAACATTTTCAT
GCGGGCATAGACATTCCGAT AE004437 572380 - 15 0
CCGGGCATAAACATTTTCAT CCGGACATACCCATTTTTGT
AE004437 168142 - 15 0 CCGGGCATAAACATTTTCAT
CCGAGTATAAATCTTTTCTT AE004437 1344099 15
The results of an oligonucleotide search with
PRIMEX 1.0 provide the sequences of the
oligonucleotide and the corresponding genomic
sequence, name or number of the clone in which
similarity occurs, position within the henomic
sequence, DNA strand containing the similarity
and the number of matching bases
18PRIMEX 1.0
MISMATCHES PROGRAM TOTAL 0
1 2 3 4 5 6 SEARCH
TIME BLAST 162 1 0 12 s BLAST-O
2 1 5 s FASTA 72 1 0 4 13
23 17 12 53 s FASTA-O 1 1 19
s BLAT 0 0 80 s SSAHA 0 0 71
s SSAHA-O 3 1 10 s CGC FP-O
1 1 10 s EMBOSS 983 1 0 5 104
86 8 18 s EMBOSS-O 1 1 14
s TACG 1 1 49 s PRIMEX 214 1 0
14 199 56 s PRIMEX-S (2,5) 12140 1
0 14 199 1686 10240 19 s PRIMEX-S
(2,4) 1900 1 0 14 199 1686 4
s PRIMEX-S (2,3) 214 1 0 14 199
1 s PRIMEX-S (2,2) 15 1 0 14
lt 1 s PRIMEX-S (2,1) 1 1 0
ltlt 1 s PRIMEX-SO (0,0) 1 1 ltlt 1
s
Performance of various search programs when
looking for oligonucleotide AAAAAATG ATCAATTTACAT
in the Arabidopsis thaliana genome. The -O suffix
represents settings when programs are expected to
return exact matches or only a small number of
high-similarity matches. The -S suffix indicates
that the program was started as a server before
the query. The numbers in parentheses are m1 and
m2 settings of PRIMEX.
19SUMMARY
- - full genomic sequence data increasingly
available - - new bioinformatic tools
- - gene identification, regulatory sequence
discovery, similarity searches (short
patterns, genes, genomes) - - PCR reaction prediction (VPCR 2.0)
- - trivial for a well-designed pair of primers
- - difficult in special situations
- - search for primer annealing sites (PRIMEX 1.0)
- - mathematical model of the PCR reaction
- PRIMEX is a tool that can find all relevant
primer-binding sites in a single genome in a
fraction of a second. - VPCR is a set of routines that analyze the output
provided by PRIMEX and run a dynamic mathematical
model of the PCR amplification process.
20VPCR 2.0
- - highly improved version of VPCR 1.0 (Lexa et
al., 2001) - - predicts PCR products for a set of primers and
template - - possible priming identified by BLAST or PRIMEX
- - Tm calculated using the nearest neighbor method
- - amplification evaluated using a mathematical
simulation model - - partitions PCR primers into bound and unbound
fractions according to Tm - - binding sites must be properly located -gt
amplicons - - detailed output
- Lexa M, Horak J, Brzobohaty B (2001). Virtual
PCR. Bioinformatics 17, 192-193. - http//elanor.sci.muni.cz/cgi-bin/vpcr2.cgi
(stable features) - http//grup.cribi.unipd.it/cgi-bin/mateo/vpcr2.cg
i (testing)
21VPCR 2.0
- VPCR 2.0 WWW interface where primer sequences are
entered
22 PCR (AMPLICON_NoPRODUCT_CONCM SIZE
FASTA_CLONE BEG_POS END_POS UPPER_PRIMER_No
LOWER_PRIMER_No) Read 2 primers, 31 matches and 1
amplicons. TEMP50oC PCR limited by Taq in cycle
19 01e-06 1187 NC_003070.1 6577885 6579072 0
1 PRIMERS GTTTTGCTAAGGTCTTGGCCTCTATACAT
GTTGGTGAGGTCATGAGGATGGAGATTC MATCHES (PRIMER_No
PRIMER_SEQ TEMPLATE_SEQ FASTA_CLONE BEG_POS
STRAND SCORE M_TEMPoC) 0 AAGGTCTTGGCCTC
AAGGTCTTGGCCTC NC_003074.1 4052449 - 28.2 53.5614
0 AAGGTCTTGGCCTCTATA AAGGTCTTGGACTCTATA
NC_003070.1 19171321 - 28.2 43.1923 0
GTTTTGCTAAGGTCTTGG GTTTTGCTAATGTCTTGG NC_003070.1
10404776 28.2 49.6022 0 GTTTTGCTAAGGTCTTGGCCTCT
ATACAT GTTTTGCTAAGGTCTTGGCCTCTATACAT NC_003070.1
6577885 58 66.8459 0 TTGGCCTCTATACA
TTGGCCTCTATACA NC_003076.1 21496944 28.2
47.5339 0 TTTGCTAAGGTCTT TTTGCTAAGGTCTT
NC_003071.1 2021450 - 28.2 46.0847 0
TTTTGCTAAGGTCT TTTTGCTAAGGTCT NC_003076.1
15035528 - 28.2 46.0847 1 AGGTCATGAGGATG
AGGTCATGAGGATG NC_003070.1 11195930 28.2
49.2183 1 ATGAGGATGGAGAT ATGAGGATGGAGAT
NC_003070.1 10547021 - 28.2 47.0381 1
ATGAGGATGGAGAT ATGAGGATGGAGAT NC_003070.1
17772190 - 28.2 47.0381 1 ATGAGGATGGAGAT
ATGAGGATGGAGAT NC_003070.1 25369149 28.2
47.0381 1 ATGAGGATGGAGAT ATGAGGATGGAGAT
NC_003071.1 1609787 28.2 47.0381 1
ATGAGGATGGAGAT ATGAGGATGGAGAT NC_003074.1 9753848
- 28.2 47.0381 1 ATGAGGATGGAGAT ATGAGGATGGAGAT
NC_003076.1 18619666 28.2 47.0381 1
ATGAGGATGGAGATT ATGAGGATGGAGATT NC_003070.1
22070521 30.2 48.8308 1 ATGAGGATGGAGATT
ATGAGGATGGAGATT NC_003075.1 3448770 - 30.2
48.8308 1 ATGAGGATGGAGATT ATGAGGATGGAGATT
NC_003076.1 2296392 - 30.2 48.8308 1
ATGAGGATGGAGATT ATGAGGATGGAGATT NC_003076.1
24175011 30.2 48.8308 1 ATGAGGATGGAGATTC
ATGAGGATGGAGATTC NC_003074.1 5095219 - 32.2
51.1376 1 CATGAGGATGGAGA CATGAGGATGGAGA
NC_003071.1 4243804 28.2 48.791 1
CATGAGGATGGAGAT CATGAGGATGGAGAT NC_003071.1
14914479 30.2 50.068 1 CATGAGGATGGAGAT
CATGAGGATGGAGAT NC_003071.1 3587863 30.2 50.068
1 CATGAGGATGGAGAT CATGAGGATGGAGAT NC_003074.1
5281258 30.2 50.068 1 GGTGAGGTCATGAG
GGTGAGGTCATGAG NC_003071.1 10969069 - 28.2
50.8517 1 GTTGGTGAGGTCAT GTTGGTGAGGTCAT
NC_003070.1 2088336 28.2 50.4802 1
GTTGGTGAGGTCATGAGGATGGAGATTC GTTGGTGAGGTCATGAGGATG
GAGATTC NC_003070.1 6579072 - 56 68.6215 1
TCATGAGGATGGAGATTC TCATGAGGATGGAGATTC NC_003074.1
20049188 - 36.2 55.4072 1 TGAGGATGGAGATT
TGAGGATGGAGATT NC_003071.1 11460130 - 28.2
47.4391 1 TGAGGATGGAGATT TGAGGATGGAGATT
NC_003074.1 2234673 28.2 47.4391 1
TGGTGAGGTCATGAGGAT TGGTGAGGTTATGAGGAT NC_003075.1
1824156 - 28.2 49.4059 1 TGGTGAGGTCATGAGGAT
TGGTGAGGTTATGAGGAT NC_003075.1 1833543 - 28.2
49.4059 AMPLICONS (FASTA_CLONE SIZE MATCH_No
PRIMER_SEQ TEMPLATE_SEQ BEG_POS SCORE M_TEMPoC
MATCH_No PRIMER_SEQ TEMPLATE_SEQ BEG_POS SCORE
M_TEMPoC) NC_003070.1 1187 3 GTTTTGCTAAGGTCTTGGC
CTCTATACAT GTTTTGCTAAGGTCTTGGCCTCTATACAT 6577885
58 66.8459 25 GTTGGTGAGGTCATGAGGATGGAGATTC
GTTGGTGAGGTCATGAGGATGGAGATTC 6579072 56 68.6215
Entering VPCR 2.0 script Mon Mar 17 145510 MET
2003 Read 16 potential primer sequences. Processin
g.................................................
. Primers expanded to 16 sequences. 2 primers
found in expanded sequences. Proceeding with
BLAST search for candidate primer annealing
sites. Takes 10-20 seconds per primer, please
wait. Read 2 primer sequences. DATALIB
at.fna Executing BLAST search for each
primer. Searching ... Blast search
complete. Extracting data .. CLONES FOUND. Mon
Mar 17 145523 MET 2003
23VPCR 2.0
- Primers containing a common sequence bind at many
locations in the genome, therefore many PCR
products are possible. In this example primers
ATTTGGGTTATTAATTGAGAAA and GCGACTGAGCAAGACTC were
used with human chromosome 15 sequence. When run
against the whole human genome, the amount of
products was so high, that none exceeded the
threshold of visibility on the gel. The reaction
ran out of Taq polymerase in cycle 9.
24VPCR 1.0
VPCR 2.0
- Comparison of VPCR 1.0 and VPCR 2.0 simulation
results with PCR reactions carried out using a
set of Arabidopsis primers and genomic DNA (Lexa
et al., 2001)