Title: Using BLAST options to refine a search
1- Using BLAST options to refine a search
- Address the question how many of the
Phytophthora/tomato interaction ESTs are tomato? - A Will depend on conditions. E-value lt1 x 10-8
match length gt 200 bp identities gt 95
match overlap gt 50 2100 (54) show match with
1622 unique ESTs. - Can the question be more easily addressed by
refining BLAST search? - Other BLAST options.
2 ./blastall.exe
-e Expectation value ltEgt Real default 10.0
3 ./blastall.exe
-m alignment view options 0 pairwise 1
query-anchored showing identities . . . 7 XML
Blast output 8 tabular 9 tabular with comment
lines
4(No Transcript)
5(No Transcript)
6Run nucleotide BLAST (blastn)
/cygdrive/c/Blast/bin/blastall -p blastn -d
./TA496Seq1.txt -i ./tomatosequence.txt o
OUTE2.txt e 0.01 grep c Strand
OUTE2.txt 3 (with default this was 82)
/cygdrive/c/Blast/bin/blastall -p blastn -d
./TA496Seq1.txt -i ./PhytophSeq1.txt o
PhytOUTE1.txt e 1e-8 grep c Strand
PhytOUTE1.txt 108,787 (with default this was
292,568) NOTE the blast which compares 3,921
sequences to a database of 116,711 sequences will
take some time (15 minutes on my laptop).
7 Searching........................................
..........done
Score
E Sequences producing significant alignments
(bits) Value gi9292199gbBE354
223.1BE354223 EST355566 tomato flower buds, ...
1237 0.0 gi16248018gbBI933546.1BI933546
EST553435 tomato flower, anth... 1017 0.0
gi4384985gbAI489614.1AI489614 EST247953
tomato ovary, TAMU S... 908 0.0
gtgi9292199gbBE354223.1BE354223 EST355566
tomato flower buds, anthesis, Cornell University
Solanum lycopersicum cDNA clone cTOD9L3, mRNA
sequence Length 632 Score 1237 bits (624),
Expect 0.0 Identities 630/632 (99) Strand
Plus / Plus
Query 1504
gactggctagaatggctgcaatcatggcatctacttacaaggcttatctt
ggcgtcggac 1563
Sbjct 1
gactggctagaatggctgcaatcatggcatctacttacaaggcttatct
tggcgtcggac 60
Query
1564 ttggtccactatcatttttgacgcagtatagaataccacatcctg
gaagagttggtggaa 1623
Sbjct 61 ttggtccactatcatttttgacgcagt
atagaataccacatcctggaagagttggtggaa 120
8Run nucleotide BLAST (blastn)
/cygdrive/c/Blast/bin/blastall -p blastn -d
./TA496Seq1.txt -i ./tomatosequence.txt o
OUTE2.txt m 8
8 tabular format
-m alignment view options
9querry start/end
length/mismatch
Slycopersicum.sequence gi9292199gbBE354223.1BE
354223 99.68 632 2 0 1504 2135 1 632 0.0
1237 Slycopersicum.sequence gi16248018gbBI9335
46.1BI933546 99.62 521 2 0 1668 2188 1
521 0.0 1017 Slycopersicum.sequence gi4384985g
bAI489614.1AI489614 99.57 466 2 0 1818 2283
1 466 0.0 908
Subject start/end
gap openings
identities
e-value
bit score
10- tblastn
- Running BLAST against a protein or peptide
(translated BLAST vs nucleotide data) - /cygdrive/c/Blast/bin/blastall -p tblastn -d
./TA496Seq1.txt -i ./SB7-15-13.txt o
PEPTIDEOUT.txt (e ) - Try
- /cygdrive/c/Blast/bin/blastall -p tblastn -d
./TA496Seq1.txt -i ./SB7-15-13-Pep4A.txt o
PEPTIDEOUT.txt - Then Try
- /cygdrive/c/Blast/bin/blastall -p tblastn -d
./TA496Seq1.txt -i ./SB7-15-13-Pep4A.txt o
PEPTIDEOUT.txt e 50
11From Xiaodong Other useful BLAST options (1)Â Â
-b integer number of database sequence to show
alignments for. The default value is 250. To
give it a smaller number will effectively reduce
the size of the output file and make the BLAST
searches faster. (2)Â Â -v integer number of
database sequences to show one-line descriptions
for. The default value is 500. A smaller number
for -v option will have a similar effect as the
-b. (3)Â Â -a integer number of processor to
use. Most laptops have only one processor. But
if they use BLAST program in a linux workstation
with multiple processors, use all processors will
drastically reduce the execution time.
12From Xiaodong Other useful BLAST options (4)Â Â
-m 7 will give results in XML format, which is
useful if the users will import the BLAST output
results into the Blast2GO for GO assignment and
metabolic pathway predictions. (5)Â Â Â -l string
Restrict search of database to list of GIs (gene
index), a specific identifier for each sequence
in GenBank. The string is the name of the file
containing all the GIs of the sequences of the
subset you want to search against. Use this
option for searches against subsets of a large
database without creating multiple databases.Â
The advantage of doing this is that the E values
for all the searches against the subsets are
comparable. If the subsets were individual
databases, the sizes are different making E
values incomparable between the searches.
13(No Transcript)