Intro to Bioinformatics - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Intro to Bioinformatics

Description:

Tutorial 3 - BLAST Intro to Bioinformatics * * * * * Linear costs are available only with megablast and are determined by the match/mismatch scores ; lookup table ... – PowerPoint PPT presentation

Number of Views:184
Avg rating:3.0/5.0
Slides: 26
Provided by: 4408155
Category:

less

Transcript and Presenter's Notes

Title: Intro to Bioinformatics


1
Tutorial 3 - BLAST
  • Intro to Bioinformatics

2
BLAST
  • What is BLAST?
  • Basic Local Alignment Search Tool
  • Set of similarity search programs for exploring
    sequence databases. 

Database
Query
BLAST program
3
Why perform similarity search?
  • One sequence by itself is not informative it
    must be analyzed by comparative methods against
    existing sequence databases to develop hypothesis
    concerning relatives and function
  • There are 3 possibilities
  • A prefect Match.
  • A similar sequence.
  • Not even one similar sequence.

4
BLAST Databases
Automatically searches opposite strand
The query is genomic, translated to protein using
6 possible reading frames
Name Query type Database
blastn Genomic Genomic
blastp Protein Protein
blastx Translated genomic Protein
tblastn Protein Translated genomic
tblastx Translated genomic Translated genomic
One search in tblastx is like ___ searches of
blastp
5
http//www.ncbi.nlm.nih.gov/BLAST/
6
Place Query
Choose Database
?
7
BLASTN Databases
Gene collection GenBank, EMBL, DDBJ, PDB and NCBI reference sequences (RefSeq)
Genomic Transcript Complete human and mouse genome transcriptome
EST Expressed sequence tags
mito Mitochondrial sequences
vector Vector subset of GenBank
month GenBank, EMBL, DDBJ, PDB from 30 days
Envi Environmental samples
http//www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.sht
mlnucleotide_databases
8
Place Query
Choose Database
Optimize similarity level of the search
Limit output size
?
Threshold for results significance
Primary word match (16-64 nt)
Reward and penalty for matching and mismatching
bases
Cost to create and extend a gap
Remove low information content
Limit search to specific organism
9
Search for homologous to chick olfactory
receptor 6 gene
10
Global Alignments
Local Alignments
Query sequence
Matched Areas of database sequences
11
Sequence description
E value
Score(bits)
Sequence Identifier
Identity
Coverage
12
Score andE value
Identities and gaps
Strand
13
Multiple hits on a same subject
14
Design of the BLAST survey
  • Consider your research question
  • Are you looking for a particular gene in a
    particular species? BLAST against the genome of
    that species.
  • Are you looking for additional members of a gene
    family across all species? BLAST against the
    gene collection database.
  • Are you looking for exact motif matches?
    increase gap penalty or use megablast.

15
Score and E-value
Score (S) ?(identities mismatches)-?gaps
Bit Score (S)
Score
Depends on search space
Depends on scoring system
Query length(bp)
Effective length (total number of bases) of the
database(bp)
16
Score and E-value
  • The score is a measure of the similarity of the
    query to the a sequence from the database.
  • The E-value is a measure of the reliability of
    the score.
  • The definition of the E-value is The probability
    due to chance, that there is another alignment
    with a similarity greater than the given S score.

17
Score and E-value
  • The Size of the E-value
  • The typical threshold for a good E-value from a
    BLAST search is E10-6e-6 or lower.
  • The reason for such low values is that an E0.001
    in a million entry database would still leave
    1000 entries due to chance. An Ee-6 would only
    leave one entry due to chance.

18
Exercise
Calculate the S, S and E for the following BLAST
hit
ACGTCGATCGAGCT AGGTCGTC-GAGGT
  • Given the following parameters
  • Query length 150
  • 1.37
  • K0.711
  • Average Sequence length in database 270
  • Number of sequences in database 4,554,026

S ?(IdMM)-?GP
S 13-1 12 S (1.3712 ln(0.711))/ln(2) S
16.44 0.341 /0.693 S 24.2
19
Exercise
Calculate the S, S and E for the following BLAST
hit
ACGTCGATCGAGCT AGGTCGTC-GAGGT
  • Given the following parameters
  • Query length 150
  • 1.37
  • K0.711
  • Average Sequence length in database 270
  • Number of sequences in database 4,554,026

E 0.711x150x270x4,554,026xe-1.3712 E
131135455683x7.24e-8 E 9504.27
20
Exercise
What will be the minimal score in order to
achieve a significant E value (e-610-6)?
131135455683e-1.37S10-6 ln (131135455683e-1.37S)
ln(10-6) ln (131135455683)ln(e-1.37S)-13.81 25.6
-1.37S-13.81 S -13.81-25.6/-1.37 S 28.76
21
1. ????? ????? ?????????? ??? CFTR ????
22
2. ???? ????? ?????? ??? CFTR ??????? ???????
?????
23
3. ??? CFTR ???? ?????? ,ABC transporters????
???? ?????? ???? ?????? ??ABC transporters


24
4. ?????? ??? ?? ?????, ???? ??????? ????? ??
?-BLAST . ?????? ??????? ?? ????? ?????? ????
??????? ????? (???? ?? ?-Algorithm parameters,
????????? ?? Filters and Masking ???? ?? ????? ??
""Low Complexity regions) ????
gtmy protein MQNSHSGVNQLGGVFVNGRPLPDSTRQKIVELAHSGAR
PCDISRILQVSNGCVSKILGRYYETGSIRPRAIGGSKPRVATPEVVSKIA
QYKRECPSIFAWEIRDRLLSEGVCTNDNIPSVSSINRVLRNLASEKQQMG
ADGMYDKLRMLNGQTGSWGTRPGWYPGTSVPGQPTQDGCQQQEGGGENTN
SISSNGEDSDEAQMRLQLKRKLQRNRTSFTQEQIEALEKEFERTHYPDVF
ARERLAAKIDLPEARIQVWFSNRRAKWRREEKLRNQRRQASNTPSHIPIS
SSFSTSVYQPIPQPTTPVSSFTSGSMLGRTDTALTNTYSALPPMPSFTMA
NNLPMQPPVPSQTSSYSCMLPTSPSVNGRSYDTYTPPHMQTHMNSQPMGT
SGTTSTGLISPGVSVPVQVPGSEPDMSQYWPRLQ
a. ????? ????? ?? BLAST ???????? BLAST
PROTEIN ????? ???? ??????? Swissprot ??? ??????
?????? ???????? ????? ?????? ??? - Paired box
protein Pax-6 . ?????????? ?????? ????
?????????? ?? ???? ????? ?? 731 (Rattus
norvegicus, Human, Bovine). b. ???? ?? ?-BLAST
?-alignments ????? ??? ????? ?? ?????? ???????
??????? ????? ??????. ????? ?? ???????? ????
???????? ?????? ???? ??????? ?????, ??? ???????
???? ???????? ?????? ?? ??????? ??????? ?????,
??? ???? ?????????? ????? ???.
25
5. ????? ?? ?????? RecA ?? E. coli (???? ????
P0A7G6. ???? ????? ?? ???? ???????? ?????, ??
???? ????? ?? ?? ???? ?????). ??? ????? ?? ???
(Saccharomyces cerevisiae . ???? ???????? ??
??????? ?? organism) ?- BLAST ??? ?? ?????? .a
????? ????? ?? BLAST ???????? TBLASTN (nr
Database) .b ?? ?? ?????? ??? ???? ?? ????? ????
?????? RAD57 . c ????? ?????? ???? ?????
BLOSUM62 ???? ?- gap penalty 11,1(????? ?"? ?????
?? ?????? Search Summary ?????? ???? ????? ????
?? ?????? )? d. ???? ????? ?? ????? ???? ???????
14,042,622  
Write a Comment
User Comments (0)
About PowerShow.com