de novo Sequence Analysis - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

de novo Sequence Analysis

Description:

PSI-BLAST sensitive protein searching. Multiple sequence alignment (proteins) ... Local: Smith-Waterman; finds region(s) of highest. similarity and build outward ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 29
Provided by: benjam92
Category:

less

Transcript and Presenter's Notes

Title: de novo Sequence Analysis


1
de novo Sequence Analysis
Lab work
de novo cDNA analysis de novo genomic sequence
analysis de novo protein analysis
Confirms/disagrees with in silico predictions
Development of programs
Predictions
Sequence analysis tools
Major portion of slides by Jane Loveland and
Dustin Schones
2
de novo Sequence Analysis
  • Assign function to a sequence
  • Align to annotated sequences
  • Similarity searching and pairwise alignment
  • BLAST cDNA and genomic clones, proteins
  • BLAT genome searching
  • Gene structure
  • PSI-BLAST sensitive protein searching
  • Multiple sequence alignment (proteins)
  • CLUSTALW perform alignment
  • JalView, GeneDoc edit and view alignment
  • Find open reading frames
  • ORF Finder
  • If translated, map protein domains
  • InterProScan

3
sequence alignment
  • sequence analysis ? sequence alignment
  • what
  • why
  • similar sequence
  • infer homology
  • infer function

sequence ? structure ? function
4
pairwise alignments multiple sequence
alignments
5
Global vs. Local
6
BLAST
Basic Local Alignment Search Tool
  • idea find high scoring local alignments between
    query sequence and target database
  • assumption true match alignments very likely to
    contain within them very high scoring matches
  • heuristics theme search quickly for homologous
    regions and then do slow/exact
    alignments

7
BLAST family
8
BLAST family
9
BLAST Steps
  • For each word of length W in the query,
    generate a list of all possible words
    (neighborhood) with a score of at least threshold
    T (determined by using the scoring matrix)

10
Determine the locations of all common words
between the query and the database (word hits).
11
(No Transcript)
12
BLAST Steps
  • use dynamic programming to extend hits until
    the score drops a value of X expensive!! --
    90 of time

13
Evaluates the statistical significance of
extended hits and reports only those above the
determined threshold.
14
(No Transcript)
15
BLAST statistical evaluation
  • for local, ungapped alignments
  • m size of query n size of database
  • E expected of HSPs with scores at least S
  • p prob of finding at least one HSP with S
  • good tutorial at
  • http//www.ncbi.nlm.nih.gov/BLAST/tutorial
    /Altschul-1.html

16
BLAT
  • Blast Like Alignment Tool (BLAT)
  • Good for aligning mRNA, ESTs to genome
  • fast
  • aligns whole mRNA, not just exons
  • handles introns and splice-sites
  • Sequences need to be 95 ID or better
  • Available at
  • UCSC Genome Browser
  • Ensembl

17
BLAT
  • Steps for cDNA alignment
  • 1 break cDNA into n base chunks
  • 2 use index to find regions in genome similar
    to each chunk of cDNA
  • 3 detailed alignment between genome region and
    cDNA chunk
  • 4 dynamic programming - stitch together
    detailed alignments of chunks into alignment of
    whole

18
  • genome cacaattatcacgaccgc (K 8-13 real
    genome)

K-mers cac aat tat cac gac cgc 0
3 6 9 12 15
cDNA aattctcac
3-mers aat att ttc tct ctc tca cac
0 1 2 3 4 5 6
example from Jim Kent
19
PSI-BLAST
Position Specific Iterated-BLAST
  • database searches using position-specific scoring
    matrices more powerful than simply using single
    sequence
  • STEPS
  • collect all DB sequences that align with E-val lt
    T
  • align these to make position-specific scoring
    matrix
  • use scoring matrix to search for new hits
  • iterate

20
PSI-Blast
21
ORF-finder
  • graphical analysis tool which finds all open
    reading frames in a sequence
  • looks for start and stop codons
  • assumes upstream start and downstream stop if ORF
    at least 100 amino acid
  • ORFs can be selected to view as DNA sequence or
    amino acid sequence

22
Clustalw DNA and Protein alignments
Copy and paste sequences Alignment may be viewed
and edited in Jalview
Available at EBI (http//www.ebi.ac.uk)
23
ClustalW Output
24
Standard ClustalW Output
25
JalView Alignment Editor and Viewer
26
GeneDoc Alignment Editor and Viewer
27
  • Integrated documentation resource for protein
    domains, families and sites
  • Integrated view of databases
  • Intuitive interface for text and sequence
    searches

Available at EBI (http//www.ebi.ac.uk)
28
When To Use What When
  • Genomic sequence searches BLAT
  • DNA vs. genome
  • cDNA vs. genome
  • protein vs. genome
  • For cDNA sequences BLAST
  • cDNA vs. nucleotide (nt)
  • cDNA vs. protein (nr)
  • For protein sequences BLAST and PSI-BLAST
  • Protein vs. protein (nr)
  • BLAST with similar species
  • PSI-BLAST high-sensitivity, distant species

Same species
Same or similar species
Write a Comment
User Comments (0)
About PowerShow.com