Department of Computer Science and Engineering - PowerPoint PPT Presentation

1 / 9
About This Presentation
Title:

Department of Computer Science and Engineering

Description:

Homologues have the same ancestor, similar function, or similar structure. ... Below the 25% and 70% limits is called the 'twilight zone' ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 10
Provided by: bridges
Category:

less

Transcript and Presenter's Notes

Title: Department of Computer Science and Engineering


1
Significance of Sequence Alignments
  • Fall 2003
  • Dr. Susan Bridges

2
Homologous sequences?
  • When two proteins of nucleotides are very
    similar, biologists call them homologues.
  • Homologues have the same ancestor, similar
    function, or similar structure.
  • When we get an alignment score, we need to know
    the probability that the alignment between
    unrelated sequences would reach the score between
    two sequences of interest.

3
Orthologs, Paralogs and Homologs
Ancestral organism
X
Y
Speciation
Duplication
B
A
Y
Y
X
X
B
A
X1
X2
Ya
Yb
X1 and X2 are orthologs with same function.
Paralogs Ya and Yb may have different but related
functions.
Homologs
4
Homologues
  • How similar must sequences be in order to be
    considered homologues?
  • More than 25 percent of the amino acids present
    for proteins
  • More than 70 of the nucleotides present for DNA
  • Below the 25 and 70 limits is called the
    twilight zone
  • These estimates only work for at least 100
    nucleotides or amino acids.

5
Distributions
  • First efforts assumed that alignment scores would
    be normally distributed.
  • This is not the case.
  • Local alignment scores were found to follow the
    Gumbel Extreme Value Distribution
  • BLAST alignments use a variation of this
    distribution called the Karlin-Altschul statistic

6
Distributions
7
Values Describing Scores
  • Both the Gumbel Extreme Value Distribution and
    Karlin-Ashtul Distribution use E values and P
    values
  • E-value (Expect value) the average number of
    times such a match would be found
  • P-value (probability) probability of finding an
    alignment under assumptions
  • Important note Alignments that are
    statistically important may not be biologically
    important

8
Evaluating Scores E Scores
  • Goal is to find an alignment that is unlikely by
    chance
  • The lower the E-value the better (unlikely to
    happen by chance).
  • E-values close to 1 indicate that the alignment
    may not be significant because the probability it
    could happen by chance is high.

9
BLAST and E-values
  • BLAST approximates E-values
  • BLAST underestimates E-values (thinks they are
    better than they are).
  • Rules of thumb for BLAST E-values
  • E-value above 10-4 is not necessarily interesting
    (0.0001)
  • To be certain of a homolog, E-value should be
    lower than 0.0001.
Write a Comment
User Comments (0)
About PowerShow.com