Detecting and Interpreting Genetic Homology - PowerPoint PPT Presentation

About This Presentation
Title:

Detecting and Interpreting Genetic Homology

Description:

'The same organ in different animals under a variety of form and ... Orthology. Protein Orthology. Sharing a Common Ancestor: Paralogy. Functional Conservation ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 24
Provided by: dnalearni
Category:

less

Transcript and Presenter's Notes

Title: Detecting and Interpreting Genetic Homology


1
Detecting and InterpretingGenetic Homology
2
Overview
  • Definitions and examples of genetic homology
  • Inferring homology from sequence similarity using
    FASTA
  • Practical homology detection

3
Homology Defined
  • "The same organ in different animals under a
    variety of form and function." Sir Richard Owen,
    Lectures on the Comparative Anatomy and
    Physiology of the Invertebrate Animals, 1843.
  • "The mechanism of homology is heredity." Allan
    Boyden, Homology and Analogy A century after the
    definitions of "homologue" and "analogue" of
    Richard Owen,1943.
  • "Homology is a relation bearing on recency of
    common ancestry." Olivier Rieppel, Homology and
    logical fallacy, 1992.

4
A working definition
  • Structures or organs which share a common
    ancestor.

5
Anatomical Homology
6
Sharing a Common AncestorOrthology
7
Protein Orthology
8
Sharing a Common AncestorParalogy
9
Functional Conservation
  • Biochemical Function What the protein does on a
    biochemical level, almost always conserved
    between homologues.
  • Example Serine/ Threonine Kinase
  • Physiological Function What role the protein
    plays within the organism, only conserved for
    orthologous proteins. Example Glycogen Synthase
    Kinase

10
Biochemical Function
  • F 1 ATP Synthase
  • Main energy conversion motor for all organisms.

11
Sequence Evolution
  • Analysis of sequences have revealed certain
    mutational regularities.
  • Closely-related Transitions/transversions
  • Distantly-related PAM mutation probabilities
  • We can use these mutational regularities to help
    us identify distantly-related sequences that we
    would not recognize by visual examination alone.

12
Recognizing ProteinHomology
  • Relies primarily on understanding random sequence
    similarity
  • Only by knowing what random similarity looks like
    can we tell when two proteins are unusually (or
    significantly) similar,and thus homologous.
  • NOTE "Significant Similarity" is not adefinition
    of homology.

13
(No Transcript)
14
Homology Detection Strategy
  • Compare sequence of interest ("query") to a
    database of known sequences.
  • Tabulate all similarity scores.
  • Fit scores to a known distribution to detect
    scores that are significantly greater than the
    mean.

15
Practical Considerations
  • What database should I search?
  • What kind of sequences should I search with?
  • What E-value is significant?
  • What can I reliably infer about the function of
    my sequence based on homology?

16
Databases
  • Bigger databases have more sequences.
  • Bigger databases are also more redundant, which
    can skew the statistics.
  • Bigger databases are also poorly annotated
    (homology with an "unidentified sequence doesn't
    really tell you much)
  • Bigger databases take lots of time to search.

17
Databases, Part 2
  • Smaller databases (like Swiss-Prot) are often
    better curated and annotated.
  • Smaller databases are much less redundant.
  • Smaller databases can contain phylogenetically
    relevant sequences (all plant or all fungi)
  • Smaller databases are much faster to search.

18
Databases Available
  • GenBank gt10,000,000 nucleotide sequences (11
    billion nucleotides), very redundant, poorly
    annotated.
  • SwissProt 90,000 protein sequences, low
    redundancy, well annotated.
  • ESTs Sizes vary, redundancy varies, main
    advantage is large sampling of putatively
    expressed sequences.

19
Whats a significant E-value?
  • For a single search, a FASTA E-value of 0.003 is
    significant, though typically quite distant.
  • For multiple searches, the E-value cutoff varies
    according to the number of searches.

20
Multiple Database Searches
  • 15,000 EST query sequences
  • A 0.001 E-value cutoff means that you should
    expect one false positive in 1000 searches.
  • Thus with 15,000 searches, we should expect 15
    false positives with a cutoff of 0.001 .
  • To reduce the chances of identifying a false
    positive, set the E-value cutoff lower.
  • For 15,000 searches, an E-value cutoff of 0.00001
    will mean that you should expect 0.15 false
    positives.

21
Inferring PhysiologicalFunction from Homology
22
Make Predictions
  • NXT1 will definitely bind to the nuclear pore
    complex, so is probably involved in nuclear
    transport.
  • NXT1 will not bind Ran-GDP, as NTF2 will.
  • Conformational changes in Ran upon GTP-binding
    may induce structural changes that allow it to
    bind to NXT1.

23
Summary
  • Homologues share a common ancestor.
  • Genetic homology is inferred from significant
    similarity.
  • Biochemical function can be reliably inferred
    from genetic homology physiological function
    cannot.
Write a Comment
User Comments (0)
About PowerShow.com