1 of 23 - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

1 of 23

Description:

1 of 23 – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 24
Provided by: ber788
Category:
Tags: opossum

less

Transcript and Presenter's Notes

Title: 1 of 23


1
Comparative Genomics
2
Overview
  • Orthologues and paralogues
  • Protein families
  • Genome-wide DNA alignments
  • Syntenic blocks

3
Comparative Genomics
  • Tells us what is common and what is unique
    between different species at the genome level
  • Allows us to achieve a greater understanding of
    vertebrate evolution
  • The function of human genes and other regions may
    be revealed by studying their counterparts in
    lower organisms
  • Helps us identify both coding and non-coding
    genes and regulatory elements

4
Species in Ensembl
PLACENTALS
MAMMALS
MONOTREMES
MARSUPIALS
OTHER BIRDS
BIRDS
PALEOGNATHS
REPTILES
PASSERINES
CROCODILES
TURTLES
LIZARDS
AMPHIBIANS
TELEOSTS
FISHES
SHARKS
RAYS
LATIMERIA
BICHIR/POLYPTERUS
LUNGFISHES
AGNATHANS
NON-VERTEBRATES
5
Homologue Relationships
  • Orthologues
  • any gene pairwise relation where the ancestor
    node is a speciation event
  • Paralogues
  • any gene pairwise relation where the ancestor
    node is a duplication event

6
Orthologue and Paralogue Types
7
Orthologue and Paralogue Types
8
Orthologue / Paralogue Prediction Algorithm
  • (1) Load the longest translation of each gene
    from all species used in Ensembl.
  • (2) Run WUBLASTpSmithWaterman of every gene
    against every other (both self and non-self
    species) in a genome-wise manner.
  • (3) Build a graph of gene relations based on Best
    Reciprocal Hits (BRH) and Blast Score Ratio (BSR)
    values.
  • (4) Extract the connected components (single
    linkage clusters), each cluster representing a
    gene family.
  • (5) For each cluster, build a multiple alignment
    based on the protein sequences using MUSCLE.
  • (6) For each aligned cluster, build a
    phylogenetic tree using TreeBeST using the CDS
    back-translation of the protein multiple
    alignment from the original DNA sequences. A
    rooted tree with internal duplication tags is
    obtained at this stage, reconciling it with the
    species tree.
  • (7) From each gene tree, infer gene pairwise
    relations of orthology and paralogy types

9
  • Find the Ensembl F8 (coagulation factor VIII)
    gene of human and go to its GeneView page.
  • Are there any within-species paralogues predicted
    for this gene?
  • Is there an orthologue predicted for this gene in
    mouse?
  • Retrieve an alignment between the human and mouse
    gene on both nucleotide and peptide level.
  • Have a look at the genetree for this gene (hint
    have a look at the side menu when in GeneView).
  • Can you identify the duplication event that gave
    rise to the two human paralogues?

10
Clustering Strategy
  • BLASTP all-versus-all comparison of
  • all Ensembl protein predictions
  • all metazoan (animal) proteins in UniProt
  • Markov clustering
  • For each cluster
  • calculation of multiple sequence alignments with
    ClustalW
  • assignment of a consensus description

11
  • FamilyView

JalView multiple alignments
Consensus annotation
E! family members within human
UniProt family members
Family members in other E! species
12
  • JalView

13
Whole Genome Alignments
  • Functional sequences evolve more slowly than
    non-functional sequences, therefore sequences
    that remain conserved may perform a biological
    function.
  • Comparing genomic sequences from species at
    different evolutionary distances allows us to
    identify
  • Coding genes
  • Non-coding genes
  • Non-coding regulatory sequences

14
Selection of Species for DNA comparisons
15
BLASTZ-net, tBLAT and PECAN
  • BLASTZ-net (comparison on nucleotide level) is
    used for species that are evolutionary close,
    e.g. human - mouse
  • Translated BLAT (comparison on amino acid level)
    is used for evolutionary more distant species,
    e.g. human - zebrafish
  • PECAN is used for multispecies alignments, from
    which conservation scores and constrained
    elements are calculated
  • 7 eutherian mammals
  • 10 amniota vertebrates

16
  • Find the Ensembl BRCA2 (Breast cancer type 2
    susceptibility protein) gene for human and go to
    its ContigView page.
  • Have a look at the conserved sequences with mouse
    and zebrafish. Which parts of the BRCA2 gene show
    the highest conservation? With which species is
    the conservation the highest, mouse or zebrafish?
  • Have a look at the conservation score and
    constrained elements.

17
  • AlignSliceView

human
mouse
dog
rat
18
  • MultiContigView vs. AlignSliceView

19
  • Find the Ensembl A1bg (alpha-1-B glycoprotein)
    gene for mouse.
  • Retrieve the genomic sequence alignment between
    the mouse and rat A1bg gene
  • (hint have a look at the side menu when in
    GeneView).

20
Syntenic Blocks
  • Genome alignments are refined into larger
    syntenic regions
  • Alignments are clustered together when the
    relative distance between them is less than 200
    kb and order and orientation are consistent
  • Any clusters less than 100 kb are discarded

21
  • SyntenyView

Human chromosome
Mouse chromosome
Mouse chromosome
22
  • CytoView

Syntenic blocks
orientation
chromosome
23
Q

A
Q U E S T I O N S A N S W E R S
Write a Comment
User Comments (0)
About PowerShow.com