Title: Understanding human genome Structural evolution through chimpanzee and mouse comparisons
1Understanding human genome (Structural) evolution
through chimpanzee and mouse comparisons
- Todd D. Taylor, Ph.D.
- Genome Annotation and Comparative Analysis Team
- Computational and Experimental Systems Biology
Group - RIKEN Genomic Sciences Center
- taylor_at_gsc.riken.jp
- Bioinformatics and Comparative Genome Analysis
Course - Institut Pasteur Tunis - Tunisia
- April 2, 2007
2RIKEN Genomic Sciences CenterYokohama, Japan
3Key projects
- Human
- Chromosome 21 (Nature, May 2000)
- 17 of 33.5 Mb
- Chromosome 18p (Nature, September 2005)
- 16 Mb
- Chromosome 11q (Nature, March 2006)
- 81 Mb
- 4-5 contribution to the Human Genome Project
- Chimpanzee
- Chromosome 22q (Nature, May 2004)
- 33.5 Mb (syntenic to human chr21)
- Chromosome Y (Nature Genetics, January 2006)
- Development of novel methods for gene and
promoter prediction - Identifying genes missed by other high-throughput
methods - Identification of unique regulatory mechanisms
4Comparative approaches for identification of
functional elements
- Looking for similarities
- Compare with distant species, like mouse
- Regions that are conserved may be important
- Looking for differences
- Compare with close species, like primates
- Regions that are different may be important
- Of course, there are exceptions to every rule!
5Phylogeny of human and its close relatives
5 MYa
Homo
250MYa
350MYa
Hominidae
Hominoidea
Pan
Hominidae
Catarrhini
Gorilla
Anthropoidea
Hominoidea
Eutheria (placentalia)
Primates
Amniota (amniotes)
Pongo
Mammalia
Primates
Gibbons
Old world monkeys
Mammalia
New world monkeys
Prosimians
Lagomorpha
Heterodonty Mammary glands Homoeothermic Hair Plac
entation (in most), amnion, internal
fertilization Sweat and sebaceous
glands Anucleate red blood cells
Rodents
Metatheria
Prototheria
Sauropsida
Reptilia Aves
6Mouse genome mapped on the human genome
- 34 maps to identical sequence in human genome
Hiram Clawson and Kate Rosenbloom (UCSC). 09 June
2006
7Chimpanzee genome mapped on the human genome
- 95 maps to identical sequence in human genome
Hiram Clawson and Kate Rosenbloom (UCSC). 09 June
2006
8Looking for similarities
9Human Chr21 DSCR vs. Mouse Chr16
10Multi-species comparisons
11Potential enhancer elements that are evolutionary
conserved
Nobrega, et al. Science 302, 413 (2003)
12Knocking out conserved sequences
13Effect on Dach1 gene expression pattern
14Looking for differences
15How different are humans arechimps?
16Important differences with humans
- Size
- Intelligence
- Language
- Ageing
- Disease susceptibility
- Cancer
- Schizophrenia
- Autism
- Triplet expansion diseases
- AIDS
- Hepatitis
17 What makes us human?Is humanity written in our
genome?
Newton,2002?4??
18Science 295, 131-134 (2002)
19Separation of chromosomes by dual-laser cell
sorting
20BESs mapped to human genome
21BES identity distribution
22Crude differences between human and chimpanzee
genomes
- Number of simple repetitive sequences
- Insertion of Alu and L1 elements
- Unique sequences
- Local duplications
- Translocations
- Inversions
- Fewer CpG Islands predicted in chimp
23Whole chromosome sequencing strategy
- Compare with small representative human
chromosome (21) - Clone-based sequencing strategy
- Map chimp BAC-end sequences to human chr. 21
- Screen libraries for additional clones to fill
gap regions
3 gaps, over 99 coverage
24Whole chromosome comparison
25Larger structural rearrange-ments
26Sequence alteration events per bp
27Distribution of divergence of the autosomes
(whole chimp genome)
Chimpanzee Sequencing Analysis Consortium.
Nature (205) 43769-87
28Base substitution rate
- Overall 1.44
-
- SINE/Alu 1.81
- LINE/L1 1.38
-
- CpG islands 2.26
- Simple repeats 4.06
29Correlation between alteration events
30Statistics of HSA21q and PTR22q
31Species-specific repeat expansions
32Emergence of human-specific characteristics
Human-specific characteristics have been acquired
during the 5 million years since the divergence
between Pan and Homo.
Orangutan
Gorilla
Time
Pongo (Orangutan)
Gorilla
Pan (Chimpanzee)
Homo (Human)
5?6MYa
Human(?)
Chimpanzee
Phylogeny of Hominidae
33Cladistic inference
Homo ACGTGTTTGAAATATTACTGATTGTAA Pan
ACGAGTTTGAAATATTATTGATTGTAA Gorilla
ACGTGTTTGAATCATTATTGATTGTAA Orangutan
ACGTGTTTAAATTATTATTGGTTGCAA LCA
ACGTGTTTGAAATATTATTGATTGTAA
34Human-specific large insertions
35Species-specific insertion-deletions
Human
Chimpanzee
Gorilla
Orangutan
positive amplification found for both chimp and
human template DNA
36Example 1 Deletion in Human Lineage
Example 2 Insertion in Human Lineage
1 2 3 4 1 2 3 4 1 1 2
1 2 3 4 1 2 3 4 1 1 2
4200
2900
106
117
106
Example 3 Deletion in Chimp Lineage
Example 4 Allelic Deletion in Chimp Lineage
Pt Hs Gg Pp
Pt Hs Gg Pp
1 2 3 4 1 2 3 4 1 1 2
1 2 3 4 1 2 3 4 1 1 2
2400
4200
1200
1300
154
129
37Human chromosome 21 gene catalog
- 284 genes
- 223 known
- 19 novel CDS
- 25 novel transcripts
- 12 putative
- 5 predicted
- 85 pseudogenes
38Gene catalog comparison
- We lacked information for 6 genes located in
sequencing gaps - 6 hsa21 genes are absent from the ptr22 sequence
(H2BFS, 5 KAP genes from the 21q22.1 cluster) - 4 hsa21 genes appear to be pseudogenes in chimp
- 3 ptr22 pseudogenes are absent from the hsa21
sequence - 1 hsa21 pseudogene has a complete ORF in ptr22
39ORF comparison
- 83 of genes have at least one amino acid
replacement - 10 of the potential ptr22 proteins are predicted
to have a different length - Amino acid insertion or deletion
- Different start codon
- Different stop codon
- Other, more complex rearrangement
40Amino acid length differences
41Complex rearrangement TCP10L
42Gene conservation
43Species-specific amino acid replacements
- Human-specific replacements
- KIAA0184
- COL6A2
- HUNK
- AGPAT3
- DSCR3
- PWP2H
- STCH
- SLC5A3
- CHAF1B
- SIM2
- KCNE2
- APP
- C21orf98
- C21orf61
- IFNAR1
- UBASH3A
- TMPRSS3
- Chimp-specific replacements
- BACE2
- TIAM1
- BACH1
- FAM3B
- C21orf33
- ADAMTS1
- C21orf103
- ITGB2
- HLCS
- DNMT3L
- IFNGR2
- PPIA3L
- C21orf59
- MRPL39
- CLDN17
- KRTAP11-1
- CCT8
44Distribution of Ka/Ks ratios
45 Distribution of Ka/Ks ratios
46GO categories with highest divergence rates in
hominids
Chimpanzee Sequencing Analysis Consortium.
Nature (205) 43769-87
47Evolutionary transcriptomics
Correralate phenotype with genotype Using
Affymetrix arrays it could be shown thatthe
amount of transcript/gene varies in a
species-specific manner (Enard et al. 2001). -gt
What DNA sequence differences are responsible for
the observed differences in transcript-levels?
48Multiple probes per gene
Transcription start site (TSS)
3UTR
5UTR
Promoter
- Transcriptional control
- RNA stability
Enhancer
49Probes mapped to human chr21
237 genes annotated for chromosome 21 189
represented on the affymetrix A-E arrays
50Gene expression profiling
- 189 annotated genes represented on the Affymetrix
A-E arrays (Hellmann, Pääbo)
51(No Transcript)
52Primate phylogenetic shadowing?
- Identifying cis-regulatory elements in the human
genome is a major challenge of the post-genomic
era - Promoters and enhancers that regulate gene
expression in normal and diseased cells and
tissues - Inter-species sequence comparisons have emerged
as a major technique for identifying human
regulatory elements - Particularly those to the sequenced mouse,
chicken and fish genomes - A significant fraction of empirically defined
human regulatory modules - Too weakly conserved in other mammalian genomes,
such as the mouse, to distinguish them from
nonfunctional DNA - Completely undetectable in nonmammalian genomes
- Identification of such significantly divergent
functional sequences will require complementary
methods in order to complete the functional
annotation of the human genome - Deep intra-primate sequence comparison is a novel
alternative to the commonly used distant species
comparisons
53(No Transcript)
54Identification of known and novel conserved
sequences
55Evolutionary conservation of six
primate-conserved sequences
56Non-coding sequences with primate-specific
conservation include three regulatory elements
57Nature (2003) 424788-793
58Proportion of human aligned sequence by category
59Relative contribution of different mutational
events
60Conjoined genes
61Conjoined genes a novel gene class
Fused transcript formed by combining the exons of
two or more distinct genes (child genes)
Child gene A
Child gene B
Conjoined Gene A B
Exon
Intron
- Transcript A-B combines at least one exon
(complete or partial overlap) from both Gene A
Gene B - Usually only supported by a few mRNA/EST
sequences, and rarely by a CCDS - Currently, about 32 known cases found by
searching NCBI Entrez (including 8 from chr 11
recently submitted by our group)
62Experimental verification
Chr1 SRP9 EPHX1 fusion (1 EST evidence-DA417873)
Alternate splicing and novel exons observed in
fused mRNA
63Conservation of conjoined genes in other
mammalian species
27 Conjoined genes conserved in Chimpanzee
6.5 Conjoined genes conserved in Mouse
Exons considered were part of conjoined gene
mRNAs
64Acknowledgments
- Chimpanzee Chr 22 Sequencing Consortium
- Chinese National Human Genome Center at Shanghai,
China - KRIBB Genome Research Center, Daejeon, Korea
- National Yang Ming University Genome Research
Center, Taipei, Taiwan - National Institute of Genetics, Mishima, Japan
- RIKEN Genomic Sciences Center, Yokohama, Japan
- GBF, Dept. of Genome Analysis, Braunschweig,
Germany - Institute for Molecular Biotechnology, Jena,
Germany - Max-Planck Institute for Molecular Genetics,
Berlin, Germany
- RIKEN
- Yoshiyuki Sakaki
- Tulika P. Srivastava
- Vineet K. Sharma
- Asao Fujiyama
- Masahira Hattori
- Atsushi Toyoda
- Yoko Kuroki
- Yasushi Totoki
- Hideki Noguchi
- Hidemi Watanabe
- Takehiko Itoh (MRI)