Title: Identification of genes by function interaction
1Identification of genes by function (interaction)
2Yeast two-hybrid screen for protein interactors
3Genomics
4Gene expression
- Genome maping
- Genome sequencing
- Genome annotations
Structural genomics
Nucleus
DNA (Genome)
pre-mRNA
Cytoplasm
mRNA
Functional genomics
mRNA (Transcriptome)
Proteins (Proteome)
Metabolites (Metabolome)
5History of genomes sequencing
- 1977 bacteriophage øX174 (5386bp, 11 genes)
- 1981 mitochondrial genome (16,568bp 13 prots 2
rRNAs 22 tRNAs - 1986 chloroplast genome (120,000-200,000bp)
- 1992 Saccharomyces chromosome III (315kb 182
ORFs) - 1995 Haemophilus influenzae (1.8Mb
- 1996 Saccharomyces whole genome (12.1Mb over 600
people 100 laboratories) - 1997 E. coli (4.6Mb 4200 proteins)
- 1998 Caenorhabditis elegans (97 Mb 19,000 genu)
- 2000 Arabidopsis thaliana (115Mb, 25-30,000 genu)
- 2001 mouse (1 year!)
- 2001 Homo sapiens (2 projekty)
- 2005 Pan, rice
- 2006 Populus
Technological improvements
6DNA sequencing principle(Sangers method)
Polymeration from primer in the presence of low
concentration of terminator (dideoxy) ddNTP
primer
Random termination on all positions with
occurance of the nucleotide
7 A T C G
- Original arrangement
- sequence
- - RA labelled primer
- 4 separated reactions
- - with individual ddNTP
- - ddNTPdNTP (cca 120 (100))
- - PAGE separation
C
T
G
G
A
T
C
T
A
G
C
Separation by size
8Automated sequencing with fluorescence-labelled
ddNTP
- Every ddNTP labelled with different fluorescent
dye all together in one reaction - Separation by size in capillary fluorescence
detection
9Genom sequencing is more than sequencing of DNA
- 1 sequencing reaction 300 800 bp
- Typical genom hunderts of millions to billions bp
- How to manage?
10Strategies of genome sequencing
- Classical strategy (Map-Based Assembly)
- - minimal quantity of DNA sequencing
- sorting of big DNA fragments, successive
reading - (human genome sequencing original strategy)
- - scaffold for genome sequence assemble
- - time consuming
-
- Whole genome shotgun (WGS)
- random (7-9x redundant) sequencing
- sorting of sequence data (Haemophilus)
- - problems with repetitive DNA
- Combination hierarchical shotgun, chromosome
shotgun
11Hierarchical shotgun sequencing
Whole-genome shotgun sequencing
Production of over-lapping clones (e.g. BACs,
YACs) and construction of physical map
Shearing of DNA and sequencing of subclones
Assembly
12(No Transcript)
13Hierarchical shotgun sequencing
- First step library of big DNA inserts
- ( genome fragments)
- phage (l) vectors 30 kb
- cosmids 50 kb
- BACs (bacterial artificial chromosomes)
- 100-300 kb
- YACs (yeast artificial chromosomes)
- cca 0.5-1Mb
14Number of 100 kb BACs to cover the whole genome
(with 99.995 probability)
(Cullins, 2004)
15Physical BAC map of genome
- Arrangement (position, orientation) of individual
BAC in the genome - Fundamental for classical sequencing
- Very usefull for assembly of shotgun sequences
- How to make the map from BACs with unknown
sequence?
16Map construction - BAC fingerprinting
Sequencing of DNA ends
Restriction sites
- 10-20x more bp in BACs than in the genome for
map construction (Arabidopsis 20 000, rice -
70 000)
17MTP, minimum tiling paththe lowest possible
number of BACs to cover the sequence
18Walking method of genome sequencing
- Redundant BAC library (tens of thousands)
sequencing of BAC ends - Sequencing of few (hunderts) seed clones
- Walking by BAC end sequence
seeds
19Teoretically one seed per chromosome is enough,
but many steps
- Better to use more seed clones
20Filling of gaps shorter clones are better
- - optimal libraries with different insert sizes
(2, 10, a 50 kbp)
21Whole Genome Shotgun (WGS) and variations
genome
Sequencing of clone ends (known distance between)
plasmids (2 10 Kbp)
cosmids (40 Kbp)
500 bp
500 bp
22Genome (chromosome, BAC...) assembly
- Looking for overlaps in sequences
- Assembly to contigs
- Assebly to supercontigs using the information of
sequence pairs (ends distance) - 4. Complete consensus sequence
..ACGATTACAATAGGTT..
23Repetitive sequences and contig assembly
Repetition are problem, if they are longer than
sequencing run
24Use of physical map for genome assembly(STS
sequence tagged sites short sequences with
known position on chromosoms)
Supecontigs with scaffold (BAC-end sequences with
known distance)
25What to do with the genome sequence? To annotate!
- Searching for genes
- Automatic prediction of coding seq.
- Prediction of introns/exons
- Prediction according to related seq.
- Confirmation by cDNA and EST
- Prediction of function
- from experimentally characterized homologues
26Comparison of genetic a physical map
physical (bp)
genetic (cM)
Arabidopsis chromosome IV
27Large genomes alternative strategies of
sequencing- isolation of individual
chromosomes (wheat)- shotgun sequencing of
non-methylated DNA(maize)- sequencing of ESTs
(potato)
28- Expressed Sequence Tags (ESTs)
- short sequenced regions of cDNA (300-600 nt)
- usually gene fragments (primarilly originate from
mRNA) - highly redundant, but also incomplete!
- problems - no regulatory sequences (promotors,
introns,...) - only transcripts of certain genes
29Expressed Sequence Tags (ESTs)
Preparation of EST library
- - mRNA
- - RT with oligoT primer ? cDNA
- cleavage of RNA from heteroduplex
- RNAseH
- - 2nd strand cDNA synthesis
- - cleavage with restriction endonuclease
- - adaptor ligation cloning
sequencing
30Assembly of EST contigs - Unigenes
31Group Organism EST Genomic Asterids
Lycopersicon (tomato) Nicotiana
(tobacco) Solanum (potato) Rosids
Arabidopsis Brassica (oilseed
rape) Gossypium (cotton) Glycine
(soybean) Lotus Medicago
Populus (poplar) Monocots Hordeum
(barley) Oryza (rice)
Sorghum Triticum (wheat)
Zea (maize) Conifers Pinus (pine)
32New technology for DNA sequencing
20 Mbp for 4 hours Bacterial genome for 4 days
33454 technology (nanodrop sequencing)
34454 technology
35454 technology
36454 technology
37http//mammoth.psu.edu/rico.d/index.html