Title: Genomics
1Genomics
- Class Molecular Biology, GIBMS 2004
- Source Molecular Biology by Robert F. Weaver
- 2nd Edition, McGraw Hill Publishing, 2002
2Subjects To Be Covered
- Sequencing of Genomes
- The human genome project
- Vectors of large scale genome projects
- The clone-by-clone strategy
- Shotgun sequencing
- Progress in sequencing human genome
- Genomics and Its Applications
- Techniques in functional genomics
- Positional cloning
- Applications of functional genomics
- Other applications
- Bioinformatics and proteomics
3Sequencing of Genomes
- 1977 Fred Sanger fX 174 bacteriophage 5,375 nt
- Concept of ORF as coding region
- Amino acid sequence of phage proteins
- Overlapping genes Figure 24-1 only in viruses
- 1995 Craig Venter Hamilton Smith
- Haemophilus influenzae (1,830,137 nt) (1st free
living) - Mycoplasma genitalium (smallest free-living,
580,000 nt 470 genes) -
- 1996 Saccharomyces cerevisiae (1st eukaryote)
12,068,000 nt - 1997 Escherichia coli 4,639,221 nt Genetically
more important - Many firsts followed
- 1999 Human chromosome 22 53,000,000 nt
- 2000 Drosophila melanogaster 180,000,000 nt
- 2001 Human Working draft 3,200,000,000 nt
4(No Transcript)
5Subjects To Be Covered
- Sequencing of Genomes
- The human genome project
- Vectors of large scale genome projects
- The clone-by-clone strategy
- Shotgun sequencing
- Progress in sequencing human genome
- Genomics and Its Applications
- Techniques in functional genomics
- Positional cloning
- Applications of functional genomics
- Other applications
- Bioinformatics and proteomics
6Sequencing of GenomesHuman Genome Project
- International project
- Controversial proposed in 1990
- Sizes and costs (500,000 pages just to print,
time to read them?) - Social implications ? More so
- Approaches
- Systematic and conservative Francis Collins
expected done by 2005 - 1998 Craig Venter Celera (VitaGenomics
Taiwan) by 2000 using shotgun sequencing ?
needs powerful computer - Rough drafts of Human Genome
- Announced June 26, 2000
- 3,200,000,000 nt 85-99 complete
7Subjects To Be Covered
- Sequencing of Genomes
- The human genome project
- Vectors of large scale genome projects
- The clone-by-clone strategy
- Shotgun sequencing
- Progress in sequencing human genome
- Genomics and Its Applications
- Techniques in functional genomics
- Positional cloning
- Applications of functional genomics
- Other applications
- Bioinformatics and proteomics
8Sequencing of GenomesVectors for Large-Scale
Genome Project
- Vectors needed Yeast bacterial artificial
chromosomes - Cloning capacity cosmid 50Kb
- Yeast artificial chromosomes (YAC) Fig. 24-2
- Large capacity self replicating
- 1,000,000 nt capacity
- Inefficient Isolation Unstable (linear)
Cryptic - Bacterial artificial chromosome (BAC) Fig.
24-3 - Based on F and F plasmids that conjugate
between bacterial cells - Mobilize the whole host chromosome after
insertion between cells - 300,000 nt capacity
9(No Transcript)
10Constructed in 19Constructed in 1992 MCS
Multiple Cloning Site for cloning CmR for
selection
11Subjects To Be Covered
- Sequencing of Genomes
- The human genome project
- Vectors of large scale genome projects
- The clone-by-clone strategy
- Shotgun sequencing
- Progress in sequencing human genome
- Genomics and Its Applications
- Techniques in functional genomics
- Positional cloning
- Applications of functional genomics
- Other applications
- Bioinformatics and proteomics
12Sequencing of GenomesThe Clone-by-Clone Strategy
- Mapping (genetically physically) the whole
genome - Use overlapping clones ? Clone-by-Clone
sequencing strategy - Looking for flag posts
- Tools for mapping of genes
- Restriction Fragment Length Polymorphisms
(RFLPs) Fig. 24-4 - Use to determine the position/location of a gene
or a stretch of DNA - How to look for RFLPs?
- Variable Number of Tandem Repeats (VNTRs)
- Repeated sequences in tandem derived from
minisatellites - Sequence Tagged Sites (STSs) Fig. 24-5
- Short (60-1000 bp) sequences detectable by PCR
- Microsatellites repeats of very short sequences
- Highly polymorphic, thus genetic mapping is
possible - Useful in physical mapping or locating specific
sequence in the genome
132 individuals are polymorphic with respect to a
HindIII site (in red)
14Primers for PCR were designed from sequences of
small areas of DNA that were already known
15Sequencing of GenomesThe Clone-by-Clone Strategy
- Tools for gene mapping landmarks that relate to
gene positions - Construction of physical map with sequencing data
- Mapping with STSs Fig. 24-6
- Very laborious due to the sizes of the BACs
- Radiation Hybrid Mapping
- Ionizing radiation to create chromosome
fragments - Form hybrid cells with hamster cells
- Examine individually cloned cells
- For mapping human chromosomes
- A set of landmarks or signposts are needed and
thus used to relate the positions of genes - 1998 STS-based maps constructed that included
30,000 genes
16After a number of positive BACs, one can begin
mapping by screening these BACs for STSs in
sequential manner
17Subjects To Be Covered
- Sequencing of Genomes
- The human genome project
- Vectors of large scale genome projects
- The clone-by-clone strategy
- Shotgun sequencing
- Progress in sequencing human genome
- Genomics and Its Applications
- Techniques in functional genomics
- Positional cloning
- Applications of functional genomics
- Other applications
- Bioinformatics and proteomics
18Sequencing of GenomesShotgun Sequencing
- The shotgun sequencing strategy Fig. 24-7
-
- Directly to sequencing without mapping
- 1996 Craig Venter, Hamilton Smith, Leroy Hood
- 500 nt/end x 300,000 BAC clones 300 million
nts 10 total human genome - 500 nt sequenced are dispersed around every
5,000 kb - Acted as sequence-tagged connector (STC) for
each BAC clone - Each of the 300,000 clones connects via STC to
30 other clones -
- Fingerprinting of each clones
- BAC walking
-
19lt1gt BAC library lt2gt Plasmid library lt3gt
Fingerprinting lt4gt BAC walking Powerful
computer
20Subjects To Be Covered
- Sequencing of Genomes
- The human genome project
- Vectors of large scale genome projects
- The clone-by-clone strategy
- Shotgun sequencing
- Progress in sequencing human genome
- Genomics and Its Applications
- Techniques in functional genomics
- Positional cloning
- Applications of functional genomics
- Other applications
- Bioinformatics and proteomics
21Sequencing of GenomesProgress in Sequencing the
Human Genome
- Progress Working draft 90 complete with 1
error - Final draft as complete as possible with less
than 0.01 error (1 in 10,000) - Functionally complete
- 33,464,000 of the 34,491,000 nt (97.02) were
sequenced - Error rate at 1 per 50,000 nt Primarily the
22q - 1999 Final draft of human chromosome 22
- 2000 Final draft of human chromosome 21
- 2001 Working draft of whole human chromosomes
- What do we learned from chromosome 22?
- lt1gt still contains 11 gaps of unclonable and
unsequenceable DNA - lt2gt 800 genes (679 known, related pseudogenes,
100 predicted, 225 unknown) - lt3gt exons account for 3 of total length
- lt4gt recombination rates vary along the
chromosome Fig. 24-8 - lt5gt local and long-range duplications
- lt6gt large regions of 22q are conserved in
mouse Fig. 24-9
22(No Transcript)
23(No Transcript)
24Sequencing of GenomesProgress in Sequencing the
Human Genome
- 1999 Final draft of human chromosome 22
- 2000 Final draft of human chromosome 21
- Involved in Downs Syndrome (trisomy 21)
- Primarily from 21q, with minors from 21p
- A total of 33,500,000 nt were sequenced (99.7
of total length) - Gaps (3) also present that no sequences are
available - Relatively low gene density 225 identified
genes (127 known, 98 predicted) - Total number of genes estimated in human
- 40,000 genes (based on chromosomes 21 22)
- 30,000 genes (working draft of whole
chromosomes) - Large regions of conservation between human and
mouse chromosomes - Identity of gene(s) responsible for Downs
Syndrome still unknown - 2001 Working draft of whole human chromosomes
25Sequencing of GenomesProgress in Sequencing the
Human Genome
- 1999 Final draft of human chromosome 22
- 2000 Final draft of human chromosome 21
- 2001 Working draft of whole human chromosomes
- 2.9 billion (Venter et al) to 3.2 billion
(Collins et al) nt - Gaps and inaccuracies, but nevertheless,
extremely informative - 25,00040,000 genes (another 12,000 possible
genes) - Only 2x more than fruit flies
- Organisms complexity not proportional to gene
numbers - Expression of human genome is more complex
- Alternative splicing? 40 of genes
- Post-translational modifications?
- Source of human genes importation (from
bacteria?) - About 50 human genome came from transposon
action - all known transposons in human are inactive now
26Subjects To Be Covered
- Sequencing of Genomes
- The human genome project
- Vectors of large scale genome projects
- The clone-by-clone strategy
- Shotgun sequencing
- Progress in sequencing human genome
- Genomics and Its Applications
- Techniques in functional genomics
- Positional cloning
- Applications of functional genomics
- Other applications
- Bioinformatics and proteomics
27Genomics and Its Applications
- Structure genomics
- sequencing data
- What can we use the genomic DNA sequences for?
- Applications
- Study the expression of large number of genes
- Functional Genomics
- Finding/Identify the functions of genes,
especially in diseases - Positional Cloning
- Others
28Subjects To Be Covered
- Sequencing of Genomes
- The human genome project
- Vectors of large scale genome projects
- The clone-by-clone strategy
- Shotgun sequencing
- Progress in sequencing human genome
- Genomics and Its Applications
- Techniques in functional genomics
- Positional cloning
- Applications of functional genomics
- Other applications
- Bioinformatics and proteomics
29Genomics and Its ApplicationsTechniques in
Functional Genomics
- Blotting analysis in the past/Miniaturized the
blotting analysis - in order to study the pattern of expression of
genes - DNA microarray
- 0.25-1 nL (billionth of a liter) per
spot Fig. 24-10 - 5,808 DNA spots/microscope slide DNA
microchips - Synthesize oligonucleotides directly on glass
chips Fig. 24-11 - Oligonucleotide array
- How long must a nucleotide be to uniquely
identify a human gene - in a mixture of all other human genes?
- Hybridization analysis on DNA chip Fig.
24-12 - 300,000 oligonucleotides in a 0.5 X 0.5 glass
area - Expressing of every and all yeast gene at the
same time has been determined - Serial analysis of gene expression (SAGE) Fig.
24-13 - Short cDNAs (tags) are synthesized from all
mRNAs in a cell - Tags are linked together in clones, sequenced to
determine the nature (expression) of them
301 X 3 glass microscopic slide with 5,808 tiny
spots of DNA
31Circle reactive groups Red photosensitive
blocking agent Blue masking agent
32Serum-starved green (3) Serum-stimulated red
(2, 4)
33(No Transcript)
34Subjects To Be Covered
- Sequencing of Genomes
- The human genome project
- Vectors of large scale genome projects
- The clone-by-clone strategy
- Shotgun sequencing
- Progress in sequencing human genome
- Genomics and Its Applications
- Techniques in functional genomics
- Positional cloning
- Applications of functional genomics
- Other applications
- Bioinformatics and proteomics
35Genomics and Its ApplicationsPositional Cloning
- Before genomic era
- Positional cloning is used
- ? to look for a gene responsible for a disease
without knowing the function of its protein
product - ? to locate a gene responsible for a disease on
the chromosome - Strategies of positional cloning
- Obtain markers closely linked to the disease
- Scan regions between markers and possible genes
- Search for exons with exon traps technique
- Locate CpG islands that tend to associate
with genes - Other tools
- Human Genome Project made the scanning much
easier
36Genomics and Its ApplicationsPositional Cloning
- exon traps or exon amplification
technique Fig. 24-14 - Look for ORFs?
- More efficiently with exon traps technique
- Vector contains chimeric gene under SV40
promoter control - Look for exons in amplified products after
cloning of cDNA - All exons or ORFs contain splice sites and thus
survive propagation in cells - Locate CpG islands
- Active human genes tend to associate with
unmethylated CpG - Inactive human genes are mostly methylated CpG
- HpaII recognizes only unmethylated CCGG
- HpaII will only cut active genes
37(No Transcript)
38Subjects To Be Covered
- Sequencing of Genomes
- The human genome project
- Vectors of large scale genome projects
- The clone-by-clone strategy
- Shotgun sequencing
- Progress in sequencing human genome
- Genomics and Its Applications
- Techniques in functional genomics
- Positional cloning
- Applications of functional genomics
- Other applications
- Bioinformatics and proteomics
39Genomics and Its ApplicationsApplications of
Functional Genomics
- Huntingtons Disease (HD)
- Progressive nerve disorderemotional
disturbances adventitious movements - Single dominant gene with linked RFLP
identified Fig. 24-15 - Two (2) polymorphic sites were present in
affected families - Four (4) haplotypes or haploid genotypes were
possible Fig. 24-16 - Which haplotype is associated with the
Hungtingtons Disease? Fig. 24-17 - Answer Haplotype C (those with both HindIII
sites) is strongly - associated with the disease
- However, this haplotype association varies with
families - RFLP can be used as a genetic marker, just like
a gene - HD gene was mapped to a region on chromosome 4
with repeats of CAG - Normal individuals 11-34 CAG repeats (98 has
less than 24 repeats) - Affected patients gt42 CAG repeats
- Cystic fibrosis (CF)
404 haplotypes (A, B, C, D) result from the
combinations of the presence or absence of the 2
HindIII sites
41Haplotype Site 1 Site 2 FragmentsA Absent Presen
t 17.5 3.7 1.2B Absent Absent 17.5
4.9C Present Present 15.0 3.7
1.2D Present Absent 15.0 4.9
42lt1gt Most individuals with the C haplotype
already have the disease lt2gt No disease sufferers
lack the C haplotype
43Genomics and Its ApplicationsApplications of
Functional Genomics
- Huntingtons Disease (HD)
- HD gene was located to a region near the end
of human chromosome 4 - Identification of HD gene
- Number of CAG repeats of a putative gene
- Normal ranged from 11 to 34 98 had lt24
- Diseased all have gt42, and up to 100
- Perspective studies using animal (mouse) model
- Applications
- Genetic screening of potential patients
- Gene therapy?
- Normal function of HD gene
(huntingtin) - How the expansion of CAG repeats causes
disease - extra glutamines in huntingtin protein?
- Cystic fibrosis (CF)
44Genomics and Its ApplicationsApplications of
Functional Genomics
- Huntingtons Disease (HD)
- Cystic fibrosis (CF)
- Most common lethal genetic disease affects
Caucasian people - Autosomal-recessive mutation carrier rate is
1/20 - Affected secretory epithelia of 1/1,600 live
births - Accumulation of mucus ? infections
- Linkage to known markers was established on 7q31
- Positional cloning chromosome walking were
followed Fig. 24-18 - Unclonable region
- Chromosomal jumping (over unclonable
regions) Fig. 24-19 - CF gene spans 250Kb of DNA and includes at
least 24 exons
45(No Transcript)
46(No Transcript)
47Genomics and Its ApplicationsApplications of
Functional Genomics
- Huntingtons Disease (HD)
- Cystic fibrosis (CF)
- Identification authentication of CF gene
- lt1gt expressed in all tissues affected by CF
- lt2gt gene product contains membrane-spanning
domain - regulates channel of ions across the membrane
- CFTR Cystic fibrosis transmembrane conductance
regulator - lt3gt most CF patients have a 3-bp deletion in
CFTR gene - a phenylalanine is missing
-
- Applications Transgenic animal model
- Applications Gene therapy CFTR protein as drug
48Subjects To Be Covered
- Sequencing of Genomes
- The human genome project
- Vectors of large scale genome projects
- The clone-by-clone strategy
- Shotgun sequencing
- Progress in sequencing human genome
- Genomics and Its Applications
- Techniques in functional genomics
- Positional cloning
- Applications of functional genomics
- Other applications
- Bioinformatics and proteomics
49Genomics and Its ApplicationsOther Applications
- Post-genomic era
- Single Nucleotide Polymorphisms (SNPs)
- SNPs could link to human diseases
- Associations with
- polygenic traits, such as intelligence
- responses to drugs ? pharmacogenomics
- Vast majority of SNPs locate outside genes
- Similarities and differences between RFLPs and
SNPs in human -
- Testing of functions of each every genes in
microorganisms - intentional and targeted mutation
- Protein-protein interactions and activities of
gene products - yeast two-hybrid system
50Subjects To Be Covered
- Sequencing of Genomes
- The human genome project
- Vectors of large scale genome projects
- The clone-by-clone strategy
- Shotgun sequencing
- Progress in sequencing human genome
- Genomics and Its Applications
- Techniques in functional genomics
- Positional cloning
- Applications of functional genomics
- Other applications
- Bioinformatics and proteomics
51Genomics and Its ApplicationsBioinformatics
Proteomics
- To access, analyze and interpret sequences in
databases - Bioinformatics
- Combines biology computerized data processing
knowledge - Building and manipulating biological database
- Proteomics
- Gene ? genome, genomics
- Transcripts ? transcriptome, transcriptomics
- Protein ? proteome, proteomics
- Separation of proteins 2-D P.A.G.E
- Analysis of proteins mass spectrometry Fig.
24-20 - Protein (antibody) microchips
52Matrix-assisted laser desorption-ionization
time-of-flight (MALDI-TOF) mass spectrometry