Title: Why genomics?
1Why genomics?
- Genomics represents a complete change in the way
we are able to think about the life sciences. - Genomics enables rapid and efficient discovery of
important genes related to commodity quality and
improvement. - Genomics approaches provide the ability to look
at complex traits and pathways.
2What is genomics?
Genomic approaches include
- Structural
- DNA sequence (genomic, cDNA)
- Molecular mapping (AFLP, microsatellite, RFLP,
etc.) - Genotyping and fingerprinting
- Functional
- Gene expression analysis (RNA, proteins)
- Gene function analysis (knockouts, mutations,
biochemical assays) - Gene interactions
- Bioinformatics
- Compilation and analysis of collected data
3What is genomics?
- Genomic science is the industrialization of
molecular biology to address complex biological
questions. - It is the integration of biology, engineering,
and statistics to solve the sequence of a complex
genome and then mine the sequence data to obtain
biological insights. - Although DNA sequence is central to genomics, it
is simply the starting point for large-scale
genome analysis.
4The Power of Genomics
Organism
Genome Size
Genes
Year
FX174 5,400 bp 10 1977Tobacco mosaic virus
6,300 bpp 4 1982 Smallpox virus 185,000
bp 200 1993Escherichia coli 4,600,000
bp 4,390 1997Saccharomyces cerevisiae 12,100,000
bp 6,000 1996Caenorhabditis elegans 100,000,000
bp 20,000 1998Homo sapiens 3,000,000,000
bp 23,000 2001 Arabidopsis thaliana 125,000,000
bp 25,000 2000Oryza sativa (Rice) 466,000,000
bp 32-50,000 2002 Nicotiana tabacum
4,500,000,000 bp 36,263 Triticum
aestivum 16,000,000,000 bp 40-80,000
5Developing World Wants More Protein
6Therapeutic Proteins
or
7What improvements should be targeted in crop
improvement research?
8Improvements Needed
- Even with modern breeding technologies, and
agrochemicals, significant yield loss still
occurs. - Average yields are merely 21.6 of record
yields. - What causes the 78.4 loss in yield?
J.S.Boyer Plant Productivity and Environment,
Science, Vol. 218, October 29, 1982 pp. 443-448.
9Improvements Needed
J.S.Boyer Plant Productivity and Environment,
Science, Vol. 218, October 29, 1982 pp. 443-448.
10Improvements Needed
11Biotech Sales
All dollar figures are for 1999
12Where will we find genes that confer hardiness?
13Weeds and Primitive Crops Have Advantageous Traits
- Ancient farmers selected for major, visibly
desirable traits through hand selection over
thousands of years (domestication) - Modern breeding improved these traits in high
input environments
Hardy wild tomatoes
- Primitive crops and weeds retain invisible
traits that confer hardiness under
disadvantageous conditions
14How to Improve Crops
- Genomics allows the identification of the genes
and gene networks responsible for hardiness
- paving the way for the reintroduction of
hardiness into crops or other target species.
15Why sequence plant genomes?
- Basis for the worlds food supply, which fails to
keep up with demand. - Basic model species for understanding general
biology. - Complete genome sequences will enable us to
improve and enhance desirable traits in
cultivated plants and to limit expression of
undesirable traits.
16Detailed questions that may be approached through
a genomics approach include
- Metabolism/secondary product biosynthetic
pathways - Stress response
- including pests, pathogens, abiotic and
water/nutritional - Growth/development
- including protein/oil content, flowering,
maturity
17What are the advantages of obtaining a complete
inventory of plant coding sequences?
- Gene discovery/novel sequences
- Promoters/control of gene expression
- Microarray analysis/expression profiling
- Biochemical pathways
- Intellectual property
18Plant Genomes Can Be Larger Than The Human Genome
Relative Genome Sizes
19A small Portion of the Genome Comprises Genes
Plant Genome Composition Junk vs. Genes
20Sequence ESTs
21ESTs Taking Advantage of the Cell to Sequence
the Genes
Central Dogma DNA ? RNA ? Protein
EST Expressed Sequence Tag
22The Problem ESTs Miss Rarely Expressed Genes
23How did ESTs do?
Organism Genes ESTs EST matches
- C. elegans 19,099
- A. thaliana 25,498
- H. sapiens 31,778
- 109,000 40
- gt113,000 60
- gt3,000,000 60
-
An incomplete and confusing Picture
24Plant Genome Sequencing Project
Mapping
Chromosome
Fingerprint analysis identifies BAC, PAC clones.
BAC
Library Core
1.5 Kb M13 insert 3 Kb plasmid insert
9x coverage random reads with both dye-labeled
primer and terminator.
Production
Reads called using Phred and Asp. Assembed with
Phrap.
3
1
2
4
Prefinishing
25Reads called using Phred and Asp. Assembed with
Phrap.
3
1
2
4
2
3
4
Mapped contigs using PCR array and RP sequencing.
Prefinishing
1
Sequence edited and gaps closed using primer
walking and dye terminator sequencing.
1
Finishing
Quality check by PCOP programs. Assembly check
using digest data. Gene homology search using
BLAST programs.
Clone Analysis
26A whole-genome clone-based map
- Eases selection of clones for sequencing
- Critical to accurate sequence assembly and
alignment - Allows identification of repeat regions
(non-coding) - BAC ends, ESTs and fragment analysis based
- A critical component of the methyl filtration
strategy
27Create a BAC Library
- Bacterial Artificial Chromosome
- Key reagent for any major sequencing project
- A typical BAC contains 150,000 bp of DNA
- BAC library replicated and spotted onto filters
28A BAC Library
150 kb
150 kb
100 kb
100 kb
50 kb
50 kb
7.4 kb vector
7.4 kb vector
NotI Digested
29Physical Mapping
genome
- Get a set of large clones (BAC, 150 Kbp)
- Sequence a minimum tiling subset
30Mapping clones to a genome
Step 1 BAC clones are assigned to a
chromosomal position by anchoring to ESTs and
genetic markers.
31Mapping clones to a genome
A
B
C
D
E
F
G
clone
Step 2 Clones are cut with restriction
enzymes, and contigs assembled by identifying
similarities in the fragment patterns, also known
as fingerprinting.
32BAC Fingerprinting
33Mapping clones to a genome
Step 3 Minimally overlapping clones are
selected to create a tiling path for sequencing.
The sequence from the chosen set of clones
will represent the genomic segment to which they
were mapped.
34Identifying Gene Rich BACs
Gene rich BAC clones are identified by
hybridizing with ESTs or methyl clones. These
and the surrounding BACs are sequenced.
35What are the advantages of obtaining a complete
inventory ofplant coding sequence?
- Gene discovery/novel sequences
- Promoters/control of gene expression
- Microarray analysis/expression profiling
- Biochemical pathways
- Intellectual property
36Detailed questions that may be approached through
a genomics approach include
- Metabolism/secondary product biosynthetic
pathways - Stress response
- including pests, pathogens, abiotic and
water/nutritional - Growth/development
- including suckering, flowering, maturity
37Representation of the Arabidopsis chromosomes
38Duplication in the Arabidopsis genome
39Arabidopsis contains 41 described gene families
14-3-3 family Kinesins ABC superfamily Lipid
metabolism ABC transporters Major intrinsic
protein AAAP family Miscellaneous Antiporters
MYB Aquaporins Myosins Calcineurin-like B
calcium sensors NADPH P450 reductases CHO
esterase Nodulin-like CBL-int. S-T
Pkases Org. solute co-transporters CW
biosynthesis Phospholipase D Chor./Mit. Poly
saccharide lyase Cyt P450 Pollen coat
proteome Cyt. B5 Primary pumps
(ATPases) Cytoskeleton Receptor
kinase-like Euk. init. Factors SNARE
interacting prot. Expansins Other
SNARES Glycoside hyhydrolase Syntaxins Glycosylt
ransferase Trehalose biosynthesis Hsfs WRKY
transcription factors Inorg. solute
co-trans. Xyloglucan fucosyltransferase Ion
channels
40Functional analysis of the Arabidopsis genome
41(No Transcript)
42The tomato and potato genomes are very similar
- Potato and tomato are highly syntenic
- Same chromosome number 12
- Diploid genome size 900 Mb
- Colinear chromosomes, only 5 inversions
- Share gt99 of their genes
- Can produce viable, though sterile interspecific
F1s
43Tomato-pepper co-linearity is dispersed
441-2 of plant genomes are composed of R gene
clusters
Arabidopsis 200 genes Rice 1000
genes
R gene for RKN