Title: Study of Gene Expression: Statistics, Biology, and Microarrays
1(No Transcript)
2Study of Gene ExpressionStatistics, Biology,
and Microarrays
- Ker-Chau Li
- Statistics Department
- UCLA
- kcli_at_stat.ucla.edu
3PART I. Cellular Biology
- Macromolecules DNA, mRNA, protein
4Why Biology?
5Human Genome Project
Begun in 1990, the U.S. Human Genome Project is a
13-year effort coordinated by the U.S. Department
of Energy and the National Institutes of Health.
The project originally was planned to last 15
years, but effective resource and technological
advances have accelerated the expected completion
date to 2003. Project goals are to  identify
all the approximate 30,000 genes in human DNA,
determine the sequences of the 3 billion chemical
base pairs that make up human DNA, store this
information in databases, improve tools for
data analysis, transfer related technologies
to the private sector, and address the
ethical, legal, and social issues (ELSI) that may
arise from the project. Â Recent Milestones
June 2000 completion of a working draft of the
entire human genome February 2001 analyses of
the working draft are published
Human Genome Program, U.S. Department of Energy,
Genomics and Its Impact on Medicine and Society
A 2001 Primer, 2001
6Future Challenges What We Still Dont Know
Gene number, exact locations, and functions
Gene regulation DNA sequence organization
Chromosomal structure and organization
Noncoding DNA types, amount, distribution,
information content, and functions
Coordination of gene expression, protein
synthesis, and post-translational events
Interaction of proteins in complex molecular
machines Predicted vs experimentally determined
gene function Evolutionary conservation among
organisms Protein conservation (structure and
function) Proteomes (total protein content and
function) in organisms Correlation of SNPs
(single-base DNA variations among individuals)
with health and disease Disease-susceptibility
prediction based on gene sequence variation
Genes involved in complex traits and multigene
diseases Complex systems biology including
microbial consortia useful for environmental
restoration Developmental genetics, genomics
Human Genome Program, U.S. Department of Energy,
Genomics and Its Impact on Medicine and Society
A 2001 Primer, 2001
7Medicine and the New Genomics
- Gene Testing
- Gene Therapy
- Pharmacogenomics
Anticipated Benefits
- improved diagnosis of disease
- earlier detection of genetic predispositions to
disease - rational drug design
- gene therapy and control systems for drugs
- personalized, custom drugs
Human Genome Program, U.S. Department of Energy,
Genomics and Its Impact on Medicine and Society
A 2001 Primer, 2001
8Anticipated Benefits
Molecular Medicine improved diagnosis of
disease earlier detection of genetic
predispositions to disease rational drug
design gene therapy and control systems for
drugs pharmacogenomics "custom drugs" Microbial
Genomics rapid detection and treatment of
pathogens (disease-causing microbes) in
medicine new energy sources (biofuels)
environmental monitoring to detect pollutants
protection from biological and chemical warfare
safe, efficient toxic waste cleanup
Human Genome Program, U.S. Department of Energy,
Genomics and Its Impact on Medicine and Society
A 2001 Primer, 2001
9Anticipated Benefits
Agriculture, Livestock Breeding, and
Bioprocessing disease-, insect-, and
drought-resistant crops healthier, more
productive, disease-resistant farm animals more
nutritious produce biopesticides edible
vaccines incorporated into food products new
environmental cleanup uses for plants like
tobacco
Human Genome Program, U.S. Department of Energy,
Genomics and Its Impact on Medicine and Society
A 2001 Primer, 2001
10(No Transcript)
11Human Genome Program, U.S. Department of Energy,
Genomics and Its Impact on Medicine and Society
A 2001 Primer, 2001
12What is a gene ?
13(No Transcript)
14SNP and Genetic Disease
15(No Transcript)
16(No Transcript)
17(No Transcript)
18 Mitochondrial ATP Synthase E. coli ATP
Synthase These images depicting models of ATP
Synthase subunit structure were provided by John
Walker. Some equivalent subunits from different
organisms have different names.
19(No Transcript)
20PART II. Microarray
- Genome-wide expression profiling
21Differential Gene expressiontissues, organs
22(No Transcript)
23Next Step in Genomics
Transcriptomics involves large-scale analysis
of messenger RNAs (molecules that are transcribed
from active genes) to follow when, where, and
under what conditions genes are expressed. Â
Proteomicsthe study of protein expression and
functioncan bring researchers closer than gene
expression studies to whats actually happening
in the cell. Â Structural genomics initiatives
are being launched worldwide to generate the 3-D
structures of one or more proteins from each
protein family, thus offering clues to function
and biological targets for drug design. Â
Knockout studies are one experimental method for
understanding the function of DNA sequences and
the proteins they encode. Researchers inactivate
genes in living organisms and monitor any changes
that could reveal the function of specific
genes. Â Comparative genomicsanalyzing DNA
sequence patterns of humans and well-studied
model organisms side-by-sidehas become one of
the most powerful strategies for identifying
human genes and interpreting their function.
Human Genome Program, U.S. Department of Energy,
Genomics and Its Impact on Medicine and Society
A 2001 Primer, 2001
24Microarray
25MicroArray
- Allows measuring the mRNA level of thousands of
genes in one experiment -- system level response - The data generation can be fully automated by
robots - Common experimental themes
- Time Course
- Mutation/Knockout Response
26Reverse-transcription
Color cy3, cy5 green, red
27Exploring the Metabolic and Genetic Control
ofGene Expression on a Genomic ScaleJoseph L.
DeRisi, Vishwanath R. Iyer, Patrick O. Brown
28(No Transcript)
29PART III. Statistics
- Low-level analysis
- Comparative expression
- Feature extraction
- Classification,clustering
- Pearson correlation
- Liquid association
30Image analysis
- Convert an image into a number representing the
ratio of the levels of expression between red and
green channels - Color bias
- Spatial, tip, spot effects
- Background noises
- cDNA, oligonucleotide arrays,
31Genome-wide expression profileA basic structure
- cond1 cond2 .. condp
- Gene1 x11 x12 .. x1p
- Gene2 x21 x22 .. x2p
- ...
- ...
- Genen xn1 xn2 .. xnp
32Cond1, cond2, , condp denote various
environmental conditions, time points, cell
types, etc. under which mRNA samples are
takenNote numerous cells are involved Data
quality issues 1. chip (manufacturer)
2. mRNA sample (user)It
is important to have a homogeneous sampleso that
cellular signals can be amplified- Yeast Cell
Cycle data ideally all cells are engaged in the
same activities- synchronization
33Example 1
- Comparative expression
- Normal versus cancer cells
- ALL versus AML
34E.Landers group at MIT
- Cancer classification (leukemia)
- ALL AML (arising from lymphoid or myeloid
precursors) - Require different treatments
- Traditional methods nuclear morphology
- Enzyme-based histochemical analysis(1960)
- Antibodies (1970)
- Genome wide expression comparision
35ALL (acute lymphoblastic leukemia) AML(acute
myeloid leukemia)
36Gene selection
- For each gene (row) compute a score defined by
- sample mean of X - sample mean of Y
- divided by
- standard deviation of X standard deviation
of Y - XALL, YAML
- Genes (rows) with highest scores are selected.
- Works ????
- 34 new leukemia samples
- 29 are predicated with 100 accuracy 5 weak
predication cases