Title: Structural Genomics
1Structural Genomics
- Entire genomes are being sequenced
- Data bases full of genomic sequences
- Provides the string of nucleotides
- Copy number
- Intergenic regions
2Information in a Genome
Genome (DNA)
Transcriptome (RNA)
Proteome (protein)
3Functional Genomics
- Understand gene function gene interactions
- Transcriptional Regulation
- What genes how much they change
- What tissue
- What stimulus
- Proteins
- difficult to purify
- may require modification for activity
- Monitoring RNA is relatively easy
- Remember proteins do the work
4Northern Blot Strategy
- Harvest tissue and extract RNA
- Electrophoresis yields size separation
- Design label probe of known sequence
- Hybridize complements anneal
- Visualize quantify signal
5Northern Blot Strategy
---GUACCGUAGUCGACU---
6Northern Blot Strategy
7Northern Blot Strategy
Dot Blot or Slot Blot
Northern Blot
Con
Increasing treatment level
8Northern Blot Strategy
Final Product
9Northern Blot StrategyExpanded
- Ideal situation would be to study expression of
all genes simultaneously - Problem
- Probe generation requires prior knowledge of
sequence - Large number of probes to be made
- Solution
- Genome projects Genomic EST/cDNA
fragments - Electronic databases
- High-throughput, assembly line style,
semi-automated processes to make microarray chips
10Microarray/Chip Reverse Northern
- Array Fabrication
- Target Labeling and Hybridization
- Detection and quantitation of signal
11Microarray Fabrication
- ssDNA deposited on a solid surface in a defined
grid Probe - Advantages
- Can put many different DNAs on a single surface
- Dont need prior knowledge of the gene or its
function
12Microarray Fabrication
- Spotting DNA on Glass Slides
- Probe generation
- Amplify cDNA inserts from a cDNA library
- large fragments (full-length or near full-length)
- Design synthetic oligos from electronic database
- oligonucleotides (50-80 mers)
- Deposit probe in defined positions
- Requires automation or semi-automation
13Microarray Fabrication
- Robotic deposits DNA at defined coordinates
- Pins deposit small amounts of liquid on surface
- ul containing 1-10 ng per spot
- 10,000-40,000, 100,000? spot per slide
14(No Transcript)
15Microarray Fabrication
16Microarray Fabrication
- Each spot representing a different sequence,
has a unique physical location - May or may not have knowledge of the sequence
EST189022
M91589 Rat Beta-arrestin1 cds
U73142 Rat Mitogen activated protein kinase cds
17Basic Microarray Experiment
18Assumptions of Gene Expression Studies
- Tight correlation between gene function and
expression pattern - Expression patterns determine cell type and
function - Expressed genes reflect the the environment
and/or internal state of the cell
19Target Labeling Hybridization
- RNA extraction
- mRNA extracted from control and treated
experimental units - Target Labeling
- cDNA synthesis used to incorporate labeled
nucleotides - Hybridization
- Labeled target binds specifically and
quantitatively to its complement (probe) on the
microarray
20 RNA Isolation
- Raw sample contains biochemical contaminants
which need to be removed (protein, DNA, cellular
debris)
mRNA quality has the largest effect on the
success of the experiment
21RNA IsolationQuality Assessment
22Target Labeling
- To allow quantitative measurements, nucleic acid
is labeled with nucleotides that contain a fluor,
biotin or radioactivity
23 Target Labeling
24 Target Labeling
RNA Extraction
25 Target Labeling
26 Target Labeling
TTTT
(dT)24 Primed
27 Target Labeling
TTTT
Reverse Transcription
28 Target Labeling
T T T T
First Strand
29 Target Labeling
T T T T
RNaseH nicks RNA Strand
30Target Labeling
Cells
A A A A
mRNA
T T T T
DNA Polymerase Extends From Nick
31Target Labeling
Cells
A A A A
mRNA
T T T T
A A A A
cDNA
T T T T
DNA Polymerase Extends From Nick
32Target Labeling
Cells
A A A A
mRNA
T T T T
A A A A
cDNA
T T T T
DNA Polymerase Extends From Nick
33Target Labeling
Cells
A A A A
mRNA
T T T T
A A A A
cDNA
T T T T
DNA Polymerase Extends From Nick
34Target Labeling
Cells
A A A A
mRNA
T T T T
A A A A
cDNA
T T T T
DNA Polymerase Extends From Nick
35Target Labeling
Cells
A A A A
mRNA
T T T T
A A A A
cDNA
T T T T
Second Strand
36Target Labeling
- Prepare mRNA from different sources
- Incorporation of different fluorescent labels
- Cy3 (green) for one sample
- Cy5 (red) used for other sample
- Mix samples
- Hybridize
37Hybridization
- The labeled target is selectively bound to
complementary probes - Note know the location of each probe on the
array
x 12,000
38Competitive Hybridization
39Competitive Hybridization Reference Sample
40Scanning / Visualization
- Signal intensity for each probe is quantitatively
measured at each of two wave lengths - Signal intensity represents the quantity of the
transcript from each original sample - Thus, the ratio of the signal intensities
represents the relative change in gene expression
between samples
41Data Analysis
- Large data sets require computing power
- Clustering genes with common expression patterns
is a common way to show microarray results
42Data Analysis
43 MicroarrayData Analysis
- Data Analysis
- Lists
- Differentially Expressed Genes
- Above a fold change (2X, 3X, 5X)
- ANOVA
- Cluster
- Groups similarly responding genes or arrays
- Hierarchical
- K-Means
- PCA
- Functional Annotation
44Data AnalysisLists
45Data AnalysisANOVA Table
46Data Analysis Agglomerative Hierarchical
Clustering
- Bottom-up clustering method
- Clusters have sub-clusters, which have
sub-clusters, etc. - Process
- Each signal value is a separate cluster
- Evaluate all pair-wise distances between
- Construct a distance matrix using the distance
values - Look for the pair of clusters with the shortest
distance - Remove the pair from the matrix and merge them
- Evaluate all distances from this new cluster to
all other clusters - Repeat until the distance matrix is reduced to a
single element - Visualize tree
- Â
- Results
- It can produce an ordering of the objects, which
may be informative - Smaller clusters are generated, which may be
helpful for discovery
47Data Analysis Agglomerative Hierarchical
Clustering
Points in n dimensional space Create difference
measure between each pair
48Data Analysis Agglomerative Hierarchical
Clustering
Points in n dimensional space Create difference
measure between each pair Find 2 most similar
and consider 1
49Data Analysis Agglomerative Hierarchical
Clustering
Points in n dimensional space Create difference
measure between each pair Find 2 most similar
and consider 1 Find most similar to group,
consider 1
50Data Analysis Agglomerative Hierarchical
Clustering
Points in n dimensional space Create difference
measure between each pair Find 2 most similar
and consider 1 Find most similar to group,
consider 1 Repeat process till only 1
point Create tree
51Data Analysis Agglomerative Hierarchical
Clustering
Data set contains only significant genes Green
(-), Red (), Black (0) Black lines indicate
similarity Short lines imply greater
similarity Rows are individual genes Columns
are different chips
52Data Analysis Agglomerative Hierarchical
Clustering
Two different cluster analyses Small branches of
larger clusters
53Data Analysis Agglomerative Hierarchical
Clustering
Small branch of large cluster analysis Lines
show similarity between treatments 10 KPa most
similar to each other 2 most similar to
10KPa 101 KPa somewhat dissimilar to each
other Gene Id list on right
54Data Analysis K-Means Clustering
- K-Means Clustering
- creates a specific number of non-hierarchical
clusters - non-deterministic and iterativeÂ
- Properties
- always K clusters.
- always at least one item in each cluster.
- clusters are non-hierarchical
- every member closer to its cluster than any other
cluster - Â
55Data Analysis K-Means Clustering
- Process
- The dataset is partitioned into K clusters
randomly with roughly the same number of data
points - Calculate cluster centroid
- For each data point
- Calculate the distance from data point to each
cluster centroid - If data point is closest to its own cluster,
leave it where it is. If not, move it into the
closest cluster - Recalculate centroid
- Repeat second step until a complete pass through
of all the data points results in no data point
moving from one cluster to another - initial partition can greatly affect the final
clusters that result, in terms of inter-cluster
and intra-cluster distances and cohesion - Try different number of groupsÂ
56Data Analysis K-Means Clustering
Approximately 250 genes Significant treatment
effect Partitioned into 10 Clusters
Identify different expression patterns
Isolate individual groups
57Data Analysis K-Means Clustering
Isolate individual groups Find mean of each
column Plot mean
Graphs display different expression patterns
58Data Analysis Principal Component Analysis
- Reduces the dimensionality of the data
- Axis of greatest effect identified
- Relation between this line and points determined
- Repeat process
59Data Analysis Principal Component AnalysisTime
Zero
60Data Analysis Principal Component Analysis
61Data Analysis Principal Component Analysis
62Data Analysis Principal Component Analysis
63Data Analysis Principal Component Analysis
64Data Analysis Principal Component Analysis
65Data Analysis Principal Component Analysis
66Data Analysis Principal Component Analysis
67Data Analysis Principal Component Analysis
68Data Analysis Functional Annotation
- Annotation
-
- Gene description
- thioredoxin reductase 1
- Gene function (Ontology)
- Signal transducer
- Pathway information
- Proteasome degraduation
69Data Analysis Functional Annotation
Genes within a cluster or influenced by a
treatment can be annotated
70Data Analysis Functional Annotation
General
Specific
71Data Analysis Functional Annotation