Title: Chip arrays and gene expression data
1Chip arrays and gene expression data
2Motivation
3- With the chip array technology, one can measure
the expression of all genes at once (even all
exons). Can answer questions such as - Which genes are expressed in a muscle cell?
- Which genes are expressed during the first weak
of pregnancy in the mother? In the new baby? - Which genes are expressed in cancer?
44. If one mutates a transcription factor which
genes are not expressed? 5. Which genes are not
expressed in the brain of a retarded baby? 6.
Which genes are expressedwhen one is asleep
versuswhen the same personis awake?
5Techonology
6DNA chip in each spot theres a specific marked
DNA molecule. Upon hybridization with a marked
mRNA molecule (or cDNA one) the intensity of
the hybridization can be quantified by light.
7Affymetrix The base is a wafer ??? ????? ?????
????? ??
A light-sensitive chemical compound that prevents
coupling between the wafer and the first
nucleotide of the DNA probe being created.
8Affymetrix
The blue cap is light sensitive. A mask is
added to some of the cells. When the cells are
illuminated, only where there is light a
reaction with a nucleotide can happen.
9Affymetrix
The nucleotide that is added is also chemically
linked with a new cap (light sensitive).
10Affymetrix
The entire process is called photolithography
11Affymetrix
12Affymetrix
Affymetrix each probe is 25 bp a part of an
exon.
The reader
The chip itself
In one cm2 gt 106 different oligos.
13Affymetrix
Affymetrix each probe is 25 nucleotides. Above
this, a technological problem exists the
synthesis becomes inaccurate. With such short
probes, each mRNA can hybridize to more than one
probe. The solution, each gene is covered by
several probes.
14Affymetrix
Affymetrix one can buy ready-made chips (human
genome, mouse genome), or he can design (print)
his own chip (more expensive).
15Affymetrix
Detection mRNA is isolated from the tissue
(cells, viruses). cDNA is synthesized. The cDNA
is fluorescently labeled. Sometimes, the cDNA is
amplified using PCR. The intensity in each cell
(probe) is measured by the reader.
16Microarray movie
From
17Agilent
Agilent Developed DNA printers in each spot
pico-liters of nucleotides are added. They can
make probes up to 60 mers (Agilent is derived
from Hewlett-Packard).
Standard phosphoramidite chemistry
18Agilent
Hybridization to Agilent probes is more
accurate. If there is hybridization, to a probe,
the gene it represents is probably expressed.
19Agilent
But, it is impossible to know how many probes are
in each cell. So absolute fluorescent intensities
are meaningless.
20Agilent
Solution, in the same experiment, hybridize
samples with two conditions healthy mRNA (in
Red) versus tumor cells (green). The Agilent
reader will give the ratio of the two colors.
21Stanford cDNA chips
In this approach, long cDNA sequences (gt300bp)
are produced in a cell (a clone) and are linked
to each chip cell. Producing long cDNA rather
than synthesizing them a nucleotide at a time is
cheaper! As in the case of Agilent, it is
impossible to control the number of probes in
each cell.
22Stanford cDNA chips
23Analyzing Output
24Output
Each cell is either an absolute number or a
relative one, depending on the technology used.
25Repeats
The repeat can either be the same sample a
different chip or a real biological repeat a
different sample.
26Expression profile
Genes 1 and 3 show the same trend (go both high
under the same conditions). That is they have
the same expression profile.
27Clustering
In general, we want to find all the genes that
share the same expression profile ? suggestive of
a functional linkage. There are clustering
algorithms, which do exactly that.
28Clustering
Clustering of the conditions can suggest two
types of brain tumor (bt)
29Clustering
Bi-clustering both on the conditions and the
genes.
30Applications
31Applications
Think of increasing the glucose concentration of
E.coli and making a chip array in various
concentration. One can potentially discover
allgenes in the glucose pathway. Knocking out a
gene ? discoverall genes that interact with it.
32Applications
Analyzing expression of genes can help reveal the
gene network of a given organism.
33Gene network
34Clinical
Do someone has a brain tumor?
35MammaPrint
Used to assess the risk that a breast tumor will
spread to other parts of the body (metastasis).
It is based on the well-known 70-gene breast
cancer gene signatureIn February, 2007 the FDA
cleared the MammaPrint test for use in the U.S
36Sequence by hybridization
- It was thought that the following procedure could
work for sequencing a genome - Make a chip containing all x mers (e.g., x 25).
- Hybridize a genome to the chip.
- By analyzing all the hybridizations with their
overlaps assemble the genome. - Problem it doesnt work.
37ChIP-on-chip A method for measuring protein-DNA
interaction. Proteins that bind DNA
includes Those responsible for transcription
regulation Transcription factors
(TFs) Replication proteins Histones
38ChIP-on-chip One chip is for Chromatin
ImmunoPrecipitation and the second chip is for
DNA microarrays. The method is used mostly to
detect TF binding sites.
39ChIP-on-chip
40Tiling arrays Here the chip array should include
not only protein coding genes but also control
regions, or simply the entire genome.
41Deep sequencing movie
From http//www.illumina.com/
42Deep sequencing reads
Wurtzel et al. Genome Research (2009)
43Protein-Proteininteraction(PPI)
44- Some facts
- Human genome, 20,000-30,000 genes, 500,000
proteins. At a given time in a cell 10,000
proteins are present. (Proteome). - Estimate of gt80 of proteins interact.
- The network includes hubs.
45Large scale studies of protein-protein
interactions (PPIs) give very noisy data 40-80
of interactions are false negatives (true
interactions that are unidentified). 30-60 of
interactions are false positives (interactions
that are inferred but are not real).
46Method 1 affinity tag purification of complexes
in vivo.
Say we want to know what interact with protein
X. We construct a plasmid with the gene coding
for X which will be used as bait (blue) fused to
a known tag (in white)
47Method 1 affinity tag purification of complexes
in vivo.
In the cell, protein X fused to the bait is
expressed, and interacts with some proteins.
The cells are lysed and the protein complex is
isolated using a solid support linked to a ligand
that can interact with the bait.
48Method 1 affinity tag purification of complexes
in vivo.
Bound proteins are eluted, separated on a gel and
identified using mass spectroscopy (MS). The
method is biased towards proteins of high
abundance.
49Method 2 yeast two hybrid system.
Some transcription factors are composed of two
domains BD which Binds the DNA and AD (in red),
which activate transcription. They need to
interact in order to express the gene.
50yeast two hybrid system.
In order to check if protein A (bait) interacts
with protein B (prey), protein A is expressed
fused to AD, and protein B fused to BD. Only if A
and B interact the reporter gene will be
expressed.
51Protein-protein interactions are fundamental for
functional annotation. If X interacts with Y Y
is known to be related to muscle development,
maybe X is also related to muscle
development. Guilt by association