CZ5225: Modeling and Simulation in Biology Lecture 2: Gene Expression Profiles and Microarray Data Analysis Prof. Chen Yu Zong Tel: 6874-6877 Email: yzchen@cz3.nus.edu.sg http://xin.cz3.nus.edu.sg Room 07-24, level 7, SOC1, NUS - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

CZ5225: Modeling and Simulation in Biology Lecture 2: Gene Expression Profiles and Microarray Data Analysis Prof. Chen Yu Zong Tel: 6874-6877 Email: yzchen@cz3.nus.edu.sg http://xin.cz3.nus.edu.sg Room 07-24, level 7, SOC1, NUS

Description:

CZ5225: Modeling and Simulation in Biology. Lecture 2: Gene Expression Profiles ... Davidson University: http://www.bio.davidson.edu/courses/genomics/chip/chip.html ... – PowerPoint PPT presentation

Number of Views:187
Avg rating:3.0/5.0
Slides: 51
Provided by: dbs7
Category:

less

Transcript and Presenter's Notes

Title: CZ5225: Modeling and Simulation in Biology Lecture 2: Gene Expression Profiles and Microarray Data Analysis Prof. Chen Yu Zong Tel: 6874-6877 Email: yzchen@cz3.nus.edu.sg http://xin.cz3.nus.edu.sg Room 07-24, level 7, SOC1, NUS


1
CZ5225 Modeling and Simulation in Biology
Lecture 2 Gene Expression Profiles and
Microarray Data AnalysisProf. Chen Yu
ZongTel 6874-6877Email yzchen_at_cz3.nus.edu.sg
http//xin.cz3.nus.edu.sgRoom 07-24, level 7,
SOC1, NUS
2
Biology and Cells
  • All living organisms consist of cells.
  • Humans have trillions of cells. Yeast - one
    cell.
  • Cells are of many different types (blood, skin,
    nerve), but all arose from a single cell (the
    fertilized egg)
  • Each cell contains a complete copy of the genome
    (the program for making the organism), encoded in
    DNA.

3
DNA
  • DNA molecules are long double-stranded chains 4
    types of bases are attached to the backbone
    adenine (A), guanine (G), cytosine (C), and
    thymine (T). A pairs with T, C with G.
  • A gene is a segment of DNA that specifies how to
    make a protein.
  • Human DNA has about 25-35K genes Rice about
    50-60K but shorter genes.

4
Exons and Introns
  • exons are coding DNA (translated into a protein),
    which are only about 2 of human genome
  • introns are non-coding DNA, which provide
    structural integrity and regulatory (control)
    functions
  • exons can be thought of program data, while
    introns provide the program logic
  • Humans have much more control structure than rice

5
Gene Expression
  • Cells are different because of differential gene
    expression.
  • About 40 of human genes are expressed at one
    time.
  • Gene is expressed by transcribing DNA into
    single-stranded mRNA
  • mRNA is later translated into a protein
  • Microarrays measure the level of mRNA expression

6
Molecular Biology Overview
Nucleus
Cell
Chromosome
Protein
Gene (DNA)
Gene (mRNA), single strand
cDNA
7
Gene Expression
  • Genes control cell behavior by controlling which
    proteins are made by a cell
  • House keeping genes vs. cell/tissue specific
    genes
  • Regulation
  • Transcriptional (promoters and enhancers)
  • Post Transcriptional (RNA splicing, stability,
    localization -small non coding RNAs)

8
Gene Expression
  • Regulation
  • Translational (3UTR repressors, poly A tail)
  • Post Transcriptional (RNA splicing, stability,
    localization -small non coding RNAs)
  • Post Translational (Protein modification
    carbohydrates, lipids, phosphorylation,
    hydroxylation, methlylation, precursor protein)

cDNA
9
Gene Expression Measurement
  • mRNA expression represents dynamic aspects of
    cell
  • mRNA expression can be measured with latest
    technology
  • mRNA is isolated and labeled with fluorescent
    protein
  • mRNA is hybridized to the target level of
    hybridization corresponds to light emission which
    is measured with a laser

10
Traditional Methods
  • Northern Blotting
  • Single RNA isolated
  • Probed with labeled cDNA
  • RT-PCR
  • Primers amplify specific cDNA transcripts

11
Microarray Technology
  • Microarray
  • New Technology (first paper 1995)
  • Allows study of thousands of genes at same time
  • Glass slide of DNA molecules
  • Molecule string of bases (25 bp 500 bp)
  • uniquely identifies gene or unit to be studied

12
Gene Expression Microarrays
  • The main types of gene expression microarrays
  • Short oligonucleotide arrays (Affymetrix)
  • cDNA or spotted arrays (Brown/Botstein).
  • Long oligonucleotide arrays (Agilent Inkjet)
  • Fiber-optic arrays
  • ...

13
Fabrications of Microarrays
  • Size of a microscope slide

Images http//www.affymetrix.com/
14
Differing Conditions
  • Ultimate Goal
  • Understand expression level of genes under
    different conditions
  • Helps to
  • Determine genes involved in a disease
  • Pathways to a disease
  • Used as a screening tool

15
Gene Conditions
  • Cell types (brain vs. liver)
  • Developmental (fetal vs. adult)
  • Response to stimulus
  • Gene activity (wild vs. mutant)
  • Disease states (healthy vs. diseased)

16
Expressed Genes
  • Genes under a given condition
  • mRNA extracted from cells
  • mRNA labeled
  • Labeled mRNA is mRNA present in a given condition
  • Labeled mRNA will hybridize (base pair) with
    corresponding sequence on slide

17
Two Different Types of Microarrays
  • Custom spotted arrays (up to 20,000 sequences)
  • cDNA
  • Oligonucleotide
  • High-density (up to 100,000 sequences) synthetic
    oligonucleotide arrays
  • Affymetrix (25 bases)
  • SHOW AFFYMETRIX LAYOUT

18
Custom Arrays
  • Mostly cDNA arrays
  • 2-dye (2-channel)
  • RNA from two sources (cDNA created)
  • Source 1 labeled with red dye
  • Source 2 labeled with green dye

19
Two Channel Microarrays
  • Microarrays measure gene expression
  • Two different samples
  • Control (green label)
  • Sample (red label)
  • Both are washed over the microarray
  • Hybridization occurs
  • Each spot is one of 4 colors

20
Microarray Technology
21
Microarray Image Analysis
  • Microarrays detect gene interactions 4 colors
  • Green high control
  • Red High sample
  • Yellow Equal
  • Black None
  • Problem is to quantify image signals

22
Single Color Microarrays
  • Prefabricated
  • Affymetrix (25mers)
  • Custom
  • cDNA (500 bases or so)
  • Spotted oligos (70-80 bases)

23
Microarray Animations
  • Davidson University
  • http//www.bio.davidson.edu/courses/genomics/chip/
    chip.html
  • Imagecyte
  • http//www.imagecyte.com/array2.html

24
Basic idea of Microarray
  • Construction
  • Place array of probes on microchip
  • Probe (for example) is oligonucleotide 25 bases
    long that characterizes gene or genome
  • Each probe has many, many clones
  • Chip is about 2cm by 2cm
  • Application principle
  • Put (liquid) sample containing genes on
    microarray and allow probe and gene sequences to
    hybridize and wash away the rest
  • Analyze hybridization pattern

25
Microarray analysis
Operation Principle Samples are tagged with
flourescent material to show pattern of
sample-probe interaction (hybridization) Micro
array may have 60K probe
26
(No Transcript)
27
Gene Expression Data
  • Gene expression data on p genes for n samples

mRNA samples
sample1 sample2 sample3 sample4 sample5 1
0.46 0.30 0.80 1.51 0.90 ... 2 -0.10 0.49
0.24 0.06 0.46 ... 3 0.15 0.74 0.04 0.10
0.20 ... 4 -0.45 -1.03 -0.79 -0.56 -0.32 ... 5 -0.
06 1.06 1.35 1.09 -1.09 ...
Genes
Gene expression level of gene i in mRNA sample j
Log (Red intensity / Green intensity)

Log(Avg. PM - Avg. MM)
28
Some possible applications
  • Sample from specific organ to show which genes
    are expressed
  • Compare samples from healthy and sick host to
    find gene-disease connection
  • Probes are sets of human pathogens for disease
    detection

29
Huge amount of data from single microarray
  • If just two color, then amount of data on array
    with N probes is 2N
  • Cannot analyze pixel by pixel
  • Analyze by pattern cluster analysis

30
Major Data Mining Techniques
  • Link Analysis
  • Associations Discovery
  • Sequential Pattern Discovery
  • Similar Time Series Discovery
  • Predictive Modeling
  • Classification
  • Clustering

31
Cluster Analysis Grouping Similarly Expressed
Genes, Cell Samples, or Both
  • Strengthens signal when averages are taken within
    clusters of genes (Eisen)
  • Useful (essential ?) when seeking new subclasses
    of cells, tumours, etc.
  • Leads to readily interpreted figures

32
Some clustering methods and software
  • PartitioningK-Means, K-Medoids, PAM, CLARA
  • HierarchicalCluster, HAC?BIRCH?CURE?ROCK
  • Density-based CAST, DBSCAN?OPTICS?CLIQUE
  • Grid-basedSTING?CLIQUE?WaveCluster
  • Model-basedSOM (self-organized
    map)?COBWEB?CLASSIT?AutoClass
  • Two-way Clustering
  • Block clustering

33
Assessment of various methods
  • Algorithmic Approaches to Clustering Gene
    Expression Data, Ron Shamir School of Computer
    Science, Tel-Aviv University Tel-Aviv
  • http//citeseer.nj.nec.com/shamir01algorithmic.htm
    l
  • Conclusion hierarchical clustering exceptional

34
Partitioning

35
Density-based clustering
36
Hierarchical (used most often)

37
Hierarchical Clustering grouping similarly
expressed genes
Gene Expression Profile Analysis
Sample
.
B
C
A
gene
0.4 0.9 0 0.5 .. .. 0.8
0.2 0.8 0.3 0.2 .. .. 0.7
0.6 0.2 0 0.7 .. .. 0.3

1 2 3 4 ..
.. 1000
38
After Clustering
Gene Expression Profile Analysis
sample
.
B
C
A
gene
.. 0 0.4 0.5 .. 0.9 0.8
.. 0.3 0.2 0.2 .. 0.8 0.7
.. 0 0.6 0.7 .. 0.2 0.3

.. 3 1 4 ..
2 1000
39
randomized row column both

data clustered
Eisen et al. Proc. Natl. Acad. Sci. USA 95 (1998)
time
40
Types of Similarity Measurements
  • Distance measurements
  • Correlation coefficients
  • Association coefficients
  • Probabilistic similarity coefficients

41
Correlation Coefficients
  • The most popular correlation coefficient is
    Pearson correlation coefficient (1892)
  • correlation between XX1, X2, , Xn and YY1,
    Y2, , Yn
  • where

sXY is the similarity between X Y
sXY
42
Use of Similarity for Tree Construction
  • Normalize similarity so that 1
  • Then have nxn similarity matrix S whose diagonal
    elements are 1
  • Define distance matrix by (for example)
  • D 1 S
  • Diagonal elements of D are 0
  • Now use distance matrix to built tree (using some
    tree-building software recall lecture on
    Phylogeny)

sXX
43
A dendrogram (tree) for clustered genes
E.g. p5
  • Let p number of genes.
  • 1. Calculate within class correlation.
  • 2. Perform hierarchical clustering which will
    produce (2p-1) clusters of genes.
  • 3. Average within clusters of genes.
  • 4 Perform testing on averages of clusters of
    genes as if they were single genes.

Cluster 6(1,2)
Cluster 7(1,2,3)
Cluster 8(4,5)
Cluster 9 (1,2,3,4,5)
1
2
3
4
5
44
A real case
Nature Feb, 2000 Paper by Allzadeh. A et
al Distinct types of diffuse large B-cell
lymphoma identified by gene expression
profiling
45
Validation Techniques Huberts G Statistics
  • XX(i, j) and YY(i, j) are two n n matrix
  • X(i, j) similarity of gene i and gene j
  • Huberts G statistic represents the point serial
    correlation
  • where M n (n - 1) / 2
  • A higher value of G represents the better
    clustering quality.

if genes i and j are in same cluster, otherwise
46
Discovering sub-groups
47
Gene Expression is Time-Dependent
Time Course Data
48
Sample of time course of clustered genes
49
Limitations
  • Cluster analyses
  • Usually outside the normal framework of
    statistical inference
  • Less appropriate when only a few genes are likely
    to change
  • Needs lots of experiments
  • Single gene tests
  • May be too noisy in general to show much
  • May not reveal coordinated effects of positively
    correlated genes.
  • Hard to relate to pathways

50
Useful Links
  • Affymetrix www.affymetrix.com
  • Michael Eisen Lab at LBL (hierarchical clustering
    software Cluster and Tree View (Windows))
    rana.lbl.gov/
  • Review of Currently Available Microarray
    Softwarewww.the-scientist.com/yr2001/apr/profile1
    _010430.html
  • ArrayExpress at the EBI http//www.ebi.ac.uk/array
    express/
  • Stanford MicroArray Database http//genome-www5.st
    anford.edu/
  • Yale Microarray Database http//info.med.yale.edu/
    microarray/
  • Microarray DB www.biologie.ens.fr/en/genetiqu/puce
    s/bddeng.html
Write a Comment
User Comments (0)
About PowerShow.com