Persistent Systems Pvt. Ltd. http://www.persistent.co.in - PowerPoint PPT Presentation

About This Presentation
Title:

Persistent Systems Pvt. Ltd. http://www.persistent.co.in

Description:

http://www.persistent.co.in. Gene Expression Analysis Using Microarrays. Dr Mushtaq Ahmed ... http://www.persistent.co.in. 2. Data Storage and Exchange ... – PowerPoint PPT presentation

Number of Views:1339
Avg rating:3.0/5.0
Slides: 32
Provided by: drmusht
Category:
Tags: http | ltd | persistent | pvt | systems | www

less

Transcript and Presenter's Notes

Title: Persistent Systems Pvt. Ltd. http://www.persistent.co.in


1
Gene Expression Analysis Using Microarrays
  • Dr Mushtaq Ahmed
  • Technology Incubation Division
  • Persistent Systems Private Ltd
  • Pune

2
Topics
  1. Introduction
  2. Data Storage and Exchange Standards
  3. Analysis (Clustering)
  4. Conclusion and References

3
1. Introduction
  • Structure Activity Relationship
  • Structural vs. Functional Genomics
  • Principals of Microarray Experiment
  • Applications

4
Structure Activity Relationship
GENES (finite)
EXPERIMENTAL SETUP
Functional Genomics OR Confirmation Work
Structural Genomics OR Prediction Work
FUNCTIONS (infinite)
PROTEINS
5
SourceYale Bioinformatics
6
Principles of a Microarray ExperimentHybridizati
on
  • Environment ? Functions ? Proteins ? mRNA ? cDNA
  • Different incubations of cells results in up or
    down regulation of different sets of genes.
  • Microarray provides a medium for matching known
    and unknown DNA samples based on base-pairing
    rules and automating the process of identifying
    the unknowns
  • Set of expressed genes (at mRNA stage) isolated
    and identified using hybridization on a
    microarray chip

7
HTS Using Hybridization
Microarray Chip
Target cDNA (variables to be detected)
Probe oligos/cDNA (gene templates)

Samples
Hybridization
Analysis of outcome
Pathways
Functional Annotation
Targets/Leads
Disease Class.
Physiological states
8
Timeline for drug discovery
Discovery (5 yrs) 5000 Gene expression
study Pre-Clinical (1 yr) 50 Clinical (6
yrs) 5 Review (2 yrs) 1 Marketed
9
2. Data Storage and Exchange Standards
  • Raw and Processed Data
  • Conceptual View of Database
  • Example of ArrayExpress
  • Issues
  • Standardization for Exchange

10
Raw data images
  • Red (Cy5) dot
  • overexpressed or up-regulated
  • Green (Cy3) dot
  • underexpressed or down-regulated
  • Yellow dot
  • equally expressed
  • Intensity - absolute level
  • red/green - ratio of expression
  • 2 - 2x overexpressed
  • 0.5 - 2x underexpressed
  • log2( red/green ) - log ratio
  • 1 2x overexpressed
  • -1 2x underexpressed

cDNA plotted microarray
11
Microarray Expression Value Representation
expression value types
composite spots
primary measurements
derived values
primary spots
composite images e.g., green/red ratios
primary images
Source MGED
12
Gene expression database a conceptual view
Samples
Gene expression matrix
Genes
Gene expression levels
13
(No Transcript)
14
DAG Representation of Biomaterials
Source MGED
15
ArrayExpress (MGED) Design
Source MGED
16
ArrayExpress (MGED) Architecture
application server
Web server
MAML data
ArrayExpress
data warehouse
data submission Curation database
image server?
Curation pipeline
Source MGED
17
Issues in Storage
  • Size of Data
  • Experiments
  • 100 000 genes, 320 cell types
  • 2000 compounds, 3 time points, 2 concentrations,
    2 replicates
  • Data
  • 8 x 1011 data-points
  • 1 x 1015 1 petaB of data
  • Others
  • Raw data are images
  • lack of standard measurement units for gene
    expression
  • lack of standards for sample annotation

18
Standardization
  • MIAME (Minimum Info About a Microarray Expt)
  • Experimental design, Array design
  • Samples, Hybridisations
  • Measurements, Controls
  • OMG-LSR-DFT
  • Life Sciences Research, Domain Task Force Gene
    Expression RFP
  • EBI (MAML), Rosetta (GEML), NetGenics
    submitters
  • Proposed MAGEML (MAML GEML)
  • Annotations data data stored as a set of
    external 2D matrices
  • Data format independent of particular scanner or
    image analysis software
  • Sample and treatment can be represented as a
    Directed Acyclic Graphs
  • Concept of composite images and composite spots

19
3. Data Analysis (Clustering)
  • Normalization
  • Hierarchical Clustering
  • Divisive Clustering
  • Other Methods
  • Visual Tools

20
Normalization
  • Assumption
  • Average expression ratio 1
  • Amount of mRNA from both the sample is same
  • Total Intensity
  • Calculate a factor to rescale intensities of all
    te genes so that
  • total Cy3 total Cy5
  • Regression Techniques
  • Adjust the intensities so that
  • Slope of scatter plot of Cy3 vs Cy5 1
  • Using ratio statistics
  • Based on housekeeping genes expression a
    probability density ratio is developed which is
    used for normalization

21
(No Transcript)
22
Clustering
  • Hierarchical
  • Single, Complete and Average Linkage
  • Divisive
  • K-means
  • Self Organizing Maps (SOM)
  • Others
  • Principal Component Analysis (PCA)
  • Supervised Methods

23
Hierarchical clustering
  • Distance metrics or Similarity Measures
  • Euclidian, Pearson, distance of slopes etc..
  • Cost functions
  • Single Linkage
  • Min distance of any two members (one from each of
    the two clusters)
  • Complete Linkage
  • Max distance of any two members (one from each of
    the two clusters)
  • Average Linkage
  • UPGMA
  • WPGMA
  • Within Groups
  • Wards Method
  • Join which produces smallest possible error in
    some of squared errors

24
(No Transcript)
25
Divisive clustering
  • K-means
  • k random (or specified) points used to create
    clusters, average vectors for the clusters then
    used iteratively
  • Knowledge of probable no of clusters (k) needed
  • Used in combination with PCA and hierarchical
    clustering
  • Self Organizing maps
  • User defined geometric configurations as
    partitions
  • Random vectors generated for each partition and
    TRAINED till convergence (ANN based)
  • Visualization Methods
  • Helps in cluster visualization
  • Scatter Plot, Web plot, histogram
  • May help in clustering itself
  • E.g., SuperGrouper utility of MaxdView

26
(No Transcript)
27
Other Clustering Methods
  • PCA (Principal Component Analysis)
  • Also called SVD (Singular Value Decomposition)
  • Reduces dimensionality of gene expression space
  • Finds best view that helps separate data into
    groups
  • Supervised Methods
  • SVM (Support Vector Machine)
  • Previous knowledge of which genes expected to
    cluster is used for training
  • Binary classifier uses feature space and
    kernel function to define a optimal
    hyperplane
  • Also used for classification of samples-
    expression fingerprinting for disease
    classification

28
(No Transcript)
29
4. Conclusion and References
  • Microarrays makes HTS with hybridization possible
  • No single standard unit for measuring expression
    levels
  • Handling and interpretation not yet exact
  • Assumptions Elements in cluster must share some
    commonality
  • Classification depends on method used for
    clustering, normalization, distance function
  • No correct way of classification, biological
    understanding is the ultimate guide
  • Provides extension to existing knowledge (e.g.,
    classifying a novel gene into a known pathway)

30
Software
  • Databases
  • Public repositories
  • GEO (NCBI), GeneX (NCGR), ArrayExpress (EBI)
  • In-house databases
  • Stanford, MIT, University of Pennsylvania,
  • Organism specific databases
  • Mouse Genome Informatics Database
  • Proprietary databases
  • Gene Logic, NCI, Synergy (NetGenics), Genomics
    Knowledge Platform (Incyte)
  • Analysis Tools
  • Public Domain
  • maxdView (University of Manchester)
  • CyberT , RCuster interfaces of GeneX
  • Proprietary
  • Spotfire, Xpression NTI (Informaxinc)

31
References
  • Microarray Gene Expression Database Group
  • http//www.mged.org
  • National Center for Genomic Research
  • http//genex.ncgr.org
  • University of Manchester , Bioinformatics Group
  • http//bioinf.man.ac.uk/microarray/resources.html
  • Nature Reviews Genetics
  • http//www.nature.com/nrg/
Write a Comment
User Comments (0)
About PowerShow.com