Introduction to Microarray Analysis - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Introduction to Microarray Analysis

Description:

Introduction to Microarray Analysis. Uma Chandran PhD, MSIS. Department of Biomedical Informatics ... Department of Biomedical Informatics. Clinical Genomics Facility ... – PowerPoint PPT presentation

Number of Views:225
Avg rating:3.0/5.0
Slides: 46
Provided by: cha54
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Microarray Analysis


1
Introduction to Microarray Analysis
  • Uma Chandran PhD, MSIS
  • Department of Biomedical Informatics
  • chandranur_at_upmc.edu
  • 412-623-7841
  • 12/17/08

2
My Background
  • Bioinformatics Analysis Service
  • UPCI
  • Department of Biomedical Informatics
  • Clinical Genomics Facility
  • Runs expression, SNP and microRNA microarrays
  • Bioinformatics tightly integrated with data
    analysis
  • Expression, SNP, proteomic, integration of
    proteomic and genomic data

3
Workshop Objectives
  • Introduction to microarray analysis
  • Understand general principles
  • BRB Array Tools from NCI
  • HSLS also offers Array Assist, Genespring GX
  • Not an advanced analysis course offered through
    DBMI and Biostatistics
  • Not a statistics course
  • Will discuss some statistical issues
  • Should consult literature, statistician to
    understand methods in detail

4
What is a microarray
  • Probes on chips
  • Detect target RNA in samples
  • High throughput
  • 10000s of specific probes
  • Measure global gene expression
  • Glass beads, chips, slides

5
(No Transcript)
6
Bioinformatic approaches for analysis
  • Measuring 10000s of data points simultaneously
  • High dimensional data
  • 10 Exp x 50K 500K
  • How to find real differences over the noise
  • Statistical approaches

7
Bioinformatic approaches for analysis
  • Class Comparison
  • Which genes are up or down in tumors v normal,
    untreated v treated
  • Class Discovery
  • Within the tumor samples, are there subgroups
    that have a specific expression profile?
  • Class prediction, pathway analysis etc

8
Challenges in microarray analysis
  • Different platforms
  • Ilumina, Affymetrix, Agilent.
  • Many file types, many data formats
  • Need to learn platform dependent methods and
    software required
  • Analysis
  • How to get started?
  • Which methods? Which software? Many freely
    available tools. Some commercial
  • How to interpret results

9
Public databases
  • Many sources for public data labs, consortia,
    government
  • Publications require that data files including
    raw files be made public
  • GEO http//www.ncbi.nlm.nih.gov/geo/
  • Array Express - http//www.ebi.ac.uk/arrayexpress/
    ae-main0

10
What tools to use
  • Gene Spring GX (HSLS)
  • Todays exercise using
  • BRB arrays tools from NCI
  • Excel Interface
  • First install R statistical package from
    Bioconductor
  • Fairly easy to use if you dont have access to
    commercial tools
  • Analysis is robust, display and graphics are
    minimal
  • Learn concepts using BRB

11
GEO data for Exercise
  • Raetz et al
  • Characterization of T-ALL, T-LL, B-ALL
  • T-ALL and T-LL are morphologically
    indistinguishable
  • Are there expression differences?
  • Class comparison, class discovery, prediction

12
Files
  • C\Desktops\cmcclass\HSLSclass\
  • Treeview, Cluster
  • Files may also be under C\

13
Hands on 1
  • Google GEO
  • Query Raetz et al
  • Open cel files for class
  • .cel files
  • Affymetrix files
  • Has many files including .dat, .cel. chp
  • Need Affy software to open these files
  • Freely downloadable

14
Microarray analysis Data Preprocessing
  • Objective
  • Convert image of thousands of signals to a a
    signal value for each gene or probe set
  • Multiple step
  • Image analysis
  • Background and noise subtraction
  • Normalization
  • Expression value for a gene or probe set
  • Image analysis and bkg, noise usually done by
    proprietary software

Gene 1 100 Gene 2 150 Gene 3 75 . Gene10000 500

15
Normalization
Treated Control
  • Corrects for variation in hybridization etc
  • Assumption that no global change in gene
    expression
  • Without normalization
  • Intensity value for gene will be lower on Chip B
  • Many genes will appear to be downregulated when
    in reality they are not

Gene 1 100 Gene 2 150 Gene 3
75 . Gene10000 500
50 75 32 250
16
How to normalize?
  • Many methods Affy MAS5.0
  • Median scaling median intensity for all chips
    should be the same
  • Known genes, house keeping, invariant genes
  • Quantile - RMA
  • Normalization method may differ depending on
    platform
  • Illumina cubic spline
  • Affymetrix
  • Choose method
  • .cel to .chp file
  • Which method to choose?
  • Know the biology

17
(No Transcript)
18
BRB Array tools
  • Website
  • Excel plug in R and fortran
  • Import, choose correct format
  • From .cel files
  • Process using GCRMA or MAS5.0
  • Or directly from processed files
  • Attaches annotation
  • Create experiment labels

19
Ilumina Format
20
Hands on 2
  • Open Excel
  • Click on Array Tools
  • Look at data import options
  • Wizard
  • General format Affy, non Affy
  • Affy data
  • Import .cel files or already normalized files
  • Various normalization options
  • Clicking OK will import data dont do this now
    because time and memory intensive
  • Already normalized files look at MAS5.0.txt
  • For the next step, we will work with data that
    has already been imported into BRB

21
Hands on continued
  • Look at a normalized file MAS5.0.txt
  • Open this file in Excel
  • Absent, Present calls unique to Affy
  • This already normalized file can also be imported
    into BRB

22
Quality control
  • Will not go into detail here because platform
    specific
  • Read the literature for platform
  • Open file lableled MAS5.0 report. Folder cel
    file to import
  • Scale factor
  • P calls
  • Should be at least 40
  • If RNA quality poor, then fewer present calls
  • Control probes
  • GAPDH has 3/, middle and 5 probes
  • Ratio of 3/5
  • Other spike in probes for over cDNA to cRNA
    synthesis
  • For hybridization
  • Background/noise
  • All platforms will have control probes and
    quality metrics

23
BRB import
  • Spot filters
  • For Affy array, check off
  • For other arrays, could exclude if negative
  • Set to threshold
  • Normalization
  • If importing already normalized as in MAS5.0,
    check off
  • If RMA, already normalized, check off
  • Other methods that do not automatically
    normalized, choose a method here
  • Gene Filters
  • For now, leave 20
  • Rest, check off

24
Hands on
  • Go to BRB analysis folder
  • BRB analysis\Project\Raetz.xls file
  • This file shows what an imported project looks
    like
  • Experiment Descriptor is the class label for each
    experiment

25
Data Analysis
  • Part 2- Data analysis
  • Class discovery
  • Class comparison
  • Class prediction
  • Biological annotation
  • Pathway analysis

26
Class Discovery
  • Objective?
  • Can data tell us which classes are similar?
  • Are there subgroups?
  • Do T-ALL, T-LL, B-ALL fall into distinct groups?
  • Methods
  • Hierarchical clustering
  • K-means, SOM etc
  • These are Unsupervised Methods
  • Class Ids are not known to the algorithm
  • For example, does not know which one is cancer or
    non cancer
  • Do the expression values differentiate, does it
    discover new classes

27
Hands on Class discovery
  • Multidimensional scaling in BRB
  • Raetz.xls
  • Choose defaults in BRB
  • Eisens Cluster
  • Filter
  • Accept
  • Adjust
  • Cluster
  • Different Clustering metrics will give diff
    results
  • Not a very robust method but very popular
  • Use as exploratory tool

28
Multidimensional scaling - MDS
29
Hands on - Hierarchical Clustering
  • Eisen Cluster and Treeview
  • Import data
  • Filter
  • Filter or not to filter, P calls, SD etc
  • Accept filter
  • Adjust data
  • Log transform (important), center, normalize
  • Clustering
  • Cluster array or genes
  • Gene computationally intensive
  • Choose distance metric
  • .cdt file created
  • Open with Treeview

30
Class comparison differential expression
analysis
  • What genes are up regulated between control and
    test or multiple test conditions
  • Normal v tumor
  • Treated v untreated
  • Fold change
  • Not sufficient, need statistics
  • Statistics
  • t test, non-parametric, fdr,

31
Class comparison
  • Many analysis methods
  • May produce different results
  • Different underlying statistics and methods
  • t test
  • t test with permutations
  • SAM
  • Emperical bayesian
  • Depends on underlying assumptions about data
  • High throughput data with many rows and few
    samples
  • What is the distribution
  • Variance from gene to gene
  • Save raw data files to try different methods and
    compare results

32
Fold change does not take variation into account
Modified from madB
http//nciarray.nci.nih.gov/
33
Hypothesis Testing
Normal
Tumor
d
mean1
mean2
Null hypothesis
Alternative hypotheses
34
Statistical power
  • t test
  • Test hypothesis that the two means are not
    statistically different
  • Adding confidence to the fold change value
  • Mean
  • Standard deviation
  • Sample size
  • Calculates statistic
  • You choose cutoff or threshold
  • Give me gene list at a cutoff of p lt0.05
  • 95 confidence that the mean for that gene
    between control are treated are different

35
Experimental Design Very important!!!
  • Sample size
  • How many samples in test and control
  • Will depend on many factors such as whether
    tissue culture or tissue sample
  • Power analysis
  • Replicates
  • Technical v biological
  • Biological replicates is more important for more
    heterogenous samples Need replicates for
    statistical analysis
  • To pool or not to pool
  • Depends on objective
  • Sample acquistion or extraction
  • Laser captered or gross dissected
  • All experimental steps from sample acquisition to
    hybridization
  • Microarray experiments are very expensive. So,
    plan experiments carefully

36
t tests
  • Results might look like
  • At a plt0.05, there are 300 genes up and 200 genes
    downregulated
  • 95 confidence that the means of these genes in
    the two groups is different
  • At a p lt 0.05, x genes up and y genes down with a
    fold change of at least 3.0

37
Multiple comparison
  • Microarrays have multiple comparison problem
  • p lt 0.05 says that 95 confidence means are
    different therefore 5 due to chance
  • 5 of 10000 is 500
  • 500 genes are picked up by chance
  • Suppose t tests selects 1000 genes at a p of 0.05
  • 500/1000 Approximately 50 of the genes will be
    false
  • Very high false discovery rate need more
    confidence
  • How to correct?
  • Correction for multiple comparison
  • p value and a corrected p value

38
Corrections for multiple comparisons
  • Involve corrections to the p value so that the
    actual p value is higher
  • Bonferroni
  • Benjamin-Hochberg
  • Significance Analysis of Microarrays
  • Tusher et al. at Stanford

39
Hands on BRB
  • Class comparison
  • Choose comparison
  • Which tests are available?
  • P value cutoff
  • How is multiple correction testing being done?
  • Stringent p value, fdr
  • How is the output reported?
  • Can you figure out how many genes are regulated
    at different p values and different cutoffs
  • How to interpret results
  • Look at gene lists generated by our analysis v
    those generated in the paper

40
BRB Hands on
  • Check Experiment desc file
  • Set up Class Comparison
  • T-ALL v T-LL
  • Choose p value
  • Random variance
  • Options
  • Save file
  • Run

41
BRB Class Comparison
  • Output folder
  • Check the .html file
  • Look at results
  • P value
  • Fold change
  • Annotation
  • Click on annotation
  • Cut and paste save into Excel

42
Many studies, many methods
Dupuy and Simon, JNCI 2007
43
How to manipulate Gene lists
  • Create gene lists
  • Venn Diagram
  • Can be done even though study done on different
    platforms
  • Compare MAS and RMA
  • Venn Diagram
  • Compare B-ALL v T-LL and T-LL v B-ALL

44
Venn Diagram
http//www.pangloss.com/seidel/Protocols/venn.cgi
http//ncrr.pnl.gov/software/VennDiagramPlotter.s
tm
45
Conclusion
  • Other analysis
  • Class prediction
  • Gene list from class comparison can be used in
    pathway analysis
  • HSLS pathway workshops on Ingenuity, DAVID,
    Pathway Architect
  • Future
  • Integrate expression data with other data such as
    snp or microRNA
  • GEO has some data analysis features
Write a Comment
User Comments (0)
About PowerShow.com