SNP chips - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

SNP chips

Description:

Sense and Anti-sense. or PM and MM (old) RMA for SNP chips ... 90% (MPAM) to 98% (CRLMM) called at comparable accuracy on HapMap data ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 15
Provided by: MarkRe80
Category:
Tags: snp | chips

less

Transcript and Presenter's Notes

Title: SNP chips


1
SNP chips
  • Advanced Microarray Analysis
  • Mark Reimers,
  • Dept Biostatistics, VCU, Fall 2008

2
Affy SNP chips
3
SNP Chip Probe Design
  • 10 25-mers overlapping the SNP
  • Alleles A B
  • Sense and Anti-sense
  • or PM and MM (old)

4
RMA for SNP chips
  • Initial Affy software wasnt very accurate
  • Rabbee Speed (2006) proposed RLMM, an RMA-like
    method using
  • Quantile normalization
  • Two variables ( A B signals)
  • Discriminant analysis
  • Much better than Affy software
  • Variant (BRLMM) adopted by Affy

5
Discriminating SNPs
  • Estimate common covariance to clusters on
    training set (Hapmap) data
  • Separate clusters by Mahalanobis metric
  • Use pre-defined clusters metric to tell apart
    alleles on new data

6
Success Rate
  • 90 (MPAM) to 98 (CRLMM) called at comparable
    accuracy on HapMap data
  • Cross-validation estimate
  • BUT
  • New chips dont
  • have same distributions
  • as training set

7
CRLMM - a heroic solution
  • RLMM couldnt be extended across labs
  • Still problems with several hundred SNPs
  • CRLMM addresses both these issues by careful
    normalization
  • Achieves accuracy of 99.85 on hets 99.95 on
    homozygotes
  • Most complicated statistical calculation in BioC!

8
CRLMM Overview
  • Normalize intensity on each chip separately by
  • Summarize qA, qB, qA-, qB- by median polish
    M qA - qB M- qA-- qB-
  • Model log ratio bias on each chip by
  • Estimate log ratio bias using E-M
  • Where Zi indexes which SNP state is likely
  • k 1,2,3 for AA, AB, BB

9
Normalization Step 1
  • Regress (PM) intensity on sequence predictors and
    fragment length

hb(t) for all four bases on two chips
g(L) and 95 CI on one chip
10
Normalization Step 1
  • Too many hb(t)s
  • Impose constraint
  • hb(t) is a cubic spline with 5 df on 1,25
  • Forces neighboring values of h to be close
  • Allows variation in smoothness (unlike loess)
  • Subtract fitted values from signal
  • BUT bias still present

11
Step 2 Summarization
  • Median Polish
  • Tukeys exploratory method for arrays of numbers
  • Iterative method
  • Subtract medians of each row and each column (and
    accumulate) until medians converge
  • Robust
  • Fast

12
Step 3 Ratio Normalization
  • Fit bias function
  • of form
  • m reflects allele bases
  • But what is k?
  • Estimate by E-M

m
fL(L) for one chip
13
E-M Algorithm
  • Systematic way to guess and improve
  • Start with putative assignments to classes
  • i.e. guess k based on overall separations
  • Estimate bias for each k fi,k
  • Use residuals from fit to classify again
  • Repeat until converge!

14
Final Step Calling
  • Aim separation in two-dimensional log-ratio
    space
  • Accuracy gt 99.85 on all Hapmap calls
Write a Comment
User Comments (0)
About PowerShow.com