Title: Felix Naef
1Felix Naef Marcelo Magnasco, GL meeting, Nov.
19 2001felix_at_funes.rockefeller.eduOutline
Excursions into GeneChip data analysis
- Background subtraction
- Probeset statistics
2Background estimation
- estimate both mean B and fluctuations s
- needed in low-intensity regime
- includes light reflection from substrate,
photodetector dark current, some
cross-hybridization (i.e. small residues) - by the CLT, background is expected to be a
Gaussian variable
3- idea B is insensitive to MM and visible at low
intensity - select probes such that PM-MM lt e (locally?)
- use e50 (new) or 100 (old settings)
- P(PM) or P(MM) is convolution of Gaussian and
step function
4dependence on e
5trick for dealing with negative values
6PM vs. MM distribution
zoom
7PM vs. MM histogram
8MMgtPM across different chips
MMgtPM not concentrated at low intensities 27 of
probe pairs with MMgtPM are in the top quartile
9probe pairs trajectories (80 chips)
- take all (PM, MM) for
- a given probe set
- center of mass (x,y)
- ellipsoid of inertia
- gt s1 and s2
- histogram the cms
- color code acc. to
- s s1 / S(min(x, y))
- noise detrending
10all probe sets blue large s green mid red
small
11probes with well defined trajectories
(eccentricity gt 3) 1/3 of probes
blue large green mid red small
12PM within a probe set
Are the brightness of the probes reasonably
uniform? Or do different probes have very
different hybridization efficiencies?
13So what can possibly be happening?
- sequence dependent hybridization efficiencies
- are kinetic effects important?
- cross-hybridization beyond what is detectable by
- MM probes
- this is hard to assess without sequence info
- sequence dependent fabrication efficiencies?
- variable probe densities
14Composite scores
- What have we learned from previous slides?
- MM are not consistently behaving as expected
- What about not using them ?
- The probe set intensities vary over decades
- difficult to estimate absolute intensities using
- averages (alternative Li and Wong)
- - we focus on ratio scores
15Outline of algorithm
- estimate background (mean and std)
- discard noisy and saturated probes
- use either only PM or PM-MM as raw intensities
- average the remaining log-ratios in an outlier
robust way (robust regression to intercept), SE - normalize by centering (event. local) log-ratio
distribution