Title: Probe selection for Microarrays
1Probe selection for Microarrays
- Considerations and pitfalls
2Probe selection wish list
- Probe selection strategy should ensure
- Biologically meaningful results (The truth...)
- Coverage, Sensitivity (... The whole truth...)
- Specificity (... And nothing but the truth)
- Annotation
- Reproducibility
3Technology
- Probe immobilization
- Oligonucleotide coupling Synthesis with
linker, covalent coupling to surface - Oligonucleotide photolithography
- ds-cDNA coupling cDNA generated by PCR,
nonspecific binding to surface - ss-cDNA coupling PCR with one modified primer,
covalent coupling, 2nd strand removal - Spotting
- With contact (pin-based systems)
- Without contact (ink jet technology)
4Technology-specific requirements
- General
- Not too short (sensitivity, selectivity)
- Not too long (viscosity, surface properties)
- Not too heterogeneous (robustness)
- Degree of importance depends on method
- Single strand methods (Oligos, ss-cDNA)
- Orientation must be known
- ss-cDNA methods are not perfect
- ds-cDNA methods dont care
5Probe selection approaches
Accuracy
Throughput
SelectedGenes
ESTs
Selected GeneRegions
ClusterRepresentatives
Anonymous
6Non-Selective Approaches
- Anonymous (blind) spotting
- Using clones from a library without prior
sequencing - Only clones with interesting expression pattern
are sequenced - Normalization of library highly recommended
- Typical uses
- HT-arrays of exotic organisms or tissues
- Large-scale verification of Differential Display
clones - EST spotting
- Using clones from a library after sequencing
- Little justification since sequence availability
allow selection
7Spotting of cluster representatives
- Sequence Clustering
- For human/mouse/rat EST clones public cluster
libraries - Unigene (NCBI)
- THC (TIGR)
- For custom sequence clustering tools
- STACK_PACK (SANBI)
- JESAM (HGMP)
- PCP (Paracel, commercial)
8A benign clustering situation
9In the absence of 5-3 links
Two clusters corresponding to one gene
10Overlap too short
Three clusters corresponding to one gene
11Chimeric ESTs
One cluster corresponding to two genes
12Chimeric ESTs ... continued
- Chimeric ESTs are quite common
- Chimeric ESTs are a major nuisance for array
probe selection - One of the fusion partners is usually a highly
expressed mRNA - Double-picking of chimeric ESTs can fool even
cautious clustering programs. - Unigene contains several chimeric clusters
- The annotation of chimeric clusters is erratic
- Chimeric ESTs can be detected by genome
comparison - There is one particularly bad class of chimeric
sequences that will be subject of the exercises.
13How to select a cluster representative
- If possible, pick a clone with completely known
sequence - Avoid problematic regions
- Alu-repeats, B1, B2 and other SINEs
- LINEs
- Endogenous retroviruses
- Microsatellite repeats
- Avoid regions with high similarity to
non-identical sequences - In many clusters, orientation and position
relative to ORF are unknown and cannot be
selected for. - Test selected clone for sequence correctness
- Test selected clone for chimerism
- Some commercial providers offer sequence verified
UNIGENE cluster representatives
14Selection of genes
- If possible, use all of them
- Biased selection
- Selection by tissue
- Selection by topic
- Selection by visibility
- Selection by known expression properties
- Selection from unbiased pre-screen
- Use sources of expression information
- EST frequency
- Published array studies
- SAGE data
15Selection of gene regions
3 UTR
ORF
5 UTR
16Alternative polyadenylation
17Alternative polyadenylation
- Constitutive polyA heterogeneity
- 3-Fragments reduced sensitivity
- no impact on expression ratio
- Regulated polyA heterogeneity
- Fragment choice influences expression ratio
- Multiple fragments necessary
- Detection of cryptic polyA signals
- Prediction (AATAAA)
- Polyadenylated ESTs
- SAGE tags
18Alternative splicing
19Alternative splicing
- Constitutive splice form heterogeneity
- Fragment in alternative exon reduced sensitivity
- No impact on expression ratio
- Regulated splice form heterogeneity
- Fragment choice influences expression ratio
- Multiple fragments necessary
- Detection of alternative splicing events
- Hard/Impossible to predict
- EST analysis (beware of pre-mRNA)
- Literature
20Alternative promoter usage
21Alternative promoter usage
- What is the desired readout?
- If promoter activity matters most multiple
fragments - If overall mRNA level matters most downstream
fragment - Detection of alternative promoter usage
- Prediction difficult (possible?)
- EST analysis
- Literature
22UDP-Glucuronosyltransferases
UGT1A8
UGT1A7
23Selection of gene regions
- Coding region (ORF)
- Annotation relatively safe
- No problems with alternative polyA sites
- No repetitive elements or other funny sequences
- danger of close isoforms
- danger of alternative splicing
- might be missing in short RT products
- 3 untranslated region
- Annotation less safe
- danger of alternative polyA sites
- danger of repetitive elements
- less likely to cross-hybridize with isoforms
- little danger of alternative splicing
- 5 untranslated region
- close linkage to promoter
- frequently not available
24A checklist
- Pick a gene
- Try get a complete cDNA sequence
- Verify sequence architecture (e.g. cross-species
comparison) - Mask repetitive elements (and vector!)
- If possible, discard 3-UTR beyond first polyA
signal - Look for alternative splice events
- Use remaining region of interest for similarity
searches - Mask regions that could cross-hybridize
- Use the remaining region for probe amplification
or EST selection - When working with ESTs, use sequence-verified
clones