Title: Talk to geWorkbench group
1Talk to geWorkbench group
- Functional Genomics I Microarray Analysis
2The c-Src Protein
3Amino Acids
4Amino Acid Symbols
5Amino Acid Sequence of c-Src
- MGSNKSKPKDASQRRRSLEPAENVHGAGGGAFPASQTPSKPASADGHRGP
SAAFAPAAAE - PKLFGGFNSSDTVTSPQRAGPLAGGVTTFVALYDYESRTETDLSFKKGER
LQIVNNTEGD - WWLAHSLSTGQTGYIPSNYVAPSDSIQAEEWYFGKITRRESERLLLNAEN
PRGTFLVRES - ETTKGAYCLSVSDFDNAKGLNVKHYKIRKLDSGGFYITSRTQFNSLQQLV
AYYSKHADGL - CHRLTTVCPTSKPQTQGLAKDAWEIPRESLRLEVKLGQGCFGEVWMGTWN
GTTRVAIKTL - KPGTMSPEAFLQEAQVMKKLRHEKLVQLYAVVSEEPIYIVTEYMSKGSLL
DFLKGETGKY - LRLPQLVDMAAQIASGMAYVERMNYVHRDLRAANILVGENLVCKVADFGL
ARLIEDNEYT - ARQGAKFPIKWTAPEAALYGRFTIKSDVWSFGILLTELTTKGRVPYPGMV
NREVLDQVER - GYRMPCPPECPESLHDLMCQCWRKEPEERPTFEYLQAFLEDYFTSTEPQY
QPGENL
6Information Transfer at the Sequence Level How a
Protein is Made
7DNA
8Double-Stranded DNA
9Base Pairing in DNA
10Double Helical Structure of DNA
11RNA Ribonucleic Acid
- Differs from DNA
- 1. Sugar has an extra oxygen
- 2. Uracil (U) instead of thymine (T)
- 3. Single strand not double helix
12Protein Synthesis
13How Many Nucleotides Are Required to Code for an
Amino Acid?
143 Nucleotides Code for an Amino Acid
15The Genetic Code
16DNA Sequence of Src
- CATCGAGGTTTTGAGAGGCTAACTCTCCCAAAAAGGACCATGGGTAGCAA
CAAGAGCAAG - CCCAAGGATGCCAGCCAGCGGCGCCGCAGCCTGGAGCCCGCCGAGAACGT
GCACGGCGCT - GGCGGGGGCGCTTTCCCCGCCTCGCAGACCCCCAGCAAGCCAGCCTCGGC
CGACGGCCAC - CGCGGCCCCAGCGCGGCCTTCGCCCCCGCGGCCGCCGAGCCCAAGCTGTT
CGGAGGCTTC - AACTCCTCGGACACCGTCACCTCCCCGCAGAGGGCGGGCCCGCTGGCCGG
TGGAGTGACC - ACCTTTGTGGCCCTCTATGACTATGAGTCTAGGACGGAGACAGACCTGTC
CTTCAAGAAA - GGCGAGCGGCTCCAGATTGTCAACAACACAGAGGGAGACTGGTGGCTGGC
CCACTCGCTC - AGCACAGGACAGACAGGCTACATCCCCAGCAACTACGTGGCGCCCTCCGA
CTCCATCCAG - GCTGAGGAGTGGTATTTTGGCAAGATCACCAGACGGGAGTCAGAGCGGTT
ACTGCTCAAT - GCAGAGAACCCGAGAGGGACCTTCCTCGTGCGAGAAAGTGAGACCACGAA
AGGTGCCTAC - TGCCTCTCAGTGTCTGACTTCGACAACGCCAAGGGCCTCAACGTGAAGCA
CTACAAGATC - CGCAAGCTGGACAGCGGCGGCTTCTACATCACCTCCCGCACCCAGTTCAA
CAGCCTGCAG - CAGCTGGTGGCCTACTACTCCAAACACGCCGATGGCCTGTGCCACCGCCT
CACCACCGTG - TGCCCCACGTCCAAGCCGCAGACTCAGGGCCTGGCCAAGGATGCCTGGGA
GATCCCTCGG
17Translation of Src Gene to Src Protein
- CATCGAGGTTTTGAGAGGCTAACTCTCCCAAAAAGGACCATGGGTAGCAA
CAAGAGCAAG - M G S N
K S K -
- CCCAAGGATGCCAGCCAGCGGCGCCGCAGCCTGGAGCCCGCCGAGAACGT
GCACGGCGCT - P K D A S Q R R R S L E P A E N V
H G A - .
- .
- .
- GAGCGGCCCACCTTCGAGTACCTGCAGGCCTTCCTGGAGGACTACTTCAC
GTCCACCGAG - E R P T F E Y L Q A F L E D Y F T
S T E - CCCCAGTACCAGCCCGGGGAGAACCTCTAGGCACAGGCGGGCCCAGACCG
GCTTCTCGGC - P Q Y Q P G E N L
18Transcription of DNA and translation of RNA vary
with biological conditions
192 kinds of microarray platforms
- Spotted Array - 2 color - Pat Brown (Stanford)
- Synthesized Oligonucleotide - 1 color -
Affymetrix
20Spotted (2 Color) Arrays
21Concentrations from 2 Color Experiments
22Affymetrix 1 Color Arrays
23Spotted vs Affymetrix
- Spotted Arrays
- Advantages Long pieces.
- Disadvantages Uncertainties in spot reading.
- Affymetrix Arrays
- Advantages Probes in same place, can be read
precisely. - Disadvantages Short pieces.
24Concentrations from 1 Color Experiments
25Probeset intensity as an average of probe
intensities
26Problems with averaging probes
- Var(probes within probeset) gt Var(The same
probe across slides) - MMgtPM (40 of slides)
27Problems to be solved in chip reading
- 1. Highly variable probe intensities compared to
probes set intensity. - 2. Correct for nonspecific binding realistically.
- 3. Correct for background within chips.
- 4. Correct for intensity variation between chips.
28Steps in GCRMA
- 1. Background correction- in each chip.
- 2. Normalization - between chips.
- 3. Summarization of probes to probe sets.
29GCRMA Background Correction
30Boxplot of Unnormalized Chips
31Quantile Normalization
32Intensity Plots
33Contributions to probe intensity
34Constraint on probe-effects
35Median polish algorithm
36Expression ratios
37Need for a measure of variability
Experiment Replicate A Replicate B Average
1 2 6 4
2 1 15 8
38Approximation of the normal distribution
39Equation of the normal distribution
m
mean(average)
standard deviation
s
40Effect of the standard deviation
41Standard deviation and percent
42Estimates of the mean and standard deviation
43The z distribution
44Does experimental CO210.00mg/m3
45The t-distribution
46The t-distribution of the difference of 2 means
47Problems applying t-test to microarrays
- 1. Multiple tests - thousands of genes.
- 2. Multiple conditions- more than 2 conditions.
- Solution LIMMA
- LInear Models for Microarray Analysis.
48The log transformation of intensities
49The t-distribution of the difference of 2 means
50Variance Stabilization
51Benjamini-Hochberg False Discovery Correction
- Uncorrected p-value rate of false discovery if
only 1 test. - Corrected p-value rate of false discovery if all
of the genes above it were tested and accepted.
52Components of Multiple Comparisons
- 1. The number of conditions.
- 2. The number of comparisons between conditions.
53Multiple Conditions
54Comparing 2 conditions out of 3
55Bonferroni Correction for Multiple Comparisons
56Bayesian Log Odds B-value
57AffylmGUI
- R (Statistical Programming Language)
Bioconductor (R Programs for Biology)
LIMMA
AffylmGUI 1 Color
Limma GUI 2 Color