Title: R and Eddie for Breast Cancer Bioinformatics
1R and Eddie for Breast Cancer Bioinformatics
2Microarrays
- Assay Samples on a Genome Wide Scale
- Developed in 1990s
- Widely Applied in Biology
3Microarrays Widely Applied in Breast Cancer
Sorlie et al Nature 2000
4Microarray Approaches in Breast Cancer
Unsupervised
Supervised
Molecular subtyping Intrinsic set
Patient Based Recurrence/Metastasis/Survival
Model Based ER/Grade/Hypoxia/Proliferation
Molecular subtypes
Poor
Good
Low
High
Prognostic Implications?
Sims (2009). J. Clin Path. (available online)
5What Can They Measure?
SNP arrays
CGH arrays
COPY NUMBER
VARIATION
METHYLATION
Genomic DNA
Methylation arrays
Gene
TRANSCRIPTION
mRNA
miRNA
Gene expression (mRNA) arrays
miRNA arrays
TRANSLATION
Protein sequence
POST-TRANSLATIONAL MODIFICATIONS
Peptide arrays
3D Protein structure
6How We Use Eddie
7Integration of Previous Studies
Richardson et al Farmer et al.
ERBB2 (216836_s_at) GRB7 (210761_s_at) ERBB2
(210930_s_at) GATA3 (209604_s_at) GATA3
(209602_s_at) GATA3 (209603_s_at) FBP1
(209696_at) ESR1 (205225_at) NAT1
(214440_at) SEMA3C (203789_at) XBP1
(200670_at) KRT17 (212236_at) KRT17
(205157_at) KRT5 (201820_at)
Farmer et al. (2005) Oncogene.
24(29)4660-71 2 49 tumours, U133A, amplified
RNA, 27 luminal, 16 basal, 6 molecular
apocrine
Richardson et al. (2006) Cancer Cell 9
121-32 1 40 tumours, U133 plus2, standard
labelling, 18 basal-like, 20
non-basal-like , 2 BRCA
Richardson et al. Farmer et al.
ERBB2 (210930_s_at) ERBB2 (216836_s_at) GRB7
(210761_s_at) NAT1 (214440_at) XBP1
(200670_at) GATA3 (209603_s_at) GATA3
(209604_s_at) GATA3 (209602_s_at) FBP1
(209696_at) ESR1 (205225_at) KRT5
(201820_at) KRT17 (212236_at) KRT17 (205157_at)
Sims et al. (2008) BMC Medical Genomics 142
8Data Repositories
Microarray Hybridisations by Year in Array Express
300000
250000
200000
Number of Hybrisations
150000
100000
50000
0
2004
2005
2006
2007
2008
2009
Year
Statistics taken from Array Express for January
of Each Year
9Semi-Automated Pre-Processing of Studies from
Repositories
10Running Analyses with Varying Parameters Using
Array Jobs
Number of Loci by Distance Paramater
100
90
80
70
60
Mouse
50
Human Interphase
Number of Loci (Relative to Distance 0)
Human Mitotic
40
30
20
10
0
0
100
200
300
400
500
600
700
800
900
1000
Distance Parameter
11Analysis of WNT Signalling in Breast Cancer
WNT Gene Sets
2 Datasets
Groups of Functionally Related Genes
Determine Association with Clinical Variables
Process Multiple Breast Cancer Datasets
Remaining Datasets
12Detecting Regions of Co-Regulation in Breast
Cancer
Regions for Further Analysis
Process Breast Cancer Datasets
Decide on Parameters
Find Regions of Interest
Significance Testing by Permutation
13Mapping of Short Sequence Tags to Transcriptional
Start Sites
900
800
700
600
Low
500
Medium
Number of Tags
400
High
300
200
100
0
-2000
-1500
-1000
-500
0
500
1000
1500
2000
Distance from TSS
Enrichment by Gene Group
Gene Locations
Parallelization of Mapping Reduces Estimated Time
from 11hrs to 2hrs
14The Future
- More Parallel Jobs
- Bigger Jobs and Parallel Processing
- Eg affyPara Bioconductor Package
- More Mass Sequencing
- More Data!
15Thanks!
Bickmore Lab Sehrish Rafique
Andy Sims Arran Turnbull Liang Liang Colette
Meyer Robert Kitchen
Breakthrough Unit Elad Katz Sylvie
Dubois-Marshall Charlene Kay
Lee Murphy Angie Fawkes Louise Evenden WTCRF
Bartlett Lab Melanie Spears Karen Taylor Carrie
Cunningham
Bauke Ylstra, VU Medisch Centrum, Amsterdam
Nick Gilbert Bernie Ramsahoye Catherine
Naughton Jayne Culley Ben Skerry Jacqueline
Dickson
Meehan Lab Colm Nestor Donncha Dunican
Dimitra Dafou Kate Lawrenson UCL EGA Institute
for Womens Health