Title: Genomic Arrays: Tools for cancer gene discovery
1Genomic Arrays Tools for cancer gene discovery
- Ian Roberts
- MRC Cancer Cell Unit
- Hutchison MRC Research Centre
- ir210_at_cam.ac.uk
2Whats a genomic array?
- A platform of regularly spaced genomic sequences
- All known genes or a subset of genes of interest
- A tool for querying the genome about damage
- Genomic gains (oncogenes)
- Genomic losses (tumour suppressor genes)
- Applications
- Research ? disease gene discovery
- Clinical ? diagnostic tests
3Comparative genomic hybridisation
Available probe
Tumour DNA (Test)
Normal DNA (Reference)
4New generation arrays produce large amounts of
data
Agilent 244K array
Raw data is foreground and background signal
intensities in two channels Median ratio of
foreground is important.
243,504 defined spots
5aCGH data analysis ...
6Genomic array analysis strategy using R
- array data is processed by snapCGH R package
- Correct array data for background noise and mean
distribution - Order data by genomic location
- Apply an aCGH segmentation algorithm
- Draw some plots
- Determine significant findings (in house R
functions) - Common and minimum genomic regions of gain and
loss - Summarise output
R ? www.cran.r-project.org snapCGH ?
www.bioconductor.org parrot R on camgrid ?
http//www.bio.cam.ac.uk/local/condor-parrot.html
7Old vs. New genomic array plots
Chromosome 7
8Significant region detection is computationally
intensive
9Distributed aCGH analysis
Consolidate output
10Condor job scripting in BASH R
- BASH function
- Responsible for producing required condor files
for discrete jobs - Default_submit has 2 positional parameters
- R script name ? 1
- Data files ? 2
- Initiates aCGH analysis on grid.
- Condor dagman R function set
- R-scripter
- Writes the appropriate R script for the current
job - R-condor-submitter
- Writes the condor job submission file
- R-condor-executer
- Writes the condor job executable file
- R-job-descriptor
- Writes the condor dagman description file
11End user abstraction start_aCGH.sh
- aCGH analysis undertaken by a single shell
command - Manages array data input
- Collects user specified parameters
- Chromosome range
- Segmentation algorithms
- Significance thresholds
- Links condor R job scripting
12start_aCGH.sh session on mole
13. continued
1 hr 6 hr later!
aCGH region information and plots
14Summary findings (38 arrays)
- Rapid identification of regions of interest
- Easy comparison of aCGH analysis via different
algorithms
15Real life application
Retrospective analysis confirms initial
findings! (summary of 38 samples)
16Future development
- Tailor output for specific user requirements
- Produce overall summary plot
- Apply approach to expression arrays
17www.bio.cam.ac.uk/ir210
- Grace Ng
- Steph Carter
- Konstantina Karagavriliidou
- Jenny Barna
- Mark Calleja
- Nick Coleman