Title: Canadian Bioinformatics Workshops
1Canadian Bioinformatics Workshops
2(No Transcript)
3Module 1 Introduction to Clinical Biomarkers
- Sohrab Shah
- Centre for Translational and Applied Genomics
- Molecular Oncology Breast Cancer Research Program
- BC Cancer Agency
- sshah_at_bccrc.ca
4Module Overview
- Introduction to biomarkers
- a brief history of clinical biomarkers in cancer
- use of biomarkers in clinical practice
- future of biomarker discovery
- Measuring molecular variation
- technology for measuring biomarkers and data
- alternative splicing
- Interlude Alternative splicing in clinical
genomics - Example study Chin et al Cancer Cell (2006)
5What is a biomarker?
6What is a biomarker?
- Wikipedia A biomarker is a substance used as an
indicator of a biologic state. It is a
characteristic that is objectively measured and
evaluated as an indicator of normal biologic
processes, pathogenic processes, or pharmacologic
responses to a therapeutic intervention.
http//en.wikipedia.org/wiki/Biomarker
7What is a biomarker?
- Huntingtons Outreach Project for Education at
Stanford A specific biological trait, such as
the level of a certain molecule in the body, that
can be measured to indicate the progression of a
disease or condition.
http//www.stanford.edu/group/hopes/sttools/gloss/
b.html
8What is a biomarker?
- HUPO Used to indicate or measure a biological
process (for instance, levels of a specific
protein in blood or spinal fluid, genetic
mutations, or brain abnormalities observed in a
PET scan or other imaging test). Detecting
biomarkers specific to a disease can aid in the
identification, diagnosis, and treatment of
affected individuals and people who may be at
risk but do not yet exhibit symptoms.
http//www.hupo.org/overview/glossary/
9What is a biomarker?
- Lilly trials A biomarker is a measurement of a
variable related to a disease that may serve as
an indicator or predictor of that disease.
Biomarkers are parameters from which the presence
or risk of a disease can be inferred, rather than
being a measure of the disease itself.
http//www.lillytrials.com/docs/terminology.html
10What is a biomarker?
- Biomarkers consortium (NIH) Biomarkers are
characteristics that are objectively measured and
evaluated as indicators of normal biological
processes, pathogenic processes, or pharmacologic
responses to therapeutic intervention.
http//www.biomarkersconsortium.org/
11Why are biomarkers important?
- Clinical Indicators for management of care
- Diagnostics
- Prognostics
- Therapeutic targets
- Basic science Help us to better understand
mechanisms of disease
12Types of biomarkers in cancer
- Diagnostic
- detect and identify a given type of cancer in an
individual - Prognostic
- Predict the probable course of disease
- Predictive
- response to therapy
Kulasingam and Diamandis, Nature Clinical
Practice Oncology (2008)
13Clinical biomarkers in cancer a brief history
Kulasingam and Diamandis, Nature Clinical
Practice Oncology (2008)
14Why are so few new biomarkers in clinical use?
Ludwig and Weinstein, Nature Reviews Cancer (2005)
15Barriers and challenges to biomarker adoption and
development
Gutman and Kessler, Nat Rev Cancer (2006)
16Barriers and challenges to biomarker adoption and
development
Ludwig and Weinstein, Nature Reviews Cancer (2005)
Lack of standards for sample prep
Problems with overfitting
The need for prospective trials
17Biomarker discovery
- Have all the important markers been found?
- What will we learn from sequencing the cancer
genome? - 100,000 mutations described to date (COSMIC)
- Mostly obtained through targeted studies
- NGS offers the ability to do mutation discovery
in an unbiased way
18Biomarker discovery
- New biology is being discovered what is the
clinical relevance? - miRNA
- highly conserved non-coding elements
- lnc-RNA
19Biomarker discovery
- Large-scale, high throughput projects
http//cancergenome.nih.gov/
http//www.icgc.org/
20Where does bioinformatics come in?
- large cohorts
- high dimensional data
- robust algorithmic and statistical tools are
needed to bring knowledge from data
21Case report in ovarian cancer
Shah et al (2009) NEJM
22FOXL2 mutation in granulosa cell tumors of the
ovary
- 402 CgtG, CgtW in all 4 index GCT cases
- Found in 86/89 additional GCTs
- Not found in 800 other cancers
- Disease that can be difficult to diagnose by
histology - Finding provides a diagnostic and a target for a
novel therapeutic
23The process of discovery
Nader Rifai, Michael A Gillette Steven A Carr,
Nature Biotechnology (2006)
24The process of discovery
- An ideal genomics study becomes genetic
25What Have We Learned?
- What is a biomarker
- Few biomarkers are currently in use in cancer
- New technologies are showing promising results
- The process of biomarker discovery
26Module Overview
- Introduction to biomarkers
- a brief history of clinical biomarkers in cancer
- use of biomarkers in clinical practice
- future of biomarker discovery
- Measuring molecular variation
- technology for measuring biomarkers and data
- alternative splicing
- Interlude Alternative splicing in clinical
genomics - Example study Chin et al Cancer Cell (2006)
27Measuring molecules for biomarker discovery
- Measurement technologies in current use in
genomics/proteomics - Gene expression microarrays
- Genomic microarrays for SNP and copy number
- Next generation sequencing
- Immunohistochemistry and tissue microarrays
- Capillary-based sequencing
- Mass spectrometry
28Gene expression microarrays
- Biology transcript quantitation
- Technology hybridization and fluorescence
intensity - Limitations
- probing for what you know
29Example questions
- Which genes are differentially expressed in my
samples vs control? - What subgroups can be identified in my population
based on gene expression? - Can a gene expression signature be used to
classify a new sample?
30Gene expression
- The data
- a data matrix X, with N rows and P columns, NgtgtP
- X(i,j) represents the relative quantity of
transcript i for sample j
- Analysis
- Normalization
- Differential expression
- Unsupervised clustering
- Classification
- Longitudinal studies (time course)
- Network reconstruction
31Software for gene expression
- Tons of it!
- Bioconductor and R
- http//www.bioconductor.org/
- 320 software packages
- 400 annotation packages
- Books, tutorials
- GenePattern
- http//www.broadinstitute.org/cancer/software/gene
pattern/index.html
32High density genotyping arrays and array CGH
- Biology single nucleotide polymorphisms DNA
copy number changes - Genotyping for 1 million SNPs
- Genome-wide copy number changes
- Allele specific copy number changes
- Human variation
- Congenital abnormalities (mental
retardation/autism) - Somatic alterations in cancer
33Example Questions
- Which regions in the genome are recurrently
altered in my cohort? - Can the cohort be stratified into subgroups based
on copy number profiles?
34High density genotyping arrays
- Technology hybridization and fluorescence
intensity - Data array CGH
35High density genotyping arrays
- Analysis
- Normalization for
- allelic cross talk
- fragment length
- GC content
- Segmentation
- Regression based approaches
- find breakpoints
- Classification using state-space models (HMM)
- find breakpoints and classify segments
36Software for high density genotyping arrays
- Array CGH
- Bioconductor (BioHMM, aCGH, DNAcopy, GLAD)
- CNA-HMMer
- see refs (Module 1)
- SNP arrays
- Normalisation
- aroma.affymetrix (CRMA) (normalisation)
- Allele specific copy number
- QuantiSNP
- PennCNV
- Genotyping
- CRLMM, BRLMM, BirdSeed
- Visualisation
- IGV (Broad), Sigma2 (BCCRC)
37Immunohistochemical staining
- Biology protein levels/localisation
- Technology labeled antibody binding to an
antigen. Done in high throughput on a tissue
microarray - Limitation must have an antibody available
38Example questions
- Is my protein of interest expressed in my sample?
- Which part of the cell does my protein of
interest localise to? - How abundantly expressed is my protein?
- Diagnosis? Prognosis? Predictive?
39Immunohistochemical staining
- The data
- Low-throughput, highly specific
mutant beta-catenin
no mutation
40Immunohistochemistry for subtyping
Kobel et al PLoS Medicine (2008)
41Next generation sequencing
- Biology single nucleotide variants, genome
rearrangements, copy number changes, inversions,
transcript expression, insertions/deletions - Technology massively parallel single molecule
sequencing producing millions of short sequence
reads
42Example questions
- What does an individual tumour/person/animal
look like at nucleotide resolution? - What is the genome architecture of my sample?
- What single nucleotide variants/indels exist in
my sample? - What transcripts are expressed and at what
quantity? - What are the recurrent aberrations in my set of
samples? - What pathways are dysregulated by mutations?
43Next generation sequencing
Predicted SNVs
Confirmed SNVs
Aligned reads
Unaligned reads
Clinically relevant SNVs
Confirmed somatic
Confirmed germline
Recurrent SNVs with functional significance
False positives
44Next generation sequencing data
45Software for next sequence data
- Alignment
- Maq, BWA, BowTie, SOAP, SSAHA, Eland
- Samtools
- SNVs
- Maq, SOAPSNP, SNVMix (unpublished)
- Indels
- Samtools, Pindel
- Copy number
- Chiang et al, Nature Methods (2009)
- Expression
- Mortazavi et al, Science (2008)
- Take the workshop
- http//bioinformatics.ca/workshops/high_throughput
46Validation!
- High throughput measurement technologies are
noisy - Predictions must be validated using lower
throughput, but more accurate experimental assays
47Example Copy number amplifications predicted
from next generation sequencing
INSR amplicon
Chr19
6/24 (25) recurrence in TMA implications for
tamoxifen resistance
7/29 (24) recurrence in TMA
48What Have We Learned?
- Technology for measuring human variation
- Gene expression
- DNA copy number and allelic variation
- Protein quantitiation/localisation
- Next generation sequencing
- Lab focuses on gene expression and copy number
linked to clinical data - Measurement technologies are getting denser
- this means more data
- Validation is critical in order to make
conclusions
49Questions?
50Coffee break
- Back at 1050
- NextTranscript structure and clinical genomics
51Module 1 Transcript Structure
- Anna Lapuk, PhD
- Vancouver Prostate Centre
- alapuk_at_prostatecentre.com
52AS is ubiquitous in the cell
- 75 of all human genes undergo AS
- AS is implicated in human disease including
cancer.
52
53Tightly regulated splicing of pre-mRNA
Spliceosome Trans-regulators
Cis-regulators
53
54Causes and consequences of aberrant splicing in
cancer
Altered splicing machinery
Cis mutations
modified from Srebrow, A. et al. (2006)
54
55Examples of cancer specific splice isoforms
55
56Subtype-specific alternative splicing of CD44
gene in BRCA
Miki Yamamoto, Rich Neve
56
57Interrogating AS on the whole genome level
Affymetrix Human Exon Chip 1.0 ST design
Array design every possible exon in the
genome 5.5 mil probes 1,4 mil exons
(probesets) 300,000 transcripts 265,000
unknown 35,000 known 4 probes per probeset
57
5858
59Detection of AS in microarray data.
AS detection pipeline
Inclusion isoform data
exclusion isoform data
59
60Tumor subtype specific splicing signatures in BRCA
60
61Splicing and transcription act independently
Little or zero overlap between alt spliced genes
and DE genes in the same cells
61
62Implications of AS for Clinical Applications
62
63Module 1 Example study in clinical genomics
Sohrab Shah Centre for Translational and Applied
Genomics Molecular Oncology Breast Cancer
Research Program BC Cancer Agency sshah_at_bccrc.ca
63
Module Title of Module
64Module Overview
- Introduce Chin et al Cancer Cell (2006)
- overview of study
- goals and biological questions
- the breast cancer expression subtypes
- data types and data sets generated in this study
- clinical outcome
- major conclusions
64
Module Title of Module
65Genomic and transcriptional aberrations linked to
breast cancer pathophysiologiesChin et al.
Cancer Cell (2006)
- Why did we choose this study?
- Cited 206 times since 2006
- Contains many of the important concepts
encountered in large scale clinical genomics
studies - Integrated analysis of copy number and expression
- Data and clinical phenotypes freely available
- Limitations and caveats
- Data generated is on older, obsolete platforms
- Goals
- identify genomic events that can be assayed to
better stratify patients according to clinical
behaviour - develop insights into how molecular aberrations
contribute to breast cancer pathogenesis - discover genes that might be therapeutic targets
in patients that do not respond well to current
therapies
65
Module Title of Module
66Expression subtypes in breast cancer
- Lab
- Reduction of 22000 features to 100s (genefilter)
- hierarchical clustering after feature selection
(cluster)
66
Module Title of Module
67Copy number profiles related to expression
subtypes
all
basal
- Lab
- Processing copy number data
- Genome wide plotting of frequency of copy number
alteration - package aCGH
erbb2
lumA
lumB
67
Module Title of Module
68Unsupervised clustering of copy number
Samples fall into 3 main groups 1q/16q,
Amplifying and Complex Greenamplification Red
deletion Yellow high-level amplicons Correlatio
n with clinical phenotypes?
- Lab
- clustering copy number profiles
- package aCGH
68
Module Title of Module
69Correlation of amplicons with survival
Copy number subtypes
Expression subtypes
- Lab
- Module 4
- Integration with outcome data
- package survival
Recurrent amplifications
lumA
8p11-12
8p11-12
69
Module Title of Module
70Expression patterns in non-copy number induced
genes preserved
70
Module Title of Module
71Comments?
71
Module Title of Module
72Limitations of study?
72
Module Title of Module
73What we will do in the lab
- Expression subtypes
- feature selection and clustering methods
- Copy number profiles
- Copy number subtypes
- Association with outcome
- Survival curves of subtypes split by
amplifications
73
Module Title of Module
74What Have We Learned?
- Review of Chin et al Cancer Cell (2006)
74
Module Title of Module
75Lunch
- On your own
- (Food court Downstairs)
- Back at 1330
75
Module Title of Module