Canadian Bioinformatics Workshops - PowerPoint PPT Presentation

1 / 75
About This Presentation
Title:

Canadian Bioinformatics Workshops

Description:

Canadian Bioinformatics Workshops – PowerPoint PPT presentation

Number of Views:108
Avg rating:3.0/5.0
Slides: 76
Provided by: GaryB123
Category:

less

Transcript and Presenter's Notes

Title: Canadian Bioinformatics Workshops


1
Canadian Bioinformatics Workshops
  • www.bioinformatics.ca

2
(No Transcript)
3
Module 1 Introduction to Clinical Biomarkers
  • Sohrab Shah
  • Centre for Translational and Applied Genomics
  • Molecular Oncology Breast Cancer Research Program
  • BC Cancer Agency
  • sshah_at_bccrc.ca

4
Module Overview
  • Introduction to biomarkers
  • a brief history of clinical biomarkers in cancer
  • use of biomarkers in clinical practice
  • future of biomarker discovery
  • Measuring molecular variation
  • technology for measuring biomarkers and data
  • alternative splicing
  • Interlude Alternative splicing in clinical
    genomics
  • Example study Chin et al Cancer Cell (2006)

5
What is a biomarker?
6
What is a biomarker?
  • Wikipedia A biomarker is a substance used as an
    indicator of a biologic state. It is a
    characteristic that is objectively measured and
    evaluated as an indicator of normal biologic
    processes, pathogenic processes, or pharmacologic
    responses to a therapeutic intervention.

http//en.wikipedia.org/wiki/Biomarker
7
What is a biomarker?
  • Huntingtons Outreach Project for Education at
    Stanford A specific biological trait, such as
    the level of a certain molecule in the body, that
    can be measured to indicate the progression of a
    disease or condition.

http//www.stanford.edu/group/hopes/sttools/gloss/
b.html
8
What is a biomarker?
  • HUPO Used to indicate or measure a biological
    process (for instance, levels of a specific
    protein in blood or spinal fluid, genetic
    mutations, or brain abnormalities observed in a
    PET scan or other imaging test). Detecting
    biomarkers specific to a disease can aid in the
    identification, diagnosis, and treatment of
    affected individuals and people who may be at
    risk but do not yet exhibit symptoms.

http//www.hupo.org/overview/glossary/
9
What is a biomarker?
  • Lilly trials A biomarker is a measurement of a
    variable related to a disease that may serve as
    an indicator or predictor of that disease.
    Biomarkers are parameters from which the presence
    or risk of a disease can be inferred, rather than
    being a measure of the disease itself.

http//www.lillytrials.com/docs/terminology.html
10
What is a biomarker?
  • Biomarkers consortium (NIH) Biomarkers are
    characteristics that are objectively measured and
    evaluated as indicators of normal biological
    processes, pathogenic processes, or pharmacologic
    responses to therapeutic intervention.

http//www.biomarkersconsortium.org/
11
Why are biomarkers important?
  • Clinical Indicators for management of care
  • Diagnostics
  • Prognostics
  • Therapeutic targets
  • Basic science Help us to better understand
    mechanisms of disease

12
Types of biomarkers in cancer
  • Diagnostic
  • detect and identify a given type of cancer in an
    individual
  • Prognostic
  • Predict the probable course of disease
  • Predictive
  • response to therapy

Kulasingam and Diamandis, Nature Clinical
Practice Oncology (2008)
13
Clinical biomarkers in cancer a brief history
Kulasingam and Diamandis, Nature Clinical
Practice Oncology (2008)
14
Why are so few new biomarkers in clinical use?
Ludwig and Weinstein, Nature Reviews Cancer (2005)
15
Barriers and challenges to biomarker adoption and
development
Gutman and Kessler, Nat Rev Cancer (2006)
16
Barriers and challenges to biomarker adoption and
development
Ludwig and Weinstein, Nature Reviews Cancer (2005)
Lack of standards for sample prep
Problems with overfitting
The need for prospective trials
17
Biomarker discovery
  • Have all the important markers been found?
  • What will we learn from sequencing the cancer
    genome?
  • 100,000 mutations described to date (COSMIC)
  • Mostly obtained through targeted studies
  • NGS offers the ability to do mutation discovery
    in an unbiased way

18
Biomarker discovery
  • New biology is being discovered what is the
    clinical relevance?
  • miRNA
  • highly conserved non-coding elements
  • lnc-RNA

19
Biomarker discovery
  • Large-scale, high throughput projects

http//cancergenome.nih.gov/
http//www.icgc.org/
20
Where does bioinformatics come in?
  • large cohorts
  • high dimensional data
  • robust algorithmic and statistical tools are
    needed to bring knowledge from data

21
Case report in ovarian cancer
Shah et al (2009) NEJM
22
FOXL2 mutation in granulosa cell tumors of the
ovary
  • 402 CgtG, CgtW in all 4 index GCT cases
  • Found in 86/89 additional GCTs
  • Not found in 800 other cancers
  • Disease that can be difficult to diagnose by
    histology
  • Finding provides a diagnostic and a target for a
    novel therapeutic

23
The process of discovery
Nader Rifai, Michael A Gillette Steven A Carr,
Nature Biotechnology (2006)
24
The process of discovery
  • An ideal genomics study becomes genetic

25
What Have We Learned?
  • What is a biomarker
  • Few biomarkers are currently in use in cancer
  • New technologies are showing promising results
  • The process of biomarker discovery

26
Module Overview
  • Introduction to biomarkers
  • a brief history of clinical biomarkers in cancer
  • use of biomarkers in clinical practice
  • future of biomarker discovery
  • Measuring molecular variation
  • technology for measuring biomarkers and data
  • alternative splicing
  • Interlude Alternative splicing in clinical
    genomics
  • Example study Chin et al Cancer Cell (2006)

27
Measuring molecules for biomarker discovery
  • Measurement technologies in current use in
    genomics/proteomics
  • Gene expression microarrays
  • Genomic microarrays for SNP and copy number
  • Next generation sequencing
  • Immunohistochemistry and tissue microarrays
  • Capillary-based sequencing
  • Mass spectrometry

28
Gene expression microarrays
  • Biology transcript quantitation
  • Technology hybridization and fluorescence
    intensity
  • Limitations
  • probing for what you know

29
Example questions
  • Which genes are differentially expressed in my
    samples vs control?
  • What subgroups can be identified in my population
    based on gene expression?
  • Can a gene expression signature be used to
    classify a new sample?

30
Gene expression
  • The data
  • a data matrix X, with N rows and P columns, NgtgtP
  • X(i,j) represents the relative quantity of
    transcript i for sample j
  • Analysis
  • Normalization
  • Differential expression
  • Unsupervised clustering
  • Classification
  • Longitudinal studies (time course)
  • Network reconstruction

31
Software for gene expression
  • Tons of it!
  • Bioconductor and R
  • http//www.bioconductor.org/
  • 320 software packages
  • 400 annotation packages
  • Books, tutorials
  • GenePattern
  • http//www.broadinstitute.org/cancer/software/gene
    pattern/index.html

32
High density genotyping arrays and array CGH
  • Biology single nucleotide polymorphisms DNA
    copy number changes
  • Genotyping for 1 million SNPs
  • Genome-wide copy number changes
  • Allele specific copy number changes
  • Human variation
  • Congenital abnormalities (mental
    retardation/autism)
  • Somatic alterations in cancer

33
Example Questions
  • Which regions in the genome are recurrently
    altered in my cohort?
  • Can the cohort be stratified into subgroups based
    on copy number profiles?

34
High density genotyping arrays
  • Technology hybridization and fluorescence
    intensity
  • Data array CGH
  • Data SNP genotyping

35
High density genotyping arrays
  • Analysis
  • Normalization for
  • allelic cross talk
  • fragment length
  • GC content
  • Segmentation
  • Regression based approaches
  • find breakpoints
  • Classification using state-space models (HMM)
  • find breakpoints and classify segments

36
Software for high density genotyping arrays
  • Array CGH
  • Bioconductor (BioHMM, aCGH, DNAcopy, GLAD)
  • CNA-HMMer
  • see refs (Module 1)
  • SNP arrays
  • Normalisation
  • aroma.affymetrix (CRMA) (normalisation)
  • Allele specific copy number
  • QuantiSNP
  • PennCNV
  • Genotyping
  • CRLMM, BRLMM, BirdSeed
  • Visualisation
  • IGV (Broad), Sigma2 (BCCRC)

37
Immunohistochemical staining
  • Biology protein levels/localisation
  • Technology labeled antibody binding to an
    antigen. Done in high throughput on a tissue
    microarray
  • Limitation must have an antibody available

38
Example questions
  • Is my protein of interest expressed in my sample?
  • Which part of the cell does my protein of
    interest localise to?
  • How abundantly expressed is my protein?
  • Diagnosis? Prognosis? Predictive?

39
Immunohistochemical staining
  • The data
  • Low-throughput, highly specific

mutant beta-catenin
no mutation
40
Immunohistochemistry for subtyping
Kobel et al PLoS Medicine (2008)
41
Next generation sequencing
  • Biology single nucleotide variants, genome
    rearrangements, copy number changes, inversions,
    transcript expression, insertions/deletions
  • Technology massively parallel single molecule
    sequencing producing millions of short sequence
    reads

42
Example questions
  • What does an individual tumour/person/animal
    look like at nucleotide resolution?
  • What is the genome architecture of my sample?
  • What single nucleotide variants/indels exist in
    my sample?
  • What transcripts are expressed and at what
    quantity?
  • What are the recurrent aberrations in my set of
    samples?
  • What pathways are dysregulated by mutations?

43
Next generation sequencing
  • The data

Predicted SNVs
Confirmed SNVs
Aligned reads
Unaligned reads
Clinically relevant SNVs
Confirmed somatic
Confirmed germline
Recurrent SNVs with functional significance
False positives
44
Next generation sequencing data
45
Software for next sequence data
  • Alignment
  • Maq, BWA, BowTie, SOAP, SSAHA, Eland
  • Samtools
  • SNVs
  • Maq, SOAPSNP, SNVMix (unpublished)
  • Indels
  • Samtools, Pindel
  • Copy number
  • Chiang et al, Nature Methods (2009)
  • Expression
  • Mortazavi et al, Science (2008)
  • Take the workshop
  • http//bioinformatics.ca/workshops/high_throughput

46
Validation!
  • High throughput measurement technologies are
    noisy
  • Predictions must be validated using lower
    throughput, but more accurate experimental assays

47
Example Copy number amplifications predicted
from next generation sequencing
INSR amplicon
Chr19
6/24 (25) recurrence in TMA implications for
tamoxifen resistance
7/29 (24) recurrence in TMA
48
What Have We Learned?
  • Technology for measuring human variation
  • Gene expression
  • DNA copy number and allelic variation
  • Protein quantitiation/localisation
  • Next generation sequencing
  • Lab focuses on gene expression and copy number
    linked to clinical data
  • Measurement technologies are getting denser
  • this means more data
  • Validation is critical in order to make
    conclusions

49
Questions?
50
Coffee break
  • Back at 1050
  • NextTranscript structure and clinical genomics

51
Module 1 Transcript Structure
  • Anna Lapuk, PhD
  • Vancouver Prostate Centre
  • alapuk_at_prostatecentre.com

52
AS is ubiquitous in the cell
  • 75 of all human genes undergo AS
  • AS is implicated in human disease including
    cancer.

52
53
Tightly regulated splicing of pre-mRNA
Spliceosome Trans-regulators
Cis-regulators
53
54
Causes and consequences of aberrant splicing in
cancer
Altered splicing machinery
Cis mutations
modified from Srebrow, A. et al. (2006)
54
55
Examples of cancer specific splice isoforms
55
56
Subtype-specific alternative splicing of CD44
gene in BRCA
Miki Yamamoto, Rich Neve
56
57
Interrogating AS on the whole genome level
Affymetrix Human Exon Chip 1.0 ST design
  • Exon level arrays

Array design every possible exon in the
genome 5.5 mil probes 1,4 mil exons
(probesets) 300,000 transcripts 265,000
unknown 35,000 known 4 probes per probeset
57
58
58
59
Detection of AS in microarray data.
AS detection pipeline
Inclusion isoform data
exclusion isoform data
59
60
Tumor subtype specific splicing signatures in BRCA
60
61
Splicing and transcription act independently
Little or zero overlap between alt spliced genes
and DE genes in the same cells
61
62
Implications of AS for Clinical Applications
62
63
Module 1 Example study in clinical genomics
Sohrab Shah Centre for Translational and Applied
Genomics Molecular Oncology Breast Cancer
Research Program BC Cancer Agency sshah_at_bccrc.ca
63
Module Title of Module
64
Module Overview
  • Introduce Chin et al Cancer Cell (2006)
  • overview of study
  • goals and biological questions
  • the breast cancer expression subtypes
  • data types and data sets generated in this study
  • clinical outcome
  • major conclusions

64
Module Title of Module
65
Genomic and transcriptional aberrations linked to
breast cancer pathophysiologiesChin et al.
Cancer Cell (2006)
  • Why did we choose this study?
  • Cited 206 times since 2006
  • Contains many of the important concepts
    encountered in large scale clinical genomics
    studies
  • Integrated analysis of copy number and expression
  • Data and clinical phenotypes freely available
  • Limitations and caveats
  • Data generated is on older, obsolete platforms
  • Goals
  • identify genomic events that can be assayed to
    better stratify patients according to clinical
    behaviour
  • develop insights into how molecular aberrations
    contribute to breast cancer pathogenesis
  • discover genes that might be therapeutic targets
    in patients that do not respond well to current
    therapies

65
Module Title of Module
66
Expression subtypes in breast cancer
  • Lab
  • Reduction of 22000 features to 100s (genefilter)
  • hierarchical clustering after feature selection
    (cluster)

66
Module Title of Module
67
Copy number profiles related to expression
subtypes
all
basal
  • Lab
  • Processing copy number data
  • Genome wide plotting of frequency of copy number
    alteration
  • package aCGH

erbb2
lumA
lumB
67
Module Title of Module
68
Unsupervised clustering of copy number
Samples fall into 3 main groups 1q/16q,
Amplifying and Complex Greenamplification Red
deletion Yellow high-level amplicons Correlatio
n with clinical phenotypes?
  • Lab
  • clustering copy number profiles
  • package aCGH

68
Module Title of Module
69
Correlation of amplicons with survival
Copy number subtypes
Expression subtypes
  • Lab
  • Module 4
  • Integration with outcome data
  • package survival

Recurrent amplifications
lumA
8p11-12
8p11-12
69
Module Title of Module
70
Expression patterns in non-copy number induced
genes preserved
70
Module Title of Module
71
Comments?
71
Module Title of Module
72
Limitations of study?
72
Module Title of Module
73
What we will do in the lab
  • Expression subtypes
  • feature selection and clustering methods
  • Copy number profiles
  • Copy number subtypes
  • Association with outcome
  • Survival curves of subtypes split by
    amplifications

73
Module Title of Module
74
What Have We Learned?
  • Review of Chin et al Cancer Cell (2006)

74
Module Title of Module
75
Lunch
  • On your own
  • (Food court Downstairs)
  • Back at 1330

75
Module Title of Module
Write a Comment
User Comments (0)
About PowerShow.com