Functional Genomics with Next-Generation Sequencing Jen - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Functional Genomics with Next-Generation Sequencing Jen

Description:

Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry Capacity and Resolution Next generation sequencing Increasing ... – PowerPoint PPT presentation

Number of Views:570
Avg rating:3.0/5.0
Slides: 33
Provided by: newtonAcU
Category:

less

Transcript and Presenter's Notes

Title: Functional Genomics with Next-Generation Sequencing Jen


1
Functional Genomics with Next-Generation
Sequencing
  • Jen Taylor
  • Bioinformatics Team
  • CSIRO Plant Industry

2
Capacity and Resolution
  • Next generation sequencing
  • Increasing capacity leads to increased resolution

Eric Lander, Broad Institute
3
How a Genome Works?
  • Parts Description
  • Function?
  • Interconnectedness?
  • Comparisons
  • Population - level
  • Between genomes

4
Application domains
  • Reference genome
  • No Reference Genome
  • Partially sequenced
  • UNsequenced
  • PUN Genomes

5
Impact of a Reference Genome
Sequence Data
Alignment
Genome
Read Density
Characterisation
6
Applications of Next Generation Sequencing
  • Profiling of Variation
  • Genetic variation
  • Transcript variation
  • Epigenetic variation
  • Metagenomic variation
  • Discovery
  • Novel genomes
  • Novel genes
  • Novel transcripts
  • Small / long non-coding RNA

7
RNASeq
  • Qualitative transcript diversity
  • Quantitative transcript abundance
  • Impact of NGS
  • Observation of transcript complexity
  • Transcript discovery
  • Small / long non-coding RNA
  • Analytical challenges
  • Transcript complexity
  • Compositional properties

8
RNASeq
Sample Total RNA PolyA RNA Small RNA
Reference
Analysis
Mapping to Genome
Digital Counts Reads per kilobase per million
(RPKM) Transcript structure Secondary
structure Targets or Products
Library Construction
PUN
Assembly to Contigs
Sequencing
Base calling QC
9
RNASeq Transcript Complexity
  • Mapping
  • Reads with multiple locations
  • Conserved domains ?
  • Sequencing error ?
  • Reads Spanning Exons
  • Gapped alignments ?
  • Sequencing error ?

Erange Pipeline Mortazavi et al., Nature
Methods VOL.5 NO.7 JULY 2008
10
RNASeq Compositional properties
  • Depth of Sequence
  • Sequence count Transcript Abundance
  • Majority of the data can be dominated by a small
    number of highly abundant transcripts
  • Ability to observe transcripts of smaller
    abundance is dependent upon sequence depth

11
RNASeq Compositional properties
  • Composition
  • Sequence counts are a composition of a fixed
    number of total sequence reads
  • Therefore they are sum-constrained and not
    independent
  • Large variations in component numbers and sizes
    can produce artefacts

True Reads
RPKM
12
RNASeq - Correspondence
  • Good correspondence with
  • Expression Arrays
  • Tiling Arrays
  • qRT-PCR
  • Range of up to 5 orders of magnitude
  • Better detection of low abundance transcripts
  • Greater power to detect
  • Transcript sequence polymorphism
  • Novel trans-splicing
  • Paralogous genes
  • Individual cell type expression

13
Reference Genome - RNASeq
14
Reference Genome - RNASeq
Human Exome Number of exons targeted 180,000
(CCDS database) plus700 miRNA(Sanger v13) 300
ncRNA
15
Epigenome
  • Protein-DNA interactions ChIPSeq
  • Nucleosome positioning
  • Histone modification
  • Transcription factor interactions
  • Methylation MethylSeq
  • Impact of NextGen
  • Whole genome profiling
  • Resolution
  • Analytical challenges
  • Systematic bias
  • Unambiguous mapping
  • Robust event calling

Image ClearScience
16
ChIPSeq
MNase Linker Digest
17
ChIPSeq
MNase Digest
Remove Nucleosomes
18
ChipSeq methods
Pepke et al., 2009
19
MethylSeq using Bisulfite conversion
20
Limited publications from BS-Seq
  • Mammals
  • Methylation predominant occurs at CpG site
  • Several publications in human
  • One publications in mouse
  • Plants
  • Methylation occurs at CG, CHH, CHG sites
  • Two publications in arabidopsis

H A, G, T
21
Problems of mapping BS-seq reads
  • Reduced sequence complexity

gtgtA C G T T T T T T A G T Tgtgt
22
Problems of mapping BS-seq reads
  • Increased search space

Watson gtgt A Cm G T T C T C C A G T C gtgt Crick
ltlt T G Cm A A G A G G T C A G ltlt
23
ELAND
  • Mapping reads to genome sequences
  • Mapping reads to two converted genome sequences
  • Cross match for reads mapping to multiple
    positions in converted genomes
  • Mapping results were combined to generate
    methylation information
  • Eland only allows 2 mismatches.

Lister et al. Cell (2008)
24
BSMAP
  • Based on HASH table seeding algorithm

Xi and Li BMC Bioinformatics (2009)
25
Re-mapping of Listers data using BSMAP
Lister et al. Cell (2008)
26
Methylation pattern throughout chromosomes
27
Partially / Unsequenced Genomes
  • Options for dealing with partial or unsequenced
    genomes
  • Wait for or generate the genome sequence
  • Borrow a reference genome from a phylogenetic
    neighbour
  • Take a deep breath and do denovo
  • Denovo Genome
  • Denovo Transcriptome

Gene Annotation
DNA or RNA Sequence Data
Genetic Variation
Partial Assembly
Transcript Variation
Partial Sequence Database
Non-coding RNA
28
Plant Genomes Haploid Size
Human Arabidopsis Rice Potato Sugarcane Cotton Ba
rley
Wheat
Diameter proportional to genome haploid genome
size
29
Plant Genomes Total Size
Human Cotton
Barley Sugarcane
30
Denovo RNA Seq
  • Why transcriptome ?
  • Large genome sizes with high repeat content are
    difficult to assemble
  • Transcriptomes more constant size
  • Enriched for functional content
  • Aims
  • Transcript discovery
  • Small /long non-coding RNA profiling
  • Analytical challenges
  • Assembly ABySS, Velvet, Euler-SR
  • Comparisons between non-discrete, overlapping
    transcripts
  • Annotation
  • Ploidy

31
Summary Impacts and Challenges
  • RNASeq
  • Increased resolution
  • Increased power for transcript complexity and
    variation
  • Analytical challenges transcript complexity,
    compositional bias
  • Large gains in small and long non-coding RNA
    profiling
  • Epigenomics
  • ChipSeq and MethylSeq
  • Genome-wide with resolution
  • Robust event calling is challenging
  • Denovo transcriptomics
  • Attractive option for large, repeat rich genomes

32
Acknowledgements
CSIRO PI Bioinformatics Team Andrew
Spriggs Stuart Stephen Emily Ying Jose
Robles Michael James CSIRO Biostatistics David
Lovell
Write a Comment
User Comments (0)
About PowerShow.com