Functional Genomics with Next-Generation Sequencing Jen

About This Presentation

Title:

Functional Genomics with Next-Generation Sequencing Jen

Description:

Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry Capacity and Resolution Next generation sequencing Increasing ... – PowerPoint PPT presentation

Number of Views:570

Avg rating:3.0/5.0

Slides: 33

Provided by: newtonAcU

Category:

more less

Transcript and Presenter's Notes

Title: Functional Genomics with Next-Generation Sequencing Jen

1
Functional Genomics with Next-Generation
Sequencing

Jen Taylor
Bioinformatics Team
CSIRO Plant Industry

2
Capacity and Resolution

Next generation sequencing
Increasing capacity leads to increased resolution

Eric Lander, Broad Institute
3
How a Genome Works?

Parts Description
Function?
Interconnectedness?
Comparisons
Population - level
Between genomes

4
Application domains

Reference genome
No Reference Genome
Partially sequenced
UNsequenced
PUN Genomes

5
Impact of a Reference Genome
Sequence Data
Alignment
Genome
Read Density
Characterisation
6
Applications of Next Generation Sequencing

Profiling of Variation
Genetic variation
Transcript variation
Epigenetic variation
Metagenomic variation

Discovery
Novel genomes
Novel genes
Novel transcripts
Small / long non-coding RNA

7
RNASeq

Qualitative transcript diversity
Quantitative transcript abundance
Impact of NGS
Observation of transcript complexity
Transcript discovery
Small / long non-coding RNA
Analytical challenges
Transcript complexity
Compositional properties

8
RNASeq
Sample Total RNA PolyA RNA Small RNA
Reference
Analysis
Mapping to Genome
Digital Counts Reads per kilobase per million
(RPKM) Transcript structure Secondary
structure Targets or Products
Library Construction
PUN
Assembly to Contigs
Sequencing
Base calling QC
9
RNASeq Transcript Complexity

Mapping
Reads with multiple locations
Conserved domains ?
Sequencing error ?
Reads Spanning Exons
Gapped alignments ?
Sequencing error ?

Erange Pipeline Mortazavi et al., Nature
Methods VOL.5 NO.7 JULY 2008
10
RNASeq Compositional properties

Depth of Sequence
Sequence count Transcript Abundance
Majority of the data can be dominated by a small
number of highly abundant transcripts
Ability to observe transcripts of smaller
abundance is dependent upon sequence depth

11
RNASeq Compositional properties

Composition
Sequence counts are a composition of a fixed
number of total sequence reads
Therefore they are sum-constrained and not
independent
Large variations in component numbers and sizes
can produce artefacts

True Reads
RPKM
12
RNASeq - Correspondence

Good correspondence with
Expression Arrays
Tiling Arrays
qRT-PCR
Range of up to 5 orders of magnitude
Better detection of low abundance transcripts
Greater power to detect
Transcript sequence polymorphism
Novel trans-splicing
Paralogous genes
Individual cell type expression

13
Reference Genome - RNASeq
14
Reference Genome - RNASeq
Human Exome Number of exons targeted 180,000
(CCDS database) plus700 miRNA(Sanger v13) 300
ncRNA
15
Epigenome

Protein-DNA interactions ChIPSeq
Nucleosome positioning
Histone modification
Transcription factor interactions
Methylation MethylSeq
Impact of NextGen
Whole genome profiling
Resolution
Analytical challenges
Systematic bias
Unambiguous mapping
Robust event calling

Image ClearScience
16
ChIPSeq
MNase Linker Digest
17
ChIPSeq
MNase Digest
Remove Nucleosomes
18
ChipSeq methods
Pepke et al., 2009
19
MethylSeq using Bisulfite conversion
20
Limited publications from BS-Seq

Mammals
Methylation predominant occurs at CpG site
Several publications in human
One publications in mouse
Plants
Methylation occurs at CG, CHH, CHG sites
Two publications in arabidopsis

H A, G, T
21
Problems of mapping BS-seq reads

Reduced sequence complexity

gtgtA C G T T T T T T A G T Tgtgt
22
Problems of mapping BS-seq reads

Increased search space

Watson gtgt A Cm G T T C T C C A G T C gtgt Crick
ltlt T G Cm A A G A G G T C A G ltlt
23
ELAND

Mapping reads to genome sequences
Mapping reads to two converted genome sequences
Cross match for reads mapping to multiple
positions in converted genomes
Mapping results were combined to generate
methylation information
Eland only allows 2 mismatches.

Lister et al. Cell (2008)
24
BSMAP

Based on HASH table seeding algorithm

Xi and Li BMC Bioinformatics (2009)
25
Re-mapping of Listers data using BSMAP
Lister et al. Cell (2008)
26
Methylation pattern throughout chromosomes
27
Partially / Unsequenced Genomes

Options for dealing with partial or unsequenced
genomes
Wait for or generate the genome sequence
Borrow a reference genome from a phylogenetic
neighbour
Take a deep breath and do denovo
Denovo Genome
Denovo Transcriptome

Gene Annotation
DNA or RNA Sequence Data
Genetic Variation
Partial Assembly
Transcript Variation
Partial Sequence Database
Non-coding RNA
28
Plant Genomes Haploid Size
Human Arabidopsis Rice Potato Sugarcane Cotton Ba
rley
Wheat
Diameter proportional to genome haploid genome
size
29
Plant Genomes Total Size
Human Cotton
Barley Sugarcane
30
Denovo RNA Seq