Metagenome analysis - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Metagenome analysis

Description:

Title: Title goes here Author: Cheryl Ventimiglia Last modified by: nivanova Created Date: 4/28/2005 9:01:06 PM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:453
Avg rating:3.0/5.0
Slides: 23
Provided by: CherylVen2
Category:

less

Transcript and Presenter's Notes

Title: Metagenome analysis


1
Metagenome analysis
Natalia Ivanova
MGM Workshop May 17, 2012
2
  • 1. Metagenome definitions
  • a refresher course

3
Metagenome definitions
  • Metagenome is a collective genome of microbial
    community, AKA microbiome (native, enriched,
    sorted, etc.).
  • Metagenomic library (or libraries) is constructed
    from isolated DNA (native, enriched, etc.).
  • Metagenomic library can be single-end (AKA
    standard)
  • or paired-end

4
Metagenome definitions
  • Single-end (standard) metagenomic library will
    produce contigs upon assembly (i. e. longer
    sequences based on overlap between reads)
  • Any Ns found in contigs correspond to low quality
    bases
  • Paired-end metagenomic library will produce
    scaffolds upon assembly (non-contigous joining of
    reads based on read pair information)
  • Ns found in scaffolds correspond either to low
    quality bases or to gaps of unknown size

AGCAGGTT
NNNNNN
TCGTCCAA
5
Amplified and Unamplified Libraries
Amplified Library
Unamplified Library
Fragmentation (1ug)
Fragmentation (1ug)
Double SPRI
End repair / Phosphorylation
End repair / Phosphorylation
SPRI Clean
Double SPRI
A-tailing with Klenow exo-
A-tailing with Klenow exo-
SPRI Clean
DNA Chip
DNA Chip
Heat Inactivation
Adaptor Ligation
Adaptor Ligation
SPRI Clean
PCR 10-cycle Amplification
DNA Chip
SPRI Clean
DNA Chip
SPRI Clean
qPCR Quantification
qPCR Quantification
6
Metagenome definitions (contd)
  • Unless the community has very low complexity (i.
    e. dominated by one or a few clonal populations),
    assembly at 100 nucleotide identity will be very
    fragmented.
  • What to do with k-mer based assemblies?
  • Use multiple k-mer settings, combine assemblies
    with an overlap-layout consensus assembler like
    minimus2 using minimal identity of 95.
    Tradeoff between overlap length and identity.

overlap alignment of reads at x identity
7
Reasoning behind combining multiple assemblies
8
Assembly Pipeline v.0.9
CPU time intensive, no known metagenomic Kmer
prediction algorithm
A snapshot of older (454-Illumina) metagenome
assembly pipeline
Picking best kmer manual process
8
9
Metagenome definitions (contd)
  • Assembly of sequences at less than 100 identity
    gt
  • population contigs and scaffolds representing a
    consensus sequence of species population
  • isolate contig species population
    contigs

overlap alignment of reads at x identity
10
2 more important definitions
  • Sequence coverage (AKA read depth)
  • How many times each base has been sequenced gt
    needs to be considered when calculated protein
    family abundance
  • Per-contig average coverage
  • Per-base coverage gt per-gene coverage
  • 2. Bins
  • Scaffolds, contigs and unassembled reads can be
    binned into sets of sequences (bins) that likely
    originated from the same species population or a
    population from a broader taxonomic lineages

11
What IMG does and doesnt do
  • Scaffolds and contigs are generated by assembly
    not provided in IMG/M
  • Sequence coverage can be computed by the
    assembler based on alignments it generates
    (preferable) or can be added later by aligning
    reads to contigs the latter can be provided in
    IMG/M
  • Bins are generated by binning software not
    provided in IMG/M
  • Scaffolds, contigs and unassembled reads are
    annotated with non-coding RNAs, repeats
    (CRISPRs), and protein coding genes (CDSs) the
    latter are assigned to protein families (COGs,
    Pfams, TIGRfams, KEGG Orthology, EC numbers,
    internal clusters) is provided in IMG/M

12
Whats the difference between IMG and MG-RAST,
IMG and CAMERA?
  • We prefer to assemble the data
  • longer sequences -gt better quality of gene
    prediction and functional annotation
  • longer sequences -gt chromosomal context and
    binning -gt population-level analysis
  • But we dont provide assembly services except for
    metagenomes sequenced at the JGI
  • we may be able to help with assembly of 454
  • were not equipped to assemble massive amounts
    of Illumina data
  • http//galaxy.jgi-psf.org
  • Contact person Ed Kirton, ESKirton_at_lbl.gov
  • IMG does not provide tools for analysis of 16S
    data from the metagenome itself
  • we do assembly -gt assembled 16S sequences are
    generally not very reliable
  • BLASTn of reads matching conserved regions is
    misleading
  • we do pyrotags or i-tags for every metagenome
    sequenced at the JGI
  • http//pyrotagger.jgi-psf.org

13
  • 2. IMG/M features
  • divide and conquer
  • (see also IMG/M -gt Using IMG/M -gt Using IMG/M -gt
    IMG User Guide and IMG/M Addendum)
  • http//img.jgi.doe.gov/m
  • http//img.jgi.doe.gov/mer
  • username public
  • password public

14
IMG/M User Interface MapAbout IMG/M -gt Using
IMG/M -gt User Interface Map
15
Dividing the contigs by GC content or length
  • Statistics
  • Microbiome Details -gt Genome Statistics -gt DNA
    Scaffolds
  • Search
  • Microbiome Details -gt Scaffold Search

16
Dividing the genes phylogenetically Phylogenetic
Distribution
  • Phylogenetic Distribution of Genes
  • Microbiome Details -gt Phylogenetic Distribution
    of Genes
  • Components
  • histograms
  • Protein Recruitment Plots
  • summary statistics tables
  • lists of genes

17
Dividing the contigs Scaffold Cart
  • Lists of contigs or genes in Gene Cart
  • E. g. Microbiome Details -gt Genome Statistics -gt
    DNA Scaffolds -gt scaffold counts
  • Scaffold Cart
  • Features
  • Scaffold Export
  • Adding all genes to Gene Cart
  • Function Profile (against functions in Function
    Cart)
  • Histograms by GC content, length and gene count
  • Phylogenetic Distribution

18
All Carts in IMG are interconnected
Gene Cart
Scaffold Cart
Function Cart
19
Dividing the genes by abundance/ by function
  • Abundance Profiles
  • Compare Genomes -gt Abundance Profiles Tools
  • Components
  • Common parameters
  • Normalization (none/scale for size)
  • Type of count (raw counts/estimated gene copies)
  • Type of protein family (COG, Pfam, Enzyme,
    TIGRfam)

20
Other tools
  • Phylogenetic Marker COGs
  • Find Functions -gt Phylogenetic Marker COGs
  • SNP BLAST and SNP Vista
  • Gene Page -gt SNP BLAST -gt SNP VISTA
  • IMG/M exercises
  • http//genomebiology.jgi-psf.org/Content/MGM-11.Fe
    b2012/agenda.html
  • The first 3 pages are questions without answers
    the rest is a cheat sheet

21
Life outside IMG binning tools
  • Alignment-based tools
  • MEGAN BLASTLCA
  • http//www-ab.informatik.uni-tuebingen.de/softwar
    e/megan
  • MTR BLAST MTR
  • http//cs.ru.nl/gori/software/MTR.tar.gz
  • SOrt-ITEMS processed BLAST best hit
  • http//metagenomics.atc.tcs.com/binning/SOrt-ITEM
    S
  • CARMA and Web-CARMA MSA neighbor-joining tree
  • http//webcarma.cebitec.uni-bielefeld.de
  • Compositional tools
  • PhyloPythia 6-mers, SVM
  • http//cbcsrv.watson.ibm.com/phylopythia.html
  • TACOA 2-6 mers, k-nearest neighbor classifier
  • http//www.cebitec.uni-bielefeld.de/brf/tacoa/tac
    oa.html
  • Phymm and PhymmBL Interpolated Markov models
    (IMMs)
  • http//www.cbcb.umd.edu/software/phymm/
  • ClaMS DOR, DBC
  • http//clams.jgi-psf.org

22
Life outside IMG statistical analysis tools
  • Comparison of 2 samples
  • MEGAN - http//www-ab.informatik.uni-tuebingen.de/
    software/megan
  • STAMP - http//kiwi.cs.dal.ca/Software/STAMP
  • Comparison of sets of samples
  • ShotgunFunctionalizeR R package for statistical
    analysis - http//shotgun.zool.gu.se
  • METAREP package from JCVI, includes
    multidimensional scaling, hierarchical
    clustering, etc - http//www.jcvi.org/metarep
  • METASTATS package for analysis of paired
    samples with replicates - http//metastats.cbcb.um
    d.edu/
  • LEfSE package for comparison of multiple
    classes of samples with replicates -
    http//huttenhower.sph.harvard.edu/lefse/
Write a Comment
User Comments (0)
About PowerShow.com