AMOS tools for assembly validation - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

AMOS tools for assembly validation

Description:

AMOS tools for assembly validation. Automatically scan an assembly to locate ... Tandem duplication. Reference: B. anthracis Ames ancestor' strain ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 28
Provided by: Michael2026
Category:

less

Transcript and Presenter's Notes

Title: AMOS tools for assembly validation


1
AMOS tools for assembly validation
  • Automatically scan an assembly to locate
    misassembly signatures for further analysis and
    correction
  • Load Assembly Data into Bank
  • Evaluate Mate Pairs Libraries
  • Evaluate Read Alignments
  • Evaluate Read Breakpoints
  • Analyze Depth of Coverage
  • Identify Surrogates
  • Load Misassembly Signatures into Bank

AMOS Bank
http//amos.sourceforge.net
2
Assembly QC mate happiness
  • Evaluate mate happiness across assembly
  • Happy Correct orientation and distance
  • Finds regions with multiple
  • Compressed Mates (too close together)
  • Expanded Mates (too far apart)
  • Invalid same orientation (? ?)
  • Invalid outie orientation (? ?)
  • Missing Mates
  • Linking mates (mate in a different scaffold)
  • Singleton mates (mate is not in any contig)
  • Regions with high C/E statistic

3
Mate happiness
  • Excision Skip reads between flanking repeats
  • Truth
  • Misassembly Compressed Mates, Missing Mates

4
Mate happiness
  • Insertion Additional reads between flanking
    repeats
  • Truth
  • Misassembly Expanded Mates, Missing Mates

5
Mate happiness
  • Rearrangement Reordering of reads
  • Truth
  • Misassembly Misoriented Mates

Note if A,B too far apart, mates may all be
happy
6
Compression/Expansion (C/E) Statistic
  • The presence of individual compressed or expanded
    mates is rare but expected
  • Do the inserts spanning a given position differ
    from the rest of the library?
  • Flag large differences as potential misassemblies
  • Even if each individual mate is happy
  • Compute the statistic at all positions
  • (Local Mean Global Mean) / Scaling Factor
  • Introduced by Jim Yorkes group at UMD

7
Library size variation
2kb
4kb
6kb
0kb
8 inserts 3kb-6kb Local Mean 4048 C/E Stat
(4048-4000) 0.33 (400 / v8)
Near 0 indicates overall happiness
8
C/E statistic Compression
2kb
4kb
6kb
0kb
8 inserts 3.2 kb-4.8kb Local Mean 3488 C/E
Stat (3488-4000) -3.62 (400
/ v8) C/E Stat -3.0 indicates Compression
9
Read Alignment
  • Multiple reads with same conflicting base are
    unlikely
  • 1x QV 30 1/1000 base calling error
  • 2x QV 30 1/1,000,000 base calling error
  • 3x QV 30 1/1,000,000,000 base calling error
  • Correlated SNPs are likely to be assembly errors,
    usually collapsed repeats
  • AMOS Tools analyzeSNPs clusterSNPs
  • Locate regions with high rate of correlated SNPs
  • Parameterized thresholds
  • Multiple positions within 100bp sliding window
  • 2 conflicting reads
  • Cumulative QV gt 40 (1/10000 base calling error)

A G C A G C A G C A G C A G C A G C C T A C T A C
T A C T A C T A
10
Read breakpoints compression error
ribosomal RNA repeats, B. anthracis
  • QC METHOD
  • Align singleton reads to consensus assembly
  • Find any breakpoints shared by multiple reads

chimeric reads
mates
11
Uncompress by creating new repeat copy
Reference B. anthracis Ames ancestor strain
B. anthracis Ames Porton Down strain
Tandem duplication
12
Read Coverage
  • Find regions of contigs where the depth of
    coverage is unusually high
  • AMOS Tool analyzeReadDepth
  • 2.5x mean coverage

B
A
R1 R2
A
R1
B
R2
13
Hawkeye assembly viewer and debugger
14
Launch Pad
15
Histograms Statistics
Insert Size
Read Length
GC Content
Overall Statistics
  • Birds eye view of data and assembly quality

16
Scaffold View
  • Statistical Plots
  • Scaffold
  • Features
  • Clone inserts
  • Overview
  • Control Panel
  • Details

17
Standard Feature Types
  • B Breakpoint
  • Alignment ends at this position
  • C Coverage
  • Location of unusual mate coverage (asmQC)
  • S SNPs
  • Location of Correlated SNPs
  • U Unitig
  • Used to report location of surrogate unitigs in
    CA assemblies
  • X Other
  • All other Features

18
Insert (mate) Happiness
  • Happy
  • Oriented Correctly
  • Insert Size Library.mean lt Happy-Distance
    Library.sd
  • Stretched
  • Oriented Correctly
  • Insert Size gt Library.mean Happy-Distance
    Library.sd
  • Compressed
  • Oriented Correctly
  • Insert Size lt Library.mean - Happy-Distance
    Library.sd
  • Misoriented
  • Same or Outies
  • Linking
  • Reads mate is in some other scaffold

Both mates present
Only 1 read present
19
Contig View detailed alignment of reads to
contigs
20
SNP View
SNP Sorted Reads
Polymorphism View
21
SNP Barcode
SNP Sorted Reads
Colored Rectangle indicate the positions and
composition of the SNPs
22
Scaffold View
Coverage
CE Statistic
SNP Feature
Happy
Stretched
Compressed
Misoriented
Linking
23
Collapsed Repeat
Read Coverage Spike
-5.5 CE Dip
Compressed Mates Cluster
68 Correlated SNPs
24
Example 1 Compression in Prevotella intermedia
17assembly, found by the CE statistic
  • Green inserts are lt2 standard deviations from
    the mean, and the orange inserts are compressed
    by gt 2 standard deviations.
  • Vertical yellow line shows the most likely place
    of a compression misassembly.
  • Only one insert in this case is compressed by gt 3
    standard deviations

25
Example 2 Compression in Prevotella intermedia
17assembly, found by the CE statistic
26
Fixing collapsed repeats with AMOS
Original Contig
Compression Point
Before
Patch Contig
Resolved Stitched Contig
After
27
Assemblies can be preserved at NCBIs Assembly
Archivehttp//www.ncbi.nlm.nih.gov/Traces/assembl
y/assmbrowser.cgi
Write a Comment
User Comments (0)
About PowerShow.com