Assembly by alignment - PowerPoint PPT Presentation

About This Presentation
Title:

Assembly by alignment

Description:

... finished genome 26,099 reads total 25,310 uniquely anchored in genome 314 placed with the help of a uniquely anchored mate 22 were placed as unique pairs, ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 25
Provided by: MichaelR211
Learn more at: https://www.cbcb.umd.edu
Category:

less

Transcript and Presenter's Notes

Title: Assembly by alignment


1
(No Transcript)
2
Assembly by alignment
  • Instead of
  • overlap-layout-consensus
  • we use
  • alignment-consensus

3
Alignment algorithm
  • AMOScmp uses MUMmer
  • MUMmer will be covered in detail by Adam
    Phillippy in a later lecture
  • MUMmer provides very fast alignment of
    closely-related sequences

4
Assembly of a close relative
5
AMOScmp algorithm
  • Read alignment Each shotgun read is aligned to
    the reference genome using MUMmer.
  • Repetitive sequences and polymorphisms between
    the target and the reference cause some reads to
    align in a non-contiguous fashion.
  • We used a modified version of the Longest
    Increasing Subsequence (LIS) algorithm in order
    to generate chains of mutually consistent matches
    between each read and the reference.

6
Repeat resolution
  1. Check to see if the paired-end sequence (the
    mate) is uniquely anchored in the genome. If
    it is, we place the read in the location that
    satisfies the constraints imposed by the
    mate-pair information.
  2. If a read and its mate are both ambiguously
    placed, we attempt to find whether the mate-pair
    information allows us to place them both in the
    assembly. In some cases, there exists only one
    placement of both a read and its mate that
    satisfies the mate-pair constraints on distance
    and orientation.
  3. When the first two steps leave us with more than
    one placement for a pair of reads, we choose at
    random one of the possible placements that
    satisfy the mate-pair constraints.

7
Repeat resolution example
  • Aligned all shotgun reads from Streptococcus
    agalactiae 2603 to the final, finished genome
  • 26,099 reads total
  • 25,310 uniquely anchored in genome
  • 314 placed with the help of a uniquely anchored
    mate
  • 22 were placed as unique pairs, with neither read
    being unique on its own
  • 442 had to be placed in a randomly chosen copy of
    a repeat

8
Read alignment anomalies
  • Reads dont always align properly
  • Certain alignment patterns are used by AMOScmp to
    detect differences in the new target genome
  • Many of these can be resolved

9
Mapping reads to the reference genome when the
target genome contains an insertion. The bottom
indicates the true layout of the reads (A,B,C)
along the target. The top indicates the
alignment of the reads to the reference. Slanted
lines depict portions of the read that do not
match in the case of read B, the entire read
does not align to the reference.
10
The insertion in the target genome is shorter
than a single read. The "bubbles" identify the
portions of the two reads that do not align to
the reference.
11
Insertion into the reference. The alignment of
reads to the reference (top) indicates the
presence of the insertion. Dashed lines indicate
the stretch of the reads needed to align to the
reference.
12
Signature of a genome rearrangement
Regions II and III from the target appear in a
different order in the reference. Reads A, B,
and C match the reference in disjoint locations
the dashed lines connect sections of a read that
are adjacent in the target genome.
13
Signature of a divergent region
The gray areas are divergent they are not
recognizably similar. Portions of the reads not
matching the reference are shown at an angle.
14
Effect of short flanking repeats on the alignment
of a read to the reference in the case of an
insertion in the reference. The repeat is
shown in gray. The dashed lines connect sections
of read A that occur twice in the reference but
once in A and in the target genome.
15
Assembly of 1Mb of S. agalactiae 2603
The rows correspond (top to bottom) to CA
scratch assembly contigs created by Celera
Assembler 2603 AMOS-Comp contigs created
using strain 2603 as a reference NEM
AMOS-Comp contigs using strain NEM 316 as a
reference nucmer the alignment of strain NEM
316 to strain 2603. Stacked arrows in the bottom
row correspond to repeats.
16
Assemblies of strain 2603 produced by AMOScmp
17
Completeness of assembly(mapped back to finished
strain 2603)
The total gap size indicates the total number of
bases missing from the assembled contigs after
mapping them to the finished genome. The column
marked LW represents the theoretical estimate of
coverage based on Lander-Waterman 19
statistics.
18
Limits on comparative assembly
19
(No Transcript)
20
Fishing in the Trace Archive
  • 2,772,509 reads (traces) for Drosophila ananassae
  • 2,214,248 traces for D. simulans
  • 2,445,065 traces for D. mojavensis

21
Discovery of fruit-fly bacterial endosymbionts in
published data Wolbachia pipientis is an
intra-cellular bacterial endosymbiont of fruit
flies (genus Drosophila) and other insects,
primarily found in the reproductive organs of
females. The endosymbiont is often inadvertently
sequenced as part of a fruit fly genome project.
Assembly strategy Use completed sequence of
Wolbachia endosymbiont of Drosophia melanogaster
(wMel) to extract Wolbachia reads from Drosophila
shotgun data deposited in NCB I Trace Archive.
22
Strategy 1 Identify reads matching wMel with nucmer Assemble extracted reads with Celera Assembler
Strategy 2 Extract and assemble reads with comparative assembler AMOScmp
23
(No Transcript)
24
wAna wSim wWil
Molecule length 1,440,650 896,761 922,146
matching reads 32,720 3,727 2,291
contigs 464 388 485
scaffolds 329 84
genes 1,837 790
wAna Wolbachia endosymbiont of D.
ananassae wSim Wolbachia endosymbiont of D.
simulans wWil Wolbachia endosymbiont of D.
willistoni (NOTE D. mojavensis turned out to be
an erroneous submission D. willistoni was
discovered later)
Write a Comment
User Comments (0)
About PowerShow.com