Title: MapView: visualization of short reads alignment on a desktop computer
1MapView visualization of short reads alignment
on a desktop computer
InCoB 2009
Hua Bao Sun Yat-sen University
2Next-generation sequencing
- Sequencing by synthesis
- High-throughput (tens of millions reads per lane)
- Read length is short (25-50bp)
- Sequencing error rate is relatively higher than
Sanger sequencing
3Statement of the problem
- 1. Alignment results (e.g. , 50M reads)
- read1 TATCGCACATAGTTCGCG hhhhhhhllhhhhhA -
Chr1 126609 - read2 CATACGACACTCATGTAG h,abhhhhhAhhda,
Chr2 94 - 2. Reference genome (e.g. , 500M bp)
- gtChr1
- CGATCGAGCGACAGACGAGCACACGTAGCACTGTGGGGGAA
-
- Visualization of large-scale alignment data with
super-high computational efficiency.
4Computational efficiency
- Memory usage
- Data compressed
- Fractional loading
- CPU time
- Indexing
- Pre-computing
5File format design
MapView format (MVF)
Compressed sequences Ordered alignments
The offset address of data is indexed by
reference position
Coverage information of reference site
Basic info of reference and reads Offset of Data,
Index and Statistics
Data
Statistics
Head
Index
6Loading algorithms
Jump to different region
MapView window
MapView window
Genomic position
Using Index
Offset address
Data
Data
MapView file
7Efficiency of MapView
Computational efficiency comparision
Tool Version Memory usage CPU times
Consed 18.0 12.06 GB 208 s
Hawkeye 2.0.8 14.14 GB 296 s
EagleView 2.2 3.91 GB 207 s
MapView 3.1 0.04 GB 2 s
The alignment data for the assessment are of
reference length 43 million bp and 6 million
Illumina 44-bp reads.
8User-friendly Interface
9User-friendly Interface
10User-friendly Interface
11Summary
- Super-high computational efficiency
- Visualization of hundreds of millions reads
with 40M memory in 2 seconds. - Rich featured and user-friendly
- Compact alignment view for both single-end and
paired-end short reads, multiple navigation and
zoom modes.
12Thank you!
MapView visualization of short reads alignment
on a desktop computer