Title: Mini Assembler
1Mini Assembler Software Project B SeqAn
David Weese and Prof. Knut Reinert
2SeqAn The C Sequence Analysis Library
- alphabets- scoring schemes- file formats-
base pair probabilities-
3DNA Sequencing
Shotgun DNA Sequencing (Technology)
4Shotgun DNA Sequencing
Avg. Length 550 Avg. Error 1-2
5Project Mini Assembler
Input Reads (mate pairs), generated by a
simulator Goal Construct large scaffolds to
obtain a good assembly (N50 measure)
6Project Mini Assembler
Whole Genome Shotgun Assembly Pipeline
Screener
7Project Mini Assembler
Whole Genome Shotgun Assembly Pipeline
Screener
Mask known and de novo repeats
Task Build a repeat screener to help the
Overlapper
8Project Mini Assembler
Whole Genome Shotgun Assembly Pipeline
Screener
Find overlaps between reads
Task Construct an overlapper module to compute
overlaps with affine gap costs.
9Project Mini Assembler
Whole Genome Shotgun Assembly Pipeline
Screener
Compute consistent sub-assemblies (unitigs)
Task Construct overlap graph Construct a
spanning tree based layout
10Project Mini Assembler
Whole Genome Shotgun Assembly Pipeline
Mated fragments
Screener
Build Scaffolds
Task Contruct mate pair contig graph Construct
scaffold with greedy approach
11Project Mini Assembler
Whole Genome Shotgun Assembly Pipeline
Screener
Fill Gaps
Task Extend contigs using mates and overlaps
12Project Mini Assembler
Whole Genome Shotgun Assembly Pipeline
Screener
TTCGGGTTGAACGTGCATTAAATCGCGCAACTG
Compute consensus sequence and N50 score
13Names to Tasks
14Tasks
- Preferable requirements
- C/C skills
- C für Fortgeschrittene block seminar (6.4. -
9.4.) - Your work
- Collaborate and develop a module interface
- Make use of SeqAn - C Sequence Analysis
Library (www.seqan.de) - Implement your module in C
- Document your code
- Present your results
- Material
- Assembly lecture notes of Alg. Bioinformatik,
Reinert WS07/08 - Links and documents on the homepage
15Schedule
- Seminar block 1 (2.4.)
- Introduction to principles of software design
- Tools (IDE, Debugger, Profiler, Bug Tracker,
SVN, ) - Seminar block 2 (14.4. - 17.4.)
- SeqAn tutorial (Sequences, Alignments, Graphs,
Indices) - Assign names to tasks
- Make and present your plan (until 27.4.)
- Prepare a presentation
- What are your data types and interfaces to other
modules? - What algorithms you want to use?
- What do you use from SeqAn what do you implement
new? - Start Working
- Present your results, write a final report (until
8.6.)
16End of Talk
Questions?
weese_at_inf.fu-berlin.de