Mini Assembler - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Mini Assembler

Description:

SeqAn The C Sequence Analysis Library. strings - structured sequences - gapped sequences ... Mask known and 'de novo' repeats. Project Mini Assembler. Task: ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 17
Provided by: Cel47
Category:
Tags: assembler | de | ligate | mini | novo

less

Transcript and Presenter's Notes

Title: Mini Assembler


1
Mini Assembler Software Project B SeqAn
David Weese and Prof. Knut Reinert
2
SeqAn The C Sequence Analysis Library
  • alphabets- scoring schemes- file formats-
    base pair probabilities-

3
DNA Sequencing
Shotgun DNA Sequencing (Technology)
4
Shotgun DNA Sequencing
Avg. Length 550 Avg. Error 1-2
5
Project Mini Assembler
Input Reads (mate pairs), generated by a
simulator Goal Construct large scaffolds to
obtain a good assembly (N50 measure)
6
Project Mini Assembler
Whole Genome Shotgun Assembly Pipeline
Screener
7
Project Mini Assembler
Whole Genome Shotgun Assembly Pipeline
Screener
Mask known and de novo repeats
Task Build a repeat screener to help the
Overlapper
8
Project Mini Assembler
Whole Genome Shotgun Assembly Pipeline
Screener
Find overlaps between reads
Task Construct an overlapper module to compute
overlaps with affine gap costs.
9
Project Mini Assembler
Whole Genome Shotgun Assembly Pipeline
Screener
Compute consistent sub-assemblies (unitigs)
Task Construct overlap graph Construct a
spanning tree based layout
10
Project Mini Assembler
Whole Genome Shotgun Assembly Pipeline
Mated fragments
Screener
Build Scaffolds
Task Contruct mate pair contig graph Construct
scaffold with greedy approach
11
Project Mini Assembler
Whole Genome Shotgun Assembly Pipeline
Screener
Fill Gaps
Task Extend contigs using mates and overlaps
12
Project Mini Assembler
Whole Genome Shotgun Assembly Pipeline
Screener
TTCGGGTTGAACGTGCATTAAATCGCGCAACTG
Compute consensus sequence and N50 score
13
Names to Tasks
14
Tasks
  • Preferable requirements
  • C/C skills
  • C für Fortgeschrittene block seminar (6.4. -
    9.4.)
  • Your work
  • Collaborate and develop a module interface
  • Make use of SeqAn - C Sequence Analysis
    Library (www.seqan.de)
  • Implement your module in C
  • Document your code
  • Present your results
  • Material
  • Assembly lecture notes of Alg. Bioinformatik,
    Reinert WS07/08
  • Links and documents on the homepage

15
Schedule
  • Seminar block 1 (2.4.)
  • Introduction to principles of software design
  • Tools (IDE, Debugger, Profiler, Bug Tracker,
    SVN, )
  • Seminar block 2 (14.4. - 17.4.)
  • SeqAn tutorial (Sequences, Alignments, Graphs,
    Indices)
  • Assign names to tasks
  • Make and present your plan (until 27.4.)
  • Prepare a presentation
  • What are your data types and interfaces to other
    modules?
  • What algorithms you want to use?
  • What do you use from SeqAn what do you implement
    new?
  • Start Working
  • Present your results, write a final report (until
    8.6.)

16
End of Talk
Questions?
weese_at_inf.fu-berlin.de
Write a Comment
User Comments (0)
About PowerShow.com