Reconstruction of DNA sequencing by hybridization - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Reconstruction of DNA sequencing by hybridization

Description:

Human Genome Project. Large molecule data in biology, such as DNA and protein ... There is no additional information except spectrum and the length of target DNA ... – PowerPoint PPT presentation

Number of Views:204
Avg rating:3.0/5.0
Slides: 21
Provided by: leo88
Category:

less

Transcript and Presenter's Notes

Title: Reconstruction of DNA sequencing by hybridization


1
Reconstruction of DNA sequencing by hybridization
  • Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang
  • ZHANGroup_at_aporc.org
  • Institute of Applied Mathematics, AMSS, CAS

2
Bioinformatics
  • Human Genome Project
  • Large molecule data in biology, such as DNA and
    protein
  • Knowledge of mathematics, computer science,
    information science, physics, system science,
    management science as well as biology
  • Genomics
  • DNA sequencing
  • Gene prediction
  • Sequence alignment

3
DNA Sequencing
ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGAC
TACGTTTTA TATATATATACGTCGTCGT ACTGATGACTAGATTACAG
ACTGATTTAGATACCTGAC TGATTTTAAAAAAATATT
4
DNA Sequencing (shotgun)
target DNA
cut many times at random
forward-reverse linked reads
known dist
500 bp
500 bp
5
DNA Sequencing (SBH)
  • DNA array (DNA chip) with 43 probes
  • Target DNA AAATGCG

6
Sequencing by Hybridization
  • Hybridize target to array containing a spot for
    each possible k-tuple (k-mer)
  • The spectrum of a sequence
  • multi-set of all its k-long substrings (k-tuples)
  • Goal
  • reconstruct the sequence from its spectrum
  • Pevzner (1989) reconstruction is polynomial
  • But

7
Uniqueness of Reconstruction
  • Different sequences can have the same spectrum
  • ACT, CTA, TAC
  • ACTAC
  • TACTA
  • Non-uniqueness Probability

8
Experiment Errors
  • Hybridization experiments are error prone
  • False negative error
  • k-tuple appears in target DNA but does not appear
    in its measured spectrum
  • Repetition of k-tuple
  • False positive error
  • k-tuple does not appear in target DNA but does
    appear in its measured spectrum

9
Sequencing by Hybridization
  • Target DNA TTTTACGC

  • ß
  • Spectrum
  • Errors Positive (misread) / Negative (missing,
    repetition)

TTT TTT TTA TAC
ACG CGC Ideal case
TTT TTT TTA TAC
ACG CGC TGA With
errors
10
(No Transcript)
11
SBH Reconstruction Problem
  • In the case of error-free SBH experiments
  • A desired solution of SBH is just a feasible
    solution including all k-tuple in the specturm
  • For the general case
  • There is no additional information except
    spectrum and the length of target DNA
  • A feasible solution composed of a maximum
    cardinality subset of the spectrum shall be a
    reasonable desired solution

12
SBH Reconstruction Problem
  • Ideal case (without repetitions and errors)
  • Equivalent to finding an Eulerian path in a
    corresponding graph (Pevzner, 1989)
  • A linear time algorithm (Fleischner, 1990)
  • General case is NP-hard problem
  • Branch and bound
  • Heuristics
  • Extensions
  • PSBH (Positional SBH)
  • SBH with length error

13
Motivations
  • Give some criteria which can determine the most
    possible k-tuples at both ends and in the middle
    of all possible reconstructions of the target DNA
  • These criterions greatly reduce ambiguities in
    the reconstruction of DNA
  • Transform the negative errors into the positive
    errors
  • These means enables us to handle both types of
    errors easily
  • Separate the repetitions from both type of errors

14
Methods
  • Estimate the number of k-tuples that does not
    occur in a solution
  • Adjacency matrix (connection matrix)
  • Give a lower bound of k-tuples that does not
    occur in all solutions from k-tuple i to j

15
Methods
  • Determine the most possible k-tuples at both ends
  • Reconstruct from the most possible end pairs to
    get an upper bound of SBH problem
  • Purge the end pairs that can not have better
    solution than current upper bound

16
Methods
  • Transform the negative errors into the positive
    errors
  • Artificial k-tuple
  • Fill in all the possible gaps due to false
    negative error
  • Negative error level
  • The maximal number of allowed consecutively
    missing k-tuples
  • Reduce the number of artificial k-tuples

17
Computational Experiments
  • 109 DNA sequence from GenBank
  • Simulate the SBH experiments
  • Error models
  • Randomly (probabilistic model)
  • Systematically (one base mismatched model)

18
(No Transcript)
19
(No Transcript)
20
Conclusions
  • Ideal case (without repetitions and errors) can
    be solved in polynomial time (Pevzner, 1989)
  • General case is NP-hard problem
  • Design efficient algorithms
  • Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang. A
    new approach to the reconstruction of DNA
    sequencing by hybridization. Bioinformatics, vol
    19(1), pages 14-21, 2003.
  • Xiang-Sun Zhang, Ji-Hong Zhang and Ling-Yun Wu.
    Combinatorial optimization problems in the
    positional DNA sequencing by hybridization and
    its algorithms. System Sciences and Mathematics,
    vol 3, 2002. (in Chinese)
  • Ling-Yun Wu, Ji-Hong Zhang and Xiang-Sun Zhang.
    Application of neural networks in the
    reconstruction of DNA sequencing by
    hybridization. In Proceedings of the 4th ISORA,
    2002.
Write a Comment
User Comments (0)
About PowerShow.com