Bioinformatics: Buzzword or Discipline (???) - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Bioinformatics: Buzzword or Discipline (???)

Description:

Genes make up. only 3% of the genome. 30,000. Genome Sizes. Human 3.0 x 109 base pairs ... Expected number of contigs first increases, then decreases with coverage. ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 14
Provided by: kim123
Learn more at: http://www.stat.rice.edu
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics: Buzzword or Discipline (???)


1
Bioinformatics Buzzword or Discipline (???)
2
Outline of the course
  • Analysis of one DNA sequence Shotgun sequencing,
    Markov-Chain modeling, patterns and repeats.
  • Analysis of multiple DNA or protein sequences
    Dynamic programming alignments, substitution
    matrices.
  • BLAST Algorithm for sequence retrieval and
    comparison.
  • Refresher on Markov Chains Capsule theory,
    Markov-Chain Monte Carlo algorithms.
  • Hidden Markov Models Viterbi Algorithm and its
    applications.
  • Evolutionary Models Models of nucleotide
    mutation and substitution, recombination and
    genetic drift, with applications to genome
    evolution and gene mapping.
  • Molecular phylogenetics (tree making) distance
    matrix, maximum likelihood and parsimony.
  • Special topics Gene and protein networks,
    analysis of DNA-microarray data,

3
30,000
Genes make up only 3 of the genome
BCM- HGSC
4
Genome Sizes
Human 3.0 x 109 base pairs
Mouse 3.0 x 109
Drosophila 1.1 x 108
Worm 1.0 x 108
Dictyostelium 3.4 x 107
Yeast 1.2 x 107
Bacteria 1.0 - 5.0 x 106
5
Shotgun Sequencing
High Accuracy Sequence lt 1 error/ 10,000 bases
6
The Human Genome 3 Billion Base PairsWhole
Genome Shotgun Strategy
Genome 3 billion bases
Libraries of clones 3kb, 10kb, 50kb base pairs
DNA sequence reads 500 bases each
AGGCTCACTG
BCM- HGSC
7
Statistical issues in shotgun strategy
  • Model for the random fragments Binomial/Poisson
    process
  • Coverage of sequence by random fragments
  • Mean number of contigs
  • Mean size of contigs
  • Coverage by anchored contigs

8
Binomial/Poisson Process
  • N fragments, of length L each, randomly scattered
    in the interval of length G.
  • Coverage a NL/G
  • Contig Union of overlapping fragments. We want
    to have them cover as much of G as possible.
  • Prfrags with left end in (x, x-h) k is
    binomial(N,h/G) or approximately Poisson(Nh/G)
    (when?).

9
Mean number of contigs
  • Econtigs N ? Pra frag is rightmost in a
    contig
  • N ? Prfrag does not include the left end of
    any other frag
  • N ? exp(- NL/G) (aG/L) ? exp(- a)

L 800 G 100,000
10
Mean contig size
  • ES Efrags-1 Einter-epoch distance L

11
Mean contig size
E(S)
a
12
Number of anchored contigs
anchors M frags N a NL/G b ML/G
Eanchored contigs Nb exp(-a)-exp(-b)/(b-a
)
13
Conclusions
  • Expected number of contigs first increases, then
    decreases with coverage.
  • Expected size of contig increases with coverage.
  • Expected number of anchored contigs first
    increases then decreases with anchor density .
  • Attention Computations do not involve boundary
    effects.
Write a Comment
User Comments (0)
About PowerShow.com