BioInformatics 2 - PowerPoint PPT Presentation

About This Presentation
Title:

BioInformatics 2

Description:

Kilobase-scale or better. Methods for low resolution mapping. Somatic cell hybrids (human and mouse or hamster) Fast chromosomal localisation of genes ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 14
Provided by: CCUSe
Category:

less

Transcript and Presenter's Notes

Title: BioInformatics 2


1
BioInformatics (2)
2
Physical Mapping - I
  • Low resolution
  • Megabase-scale
  • High resolution
  • Kilobase-scale or better
  • Methods for low resolution mapping
  • Somatic cell hybrids (human and mouse or hamster)
  • Fast chromosomal localisation of genes
  • Subchromosomal mapping possible
  • Fluorescence in situ hybridisation (FISH)
  • Chromosome painting
  • Fractionation of chromosomes by flow cytometry

3
Physical Mapping - II
  • Methods for high resolution mapping
  • Long-range restriction mapping
  • Pulsed-field gel electrophoresis (PFGE)
  • Assembly of clone contigs
  • The double digest problem
  • Ordering fragments from a 2 restriction enzyme
    digest
  • Sequence Tagged Sites (STSs)
  • Sequence fragments in the genome described
    uniquely
  • by a pair of PCR primers
  • Usually 200-300 bases
  • Very useful as landmarks on the physical map
  • Can be mapped to individual clones by FISH
  • Assembly of STS-content physical maps

4
Physical Mapping - III
  • Map units (human genome)
  • 1 cM 1 Mb
  • 1 cR 30 kb
  • 1 centiRay 1 chance of a radiation-induced
    break between 2 markers
  • Major information resources
  • Stanford Human Genome Center (RH maps)
  • http//www-shgc.stanford.edu
  • Whitehead/MIT Genome Center (STS content maps)
  • http//www-genome.wi.mit.edu/
  • Centre dEtude du Polymorphisme Humaine - CEPH
    (YAC maps)
  • http//www.cephb.fr/bio/ceph-genethon-map.html

5
Physical Mapping - IV
  • Conclusions
  • The value of physical mapping
  • Confirmation of chromosomal location of clones
    and genes
  • Correction of genetic map errors
  • Correlation to genetic map reveals hotand
    cold regions of recombinational activity on
    chromosomes
  • Provides useful information for duplicated
    regions
  • High resolution mapping provides the framework
    necessary for high quality sequencing of large
    genomic regions

6
System for Assembling Markers (SAM)
7
(No Transcript)
8
DNA Sequencing
  • Ordered clone library
  • Sequencing of overlapping clones of known order
    as determined by restriction analysis
  • Advantage
  • Easy ordering of resulting sequence reads
  • Disadvantage
  • Detailed mapping is time-consuming
  • Shotgun sequencing
  • Partial digestion of DNA with a 4-cuter enzyme
  • Sequencing of randomly overlapping clones
  • Computer-aided assembly of reads
  • Advantage
  • Speed
  • Disadvantage
  • High data redundancy due to random sequencing
  • Not suitable for large genomes (gt300 Mb)

9
Assembly of Sequence Contigs
  • The problem
  • Semi-automated assembly of a contiguous DNA
    sequence from overlapping gel readings
  • Steps
  • Base identification
  • Trimming of ends
  • Vector clipping
  • Assembly of fragments
  • Major software packages
  • SequencherTM from GeneCodes Inc., Ann Arbor,
    Michigan
  • Platforms PowerMac, Windows NT
  • Up to 70 kb contigs
  • The Staden package by Staden et al., MRC,
    Cambridge
  • PHRED/PHRAP by Green et al., University of
    Washington, Seattle
  • Platforms Unix
  • Megabase range contigs
  • Mutation detection capabilities

10
Quality Control of Sequence DataSource US DOE
Joint Genome Institute
  • Goals
  • Complete sequence continuity across a target
    region (both within and between clones)
  • No more than one gap in 200 kb
  • Size of all gaps no larger than 1 of the size of
    the total region
  • Allowable gaps include
  • regions unclonable/unstable in conventional
    cloning vectors
  • repetitive regions
  • regions with significant secondary structure or
    abnormally high GC content
  • Gap size measured by PCR or restriction digest
    analysis
  • Accuracy of finished sequence 1 error in 10,000
    bases
  • At least 95 double-strand coverage
  • Assembly Verification
  • a minimum of three independent restriction
    digests
  • reassembly with an independent algorithm
  • re-sequencing of random clones

11
Submission and Annotation of Sequence Data
Source US DOE Joint Genome Institute
  • Size of the starting clone is minimum size of
    submission to public databases
  • 95 of the sequence represented on both strands
  • all ambiguities resolved or annotated
  • missing data from the end of a clone allowed if
    sequence overlap is detected with the adjacent
    clone in the tiling path
  • Level of annotation
  • all sequences annotated in a largely automated
    fashion
  • identification of putative or known genes,
    repetitive elements, EST matches and any other
    useful miscellaneous features
  • computationally-derived predictions must be
    indicated as such
  • Immediate release of finished annotated sequence
  • Global assembly of meta-contigs from previously
    submitted data will be performed periodically

12
International Strategy Meeting on Human Genome
SequencingBermuda, 25th-28th February
1996Sponsored by the Wellcome Trust
  • Summary of agreed principles
  • Primary genomic sequence should be in the public
    domain
  • Primary genomic sequence should be rapidly
    released
  • Assemblies of greater than 1 Kb should be
    automatically released on a daily basis
  • Finished annotated sequence should be immediately
    submitted to the public databases
  • Coordination
  • Large-scale sequencing centres should inform HUGO
    of their intention to sequence particular regions
    of the human genome

13
Annotating the Human Genome Sequence
  • Identification of coding regions
  • Exon/intron prediction
  • High throughput comparison of genomic sequence to
    protein information
  • Full-length protein sequences
  • Databases of protein domains
  • How automated is automated annotation in reality?
  • Advantages
  • High speed
  • Good for tRNA genes, repetitive regions
  • Good for high-scoring matches in databases, but
  • Disadvantages
  • Error propagation can be detrimental
  • Domain recycling in evolution causes
    misinterpretation, e.g. in the case of
    transcription factors similar to peptidases
  • Very computer-intensive task!
Write a Comment
User Comments (0)
About PowerShow.com