Automated sequencing machines, - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Automated sequencing machines,

Description:

Automated sequencing machines, particularly those made by PE Applied Biosystems, use 4 colors, so they can read all 4 bases at once. – PowerPoint PPT presentation

Number of Views:129
Avg rating:3.0/5.0
Slides: 47
Provided by: Pev2
Category:

less

Transcript and Presenter's Notes

Title: Automated sequencing machines,


1
  • Automated sequencing machines,
  • particularly those made by PE Applied
    Biosystems, use 4 colors, so they can read all 4
    bases at once.

2
All the Genes?
  • Any human gene can now be found in the genome by
    similarity searching with over 95 certainty.
  • However, the sequence still has many gaps
  • unlikely to find an uninterrupted genomic segment
    for any gene
  • still cant identify pseudogenes with certainty
  • This will improve as more sequence data
    accumulates

3
Finding Genes in genome Sequence is Not Easy
  • About 2 of human DNA encodes functional genes.
  • Genes are interspersed among long stretches of
    non-coding DNA.
  • Repeats, pseudo-genes, and introns confound
    matters

4
Impact on Bioinformatics
  • Genomics produces high-throughput, high-quality
    data, and bioinformatics provides the analysis
    and interpretation of these massive data sets.
  • It is impossible to separate genomics laboratory
    technologies from the computational tools
    required for data analysis.

5
Completed genome projects
Eukaryotes 9 In progress (partial) Anopheles
gambiae Danio rerio (zebrafish) Arabidopsis
thaliana Glycine max (soybean) Caenorhabditis
elegans Hordeum vulgare (barley)
Drosophila melanogaster Leishmania major
Encephalitozoon cuniculi Rattus norvegicus
Guillardia theta nucleomorph Plasmodium
falciparum Saccharomyces cerevisiae
(yeast) Schizosaccharomyces pombe Bacteria
132 Archaea 16 Viruses 1413
6
Six basic questions about genomes
1 how is a genome sequenced? 2 when is the
project finished? 3 sequence one individual or
many? 4 what information is in the DNA? 5 how
many genes are in the genome? 6 how can whole
genomes be compared?
7
1 Genome projects sequencing strategies
Hierarchical shotgun method Assemble contigs from
various chromosomes, then sequence and assemble
them. A contig is a set of overlapping clones or
sequences from which a sequence can be obtained.
The sequence may be draft or finished. A contig
is thus a chromosome map showing the locations of
those regions of a chromosome where
contiguous DNA segments overlap. Contig maps are
important because they provide the ability to
study a complete, and often large segment of the
genome by examining a series of overlapping
clones which then provide an unbroken succession
of information about that
region. Scaffold an ordered set of contigs
placed on a chromosome.
Shotgun An approach used to decode an organism's
genome by shredding it into smaller fragments of
DNA which can be sequenced individually. The
sequences of these fragments are then ordered,
based on overlaps in the genetic code, and
finally reassembled into the complete sequence.
The 'whole genome shotgun' method is applied to
the entire genome all at once, while the
'hierarchical shotgun' method is applied to
large, overlapping DNA fragments of known
location in the genome.
http//www.genome.gov/glossary.cfm
8
3. Whole Genome Shotgun Sequencing
genome
forward-reverse linked reads
9
ARACHNE Whole Genome Shotgun Assembly
http//www-genome.wi.mit.edu/wga/
10
2 When is the project finished?
Get five to ten-fold coverage
Finished sequence a clone insert is
contiguously sequenced with high quality standard
of error rate 0.01. There are usually no gaps in
the sequence. Draft sequence clone sequences
may contain several regions separated by gaps.
The true order and orientation of the pieces may
not be known.
11
(No Transcript)
12
Repetitive DNA sequences five classes
1 Interspersed repeats transposon-derived
repeats -- 45 of human genome LTR, SINE,
LINE 2 Processed pseudogenes 3 Simple
sequence repeats -- micro- and
minisatellites -- ACAAACT, 11 million times in a
Drosophila -- Human genome has 50,000 CA
dinucleotide repeats 4 Segmental duplications
(about 5 of human genome) 5 Tandem repeats
(e.g. telomeres, centromeres)
13
  • LINE and SINE repeats. A LINE (long interspersed
    nuclear element) encodes a reverse transcriptase
    (RT) and perhaps other proteins. Mammalian
    genomes contain an old LINE family, called LINE2,
    which apparently stopped transposing before the
    mammalian radiation, and a younger family, called
    L1 or LINE1, many of which were inserted after
    the mammalian radiation (and are still being
    inserted). A SINE (short interspersed nuclear
    element) generally moves using RT from a LINE.
    Examples include the MIR elements, which
    co-evolved with the LINE2 elements. Since the
    mammalian radiation, each lineage has evolved its
    own SINE family. Primates have Alu elements and
    mice have B1, B2, etc. The process of insertion
    of a LINE or SINE into the genome causes a short
    sequence (7-21 bp for Alus) to be repeated, with
    one copy (in the same orientation) at each end of
    the inserted sequence. Alus have accumulated
    preferentially in GC-rich regions, L1s in GC-poor
    regions.

14
What is the function of nongenic DNA?
  • Hypotheses
  • Nongenic DNA performs essential functions, such
    as
  • regulation of gene expression.
  • Nongenic DNA is inert, genetically and
    physiologically.
  • Excess DNA is incidental and is called junk
    DNA.
  • Nongenic DNA is a functional parasite or selfish
    DNA
  • (retrotransposons).
  • Nongenic DNA has a structural function.

15
5 How many genes are in the a genome?
This depends how a gene is defined (e.g.
protein- coding versus noncoding) It also
depends what methods are used to find genes, and
what criteria are applied to determine
whether they are real (functional).
16
Clasificación del ADN
  • FUNCIONAL (secuencias que cumplen una función)
  • - Codante (se traducen en proteínas)
  • -No codante (no se traducen)
  • Transcrito (cumple función a nivel de RNA
    subun. ribos.)
  • No transcrito (cumple función a nivel de
    DNA intrón, promotor,
    enhancer, etc.)
  • NO-FUNCIONAL (secuencias que no cumplen ninguna
    función Junk DNA basura)

17
Gene-finding algorithms
Homology-based searches (extrinsic) Rely on
previously identified genes Algorithm-based
searches (intrinsic) Investigate nucleotide
composition, open- reading frames, and other
intrinsic properties of genomic DNA
18
DNA
RNA
intron
Mature RNA
protein
19
Homology-based searching compare DNA to
expressed genes (ESTs)
DNA
RNA
intron
RNA
protein
20
DNA
RNA
Algorithm-based searching compare DNA in
exons (unique codon usage) to introns (unique
splices sites) to noncoding DNA. Identify open
reading frames (ORFs).
21
(No Transcript)
22
(No Transcript)
23
5 How many genes are in the human genome?
One answer is about 30,000. BUT how many genes?
-- A lot more than a fungus (6,000) -- Somewhat
more than a fly (13,000) or a worm (19,000) --
About the same as a plant (Arabidopsis,
25,000) -- Two groups estimate 30,000 to 35,000,
but there is only partial overlap in their
gene lists! -- One Drosophila gene potentially
yields 38,000 distinct proteins by
alternative splicing. -- A microarray-based
survey of chromosomes 21, 22 finds 10 times
more transcripts than are annotated
24
6 how can whole genomes be compared?
-- molecular phylogeny -- You can BLAST (or
PSI-BLAST) all the DNA and/or protein in one
genome against another -- We looked at TaxPlot
and COG for bacterial (and for some
eukaryotic) genomes -- PipMaker and other
programs align large stretches of genomic DNA
from multiple species
25
Resources to study the human genome
NCBI www.ncbi.nlm.nih.gov The Sanger
Institute/European Bioinformatics
Institute www.ensembl.org UCSC Genome
Bioinformatics Site http//genome.ucsc.edu/
26
Top ten challenges for bioinformatics
1 Precise models of where and when
transcription will occur in a genome
(initiation and termination) 2 Precise models
RNA splicing 3 Precise models of signal
transduction pathways ability to predict
cellular responses to external stimuli 4
Determining proteinDNA, proteinRNA,
proteinprotein recognition codes 5
Accurate ab initio protein structure prediction
27
Top ten challenges for bioinformatics
6 Rational design of small molecule inhibitors
of proteins 7 Mechanistic understanding of
protein evolution 8 Mechanistic understanding
of speciation 9 Development of effective gene
ontologies systematic ways to describe
gene and protein function 10 Education
development of bioinformatics curricula
28
Comparative GenomicsUsing ACTThe Artemis
Comparison Tool
29
Artemis comparison tool ACT
  • Based on artemis and coded in java.
  • Allows visualisation of two sequences or more and
    a comparison file.
  • The comparison file can be BLASTn or tBLASTx.
  • Retains all the functionality of artemis.

30
The ACT Display
genome1
Zoom scroll bar
Filter scroll bar
genome2
Genome2
Blast HSPs
genome3
31
Running ACT
Sequence 1
Sequence 2
BLASTn tBLASTx
MSPcrunch
Reformat
32
ACT
  • Designed for looking at complete bacterial
    genomes.

33
Knowlesi contgs
tblastx
Falciparum Chr 3
tblastx
Yoelii Contigs (TIGR)
34
(No Transcript)
35
Orthologue Paralogue
  • Orthologue- homologous genes with identical
    function in different organisms.
  • Paralogue- homologous genes in the same organism
    originated from gene duplication.

36
Orthologue Paralogue
Gene A
37
Orthologue Paralogue
38
Orthologue Paralogue
39
Orthologue Paralogue
Species 1
Species 2
Gene A
Gene B
40
AG-FMVZ-USP
41
(No Transcript)
42
(No Transcript)
43
T. brucei vs L. major (cont.)
44
T. brucei vs T. cruzi
45
L. major has break in synteny that is conserved
in T. brucei and T. cruzi
T. cruzi Chr3.
T. Brucei chr1
L. Major chr12
T. Brucei chr6
46
Software
  • www.sanger.ac.uk/Software/Artemis
  • www.sanger.ac.uk/Software/ACT
  • www.genome.nghri.nih.gov/blastall
  • www.cgr.ki.se/cgr/goups/sonnhammer/MSPcrunch.html
Write a Comment
User Comments (0)
About PowerShow.com