Data Acquisition Tools - PowerPoint PPT Presentation

About This Presentation

Title:

Data Acquisition Tools

Description:

DNA sequencing is performed using an automated version of ... Affinity methods. Affinity chromatography. Co-immunoprecipitation. Molecular and atomic methods ... – PowerPoint PPT presentation

Number of Views:52

Avg rating:3.0/5.0

Slides: 37

Provided by: tvisw

Category:

more less

Transcript and Presenter's Notes

Title: Data Acquisition Tools

1

Data Acquisition Tools Techniques

2
In this presentation

Part 1 Sequencing Technology
Part 2 Genomic Databases

3
Part1

Sequencing Technology

4
Principles of DNA sequencing

DNA sequencing is performed using an automated
version of the chain termination reaction, in
which limiting amounts of dideoxyribonucleotides
generate nested sets of DNA fragments with
specific terminal bases
Four reactions are set up, one for each of the
four bases in DNA, each incorporating a different
fluorescent label
The DNA fragments are separated by PAGE and the
sequence is read by a scanner as each fragment
moves to the bottom of the gel

5
Types of DNA sequencing

DNA sequences come in three major forms
Genomic DNA comes directly from the genome and
includes extragenic material as well as genes.
In eukaryotes, genomic DNA contains introns
cDNA is reverse-transcribed from mRNA and
corresponds only to the expressed parts of the
genome. It does not contain introns
Recombinant DNA comes from the laboratory and
comprises artificial DNA molecules such as
cloning vectors

6
Genome sequencing strategies

Only short DNA molecules (800 bp) can be
sequenced in one read, so large DNA molecules,
such as genomes, must first be broken into
fragments. Genome sequencing can be approached in
two ways
Shotgun sequencing involves the generation of
random DNA fragments, which are sequenced in
large numbers to provide genome-wide coverage
Clone contig sequencing involves the systematic
production and sequencing of subclones

7
Sequence quality control

High quality sequence data is generated by
performing multiple reads on both DNA strands
Preliminary trace data is then base called and
assessed for quality using a program such as
Phred
Vector sequences and repeated DNA elements are
masked off and then the sequence is assembled
into contigs using a program such as Phrap
Remaining inconsistencies must be addressed by
human curators

8
Single-pass sequencing

Sequence data of lower quality can be generated
by single reads (single-pass sequencing)
Although somewhat inaccurate, single-pass
sequences such as ESTs and GSSs can be generated
in large amounts very quickly and inexpensively

9
RNA sequencing

Most RNA sequencing are deduced from the
corresponding DNA sequences but special methods
are required for the identification of modified
nucleotides. These include biochemical assays,
NMR spectroscopy and MS

10
Protein sequencing

Most protein sequencing is now-a-days carried out
by MS, a technique in which accurate molecular
masses are calculated from the mass/charge ration
of ions in a vacuum
Soft ionization methods allow MS analysis of
large macromolecules such as proteins
Sequences can be deduced by comparing the masses
of tryptic peptide fragments to those predicted
from virtual digests of proteins in databases
Also, de novo sequencing can be carried out by
generating nested sets of peptide fragments in a
collision cell and calculating difference in mass
between fragments differing in length by a single
amino acid residue

11
Importance of protein interactions

They underlie most cellular functions.
Protein-protein interactions result in formation
of transient or stable multi-subunit complexes
Understanding of these complexes is required for
functional annotation of proteins and is a step
towards the elucidation of molecular pathways
such as signaling cascades and regulatory
networks
Protein interactions with nucleic acids form an
important area of study, since such interactions
are required for replication, transcription,
recombination, DNA repair and many other
processes. Proteins also interact with small
molecules, which act as ligands, substrates,
cofactors and allosteric regulators

12
Methods for protein interactions

Genetic methods
Suppressor mutant
Synthetic lethal effect
Dominant negative mutations
Affinity methods
Affinity chromatography
Co-immunoprecipitation

Molecular and atomic methods
X-ray crystallography
NMR spectroscopy
Other methods
FRET
SPR spectroscopy
SELDI
Library-based methods
Y2H system

13
Other methods

For larger proteins that do not readily form
crystals, alternative analytical methods are
required to deduce structures
These include X-ray fiber diffraction, electron
microscopy and circular dichroism (CD)
spectroscopy

14
Protein structure determination

X-ray crystallography
NMR spectroscopy
Other methods
X-ray fiber diffraction
Electron microscopy
CD spectroscopy

15
X-ray crystallography

Involves determination of protein structure by
studying diffraction pattern of X-rays through a
precisely orientated protein crystal
They way in which X-rays are scattered depends on
the electron density and spatial orientation of
the atoms in the crystal
A mathematical method called the Fourier
transform is used to reconstruct electron density
maps from the diffraction data allowing
structural models to be built

16
NMR spectroscopy

NMR is a property of certain atoms that can
switch between magnetic states in an applied
magnetic field by absorbing electromagnetic
radiation
The nature of absorbance spectrum is influenced
by the type of atom and its chemical context, so
that NMR spectroscopy can discriminate between
different chemical groups
NMR spectra are also modified by the proximity of
atoms in space
Analysis of NMR spectra allows 3D configuration
of atoms to be reconstructed, resulting in a
series of structural models
The technique is suitable only for the analysis
of small, soluble proteins

17
2-D gel electrophoresis

The current method for studying proteins consists
in part of a technique called two dimensional gel
electrophoresis, which separates proteins by
charge and size
In the technique, researchers squirt a solution
of cell contents onto a narrow polymer strip that
has a gradient of acidity. When the strip is
exposed to an electric current, each protein in
the mixture settles into a layer according to its
charge. Next, the strip is placed along the edge
of a flat gel and exposed to electricity again.
As the proteins migrate through the gel, they
separate according to their molecular weight.
What results is a smudgy patterns of dots, each
of which contains a different protein
In academic laboratories, scientists generally
use a tool similar to a hole puncher to cut the
protein spots from 2-D gels for individual
identification by another method, mass
spectroscopy
Now-a-days, companies have started using robots
to do it

18
Part2

Genomic Databases

19
Types of databases

There are many types of databases available for
researchers in the field of biology
Primary sequence databases - for storage of raw
experimental data
Secondary databases - contain information on
sequence patterns and motifs
Organism specific databases
Other databases

20
Primary sequence databases

Three primary sequence databases are GenBank
(NCBI), the Nucleotide Sequence Database (EMBL)
and the DNA Databank of Japan (DDBJ)
These are repositories for raw sequence data, but
each entry is extensively annotated and has
features table to highlight the important
properties of each sequence
The three databases exchange data on a daily basis

21
Subsidiary sequence databases

Particular types of sequence data are stored in
subsidiaries of the main sequence databases. For
instance, ESTs are stored in dbEST, a division of
GenBank
There are also subsidiary databases for GSSs and
unfinished genomic sequence data

22
Organism specific resource

As well as general databases that serve the
entire biology community, there are many organism
specific databases that provide information and
resources for those researches working on
particular species
The number of such databases is growing as more
genome projects are initiated, and many can be
accessed from general genomics gateway sites such
as GOLD

23
Organism-specific genomic databases
Organism Database/resource URL
Escherichia coli EcoGene EcoCyc (Encyclopedia of E. coli genes and metabolism Colibri http//bmb.med.miami.edu/EcoGene/EcoWeb http//ecocyc.pangeasystems.com/ecocyc/ecocyc.html http//genolist.pasteur.fr/Colibri
Bacillus subtilis SubtiList http//genolist.pasteur.fr/SubtiList
Saccharomyces cerevisiae Saccharomyces Genome Database (SGD) http//genome-www.stanford.edu/Saccharmyces
Plasmodium falciparum PlasmoDB http//PlasmoDB.org
Arabidopsis thaliana MIPS Arabidopsis thaliana Database (MAtDB) The Arabidopsis information resource (TAIR) http//mips.gsf.de/proj/thal/db http//www.arabidopsis.org
Drosophila melanogaster FlyBase http//flybase.bio.indiana.edu
Caenorhabditis elegans A C. elegans DataBase (ACeDB) http//www.acedb.org
Mouse Mouse Genome Database (MGD) http//www.informatics.jax.org
Human OnLine Mendelian Inheritance in Man (OMIM) http//www.ncbi.nlm.nih.gov/omim
24
Finding organism-specific databases

Organism specific databases are widely
distributed on the Internet
In order to find and interrogate databases on
specific organisms, it is necessary to use a
gateway site to access relevant databases and
information resources
Worked examples are provided, using GOLD as the
gateway and illustrated with Ebola virus, the
bacterium E. coli, the fruit fly Drosophila
melanogaster and the human genome

25
Useful gateway sites providing information on
multiple, organism and genomic resources
Gateway site URL
NCBI Genomic Biology www.ncbi.nlm.nih.gov/Genomes/
GOLD (Genomes OnLine Database) wit.integratedgenomics.com/GOLD
Organism specific genomic databases www.unl.edu/stc-95/ResTools/biotools/biotools10.html
TIGR Microbial Database www.tigr.org/tdb/mdb/mdbcomplete.html
Bacterial genomes genolist.pasteur.fr
Yeast database genome-www.stanford.edu/Saccharomyces/yeast_info.html
EnsEMBL genome database project www.ensembl.org
MIPS (Munich Information Centre for Protein Sequences) mips.gsf.de
26
Nematode
Bakers Yeast Cells
27
Other databases

Specialized sequence databases for storage and
analysis of particular types of sequences e.g.,
rRNA and tRNA, introns, promoters and other
regulatory elements
OMIM for study of human genetics and molecular
biology
Incyte and UniGene for providing gene sequences
and transcripts with expert annotation for use in
drug design and research
Structural databases for protein structural
data (e.g. PDB, MMDB) containing X-ray Crys.
and NMR studies
Proteins and higher order functions to store
information on particular types of proteins such
as receptors, signal transduction components,
regulatory hierarchies and enzymes
Literature databases to store scientific
articles with text search facility (e.g. Medline
and PubMED)

28
Database tools for displaying and annotating
genomic sequence data
Viewer format URL
Artemis www.sanger.ac.uk/Software/Artemis
ACeDB www.acedb.org/Tutorial/brief-tutorial/shtml
Apollo www.ensembl.org/apollo
EnsEMBL www.ensembl.org
NCBI map viewer www.ncbi.nlm.nih.gov
GoldenPath genome.ucsc.edu
29
Database formats

There is no universally agreed format for genome
databases and several viewers and browsers have
been developed with graphical displays for
genomic sequence analysis and annotation
One of the most versatile formats is ACeDN
(originally designed for the nematode C.
elegans), which has an object-oriented database
architecture and is now used in many applications
outside the field of genomic bioinformatics

30
Common formats

There are several conventions for representing
nucleic acid and protein sequences, of which the
following are widely used
NBRF/PIR
FASTA
GDE
These formats have limited facilities for
comments, which must include a unique identifier
code and sequence accession number

31
Formats for multiple sequence alignment

There are separate formats for multiple sequence
alignment representation, of which the following
are popular
MSF
PHYLIP
ALN

32
Files of structural data

Structural data are maintained as flat files
using the PDB format
Such files contain orthogonal atomic co-ordinates
together with annotations, comments and
experimental details

33
Submission of sequences

Sequences may be submitted to any of the three
primary databases using the tools provided by the
database curators
Such tools include WebIn and BankIt, which can be
used over the Internet, and Sequin, a stand-alone
application

34
Database interrogation

All the databases discussed above can be searched
by sequence similarity
However, detailed text-based searches of the
annotations are also possible using tools such as
Entrez
The simplest way to cross-reference between the
primary nucleotide sequence databases and
SWISS-PROT is to search by accession number, as
this provides an unambiguous identifier of genes
and their products

35
Databases covered by Entrez
Category Database
Nucleic acid sequences Entrez nucleotides sequences obtained from GenBank, RefSeq and PDB
Protein sequences Entrez protein sequences obtained from SWISS-PROT, PIR, PRF, PDB and translations from annotated coding regions in GenBank and RefSeq
3D structures Entrez Molecular Modeling Database (MMDB)
Genomes Complete genome assemblies from many sources
PopSet From GenBank, set of DNA sequences that have been collected to analyze the evolutionary relatedness of a population
OMIM OnLine Mendelian Inheritance in Man
Taxonomy NCBI Taxonomy Database
Books Bookshelf
ProbeSet Gene Expression Omnibus (GEO)
3D domains Domains from the Entrez MMDB
Literature PubMED
36
Databases covered by DBGET/LinkDB
Category Database
Nucleic acid sequences GenBank, EMBL
Protein sequences SWISS-PROT, PIR, PRF, PDBSTR
3D structures PDB
Sequence motifs PROSITE, EPD, TRANSFAC
Enzyme reactions LIGAND
Metabolic pathways PATHWAY
Amino acid mutations PMD
Amino acid indices AAindex
Genetic diseases OMIM
Literature LITDB Medline
Organism-specific gene catalogs E. coli, H. influenzae, M. genitalium, M. pneumoniae, M. jannashii, Synechocystis, S. cerevisiae

Write a Comment

User Comments (0)