What is bioinformatics? - PowerPoint PPT Presentation

About This Presentation
Title:

What is bioinformatics?

Description:

What is bioinformatics? Long Definition: The study of the application of computer and statistical techniques to the management of biological information, including ... – PowerPoint PPT presentation

Number of Views:382
Avg rating:3.0/5.0
Slides: 30
Provided by: imtechRes
Category:

less

Transcript and Presenter's Notes

Title: What is bioinformatics?


1
What is bioinformatics?
  • Long Definition The study of the application of
    computer and statistical techniques to the
    management of biological information, including
    development of methods to search databases
    quickly, to analyze DNA sequence information, and
    to predict protein sequence and structure from
    DNA sequence data.
  • Short Definition The management, analysis, and
    visualization of molecular, cellular, and genomic
    information.

2
Computational Biology
Molecular Biology
Bioinformatics
Genomics
Computer Science
3
  • Genomics-what is it?
  • Development and application of genetic mapping,
    sequencing, and computation (bioinformatics) to
    analyze the genomes of organisms.
  • Sub-fields of genomics
  • Structural genomics-genetic and physical mapping
    of genomes.
  • Functional genomics-analysis of gene function
    (and non-genes).
  • Comparative genomics-comparison of genomes across
    species.
  • Includes structural and functional genomics.
  • Evolutionary genomics.

4
COMPARATIVE GENOMICS
  • Brief Review

5
Definition
  • A comparison of gene numbers , gene locations
    biological functions of gene, in the genomes of
    different organisms, one objective being to
    identify groups of genes that play a unique
    biological role in a particular organism.

6
Few Terminologies
  • Homology - Homology is the relationship of any
    two characters ( such as two proteins that have
    similar sequences ) that have descended, usually
    through divergence, from a common ancestral
    character. Homologues are thus components or
    characters (such as genes/proteins with similar
    sequences) that can be attributed to a common
    ancestor of the two organisms during evolution.

7
Homologoues can either be orthologues, paralogues
or xenologues.
  • Orthologues are homologues that have evolved from
    a common ancestral gene by speciation. They
    usually have similar functions.
  • Paralogues are homologues that are related or
    produced by duplication within a genome followed
    by subsequent divergence. They often have
    different functions.
  • Xenologues are homologous that are related by an
    interspecies (horizontal transfer) of the genetic
    material for one of the homologues. The functions
    of the xenologues are quite often similar.

8
Analogues
  • Analogues are non-homologues genes/proteins that
    have descended convergently from an unrelated
    ancestor. They have similar functions although
    they are unrelated in either sequence or
    structure.

9
Comparative Genomics
Two very large problems
are immediately apparent in undertaking the
sequencing of entire genomes. First, the vast
numbers of species and the much larger size of
some genomes makes the entire sequencing of all
genomes a non-optimal approach for understanding
genome structure. Second, within a given species
most individuals are genetically distinct in a
number of ways. What does it actually mean, for
example, to "sequence a human genome"? The
genomes of two individuals who are genetically
distinct differ with respect to DNA sequence by
definition. These two problems, and the
potential for other novel applications, have
given rise to new approaches which, taken
together, constitute the field of comparative
genomics.
10
Because all modern genomes have arisen from
common ancestral genomes, the relationships
between genomes can be studies with this fact in
mind. This commonality means that information
gained in one organism can have application in
other even distantly related organisms.
Comparative genomics enables the application of
information gained from facile model systems to
agricultural and medical problems. The nature and
significance of differences between genomes also
provides a powerful tool for determining the
relationship between genotype and phenotype
through comparative genomics and morphological
and physiological studies.
11
The Role of Bioinformatics in Identification of
Drug Targets from Bacterial and Fungal
GenomesDr. Andrew E. DePristo, Director of
Bioinformatics, Genome Therapeutics
CorporationBacterial genomes are appearing at an
ever-increasing rate, with a September 1999
listing by NCBI indicating 16 completed, 10 being
annotated, and 55 being sequenced. Fungal genomes
and proteomes are less prevalent with one
complete, a few nearly complete, and large
collections of cDNA sequences available for about
five organisms. This presentation will discuss
use of this bacterial and fungal genomic
diversity, along with high-throughput
bioinformatics tools, to attach confidence to
certain functional predictions and to allow
identification and targeting of essential genes
that are unique to specific organisms.
12
Methods (WET) Introduction A DNA walk of a
genome represents how the frequency of each
nucleotide of a pairing nucleotide couple changes
locally. This analysis implies measurement of the
local distribution of Gs in the content of GC and
of Ts in the content of TA. Lobry was the first
to propose this analysis (1996, 1999). Two
complementary representations can be derived from
the DNA walk the cumulative TA- and the GC-skew
analysis.  Aim By reading these description of
the algorithm, a reader not trained in genomics
is able to redraw our graphs, using the basic
genometric data file that is posted on our web
resource for each organism as a zip file (.zip).
13
1) DNA walk 1.1) Drawing a DNA walk by reading a
sequence file nucleotide by nucleotide. A simple
algorithm is used to draw a DNA walk by simply
assigning a direction to each nucleotide. We
propose the following assignment, slightly
different from Lobry's to T, C, A, and G
correspond the E(ast), S(outh), W(est), and
N(orth) directions, respectively (Lobry, 1999).
Reading the nucleotide sequence nucleotide by
nucleotide, and following the rule, a path
clearly emerges on the graph Figure 1.
                                                                                                                                                                                            
Figure 1 DNA walk of the sequence  GTCTGGTGTCTGGAGTTCCTGGGTCTTGAGACCACAGGACCCACCAGGGACCCAGGACCC Starting from the bottom left (bold blue line), the curve end at the bottom left (pink line)
14
1.2) Drawing a DNA walk by slicing a sequence
file nucleotide into small windows A simple way
to draw quickly this kind of graph is suggested
by Lobry (1996) by cutting a genome into windows
of equal length.
                                            
Figure 2 DNA walk of the same sequence as the one presented in Figure 1 GTCTGGTGTCTGGAGTTCCTGGGTCTTGAGACCACAGGACCCACCAGGGACCCAGGACCC The sequence was sliced into 5-nucleotide windows. Only the fifth nucleotide per window is plotted. We can also work with the mean values of the window
Comment this method is not as precise as the
first one. We could use it with a spreadsheet
software without affecting the final resolution
of the curve at the genome level.  
15
2) The cumulative TA- and the GC-skew
analyses. 2.1) Drawing a cumulative TA- or a
GC-skew analysis by reading a sequence file
nucleotide by nucleotide. Cumulative TA-skew
analysis Assign to each nucleotide the following
direction to A, T, C, and G correspond the S, N,
nd (no direction), and nd directions,
respectively. On the graph, after the reading of
one nucleotide, the pointer has to go one step
eastward. If a A, or T, is read, a further step
is added, southward, or northward, respectively.
                                           
16
Cumulative GC-skew analysis Assign to each
nucleotide the following direction to A, T, C,
and G correspond the nd, nd, S, and N directions,
respectively. On the graph, after reading one
nucleotide, the pointer has to move one step
eastward. If a C, or G, is read, a further step
is added, southward, or northward, respectively.
                                              
17
Methods (dry)
  • Bioinformatics.
  • Its tools (software)

18
Computational analysis in drug target discovery
  • Shannon entropy is a measure of variation or
    change over a time series.Genes that exhibit
    significant changes are regarded as good target
    candidates.
  • Clustering is a method for grouping patterns by
    similarities in their shapes.

19
(No Transcript)
20
(No Transcript)
21
GCG History (tools)
Founded in 1982 as a service of the Department
of Genetics at the University of Wisconsin, GCG
became a private company in 1990 and was acquired
by Oxford Molecular Group in 1997. The company
was one of the pioneers of bioinformatics and its
Wisconsin Package sequence analysis tools are
widely used and well regarded throughout the
pharmaceutical and biotechnology industries and
in academia. To support enterprise bioinformatics
efforts, GCG developed SeqStore, its Oracle-based
data management system. Desktop solutions are
delivered to bench scientists through products
such as MacVector and OMIGA
22
GCG Wisconsin Package
Molecular biologists worldwide use the GCG
Wisconsin Package as their software of choice
for comprehensive sequence analysis. The
Wisconsin Package meets research needs across
disciplines, project teams, and labs to provide
an enterprise-wide solution. Based on published
algorithms from the fields of mathematical and
computational biology, the Package includes tools
for
                                                                                                                         SeqLab, free with the Wisconsin Package, provides a graphical interface to the Package's analysis tools plus project management capabilities. SeqLab's Editor (shown above) enables you to enter sequences, view multiple sequence alignments, as well as select the sequence ranges to analyze.
Comparison Database Searching and Retrieval
DNA/RNA Secondary Structure Editing and
Publication Evolution Fragment Assembly Gene
Finding and Pattern Recognition Importing and
Exporting Mapping Primer Selection Protein
Analysis Translation
  • Molecular biologists worldwide use the GCG
    Wisconsin Package as their software of choice
    for comprehensive sequence analysis. The
    Wisconsin Package meets research needs across
    disciplines, project teams, and labs to provide
    an enterprise-wide solution. Based on published
    algorithms from the fields of mathematical and
    computational biology, the Package includes tools
    for
  • Comparison
  • Database Searching and Retrieval
  • DNA/RNA Secondary Structure
  • Editing and Publication
  • Evolution
  • Fragment Assembly
  • Gene Finding and Pattern Recognition
  • Importing and Exporting
  • Mapping
  • Primer Selection
  • Protein Analysis
  • Translation

23
PAUP version 4.0 is a major upgrade and new
release of the software package for inference of
evolutionary trees, for use in Macintosh,
Windows, UNIX/VMS, or DOS-based formats. The
influence of high-speed computer analysis of
molecular, morphological and/or behavioral data
to infer phylogenetic relationships has expanded
well beyond its central role in evolutionary
biology, now encompassing applications in areas
as diverse as conservation biology, ecology, and
forensic studies. The success of previous
versions of PAUP Phylogenetic Analysis Using
Parsimony has made it the most widely used
software package for the inference of
evolutionary trees
24
Target Validation
  • Target validation involves taking steps to prove
    that a DNA, RNA, or protein molecule is directly
    involved in a disease process and is therefore a
    suitable target for development of a new
    therapeutic compound.
  • Genes that do not belong to an established
    family are critical to many disease processes and
    also need to be validated as potential drug
    targets.

25
Target validation identification
  • Computer based Drug- design- Beginning with the
    protein engineering and analysis tools we can
    identify and evaluate the target. Then, with that
    information we may attack the target with a
    variety of tools to identify new and novel drug
    candidates. The complete suite of software
    products provides for a seamless environment to
    work more efficiently quickly.

26
Target validation identification
  • Computational component analyzes genomic
    sequences resulting in 3D and functional
    annotations. Once annotated, sequences can be
    identified as potential drug targets for
    development.
  • X-ray crystallography has become a central tool
    in modern drug and target discovery.
  • These annotations, made from knowledge of
    predicted protein structure, are an important
    component in identifying potential targets,
    thereby facilitating successful and competitive
    drug discovery.

27
Outcomes/ Benefits
  • Provides first pass information on the function
    of the putative protein based on the existence of
    conserved protein sequence motifs.
  • Advancements in computer software technologies
    (Bioinformatics) has made comparative analysis of
    genomes an extremely powerful approach for
    functional genomics too.
  • These studies can also reveal insights into the
    recruitment of enzymes in a pathway

28
Outcomes/ Benefits
  • It will help us to understand the genetic
    basis of diversity in organisms, both speciation
    variation, events that are important aspects of
    evolutionary biology.
  • Comparative genomics provides a powerful way
    in which to analyze sequence data.
  • Indeed, there is already a long list of
    'model' organisms, which allow comparative
    analyses in a variety of ways.

29
Outcomes/ Benefits
  • The very small vertebrate genome of the
    pufferfish provides a simple and economical way
    of comparing sequence data from mammals and fish,
    representing a large evolutionary divergence and
    so permitting the identification of essential
    elements that are still present in both species.
  • These elements include genes and the associated
    machinery that controls their expression
    elements that, in many cases, have survived the
    test of time
Write a Comment
User Comments (0)
About PowerShow.com