Title: Bioinformatics Ch1. Introduction
1BioinformaticsCh1. Introduction
www.ntut.edu.tw/yukijuan/lectures/bioinfo/Oct17.p
pt
2Outline
- A scenario
- Life in space and time
- Dogmas central and peripheral
- Observable and data archives
3Traditional and Current Biology
- Traditionally, biology has been an observational
science. - Now, biology has been converted into deductive
science.
4The Data of Bioinformatics
- Very very large amount
- Nucleotide sequence databanks contain 16 x 109
bases - The full three-dimensional coordinates of
proteins of average length 400 residues 16000
entries - Not only are the individual databanks large, but
their sizes are increasing as a very high rate.
5GenBank
6Goals
- Saw life clearly and saw it whole
- To interrelate sequence, three-dimensional
structure, interactions, and function of
individual proteins, nucleic acids and
protein-nucleic acid complexes
Understand integrative aspects of the biology of
organisms
7Goals
- To deduce events in evolutionary history.
- To support application to medicine, agriculture
and other scientific fields.
8A Scenario
9Imagine a Crisis (1)
- A new biological virus creates an epidemic of
fatal disease in humans or animals
Laboratory scientists will isolate its genetic
material-a molecule of nucleic acid and determine
the sequence.
Computer program will then take over
10Imagine a Crisis (2)
Screening this new genome against a data bank of
all know genetic messages
Developing antiviral therapies virus contain
protein molecules which are suitable targets, for
drugs that will interfere with viral structure or
function
11Imagine a Crisis (3)
From the viral DNA sequences
Computer program
Protein sequence
12Imagine a Crisis (4)
From amino acid sequences
Computer program
Three-dimensional structure
13Homology Modelling
Data bank will be screened for related proteins
of know structures
Computer program
Structure will be predicted
A
B
14Ab initio
No related protein of known structure is found
Ab initio
Predicting the structure
15Design Therapeutic Agents
Knowing the viral protein structure
Design therapeutic agents
16Life in space and time
17In Space
- Biosphere
- Ecosystem
- Darwinian selection or genetic drift
Natural mutation
The recombination of genes in sexual reproduction
Direct gene transfer
The generation of variants
18In Space
Ecosystem
Species
Cell
Nuclei, organelles and cytoskeleton
Molecules
19In Time
- A history of life 3.5 billion years
20Dogmas Central and Peripheral
21Central Dogmas
- 1957, Crick??????DNA??RNA,RNA?????
- ??????????,????????
- ???RNA??RNA?DNA?RNA?Protein
- ????
- ????RNA??????
- ??????????,??RNA??????
- ??????????????????????,?????RNA ,????????????
22????RNA??????
23??????????,??RNA??????
First identified in plant virus
24(No Transcript)
25Purines and Pyrimidines
26The Strand in the Double-helix are Antiparalle
3
5
5
3
27(No Transcript)
28Paradigm
DNA sequence
determines
Protein sequence
determines
Protein structure
determines
Protein function
Most of the organized activity of bioinformatics
has been focused on the analysis of the data
related to these processes
29Observable and Data Archives
30A Databank
- An archive of information
- A logical organization
- Structure of that information
- Tools to gain access to it
31A Databank in Molecular Biology
- Archival databanks of biological information
- DNA and protein sequence
- Nucleic acid and protein structure
- Databanks of protein expression
32A Databank in Molecular Biology
- Derived Databanks
- Sequence motifs
- Mutations and variants in DNA and protein
sequences - Classification and relationships
- Bibliographic Databanks
- Databanks of web sites
- Databanks of databanks containing biological
information - Links between databanks
33The Mechanism of Access to a Databank is
- the Set of Tools for answering Question Such as
- Does the databank contain the information I
require? - How can I assemble information from the databank
in a useful form? - Indices of databanks are useful in asking Where
can I find some specific piece of information?
34A Variety of Possible Kinds of Database Queries
Can Arise in Bioinformatics (1)
Give a sequence or fragment of a sequence
Find sequence in the database that are similar to
it
A central problem in bioinformatics
35A Variety of Possible Kinds of Database Queries
Can Arise in Bioinformatics (2)
Give a protein structure or fragment
Find protein structures in the database that are
similar to it
36A Variety of Possible Kinds of Database Queries
Can Arise in Bioinformatics (3)
Give a sequence of a protein of unknown structure
Find structures in the database that adopt
similar three-dimensional structures
37A Variety of Possible Kinds of Database Queries
Can Arise in Bioinformatics (3)
For if two protein have sufficiently similar
sequences
They will have similar structure
38A Variety of Possible Kinds of Database Queries
Can Arise in Bioinformatics (4)
Give a protein structure
Find sequences in the data bank that correspond
to similar structures
39A Variety of Possible Kinds of Database Queries
Can Arise in Bioinformatics
- (1) and (2) are solved problems
- (3) and (4) are active fields of research
40Curation, Annotation and Quality Control
- Older data were limited by older techniques
- Amino acid sequences of protein used to be
determined by peptide sequencing. - Now, almost al are translated from DNA sequences.
41Curation, Annotation and Quality Control
- Distributed error-correction and annotation
- Dynamic error-correction and annotation