Organization of Biological Data - PowerPoint PPT Presentation

About This Presentation
Title:

Organization of Biological Data

Description:

64 ways of writing the codon. 20 amino acids. The Flow of ... Physiological state. of the cell. Paradigm Shift in the Bioinformatics Age. Conventional Path ... – PowerPoint PPT presentation

Number of Views:453
Avg rating:3.0/5.0
Slides: 27
Provided by: drpvan
Category:

less

Transcript and Presenter's Notes

Title: Organization of Biological Data


1
Organization of Biological Data and Databases
Pramod Wangikar Dept. of Chemical Engineering IIT
Bombay
2
ORGANIZATION OF BIOLOGICAL DATA
Gene i
Genomics
m-RNA i
Transcriptomics
Protein Sequence / Proteomics
Protein i
Function (Enzyme, hormone etc.)
3-D Structural Database
3
Primary Structure of Deoxyribonucleic Acid (DNA)
OR
pApCpGpTpTpG
OR
ACGTTG
4
The Basic Principle of Transcription
RNA Polymerase
5
Double stranded DNA
RNA
Nucleotides
5
The Code
  • 64 ways of writing the codon
  • 20 amino acids

F
M
uac 5' 5'... aug
gaa 5' uuu ...
Adjacent mRNA codons
6
The Flow of Genetic Information
Sequense same as RNA
3
5
ACTGCACCATGGGGCTCAGCGACGGGGAATGGCACTTGGTG TGACGTGG
TACCCCGAGTCGCTGCCCCTTACCGTGAACCAC
DNA
Sequence complementary to RNA
5
mRNA
ACUGCACCAUGGGGCUCAGCGACGGGGAAUGGCACUUGGUG
Initiation signal
codons
Protein
Met-Gly-Leu-Ser-Asp-Gly-Gln-Trp-His-Leu-Val
7
Memory Requirements for Storing Genomes
00 a 01 c 10 g 11 t
Prokaryotic 0.5-7.0 Mbp Eukaryotic 10 Mbp -
1000 Gbp
8
(No Transcript)
9
How Much Data Does a Bacteria (E. coli) contain?
10
E. coli and Data size
Numbers are approximate The data size increases
roughly by three orders of magnitude for human
system
11
Minimal Life Self- assembly, Catalysis,
Replication, Mutation, Selection
Environment
Cell Boundary
Monomers
RNA
Growth rate
12
Maximal Life Self- assembly, Catalysis,
Replication, Mutation, Selection Regulatory
Metabolic Networks
Environment
Metabolites
Interactions
RNA
DNA
Protein
Growth rate
Expression
stem cells cancer cells microbes
13
Regulation More biological data
What is regulation A catalogue of possible
scenarios and respective course of action.
  • The information for regulation can be stored in
    the form of
  • Protein-protein interaction
  • Protein-DNA interaction
  • Protein-metabolite interaction
  • Molecular switches, controls, set-points, etc.

Genome Environment Input file Biological
Machinery Executable program Observations
Output file
Can we crack the executable program?
14
Some useful regulatory signals on Genes
Upstream activating sequences (UAS)
m-RNA expression start end
TATA box
DNA
x
x
mRNA
Ribosomal binding site
protein
Protein synthesis stops
Protein synthesis starts
15
Minimal Gene Complement of Mycoplasma genitalium
16
DESCRIPTION OF A LIVING CELL / VIRUS
Genome / Genomics
General Capability of the Cell
Readyness of the Cell
Transcriptomics
Proteomics / Protein Map
Physiological state of the cell
17
Paradigm Shift in the Bioinformatics Age
Conventional Path
Structure
Gene
Function
  • Bioinformatics Age

Functional Genomics
Gene sequence
Structure of Protein
Function
Protein Map 2D-PAGE, pI, mol. wt.
Proteomics
18
Possible Relationships Between Databases
Genome Sequence
Protein Seqeunce
Proteomics
Transcriptomics Expression Profile
Protein Structure
Protein Profile
Protein-DNA interactions
Protein-Protein Interaction
Protein Function
Metabolome
Phenotype
19
Combinatorial Problems in Biology
  • Prediction of ORF gene finding
  • Prediction of DNA regulatory sites
  • DNA regulatory Proteins
  • Protein-Protein interactions
  • Protein Function
  • Prediction of Metabolic capability
  • Prediction of Genetic Regulatory Circuits

20
Biological Databases
  • Raw databases
  • Processed databases
  • Querying in databases.

21
Raw Databases
Conventional Ones
DNA / Gene / Genome Sequence Databases. EMBL,
GenBank, GSDB etc. gt 106 genes, Doubles every 18
months. Genome Projects E. coli, plants, Human,
Mouse, etc. Protein Sequence Databases. PIR,
SwissProt, GenBank, etc. gt 105 protein
sequences, Doubles every 21 months Three
Dimensional structure Database. Brookhaven
Protein Databank (PDB) gt 20,000 structures,
doubles every 24 months.
22
Proteomics Database (SwissProt)
  • Each Protein Identified by pI, mol wt., mass
    spectra, microsequencing, peptide mass
    fingerprint, etc.
  • Entries for E.coli, yeast, human etc.

Hoogland et al, Nucl. Acids Res. (2000) 28, 286
23
Cluster of Orthologous Groups (COG) of Proteins
A Processes Database
  • Compares genes from different genomes.
  • Forms clusters with similar sequences.
  • Each COG contains genes connected through
    vertical evolutionary descent.
  • 30 genomes (68,571 genes), 2,791 COGs with 45,350
    genes
  • Assignment of function for genes based on known
    functions for some members of the cluster.
  • Highly useful for functional assignments for
    newly sequenced genomes.

24
EcoCyc Database Encyclopedia of E. coli genes
and Metabolism
4300 genes, 695 enzymes, 595 reactions, 123
pathways Blue E. coli only Green both E. coli
and H. influenzae.
Karp et al, Nucl. Acids Res. (1998) 26, 50
25
Querying in Databases
  • Based on sequence similarity gives similar
    sequences and the similarity score or expectation
    value.
  • Normally a BLAST, FASTA search (local alignment).
    Can look for a sequence motif.
  • Gene names, biological source, functional
    category, cellular location / role.
  • Structural features (for known 3-D structures).

26
Bioinformatics A multidisciplinary effort is
required
  • Generation of biological data
  • Storage and Retrieval of Data
  • Conversion of known biological hypotheses into
    mathematical/statistical models
  • Building models from data
  • Fitting new data to existing models.
  • Searching for patterns in data
  • Derive new biological knowledge from Data
Write a Comment
User Comments (0)
About PowerShow.com