Title: Applications of Bioinformatics
1Applications of Bioinformatics
2Systems Biology
- Human Genome Project leads to new view of
biology - No longer investigate one gene at a time
- Investigates the behavior and relationships of
all of the elements in a particular biological
system - Integration/Display/Modeling/Simulation
3Discovery Science
- Complete characterization of genes and proteins
in human and model organisms - Information science
- High throughput perturbation and monitor of
biological systems - Simulation with computational methods
- In contrast to hypothesis-driven science
4Genomic Sequence
- Genes, transcription regulatory elements, motifs,
functional domains - Comparative genomics
- Polymorphism (SNP)
- Model organisms provide hints to decipher human
genome
5(No Transcript)
6Biological Science Full With Information
- DNA ? mRNA ? protein ? protein interactions ?
information pathways ? information networks ?
cells ? tissues ? organism ? populations ?
ecologies - Other molecules (metabolites)
- Driven by genes and their interaction with the
environments
7Biological Information
- Multiple hierarchical levels of organization
- Processed in complex networks
- Biological networks are robust, tolerant to small
perturbations - Key components has profound effects offer as
targets to understand/manipulate the system
8Yeast genetic perturbation with YAC
9Applications of YAC
- Gene knockout
- Promoter fusion
- Protein fusion
- Epitope tags
10Mammalian genetic perturbation with RNA
interference
11High-Throughput Tools
- DNA sequencing
- Microarray
- Protein Chip
- Proteomics
- With the following stages
- Proof of principle
- Creation of reliable instrument
- Development of automatic procedure
12DNA sequencing
13Microarray
14Microarray
- cDNA array
- Double strand cDNA or PCR products
- Oligonucleotide array
- More specific than cDNA array
- Possible to distinguish single nucleotide
difference (SNP) - Not as mature as sequencing
15High Density Protein Chip
- Dimension 1 in. x 3 in.
- More than 4000 proteins
- More than 7000 proteins in the end of the year
16(No Transcript)
17Proteomics
18Information revealed in Proteomics
- Identity
- Abundance
- Processing
- Modification
- Interaction
- Localization
- Turnover rate
19Identification of proteins
- Mass spectrometry
- Quantitative
- Determine protein sequence
- Detect differentiated protein expressions in
different cell types
20Mass Spectrometry
21Mass Spectrometry
22Cell Sorter
23Computational Databases
- Protein-protein interaction
- DIP, BIND, MIPS, MINT, IntAct, POINT
- Protein-DNA interaction
- TRANSFAC, SCPD
- Metabolic pathways
- KEGG, EcoCyc, WIT, Reactome
- Gene Expression
- GEO, GNF, NCI60, commercial
- Gene Ontology
24Protein-protein interaction
25Gene Regulatory Network
26Metabolic Pathways
27(No Transcript)
28(No Transcript)
29Gene Ontology
- The Gene Ontology project provides a controlled
vocabulary to describe gene and gene product
attributes in any organism - Annotations
- Molecular Function
- Cellular Components
- Biological Processes
30(No Transcript)
31(No Transcript)
32Challenges of Databases
- Provide information other than simple entries
(e.g. PPI with functional annotation or binding
strength) - Data maintenance update
- Integration with other databases
33Importance of Global Analysis
- Using gene-expression data to identify genes
involved in cancer, development, aging, cellular
responses - Clustering
- Other analysis
- Regulatory elements
- Protein-protein interaction
- Phylogenetic profiles
34(No Transcript)
35(No Transcript)
36Importance of Computer Models
- Interactions in cell is too complex to handle by
pen-and-paper - With high-throughput tools, biology shifts from
descriptive to predictive - Computers are required to store, processing,
assemble, and model all high-throughput data into
networks
37Tools for Simulation
- E-cell
- Cell Illustrator
- Virtual Cell
- Standardizing efforts
- BioJake
- SBML (systems biology markup language)
- Facilitate the exchange of models
38E-Cell System
- A software to construct object models equivalent
to a cell system or a part of the cell system - Employing Structured Variable-Process model
(previously called the Substance-Reactor model,
or SRM) - Objects
- Variables, Processes, Systems
39Cell Illustrator
40Types of Computer Models
- Chemical Kinetic Model
- Defined by concentrations of different molecular
species in the cell - Represented with a number of equations
- Some processes may be stochastic
- Simplified Discrete Circuit
- Network with nodes and arrows
- Nodes represent quantity or other attributes
- Directed edges represent effect of nodes on other
nodes
41Different Mathematical Formulations
- Differential Equations
- Linear (ordinary)
- Partial
- Stochastic
- S-Systems
- Power-law formulation
- Captures complicate dynamics
- Parameter estimation is computation intensive
42Model details
- Selection of genes, gene products, and other
molecules to be included - Cellular compartments nucleus, golgi, or other
organelles - Too much details may lead to more noises
- Minimal model able to predict system properties
(mRNA level, growth rate, etc) is sufficient
43Construct Model from Global Patterns
- Microarray gene expression patterns
Up-regulated/down-regulated - Gene expression profiles under different
conditions Tumor/normal, cell cycle, drug
treatment, - Methods
- Bayesian Inferences
- Machine learning (clustering, classification)
44Framework for Systems Biology
45Steps in Systems Biology Frameworks
- Define all components of system
- Systematically perturb and monitor system
components - Refine models to reflect experimentally
observations as close as possible - Design and perform new perturbation experiments
46Examples
- Galactose system
- Sea urchin cis-regulatory network
47(No Transcript)
48(No Transcript)
49(No Transcript)
50(No Transcript)
51Summary
- High throughput experimental data
- High throughput perturbation
- Data integration