Title: Bioinformatics
1Bioinformatics
Predrag Radivojac Indiana University
2Basics of Molecular Biology
Eukaryotic cell
- Can we understand how cells function?
3Bioinformatics is multidisciplinary!
- What is Bioinformatics?
- Integrates computer science, statistics,
chemistry, physics, and molecular biology - Goal organize and store huge amounts of
biological data and extract knowledge from it - Major areas of research
- Genomics
- Proteomics
- Databases
- Practical discipline
Some major applications Drug design
Evolutionary studies Genome
characterization
4Interesting Problems
5Interesting Problems
6Interesting Problems
Goal solve the puzzle, i.e. connect the pieces
into one genomic sequence
7Interesting Problems
Mass spectrometry
8Interesting Problems
9Interesting Problems
10Diseases are interconnected
Goh et al. PNAS, 104 8685 (2007).
11Disease
- Development of tools that can be used to
understand and treat human disease - Prediction of disease-associated genes
- Important from
- biological standpoint
- medical standpoint
- computational standpoint
- Background
- human genome
- low-throughput data
- high-throughput data
- ontologies for protein function at multiple levels
The Time is Right!
www.cancer.gov
12Alzheimers disease
Top PhenoPred hits 1) CDK5 2) NTN1
AUC 77.5
13Loss/Gain of function and disease
E6V
4hhb
2hbs
Sickle Cell Disease Autosomal
recessive disorder E6V in HBB causes
interaction w/ F85 and L88 Formation
of amyloid fibrils Abnormally shaped
red blood cells, leads to sickle cell anemia
Manifestation of disease vastly different
over patients
Pauling et al. Science 110 543 (1949). Chui
Dover. Curr Opin Pediatr, 13 22 (2001).
http//gingi.uchicago.edu/hbs2.html
14Lipitor (ATORVASTATIN)
E6V
15Proteins chains of amino acids
- biomolecule, macromolecule
- more than 50 of the dry weight of cells is
proteins - polymer of amino acids connected into linear
chains - strings of symbols
- machinery of life
- play central role in the structure and function
of cells - regulate and execute many biological functions
a) amino acid b) amino acid chain
Introduction to Protein Structure by Branden and
Tooze
16Protein structure
- peptide bonds are planar and strong
- by rotating at each amino acid, proteins adopt
structure
Introduction to Protein Structure by Branden and
Tooze
17Protein function
- Multi-level phenomenon
- biochemical function
- biological function
- phenotypical function
- Example kinase
- biochemical function transferase
- biological function cell cycle regulation
- phenotypical function disease
- Function is everything that happens to or through
a protein (Rost et al. 2003)
18Protein contact graph
C??- C??lt 6A
- Myoglobin 1.4A X-ray PDB 2jho 153 residues
19Protein contact graph
20Protein contact graph
21Residue neighborhood
Notation
S113 of isocitrate dehydrogenase
G (V, E) f V ? A A A, C, D, W,
Y g V ? ?1, 1
22S
Graphlets are small non-isomorphic connected
graphs. Different positions of the pivot vertex
with respect to the graphlet correspond to
graph-theoretical concept of automorphism orbits,
or orbits.
Przulj et al. Bioinformatics 20 3508 (2004).
23Results
24Key insight Efficient combinatorial enumeration
of graphlets / orbits over 7 disjoint cases
breadth-first search
- 2-graphlets 01
- 3-graphlets 011, 012
- 4-graphlets 0111, 0112
- 0122, 0123
2502
01
01 A o2 A2 o5, o6, o11 A3 o3, o4
? A 0, 1 00, 01
10, 11 (3) A 0, 1, 2 00, 11, 22,
01 10, 02 20, 12 21 (6) binomial
(multinomial) coefficients A 20,
dimensionality 1,062,420
26Graphlet kernel
- Inner product between vectors of counts of
labeled orbits - where
-
- K is a kernel because matrices of inner products
are symmetric and positive definite (proof due to
David Haussler).
?i(x) is the number of times labeled orbit i
occurs in the graph