ESI 2004 OPTIMIZATION AND DATA MINING - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

ESI 2004 OPTIMIZATION AND DATA MINING

Description:

ESI 2004 OPTIMIZATION AND DATA MINING – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 34
Provided by: fadim
Category:
Tags: and | data | esi | mining | optimization | acid | alpha | amino

less

Transcript and Presenter's Notes

Title: ESI 2004 OPTIMIZATION AND DATA MINING


1
ESI 2004 OPTIMIZATION AND DATA
MINING
  • CLASSIFICATION OF FOLDING TYPES
  • IN PROTEINS USING MILP
  • FADIME ÜNEY
  • (funey_at_ku.edu.tr)
  • Supervised by METIN TÜRKAY
  • (mturkay_at_ku.edu.tr)
  • KOÇ UNIVERSITY, ISTANBUL

2
AGENDA
  • Proteins
  • Structures of proteins
  • Classification of folding types
  • Folding type prediction
  • Propositional Logic
  • Model (MILP Formulation)
  • Parameters
  • Variables
  • Objective function
  • Constraints
  • Illustrative Examples
  • Training Set Results
  • Future Research

3
PROTEINS
  • Bones, muscles, skin and hair of organisms
  • Used in structure of cells
  • Required for proper functioning and regulation of
    organisms such as enzymes, hormones, antibodies
  • Amino acids ? PROTEINS
  • Molecules of life

4
CHEMICAL STRUCTURE OF AMINO ACIDS
  • Distinguishing feature
  • Different R groups

Carboxyl group
Side Chain
Amino group
5
CLASSIFICATION OF AMINO ACID
6
PEPTIDE BOND
Repeating units NC? C NC? C
O
O
7
PROTEINS
  • Since part of the amino acid is lost during
    dehydration synthesis
  • units of a protein ? amino acid residues
  • Typicall protein 200 -300 residues
  • may increase up to 27 000
  • Residue content and order is unique for each
    protein
  • Sequence and types of side chains determine
  • 3D shape, chemical and biological functions

8
STRUCTURES OF PROTEINS
9
PRIMARY STRUCTURE
  • Sequence of amino acids
  • A C M V I I C E V
  • No arrangement of peptide bonds
  • No angles between chemical bonds
  • No interactions between any parts of residues
  • Amino acid content and order dictates
  • Shape of protein molecule
  • Its spatial and biochemical properties

10
SECONDARY STRUCTURE
  • Local spatial arrangement of its main chain atoms
  • Without regard to conformation of its side chains
  • Without relationship with other segments
  • Types of secondary structures
  • a-helices
  • ß-sheets
  • Loops, turns and coils

11
CLASSIFICATION OF FOLDING TYPES IN PROTEINS
ALL-ALPHA ? a-helices 40 and ß-sheets 5
(a) ALL-BETA ? a-helices 5 and ß-sheets 40
(ß) ALPHABETA ? a-helices 15 and ß-sheets
15 (aß) (60 antiparallel) ALPHA/BETA ?
a-helices 15 and ß-sheets 15 (a/ß) (60
parallel)
12
FOLDING TYPE PREDICTION
  • Functions of proteins ? study of fundamental
    biological processes
  • Genetic engineering
  • In case of human ? DESIGN OF DRUGS
  • Experimental methods ? slow, require large
    amounts of resources
  • Focal research subject in computational biology
    and bioinformatics

13
FOLDING TYPE PREDICTION
  • Folding type of a protein depends on amino acid
    composition, Nakashima et. al., 1986
  • Several methods studied
  • Chou, 1995 (Component coupled, 95.3)
  • Bahar et. al., 1997 (Singular Value
    Decomposition, 81)
  • Cai Zhou, 200 (Neural Network, 89.2)
  • Cai et. al., 2001 (Support Vector Machines,
    93.2)
  • Properties of training and test set

14
FOLDING TYPE PREDICTION
SVM
MIP
15
PROPOSITIONAL LOGIC
  • Express relationships among Boolean variables
  • Boolean variables (True or False)
  • Operators
  • OR
  • AND
  • IMPLICATION

16
MIXED-INTEGER LINEAR PROGRAMMING (MILP)
FORMULATION
  • Indices
  • i protein
  • j chain of the protein (A, B, C,...)
  • k folding type of the protein (a, ß, aß, a/ß)
  • l box that encloses a number of data points
    belonging to a type (1, 2, .., L)
  • m amino acid (1, 2, .., 20)
  • n bound (lower, upper)

17
MILP FORMULATION (cont.)
  • Parameters
  • compijm composition of amino acid m in the
    subunit j of protein i
  • foldtypeijk folding type k of the subunit j of
    protein i
  • compU a sufficiently large parameter

18
MILP FORMULATION (cont.)
  • Variables (binary)
  • YBl existence of box l
  • YBClk assignment of folding type k to box l
  • YPBijl assignment of subunit j of protein i to
    box l
  • YPCijk assignment of subunit j of protein i to
    class k
  • YLlsm lower bound of box s is between the bounds
    of box l for amino
  • acid m
  • YUlsm upper bound of box s is between the bounds
    of box l for amino
  • acid m
  • YCls intersection of box l and box s

19
MILP FORMULATION (cont.)
  • Variables (continuous)
  • Xlmn define bounds n for amino acid m in box l
  • XDlkmn define bounds n for amino acid m in box l
    for class k
  • XP1ijk model misallocation of subunit j of
    protein i to class k
  • XP2ijk model misallocation of subunit j of
    protein i to class k

20
MILP FORMULATION (cont.)

OBJECTIVE FUNCTION Minimize
Intersection
Misallocation
21
MILP FORMULATION (cont.)
CONSTRAINTS Bounds for boxes

22
MILP FORMULATION (cont.)

Relationship between protein-box and protein-class
Relationship between box-class
23
MILP FORMULATION (cont.)
Relationship between protein-box-class
Misallocation
24
MILP FORMULATION (cont.)
Intersection
l
s
l
s
25
MILP FORMULATION (cont.)
Intersection
26
MODELING ENVIRONMENT
27
ILLUSTRATIVE EXAMPLES
ALP BET APB ASB
28
ILLUSTRATIVE EXAMPLES
ALP BET APB ASB
29
TRAINING SET
  • Better training database
  • A good quality of structure
  • As many nonhomologous structures as possible
  • A typical or distinguishable feature for each
    class
  • PDB (Protein Data Bank)
  • http//www.rcsb.org/pdb/
  • SCOP (Structural Classification of Proteins)
  • www.scop.mrc-lmb.cam.ac.uk/scop/
  • 30 from each class

30
PDB code(Brookhaven National Labrotary)1ABA,
8ATC, etc.5th letter indicates the chain of
the protein
31
RESULTS
  • Very big problem ?
  • Dual Simplex Method
  • Iterative solution procedure ?
  • Objective Function 0
  • 14 Boxes
  • 4 ALL ALPHA, 4 ALPHABETA
  • 3 ALL BETA, 3 ALPHA/BETA
  • Training accuracy 100

32
FUTURE RESEARCH
  • Test set
  • 1600 proteins
  • Jack-knife Test
  • Prediction accuracy
  • Model
  • Distance-based classification

33
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com