Title: A dimensionality reduction approach to modeling protein flexibility
1A dimensionality reduction approach to modeling
protein flexibility
- By Miguel L. Teodoro, George N. Phillips J
- and Lydia E. Kavraki
- Rice University and University of
Wisconsin-Madison - Presented by Zhang Jingbo
2Outline
- Motivation, Background and Our goal
- Protein flexibility
- The problems in current methods and the benefit
of our methods in this paper - Dimensionality reduction techniques
- Obtaining conformational Data
- Application to Specific Systems
- Summary
3Motivation
- Introduce a method to obtain a reduced basis
representation of protein flexibility.
4Background
- Proteins are involved either directly or
indirectly in all biological processes in living
organisms. - Conformational changes of proteins can critically
affect their ability to bind other molecules. - Any progress in modeling protein motion and
flexibility will contribute to the understanding
of key biological functions. - Today there is a large body of knowledge
available on protein structure and function and
this amount of information is expected to grow
even faster in the future.
5Our method and goal
- Method
- A dimensionality reduction technique
Principal Component Analysis - Goal
- Transform the original high dimensional
representation of protein motion into a lower
dimensional representation that captures the
dominant modes of motions of the protein. - Obtain conformations that have been observed in
laboratory experiments.
6The focus of this paper
- How to obtain a reduced representation of protein
flexibility from raw protein structural data
7What is Protein flexibility ?
- Definition A crucial aspect of the relation
between protein structure and function. - Proteins change their three-dimensional shapes
when binding or unbinding to other molecules.
8(No Transcript)
9(No Transcript)
10Why we want to modeling protein flexibility?
- Several applications for our work
- 1. Pharmaceutical drug development
-
- 2. To model conformational changes that
occur during protein-protein and protein-DNA/RNA
interactions.
11RII molecular "handshake" (donut with two holes).
Models for the binding of RII to the glycophorin
A receptor on red blood cells (erythrocytes).
Backbone of the RII dimer showing glycan binding
sites.
12The problems in current methods
- The computational complexity of explicitly
modeling all the degrees of freedom of a protein
is too high. - Modeling proteins as rigid structures limits the
effectiveness of currently used molecular docking
mithods.
13The benefit of our method in this paper (1)
- Using the approximation
- Make including protein flexibility in the drug
process a computationally efficient way.
14Two most common structural biology experimental
methods in use today
- Protein X-ray crystallography
- Nuclear magnetic resonance (NMR)
- Limits
15An alternative to experimental methods
- Computational methods based on classical or
quantum mechanics to approximate protein
flexibility. - Limits
16The benefit of our method in this paper (2)
- Transform the basis of representation of
molecular motion. - The new degrees of freedom will be linear
combinations of the original variables. - Some degrees of freedom are significantly more
representative of protein flexibility than
others. - Consider only the most significant dof and the
transformed dof are collective motions affecting
the entire configuration of the protein. - Some tradeoff between the loss of information and
effectively modeling protein flexibility in a
largely reduced dimensionality subspace.
17What we acutually do in this paper?
- Start from initial coordinate information from
different data sources - Apply the principal component analysis method of
dimensionality reduction. - Obtain a new structural representation using
collective degrees of freedom. - Here, we will focus on
- a. the interpretation of the principal
components as biologically relevant motions - b. how combinations of a reduced number of
these motions can approximate alternative
conformations of the protein.
18Dimensionality reduction techniques
- Aim find a mapping between the data in a space
and its subspace. - Two methods
- a. Multidimensional scaling (MDS)
- b. Principal component analysis (PCA)
-
- Merits
- Limits
19PCA of conformational data
- Merits 1). the most established method
- 2). the most efficient algorithms
- 3). guaranteed convergence for
computation - 4). a upper bound on how much we
can - reduce the representation of
conformational - flexibility in proteins.
- 5). the principal components have
a direct - physical interpretation.
- 6). can readily project the high
dimensional - data to a low dimensional space
and do it in - the inverse direction
recovering a - representation of the original
data with - minimal reconstruction error.
20PCA of conformational data (continued)
21PCA of conformational data (continued)
- Conformational Data
- 1. The input data for PCA Several atomic
displacement vectors (3N dimension) corresponding
to different structural conformations, which as
the form - corresponds to Cartesian coordinate
information for the ith atom. - 2. All atomic displacement vectors constitute
the conformational vector set.
22Singular value decomposition (SVD)
- We use the singular value decomposition (SVD) as
an efficient computational method to calculate
the principal components. - The SVD of a matrix, A, is defined as
-
- where U and V are orthonormal matrices and
- is a nonnegative diagonal matrix whose
diagonal elements are the singular values of A. - the columns of matrices U and V are called
the left and right singular vectors,
respectively. - the square of each singular value corresponds
to the variance of the data in A. - The SVD of matrix A was computed using the ARPACK
library.
23Obtaining conformational Data
- The most common data sources
- 1. experimental laboratory methods
- a. X-ray crystallography
- b. NMR,
- 2. computational sampling methods based
forcefield such as molecular dynamics. - laboratory methods VS computational methods
- - laboratory methods generate less data
- - computational methods have a lower
accuracy.
24Application to Specific Systems
25HIV-1 Protease
26The advantages of using the PCA methodology to
analyze protein flexibility
- Can be used at different levels of detail
- 1. the overall motion of the backbone.
- 2. the simplified flexibility of the
protein as a whole. - 3. include only the atoms that
constitute the binding site.
27In the first experiment situation
28The second situation
29The last situation
- As a validation of our method.
30Another application Aldose Reduction
31Summary
32