Protein - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Protein

Description:

... arise when it comes to the driving force behind proteins interacting with each other ... This maps to the integer MSC problem with a global set of {0,1,2,3} ... – PowerPoint PPT presentation

Number of Views:12
Avg rating:3.0/5.0
Slides: 26
Provided by: Sim484
Category:

less

Transcript and Presenter's Notes

Title: Protein


1
Protein Protein Interactions
  • Lisa Chargualaf
  • Simon Kanaan
  • Keefe Roedersheimer
  • Others Dr. Izaguirre, Dr. Chen, Dr. Wuchty,
    ChengBang Huang

2
What are proteins?
  • Basis of most living functions
  • Building blocks of life
  • Substrates
  • Products
  • Enzymes
  • One cell contains thousands of different
    proteins the human body contains 50 to 100
    thousand proteins!

3
Proteins
  • Composed of sequences of amino acids
  • Variations of 20 primary/basic amino acids
  • Rules governing structure
  • AAs close in the folded structure may/may not be
    close in primary structure
  • Hydrophobic residues generally buried in core
    hydrophilic are usually exposed
  • Protein strings cannot form knots
  • Related proteins generally have similar
    structures
  • Similar structures can exist without having
    similar sequences

4
What is a proteinprotein, P-P, interaction and
why is it important?
  • Derived from the nuclear material within a cell,
    proteins fold and interact in intricate
    arrangements that provide functionality to the
    components of a cell, which in turn work
    cooperatively to form whole body systems.
  • Protein-protein interactions serve as the
    chemical basis of all living organisms.
  • Understanding protein interactions helps us
    understand the protein network.

5
What causes P-P interactions?
  • Many speculations arise when it comes to the
    driving force behind proteins interacting with
    each other
  • Primary sequence dictating interaction between
    attached functional groups
  • Protein domains drive proteins to fold and
    interact as they do.

6
What are protein domains?
  • significant portions of proteins
  • composed of distinct peptides
  • the key to intricate arrangements

7
Domains and Proteins
  • A single protein molecule can possess multiple
    domains, causing difficulty in discovering a
    simple formula that dictates the manner by which
    protein-protein interactions occur.
  • Yet, certain affinities exist between certain
    protein domains and are frequently seen in living
    organisms.
  • This drives our research that seeks to
    extrapolate the mechanism of protein-protein
    interactions to focus on domain-domain
    interactions as a factor.
  • The model system used for these proceedings is
    the yeast cell, with several of its proteins
    serving as the test cases. This is done using a
    protein family data bank available online.

8
Our Formula dictating which P-P interactions
occur
  • A data bank gives a list of protein interactions.
  • A protein interaction, (P1, P2), is explained by
    a domain pair, (D1, D2), if P1 includes one
    domain and P2 includes the other.
  • Find the minimum number of domain pairs that
    explains the databank. Equivalent to Minimum Set
    Cover problem.

9
Minimum Set Cover Problem
  • The problem of finding the minimum size set of
    sets whose union is equal to the union of all the
    sets.
  • NP complete problem.

10
Why the Minimum Set of Domains?
  • Lets look at the following case
  • P1 contains domains D2
  • P2 contains domains D2 and D3
  • P3 contains domains D2 and D4
  • P4 contains domains D2 and D5
  • And lets assume the protein interactions are
  • P1 - P1
  • P1 - P2
  • P1 - P3
  • P1 - P4
  • P-P interactions explained by
  • (D2 - D2)
  • (D2 - D3)
  • (D2 - D4)
  • (D1 - D5)
  • Or by
  • (D2 - D2)

11
Mapping to MSC
  • Let
  • P1 - P1 0
  • P1 - P2 1
  • P1 - P3 2
  • P1 - P4 3
  • Each pairs interactions
  • D2-D20,1,2,3
  • D2-D31
  • D2-D42
  • D1-D53
  • This maps to the integer MSC problem with a
    global set of 0,1,2,3 and subsets of
    0,1,2,3,1,2,3
  • Solution is D2-D2, more difficult for larger
    problems.

12
Implementation/Algorithm
  • This base algorithm consists of functions that
    can record the protein structure and interaction
    information and store them into different data
    structures.
  • It also builds a domain-domain matrix.
  • This matrix holds information about interacting
    domains. Each entry in the matrix represents the
    number of times domains Di and Dj were observed
    as the possible cause in different
    protein-protein interactions.
  • Example
  • P1D1, D2, D3 and P2 D1, D5 interact.
  • (D1, D1), (D1, D5), (D2, D1), (D2, D5), (D3, D1)
    and (D3, D5).

13
Exact Problems
  • In the worst case, ( of domains)2 number of
    domain interactions, corresponding to subsets.
  • Large number of proteins corresponding to a
    global set.
  • MSC is an NP complete problem, the exact solution
    requires considering all combinations of subsets.
  • Computationally expensive, impractical for more
    than 10 domains. There are thousands in a real
    problem.

14
Implementation/Algorithm
  • Algorithm approximates the minimum set of domains
    pairs.
  • Algorithm needs to be able to choose d-d pairs in
    an educated, not a randomized fashion.
  • This educated way can be done using weight
    functions. Where each domain pair is given a
    weight, and the largest of the weights is chosen.

15
Different Functions
  • Different weight functions were considered.
  • Decided on looking at two for now
  • MSC
  • MSC by probability
  • Also looked at running MSC twice with the
    addition of adding pairs with a high probability
    of interacting.

16
MSC
  • Assumption
  • most common observed interacting domain pair
    among the protein interactions is probably the
    cause of the protein interactions.
  • While there are P-P interactions to be explained
  • Chooses the most common observed interacting
    domain Di-Dj.
  • Removes Di-Dj
  • Removes all P-P interactions from the data being
    observed
  • Undoes P-P interactions effect on matrix

17
MSC by Probability
  • Assumption
  • Incorporate the absence of p-p interactions.
  • Initialize matrix just like MSC.
  • go through every element in the matrix and divide
    that entry by the total number of proteins that
    contain the first domain times the number of
    proteins which contain the second domain.
  • Now each element now represents the probability
    that domains i and j interact.
  • Then the weight function goes about choosing the
    highest probability in the matrix, seeing which
    proteins this domain pair explains, remove these
    proteins influence from the data and then
    performing the same tasks again.

18
Prediction
  • Input set of proteins with known structure.
  • Set of domains pairs obtained from algorithm
    being observed.
  • Go through each interacting domain pair (Di, Dj)
  • Every protein contained domain Di is considered
    interacting with a protein containing Dj.

19
Testing
  • Running MSC approximation VS. MSC exact on very
    small sets to see how good the approximation
    really is to exact solution.

20
Testing
  • Building different size training data using swiss
    pfam A database among others.
  • Running The aproximation algorithms on these
    sets.
  • Running AM on the same sets.
  • Attempting to use similar size sets to MLE for
    comparisons sake.

21
Testing
  • Compares calculated P-P interactions with
    observed interactions. (number of matches, false
    positive, and false negative p-p interactions)
  • Calculate fold, specificity, and sensitivity in
    order to compare to previous research.

22
Results
23
Results
24
Results
25
Future Work
  • Finish Testing and comparing different Weight
    Functions.
  • Getting some stats by running different
    algorithms multiple times on different size data
    sets.
  • Testing MSC exact vs. different weight functions
Write a Comment
User Comments (0)
About PowerShow.com