Extracting hidden information from knowledge networks - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Extracting hidden information from knowledge networks

Description:

Extracting hidden information from knowledge networks Sergei Maslov Brookhaven National Laboratory, New York, USA – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 30
Provided by: Serge177
Category:

less

Transcript and Presenter's Notes

Title: Extracting hidden information from knowledge networks


1
Extracting hidden information from knowledge
networks
  • Sergei Maslov
  • Brookhaven
  • National Laboratory,
  • New York, USA

2
Outline of the talk
  • What is a knowledge network and how is it
    different from an ordinary graph or network?
  • Knowledge networks on the internet matching
    products to customers
  • Knowledge networks in biology large ensembles of
    interacting biomolecules
  • Empirical study of correlations in the network of
    interacting proteins
  • Collaborators Y-C. Zhang, and K. Sneppen

3
Networks in complex systems
  • Network is the backbone of a complex system
  • Answers the question who interacts with whom?
  • Examples
  • Internet and WWW
  • Interacting biomolecules (metabolic, physical,
    regulatory)
  • Food webs in ecosystems
  • Economics customers and products Social people
    and their choice of partners

4
Predicting tastes of customers based on their
opinions on products
  • Each of us has personal tastes
  • These tastes are sometimes unknown even to
    ourselves (hidden wants)
  • Information is contained in our opinions on
    products
  • Matchmaking customers with similar tastes can be
    used to predict future opinions
  • Internet allows to do it on a large scale

5
Types of networks
Plain network
Knowledge or opinion network
readers
6
Storing opinions
Matrix of opinions ?IJ
Network of opinions
X X X 2 9 ? ?
X X X ? 8 ? 8
X X X ? ? 1 ?
2 ? ? X X X X
9 8 ? X X X X
? ? 1 X X X X
? 8 ? X X X X
7
Using correlations to reconstruct customers
tastes
  • Similar opinions ? similar tastes
  • Simplest model
  • Readers ? M-dimensional vector of tastes rI
  • Books ? M-dimensional vector of features bJ
  • Opinions ? scalar product
  • ?IJ rI?bJ

1
2
9
1
2
customers
8
2
books
3
8
1
3
4
8
Loop correlation
  • predictive power 1/M(L-1)/2
  • one needs many loops to completely
    freezemutual orientation of vectors

9
Field Theory Approach
  • If all components of vectors are Gaussian and
    uncorrelated
  • Generating functional is det(1i?)-M/2
  • All irreducible correlations are proportional to
    M
  • All loop correlations lt?12 ?23 ?34 ?L1gtM
  • Since each is ?IJ?M sign correlation scales
    as M(L-1)/2

10
Main parameter density of edges
  • The larger is the density of edges p the easier
    is the prediction
  • At p1 ? 1/N (NNreadersNbooks) macroscopic
    prediction becomes possible. Nodes are connected
    but vectors rI bJ are not fixed ordinary
    percolation threshold
  • At p2 ? 2M/N gt p1 all tastes and features (rI
    and bJ) can be uniquely reconstructed rigidity
    percolation threshold

11
Spectral properties of ?
  • For MltN the matrix ?IJ has N-M zero eigenvalues
    and M positive ones ? R ? R.
  • Using SVD one can diagonalize R U ? D ? V
    such that matrices V and U are orthogonal V ? V
    1, U ? U 1, and D is diagonal. Then ? U ?
    D2? U
  • The amount of information contained in ?
    NM-M(M-1)/2 ltlt N(N-1)/2 - the of off-diagonal
    elements

12
Practical recursive algorithm of prediction of
unknown opinions
  1. Start with ?0 where all unknown elements are
    filled with lt?gt (zero in our case)
  2. Diagonalize and keep only M largest eigenvalues
    and eigenvectors
  3. In the resulting truncated matrix ?0 replace
    all known elements with their exact values and go
    to step 1

13
Convergence of the algorithm
  • Above p2 the algorithm exponentially converges
    to theexact values of unknown elements
  • The rate of convergence scales as (p-p2)2

14
Reality check sources of errors
  • Customers are not rational! ?IJ rI?bJ
    ?Ij(idiosyncrasy)
  • Opinions are delivered to the matchmaker through
    a narrow channel
  • Binary channel SIJ sign(?IJ) 1 or 0 (liked or
    not)
  • Experience rated on a scale 1 to 5 or 1 to 10 at
    best
  • If number of edges K, and size N are large,
    while M is small these errors can be reduced

15
How to determine M?
  • In real systems M is not fixed there are always
    finer and finer details of tastes
  • Given the number of known opinions K one should
    choose Meff ? K/(NreadersNbooks) so that systems
    are below the second transition p2 ? tastes
    should be determined hierarchically

16
Avoid overfitting
  • Divide known votes into training and test sets
  • Select Meff so that to avoid overfitting !!!

17
Knowledge networks in biology
  • Interacting biomolecules key and lock principle
  • Matrix of interactions (binding energies) ?IJ
    kI?lJ lI?kJ
  • Matchmaker (bioinformatics researcher) tries to
    guess yet unknown interactions based on the
    pattern of known ones
  • Many experiments measure SIJ ?(?IJ-?th)

k(1)
k(2)
l(2)
l(1)
18
Real systems
  • Internet commerce the dataset of opinions on
    movies collected by Compaq systems research
    center
  • 72916 users entered a total of 2811983 numeric
    ratings ( to ) for 1628 different movies
    Meff40
  • Default set for collaborative filtering research
  • Biology table of interactions between yeast
    proteins from Ito et al. high throughput
    two-hybrid experiment
  • 6000 proteins (3300 have at least one
    interaction partner) and 4400 known interactions
  • Binary (interact or not)
  • Meff1 too small!

19
Yeast Protein Interaction Network
  • Data from T. Ito, et al. PNAS (2001)
  • Full set contains 4549 interactions among 3278
    yeast proteins
  • Here are shown only nuclear proteins interacting
    with at least one other nuclear protein

20
Correlations in connectivities
  • Basic design principles of the network can be
    revealed by comparing the frequency of a pattern
    in real and random networks
  • P(k0,k1) probability that nodes with
    connectivities k0 and k1 directly interact
  • Should be normalized by Pr(k0,k1) the same
    property in a randomized network such that
  • Each node has the same number of neighbors
    (connectivity)
  • These neighbors are randomly selected
  • The whole ensemble of random networks can be
    generated

21
Correlation profile of the protein interaction
network
P(k0,k1)/Pr(k0,k1)
Z(k0,k1) (P(k0,k1)-Pr(k0,k1))/?r(k0,k1)
22
Correlation profile of the internet
23
What it may mean?
  • Hubs avoid each other (like in the internet R.
    Pastor-Satorras, et al. Phys. Rev. Lett. (2001))
  • Hubs prefer to connect to terminal ends (low
    connected nodes)
  • Specificity network is organized in modules
    clustered around individual hubs
  • Stability the number of second nearest neighbors
    is suppressed ? harder to propagate deleterious
    perturbations

24
Conclusion
  • Studies of networks are similar to paleontology
    learning about an organism from its backbone
  • You can learn a lot about a complex system from
    its network !! But not everything

25
THE END
26
Entropy of unknown opinions
Entropy
Density of knownopinions p
p1
p2
0
1
27
How to determine p2?
  • K known elements of an NxN matrix ?IJ rI?bJ
    (NNrNb)
  • Approximately N x M degrees of freedom (minus
    M(M-1)/2 gauge parameters)
  • For KgtMN all missing elements can be
    reconstructed ? p2 K2/(N(N-1)/2) ? 2M/N

28
What is a knowledge network?
  • Undirected graph with N vertices and K edges
  • Each vertex has a (hidden) M-dimensional vector
    of tastes/features
  • Each edge carries a scalar product (opinion) of
    vectors on vertices it connects
  • The centralized matchmaker is trying to guess
    vectors (tastes) based on their scalar products
    (opinions) and to predict unknown opinions

29
Versions of knowledge networks
  • Regular graph every link is allowed. Example
    recommending people to other people according to
    their areas of interests
  • Bipartite graphs Example Customers to products
  • Non-reciprocal opinions each vertex has two
    vectors dI, qI so that ?IJ dI?qJ . Example Real
    matchmaker recommending men to women.
Write a Comment
User Comments (0)
About PowerShow.com