Extracting hidden information from knowledge networks - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

Extracting hidden information from knowledge networks

Description:

Extracting hidden information from knowledge networks Sergei Maslov Brookhaven National Laboratory, New York, USA – PowerPoint PPT presentation

Number of Views:93

Avg rating:3.0/5.0

Slides: 30

Provided by: Serge177

Category:

more less

Transcript and Presenter's Notes

Title: Extracting hidden information from knowledge networks

1
Extracting hidden information from knowledge
networks

Sergei Maslov
Brookhaven
National Laboratory,
New York, USA

2
Outline of the talk

What is a knowledge network and how is it
different from an ordinary graph or network?
Knowledge networks on the internet matching
products to customers
Knowledge networks in biology large ensembles of
interacting biomolecules
Empirical study of correlations in the network of
interacting proteins
Collaborators Y-C. Zhang, and K. Sneppen

3
Networks in complex systems

Network is the backbone of a complex system
Answers the question who interacts with whom?
Examples
Internet and WWW
Interacting biomolecules (metabolic, physical,
regulatory)
Food webs in ecosystems
Economics customers and products Social people
and their choice of partners

4
Predicting tastes of customers based on their
opinions on products

Each of us has personal tastes
These tastes are sometimes unknown even to
ourselves (hidden wants)
Information is contained in our opinions on
products
Matchmaking customers with similar tastes can be
used to predict future opinions
Internet allows to do it on a large scale

5
Types of networks
Plain network
Knowledge or opinion network
readers
6
Storing opinions
Matrix of opinions ?IJ
Network of opinions
X X X 2 9 ? ?
X X X ? 8 ? 8
X X X ? ? 1 ?
2 ? ? X X X X
9 8 ? X X X X
? ? 1 X X X X
? 8 ? X X X X
7
Using correlations to reconstruct customers
tastes

Similar opinions ? similar tastes
Simplest model
Readers ? M-dimensional vector of tastes rI
Books ? M-dimensional vector of features bJ
Opinions ? scalar product
?IJ rI?bJ

1
2
9
1
2
customers
8
2
books
3
8
1
3
4
8
Loop correlation

predictive power 1/M(L-1)/2
one needs many loops to completely
freezemutual orientation of vectors

9
Field Theory Approach

If all components of vectors are Gaussian and
uncorrelated

Generating functional is det(1i?)-M/2
All irreducible correlations are proportional to
M
All loop correlations lt?12 ?23 ?34 ?L1gtM
Since each is ?IJ?M sign correlation scales
as M(L-1)/2

10
Main parameter density of edges

The larger is the density of edges p the easier
is the prediction
At p1 ? 1/N (NNreadersNbooks) macroscopic
prediction becomes possible. Nodes are connected
but vectors rI bJ are not fixed ordinary
percolation threshold
At p2 ? 2M/N gt p1 all tastes and features (rI
and bJ) can be uniquely reconstructed rigidity
percolation threshold

11
Spectral properties of ?

For MltN the matrix ?IJ has N-M zero eigenvalues
and M positive ones ? R ? R.
Using SVD one can diagonalize R U ? D ? V
such that matrices V and U are orthogonal V ? V
1, U ? U 1, and D is diagonal. Then ? U ?
D2? U
The amount of information contained in ?
NM-M(M-1)/2 ltlt N(N-1)/2 - the of off-diagonal
elements

12
Practical recursive algorithm of prediction of
unknown opinions

Start with ?0 where all unknown elements are
filled with lt?gt (zero in our case)
Diagonalize and keep only M largest eigenvalues
and eigenvectors
In the resulting truncated matrix ?0 replace
all known elements with their exact values and go
to step 1

13
Convergence of the algorithm

Above p2 the algorithm exponentially converges
to theexact values of unknown elements
The rate of convergence scales as (p-p2)2

14
Reality check sources of errors

Customers are not rational! ?IJ rI?bJ
?Ij(idiosyncrasy)
Opinions are delivered to the matchmaker through
a narrow channel
Binary channel SIJ sign(?IJ) 1 or 0 (liked or
not)
Experience rated on a scale 1 to 5 or 1 to 10 at
best
If number of edges K, and size N are large,
while M is small these errors can be reduced

15
How to determine M?

In real systems M is not fixed there are always
finer and finer details of tastes
Given the number of known opinions K one should
choose Meff ? K/(NreadersNbooks) so that systems
are below the second transition p2 ? tastes
should be determined hierarchically

16
Avoid overfitting

Divide known votes into training and test sets
Select Meff so that to avoid overfitting !!!

17
Knowledge networks in biology

Interacting biomolecules key and lock principle
Matrix of interactions (binding energies) ?IJ
kI?lJ lI?kJ
Matchmaker (bioinformatics researcher) tries to
guess yet unknown interactions based on the
pattern of known ones
Many experiments measure SIJ ?(?IJ-?th)

k(1)
k(2)
l(2)
l(1)
18
Real systems

Internet commerce the dataset of opinions on
movies collected by Compaq systems research
center
72916 users entered a total of 2811983 numeric
ratings ( to ) for 1628 different movies
Meff40
Default set for collaborative filtering research
Biology table of interactions between yeast
proteins from Ito et al. high throughput
two-hybrid experiment
6000 proteins (3300 have at least one
interaction partner) and 4400 known interactions
Binary (interact or not)
Meff1 too small!

19
Yeast Protein Interaction Network

Data from T. Ito, et al. PNAS (2001)
Full set contains 4549 interactions among 3278
yeast proteins
Here are shown only nuclear proteins interacting
with at least one other nuclear protein

20
Correlations in connectivities

Basic design principles of the network can be
revealed by comparing the frequency of a pattern
in real and random networks
P(k0,k1) probability that nodes with
connectivities k0 and k1 directly interact
Should be normalized by Pr(k0,k1) the same
property in a randomized network such that
Each node has the same number of neighbors
(connectivity)
These neighbors are randomly selected
The whole ensemble of random networks can be
generated

21
Correlation profile of the protein interaction
network
P(k0,k1)/Pr(k0,k1)
Z(k0,k1) (P(k0,k1)-Pr(k0,k1))/?r(k0,k1)
22
Correlation profile of the internet
23
What it may mean?

Hubs avoid each other (like in the internet R.
Pastor-Satorras, et al. Phys. Rev. Lett. (2001))
Hubs prefer to connect to terminal ends (low
connected nodes)
Specificity network is organized in modules
clustered around individual hubs
Stability the number of second nearest neighbors
is suppressed ? harder to propagate deleterious
perturbations

24
Conclusion

Studies of networks are similar to paleontology
learning about an organism from its backbone
You can learn a lot about a complex system from
its network !! But not everything

25
THE END
26
Entropy of unknown opinions
Entropy
Density of knownopinions p
p1
p2
0
1
27
How to determine p2?

K known elements of an NxN matrix ?IJ rI?bJ
(NNrNb)
Approximately N x M degrees of freedom (minus
M(M-1)/2 gauge parameters)
For KgtMN all missing elements can be
reconstructed ? p2 K2/(N(N-1)/2) ? 2M/N

28
What is a knowledge network?

Undirected graph with N vertices and K edges
Each vertex has a (hidden) M-dimensional vector
of tastes/features
Each edge carries a scalar product (opinion) of
vectors on vertices it connects
The centralized matchmaker is trying to guess
vectors (tastes) based on their scalar products
(opinions) and to predict unknown opinions

29
Versions of knowledge networks

Regular graph every link is allowed. Example
recommending people to other people according to
their areas of interests
Bipartite graphs Example Customers to products
Non-reciprocal opinions each vertex has two
vectors dI, qI so that ?IJ dI?qJ . Example Real
matchmaker recommending men to women.

Write a Comment

User Comments (0)