Title: Modeling Distances in LargeScale Networks by Matrix Factorization
1Modeling Distances in Large-Scale Networks by
Matrix Factorization
- Yun Mao
- CIS dept, University of Pennsylvania
- October, 2004
- Joint work with Prof. Lawrence Saul
2Motivation
- Network distances are useful
- Content distribution networks
- Peer-to-peer networks
- Overlay routing, multicast
- Network games
- Measurement comes w/ cost
- time, bandwidth,
- Is there a map or something to estimate without
measurement?
3Sorry, no global routing maps (yet)
4Our goal
Node1 location1
Node2 location2
Node3 location3
5Outline
- What has been done?
- Matrix factorization model
- Internet Distance Estimation Service
- Evaluation
- Conclusions
6What is network distance?
- Round trip latency
- Symmetric
- Relatively stable
- Triangle inequality violation
- Bandwidth, loss rate
- Not really distance, but useful
- Asymmetric
- This talk RTT (unless specified)
7State of the art
- Euclidean Embedding
- Each host has a coordinate
- Network distances are estimated as Euclidean
distances - Known systems
- GNP (IMW01, INFOCOM02)
- Simplex downhill
- ICS, Virtual Landmark (IMC03)
- Lipschitz PCA
- Vivaldi (HotNets03,SIGCOMM04)
- spring energy simulation, height extension
- Many others..
8Shared limitations
- Triangle inequality violations in RTT metrics
- Symmetric constraint
- Not suitable for complicated metrics
- Even if triangle inequality and symmetry
properties hold, increasing dimensionality
doesnt help to improve accuracy in many cases
9Simple topologies that dont have exact embeddings
(0.5,0.5)
(-0.5,0.5)
1
H1
H2
H1
H2
The estimated distance between H1 and H4 is 1.414
while the real distance is 2.0 Extra dimensions
dont help
(0,0)
1
1
H3
H4
H3
H4
1
(0.5,-0.5)
(-0.5,-0.5)
One Possible 2-D Embedding
Is there a better model?
Another tree topology example
10An algebraic perspective
Distance Matrix Dij is the distance from host i
to host j
Each host i has two vectors Xi and Yi Distance
function is the dot product.
11Comparison
(outgoing vector)
(coordinate)
(incoming vector)
(Euclidean distance)
(dot product)
12Questions
- How to factorize a matrix (D) into two (much)
smaller matrices (X,Y)? - Accuracy? d?
- How to build a system to assign outgoing/incoming
vectors to Internet-scale networks?
13Algorithm 1 Singular Value Decomposition (SVD)
Simple, and can find a global minimum of
14Algorithm 2 Nonnegative Matrix Factorization
(NMF)
- Can handle distance matrix with missing elements
with our simple extension. - Non-negative constraint
- Iterative method converges to local minima
quickly - See paper for details.
15Do they work?
- Data sets
- NLANR AMP
- 100x100. Hosts were mostly in US, distance is
minimum RTT ping time. - P2PSim
- 1000x1000. Hosts were obtained from an
Internet-scale Gnutella trace. Distances were
collected by the King method IMW02 - Comparison
- SVD and NMF based on matrix factorization model
- LipschitzPCA based on Euclidean embedding model
(used in ICS and VL) - Error function
16Accuracy comparison
17So far
- Matrix factorization model seems promising
- Accurate, flexible
- How to build a scalable system on this model?
18IDES Internet Distance Estimation Service
(Xnew,Ynew)?
(Xnew2,Ynew2)?
The distances are not measured, but can be
predicted
19Vectors of ordinary hosts
- We hope
- So, minimize the least-squares
(for all landmark i)
20Practical concerns
- What if some landmarks fail?
- How to reduce the load on the landmarks?
- How many dimensions?
- How scalable, robust is IDES?
- Answered in our paper
21Evaluation Efficiency accuracy
- Datasets
- Landmarks were selected randomly
- Comparison
- IDES/SVD and IDES/NMF
- Euclidean embedding
- GNP
- ICS
22Efficiency
Environment Pentium 4 3.2GHz CPU, 2G memory GNP
is obtained from the original release written in
C. IDES and ICS are implemented in Matlab.
23Accuracy
P2PSim dataset (d8)
AMP dataset (d8)
24Conclusions
- A new model based on matrix factorization
- Simple to implement
- No constraints on network distances
- Two algorithms SVD, NMF
- Internet Distance Estimation Service
- Efficient, accurate, robust.
25Thank you!Questions?