Title: I2.2: Analysis of significant substructures in time-varying networks
1I2.2 Analysis of significant substructures in
time-varying networks
Ambuj Singh (in collaboration with P. Bogdanov,
M. Mongiovi, X. Yang) NS-CTA INARC Mid-Year
Review March 2011
03/22/11
2Dynamic networks
- Dynamic networks are commonplace
- online interaction networks
- Twitter, Wikipedia, LinkedIn, Facebook, ..
- mobile networks
- Cyber-physical scenario (EDIN, INARC)
- virus propagation (E2.1)
- Generative models to explain the network
structure - preferential attachment Barabasi '99
- forest-fire Lescovec '09
- Markov Chain models (discrete, continuous)
- when, where, what changes Avin '08, Clemente
'08 - Latent space / context models Zheng '05
- Network flow/traffic Daganzo '94, Bickel '01,
Stoev '09 - Disease propagation, blog cascade, SIS Lescovec
'07 - Stochastic actor-based models Snijders '09
03/22/11
3Our focus
- Dynamic edge attributes
- Simplest case
- edge is 1 or -1
- 1 means flow of interest
- congestion, flow above historical threshold
- real values are a general case and can also be
considered - Query find highest scoring substructures in
graph over time - combines graph structure and time
03/22/11
4Motivation traffic congestion
03/22/11
5Re-tweet rate of music in Twitter
03/22/11
6- Motivation
- Problem definition
- Solving for a fixed time interval
- Heuristic for multiple time intervals
- Path Forward
03/22/11
7- A time evolving graph
- G (V, E, Ft(e))
- V set of nodes
- E set of edges
- Ft(e) mapping of edges to -1,1
- Score of an edge e in interval t1,t2 ? Ft(e)
- Score of a subgraph in interval ? score(e), for
all e in the subgraph
1
-1
1
-1
1
-1
-1
-1
1
-1
-1
1
-1
-1
1
-1
1
1
-1
-1
t1
t2
t3
t4
1
-1
1
-1
1
-1
-1
-1
-1
-1
-1
-1
1
1
1
-1
1
-1
1
-1
03/22/11
8Prize-collecting Steiner Tree (PCST)
- Given a graph G(V, E) with positive node weights
p(v) and negative edge weights c(e), find a
subtree T (V,E) such that - Goemans-Williamson Minimization (GW-PCST)
- Net Worth Maximization (NW-PCST)
- Both are NP-hard (equivalent objective functions)
Johnson00 - GW-PCST has an approximation factor 2-1/(n-1).
- The rooted version of NW-PCST is NP-hard to
approximate within any constant factor Feig 01
GW(T) ? c(e) ? p(v)
e in E
v not in V
NW(T) ? p(v) - ? c(e)
e in E
v in V
03/22/11
9Why the same guarantee doesnt hold for NW?
APX
- In this specific example
- GW-PCST
- APX 3(k-1)
- OPT 2k
- ratio 2/3
- NW-PCST
- OPT k
- APX 3
- ratio k/3
OPT
3
2
3
2
0
2
k
3
2
3
Optimal solution the whole graph
03/22/11
10Merge-and-refine approximation
- Merge nodes into clusters in a bottom-up fashion
- shortest-path metric graph using edge costs
- Merge triangle and star structures considering
both node values and interconnect cost - Multiple refinement iterations
- Approximation quality
- OPT lt APX cN(OPT), where N is the cost of
interconnection - Good approximation for instances in which there
are cheaply connected clusters of high-prize
nodes - Challenges
- Relatively high computational cost due to all
pairs shortest path computation
03/22/11
11An example
- Aggregate edge values within the interval
- Transform the edge-weighted graph into NW-PCST
- Apply the Merge-and-refine approximation
03/22/11
12Running time of merge-and-refine
- APSP comprises 90 of the approximation running
time - Takes more than a second for N360 for one
interval
03/22/11
13Baseline solution across time
- Find the best subgraph in time by exhaustive
enumeration - Consider all O(t2) intervals
- Apply the solution for a fixed interval in each
- Take the best obtained subgraph in all intervals
- Polynomial cost, but impractical for real-world
problems - The highway system of Southern California has
4k edges with live-traffic measurements - The Autonomous Systems (AS)-level Internet
backbone has hundreds of thousand of links - The baseline solution would not be practical for
networks of this scale - Need for scalable solutions of acceptable quality
03/22/11
14Best-first approach using bounds
- Idea reduce the number of calls to
Merge-and-refine - Estimate solutions for different intervals
- Evaluate the most "promising" intervals first
- Prune intervals that do not contain the best
solution - Bound the solution in an interval
- Computationally simple to compute
- Effective in terms of pruning power
- Best first procedure
- Order intervals by their upper bound
- Prune infeasible intervals using lower bound
03/22/11
15Upper bound (UB)
- Offline
- Consider a hyper-graph in which original edges
become nodes and original nodes become
hyper-edges - Split the original edges into k partitions via
hyper-graph partitioning - Maintain edges at partition "boundaries
- Online UB estimation for a fixed interval
- UB of a partition is the aggregate of its
positive edges - Edges between partitions
- 0 cost if there is at least one positive boundary
edge - cheapest boundary edge otherwise
- Solve the NW-PCST on the obtained coarse-level
graph
03/22/11
16Upper bound example
03/22/11
17Upper bound effectiveness
- The upper bound is more effective if
- Partitions are well connected (small diameter)
- Edges within partitions are correlated
- Boundary edges are minimal and have expected
value closer to -1 than within-partition edges - The upper bound is a coarse aggregation of the
original graph - Coarseness is controlled by partitions
- Trade-off between efficiency and effectiveness
03/22/11
18Upper bound quality
- Random Markovian graph (N150,M180,T300).
- Number of partitions 2-64.
- Random 64 is a random partitioning of edges into
64.
03/22/11
19Lower bound
- Local iterative search in the solution space
within an interval - Simulated Annealing (SA) procedure that
grows/shrinks a subgraph within an interval - Possible moves add/remove an edge from an
existing solution - Allow sub-optimal moves according to an annealing
schedule - Better quality than simple greedy algorithm
- Due to sub-optimal moves, high-score clusters can
be joined even if there are more than 2-hops away - Better running time than Merge-and-refine
- No computation of all pairs shortest paths
03/22/11
20Summary
- Dynamic graphs with changing edge attributes
- Simplest query find the highest scoring
substructure - Heuristics under development
- Approximation guarantee
- Empirical validation on
- traffic network
- twitter messages
03/22/11
21Path forward
- Maximal scoring subgraph is a building block for
richer queries and analyses - What is the structure of a congestion? Global
(short and large), longitudinal (prolonged and
localized) or a combination of both? - What characterizes the evolution of a network?
- How do different network regions compare?
- Is evolution similar across networks of different
genres? - Index structures
- Use statistical models for indexing real-world
networks - Exploit locality within the network and locality
in time - Represent the network at different level of
coarseness - Queries constrained by
- Time
- Neighborhood
- Similarity queries
03/22/11
22Connections
- Queries/analysis of information flow (E 2.1)
-
- Queries on mobile networks (E 2.2, E2.3)
- Formal modeling of time (E1.1)
- Dynamic network models (E2.1)
03/22/11
23Army relevance
- Query/analysis of mobility networks
- Cyber-physical scenario
- Query/analysis of evolving networks
- Patterns of behavior in composite networks
- Find terrorist groups using temporal interactions
03/22/11
24Publications
- P. Bogdanov, B. Baumer, P. Basu, A. Singh, and
A. Bar-Noy, Discovering Influential Groups of
Agents Using Composite Network Analysis,
submitted to NetSci 2011. - P. Bogdanov, Nicholas D. Larusso and Ambuj K.
Singh, Towards Community Discovery in Signed
Collaborative Interaction Networks, published in
SIASP at 2010 IEEE International Conference on
Data Mining, 2010. - K. Macropol and A. Singh, Content-based Modeling
and Prediction of Information Dissemination,
submitted to ASONAM 2011. - M. Mongiovi, A. Singh, X. Yan, B. Zong, K.
Psounis, An Indexing System for Mobility-aware
Information Management, submitted to VLDB. - Ziyu Guan, Jian Wu, Zheng Yun, Ambuj K. Singh and
Xifeng Yan, Assessing and Ranking Structural
Correlations in Graphs, to appear at SIGMOD 2011.
- Nicholas D Larusso and Ambuj K. Singh, Synopses
for Probabilistic Data over Large Domains, in
EDBT 2011.
03/22/11
25THANK YOU!
03/22/11
26- Markovian - the graph state is a Markov Chain
- Fixed set of nodes
- Edges at time t depend on edges at time t-1
- Cover Time of Dynamic Graphs Avin et Al. '08
- Introduction of Markovian Dynamic Graphs
- Exponential cover time
- Lazy random walks
- Information spread in Markovian graphs Clementi
'09 - Edge-Markovian
- Geometric Markovian - node mobility
- Evolving range-dependent graphs Grindrod '09
- Edge dynamics as a birth/death process
03/22/11
27Dynamic models of traffic
- The cell transmission model (CTM) Daganzo '94
- Dynamic model of highway traffic
- Inspired by hydrodynamic theory
- Traffic Flow on a Freeway Network Bickel '01
- Time and context Markovian model of the traffic
flow - The state of a segment at time t depends on the
state of its neighbors and and itself at time t-1 - Model of a single highway. How about junctions?
- Computer Network Traffic Stoev '09
- Statistical model of traffic flow across all
links - Applied to traffic prediction
03/22/11
28 ?Avin '08 Chen Avin and Zvi Lotker. "How to
Explore a Fast-Changing World." 2008 Bickel
'01 Peter Bickel, Chao Chen, Jaimyoung Kwon,
and John Rice. "Traffic Flow on a Freeway
Network" Electrical Engineering, 2001. Clementi
'09 Andrea Clementi, Angelo Monti, Francesco
Pasquale, and Riccardo Silvestri. "Information
Spreading in Stationary Markovian Evolving
Graphs". Informatica, 2009 Feig01 J.
Feigenbaum, C. Padimitriou, and S. Shenker,
Sharing the Cost of Multicast Transmissions,
JCSS, 63, 21-41, 2001. Grinford '09 Peter
Grindrod and Desmond J. Higham. "Evolving Graphs
Dynamical Models, Inverse Problems and
Propagation." 2009 Johnson00 D. Johnson, M.
Minkoff, S. Phillips, The Prize Collecting
Steiner Tree Problem Theory and Practice, ACM
SODA, 2000. Lescovec '07 Jure Leskovec, Mary
McGlohon, Christos Faloutsos, Natalie Glance,
Matthew Hurst "Cascading behavior in large blog
graphs Patterns and a Model", SDM, 2007
03/22/11
29Background literature
- Ribeiro '11 B. Ribeiro, D. Figueiredo, E. de
Souza e Silva, and D. Towsley, "Characterizing
Dynamic Graphs with Continuous-time Random
Walks" SIGMETRICS 2011. - Snijders '09 Tom A.B. Snijders, Gerhard G. van
de Bunt, Christian E.G. Steglich, "Introduction
to Stochastic Actor-Based Models for Network
Dynamics", Social Networks, 2009 - Stoev '09 Stilian A. Stoev, George Michailidis,
and Joel Vaughan. "Global Modeling and Prediction
of Computer Network", Arxiv 2009 - Zheng '05 A. X. Zheng and A. Goldenberg "A
Generative Model for Dynamic Contextual
Friendship Networks", Learning, 2005
03/22/11