Title: Mining Networks through Visual Analytics
1Mining Networks through Visual Analytics
- Incremental Hypothesis Building and Validation
David Auber Romain Bourqui Guy Melançon
CNRS LaBRI UMR 5800 INRIA Futurs
GRAVITÉ Bordeaux, France
2peacokmaps.com
3InfoVis CyberInfraStructure Pajek
- A picture is worth a thousand words
- Chinese proverb (?)
4Tulip BubbleTree
5Graph Viz Framework Tulip
- Its all visual
- R. Feynman (Nobel prize in Physics)
6Internet traffic
7Voronoï Treemaps
- The purpose of computing is insight not numbers
- R. Hamming (1973)
8Cushion Treemaps
9- Visualization uses
computer
graphics to help provide insight on complicated
problems, models or systems - Scientific visualization is exploring data and
information graphically, gaining understanding
and insights into the data - R.A. Earnshaw (a pioneer in computer graphics,
1973)
Munzners Hyperbolic Browser
10Tulip Sugiyama Layout
11Visualize?
- Inselberg creator of parallel coordinates
- Insight through images
- Goal Visual Model to Help our Intuition
- Involves Geometry, Cognition, Art ?
12Visualize?
13Visual graph mining related to security issues
- Recognize structural properties
- Identify key actors
- Identify their neighborhood
- Community structure
- Connectivity between communities
Chess players recognize patterns
14Example from NCTC data
- Extracted about 8000 incidents from WITS
- Identified terrorists groups when possible
(directly or through AFP) - Identified countries where incidents took place
- Added territorial information (continents, world
regions) to help organize the overall map
15Example from NCTC data
- About 8000 incidents
- 9419 nodes
- 18486 edges
- Layout is time consuming
- Does not provide clue about structure
- Filter out incidents with no identified group
16Example from NCTC data
- Interactivity
- Play with network
- Apply various metrics
- Attribute-based node filtering
- Tulip Graph Viz Framework
- Opensource
- Plug-in architecture
- www.tulip-software.org
17Massive data
- Information big bang - Projet How much
information , Berkeley University - In 2001, about 1 exabyte (1 million terabytes) of
data is generated annually worldwide, including
99.997 available only in digital form - In 2003 each individual produces about 800
megabytes per year
18Massive data
- 100 million FedEx transactions / day
- 150 million VISA transactions / day
- 300 millions long distance calls / day over ATTs
network - 35 billions e-mails / day over the world
- 600 billions IP packets / day over DE-CIX backbone
Keim, VIEW Workshop 2006
19Visualization and Moores law
Daniel Keim - Keynote Address, VIEW 2006
20Visualization and Moores law
- Issues that wont be solved by hardware only
- Design interaction together with visualization
- Understand how and why visualization pays
- Collaborate with other fields
- Integrate visualization together with other
technology
NIH-NSF Visualization Research Challenges Report,
2006
21Added value of visual and interactive mining
- KDD Panel The Perfect Data Mining Tool
Ankerst 2002 - The human eye is an excellent tool for spotting
natural patterns - Getting rid of the human in the loop? Wrong
decision! - Increase human participation through
visualization in the data exploration and
knowledge discovery processes
22 Sense making loop
J. Thomas Visual Analytics Initiative
23 Visualization mantras
- Visual Information Seeking Mantra
- Overview, Zoom-in / Filter, and Details on Demand
(Shneiderman, 1996) - Visual Analytics Mantra
- Analyse first, Show the Important, Zoom, filter
and analyse, Details on demand (Keim 2006)
24Visualization pipeline
- A designers view on the visualization process
25Visualize?
Protein interaction network (yeast) Barabàsi 2000
26Organize data prior to visualization
- Layer or hierarchize data based on
- node/edge metrics (eigenvalues, centralities, )
- topological feature detection
- Use relevant drawing methods
- Combine with interaction
27Case study ITA 2000 passenger air traffic
- Cities connect through direct flights
- Edge weights number of passengers
- Questions
- Read motivations of carriers through organization
of the network? - Territorial logic?
- Political? Economical?
28Case study ITA 2000 passenger air traffic
- Cities connect through direct flights
- Edge weights number of passengers
- Questions
- Read motivations of carriers through organization
of the network? - Territorial logic?
- Political? Economical?
29TopoLayout (Topological) Feature-based
Hierarchization
- Search the graph for components of growing
complexity - Subtrees
- Biconnected components ( blocks )
- Grid-like
- Clusters
30TopoLayout (Topological) Feature-based
Hierarchization
- Search the graph for components of growing
complexity - Subtrees
- Biconnected components ( blocks )
- Grid-like
- Clusters
31TopoLayout (Topological) Feature-based
Hierarchization
- Search the graph for components of growing
complexity - Subtrees
- Biconnected components
- Grid-like
- Clusters
32TopoLayout (Topological) Feature-based
Hierarchization
- Search the graph for components of growing
complexity - Subtrees
- Biconnected components
- Grid-like
- Clusters
- Need to identify articulation points (pivots)
- The graph builds into a tree of biconnected
components
33TopoLayout (Topological) Feature-based
Hierarchization
- Search the graph for components of growing
complexity - Subtrees
- Biconnected components ( blocks )
- Grid-like (eigenvalues)
- Clusters
34TopoLayout (Topological) Feature-based
Hierarchization
- Search the graph for components of growing
complexity - Subtrees
- Biconnected components ( blocks )
- Grid-like (eigenvalues)
- Clusters
35TopoLayout
- Components naturally organize as a hierarchy
through the search process
36TopoLayout interaction Grouse
- Explore the graph by unfolding/folding the
hierarchy - The users navigation triggers layout of
components - Higher level graphs (quotient graphs) are built
from metanodes - Improve readability / Less visual elements
- Faster layout, based on topology of quotient
graph - Grouse
37TopoLayout interaction Grouse
- Multilevel hierarchy recursive grouping of
metanodes
38TopoLayout interaction Grouse
- Multilevel hierarchy recursive grouping of
metanodes
39TopoLayout interaction Grouse
- Multilevel Hierarchy for Abstraction Cut
40Multilevel navigation of small world networks
- Small world networks social networks, web
graphs, transportation networks (ITA), - Small world networks organize into several levels
(hierarchy) Adamic, Huberman - Idea capture the hierarchy and use it as a
navigation paradigm
41Small world networks
- Centralities
- Bottleneck passageways
- Network organizes around those pivots nodes
42Small world networks
- Centralities
- Betweenness centrality has high computational
cost (global) - Betweenness centrality
- Eigenvalue centrality
- Prefer local index
- Degree
- Edge strength
43Small world networks
- Edge strength proportion of cycles containing an
edge (length 3 and 4)
(Jaccard 1912) (Tanimoto 1958) Auber et al.
2003 Raddichi et al. 2004
44Small world networks
- Edge strength
- Costs linear time if degree is bounded, otherwise
quadratic
45Small world networks
- Edge strength
- Cost yet lower than most centralities (local
versus global indices) - Incremental local modification of graphs require
local recomputation
46Community structure of small world networks
- Filter out weak edges
- Capture components
- Infer quotient graph (metanodes)
- Recurse over each component
47Community structure of small world networks
- Filter out weak edges
- Capture components
- Infer quotient graph (metanodes)
- Recurse over each component
48Community structure of small world networks
- Filter out weak edges
- Q. What threshold to choose?
- A. Best possible one (!)
- Use quality criteria
- MQ (modularity quality)
49Quality criteria MQ
- C (C1, C2, , Cp) is a clustering of a graph G
50MQ / Nice properties
- MQ varies over a bounded interval -1, 1
- MQ behaves like a Gaussian distribution
51MQ / Nice properties
- MQ behaves like a Gaussian distribution
52Challenge find the best possible clustering
(according to MQ)
- Exhaustive search intractable
- Optimization, search algorithms (hill climbing,
genetic algorithms, bio-mimetics, ) costy - Heuristic exploit node/edge centralities
- Filter out weak edges
- Tickmark possible values for edges
- Find threshold with best MQ
53Filter / Threshold
54Filter / Threshold
55Filter / Threshold
56Hierarchical organization of the network
- The procedure can be iterated to produce a
hierarchy of clusters - Strength of edges is recomputed at each stage
- Threshold is locally chosen for each component
57MQ / Extension
- To take into account the relative size of
clusters - (MQ also naturally extends to fuzzy clustering)
58MQ / Extension
- Extend to various classes of graphs (where F
stands for any adequate edge density function)
59Conclusion Future work
- MQ / Extension to graph hierarchies
60MQ / Extension to graph hierarchies
- Inspired from attribute grammars
61Conclusion Future work
- Study dynamic network
- Streamed / Time-stamped network
- Incremental/local computation/adjustment of
- edge metrics (local metrics)
- MQ (or other possible quality criteria)
62Conclusion
- Interaction is the real added value of
visualization - Must combine with other mining techniques
- Insert combination in sense making loop
63Conclusion
- We are opened and interested to collaborate with
colleagues from other areas, adopting different
perspectives - Learning / Mining /
- Experts / Corporate organizations / Final users
- Any idea for a different multilevel clustering
criteria/approach?
64Conclusion
- We are opened and interested to collaborate with
colleagues from other areas, with other
perspectives - Learning / Mining /
- Experts / Corporate organizations / Final users
- Go visit Tulips website and download the
software (Im here until Friday if you need a
coach !) - www.tulip-software.org
- Guy.Melancon_at_labri.fr
65Credits
- LaBRI UMR 5800, Bordeaux -- Equipe GRAVITÉ /
INRIA Futurs - Guy Melançon
- Maylis Delest
- David Auber
- Patrick Mary
- Tulip Graph Viz Framework
- www.tulip-software.org
- R. Bourqui, U Bx, FR
- D. Archambault, UBC, CA
- T. Munzner, UBC, CA
_at_labri.fr