Large-Scale Network Analysis with the Boost Graph Libraries - PowerPoint PPT Presentation

About This Presentation
Title:

Large-Scale Network Analysis with the Boost Graph Libraries

Description:

Large-Scale Network Analysis with the Boost Graph Libraries ... efficient Easy to port from Python to C++ Can port from sequential to parallel Always growing, ... – PowerPoint PPT presentation

Number of Views:143
Avg rating:3.0/5.0
Slides: 35
Provided by: Douglas363
Category:

less

Transcript and Presenter's Notes

Title: Large-Scale Network Analysis with the Boost Graph Libraries


1
Large-Scale Network Analysis with the Boost Graph
Libraries
  • Douglas Gregor
  • Open Systems Lab
  • Indiana University
  • dgregor_at_osl.iu.edu

2
What are the BGLs?
  • A collection of libraries for computation on
    graphs/networks.
  • Graph data structures
  • Graph algorithms
  • Graph input/output
  • Common design
  • Flexibility/customizability throughout
  • Obsessed with performance
  • Common interfaces throughout the collection
  • All open source, freely available online

Intro
3
The BGL Family
  • The Original (sequential) BGL
  • BGL-Python
  • The Parallel BGL
  • Parallel BGL-Python

Intro
4
The Original BGL
  • The largest and most mature BGL
  • 7 years of research and development
  • Many users, contributors outside of the OSL
  • Steadily evolving
  • Written in C
  • Generic
  • Highly customizable
  • Efficient (both storage and execution)

Intro
BGL
5
BGL Graph Data Structures
  • Graphs
  • adjacency_list highly configurable with
    user-specified containers for vertices and edges
  • adjacency_matrix
  • compressed_sparse_row
  • Adaptors
  • subgraphs, filtered graphs, reverse graphs
  • LEDA and Stanford GraphBase
  • Or, use your own

Intro
BGL
6
Original BGL Algorithms
  • Searches (breadth-first, depth-first, A)
  • Single-source shortest paths (Dijkstra,
    Bellman-Ford, DAG)
  • All-pairs shortest paths (Johnson,
    Floyd-Warshall)
  • Minimum spanning tree (Kruskal, Prim)
  • Components (connected, strongly connected,
    biconnected)
  • Maximum cardinality matching
  • Max-flow (Edmonds-Karp, push-relabel)
  • Sparse matrix ordering (Cuthill-McKee, King,
    Sloan, minimum degree)
  • Layout (Kamada-Kawai, Fruchterman-Reingold,
    Gursoy-Atun)
  • Betweenness centrality
  • PageRank
  • Isomorphism
  • Vertex coloring
  • Transitive closure
  • Dominator tree

Intro
BGL
7
Task Biconnected Components
Input Graph
Output Graph
Articulation points B G A
Intro
BGL
8
Define a Graph Type
  • Determine vertex/edge propertiesstruct Vertex
    string name struct Edge int bicomponent
  • Determine the graph typetypedef
    adjacency_listlt /EdgeListS/ vecS,
    /VertexListS/ vecS,
    /DirectedS/ undirectedS,
    /VertexProperty/ Vertex,
    /EdgeProperty/ Edgegt Graph

Intro
BGL
9
Read in a GraphViz DOT File
  • Build an empty graphGraph g
  • Map vertex propertiesdynamic_properties
    dyndyn.property(node_id,
    get(Vertexname, g))
  • Read in the GraphViz graphifstream
    in(biconnected_components.dot)read_graphviz(in
    , g, dyn)

Intro
BGL
10
Run Biconnected Components
  • Keep track of the articulation pointsvectorltGrap
    hvertex_descriptorgt art_points
  • Compute biconnected componentsbiconnected_compon
    ents (g, get(Edgebicomponent, g),
    back_inserter(art_points))

Intro
BGL
11
Output results
  • Attach bicomponent number to the label property
    of edgesdyn.property(label,
    get(Edgebicomponent, g))
  • Write results to another GraphViz fileofstream
    out(bc_out.dot)write_graphviz(out, g, dyn)
  • Show articulation pointscout ltlt Articulation
    points for (int i 0i lt art_points.size()
    i) cout ltlt gart_pointsi.name ltlt

Intro
BGL
12
Task Biconnected Components
Input Graph
Output Graph
Articulation points B G A
Intro
BGL
13
Original BGL Summary
  • The original BGL is large, stable, efficient
  • Lots of algorithms, graph types
  • Peer-reviewed code with many users, nightly
    regression testing, etc.
  • Performance comparable to FORTRAN.
  • Who should use the BGL?
  • Programmers comfortable with C
  • Users with graph sizes from tens of vertices to
    millions of vertices

Intro
BGL
14
BGL-Python
  • Python is ideal for rapid prototyping
  • Its a scripting language (no compiler)
  • Dynamically typed means less typing for you
  • Easy to use you already know Python
  • BGL-Python provides access to the BGL from within
    Python
  • Similar interfaces to C BGL
  • Easier to learn than C
  • Great for scripting, GUI applications
  • help(bgl.dijkstra_shortest_paths)

Intro
BGL
Python
15
Example Biconnected Components
import boost.graph as bgl Pull in the BGL
bindings g bgl.Graph.read_graphviz("biconnected_
components.dot") Compute biconnected
components and articulation points bicomponent
g.edge_property_map(int) art_points
bgl.biconnected_components(g, bicomponent)
Save results with bicomponent numbers as edge
labels g.edge_propertieslabel
bicomponentg.write_graphviz("biconnected_componen
ts_out.dot") print "Articulation points
", node_id g.vertex_propertiesnode_id for v
in art_points print node_idv, , print ""
Intro
BGL
Python
16
Wrapping the BGL in Python
  • BGL-Python is not a
  • port
  • reimplementation
  • BGL-Python wraps the C BGL
  • Python calls translate to C calls
  • C can call back into Python
  • Most of the speed of C
  • Most of the flexibility of Python

17
Performance Shortest Paths
Intro
BGL
Python
18
BGL-Python Summary
  • BGL-Python is all about tradeoffs
  • More gradual learning curve
  • Faster time-to-solution
  • Lower performance
  • Our typical approach
  • Prototype in Python to get your ideas down
  • Port to C when performance matters

Intro
BGL
Python
19
(No Transcript)
20
The Parallel BGL
  • A version of the C BGL for computational
    clusters
  • Distributed memory for huge graphs
  • Parallel processing for improved performance
  • An active research project
  • Closely related to the original BGL
  • Parallelizing BGL programs should be easy

Intro
BGL
Parallel
Python
21
Parallel BGL Distributed Graphs
distributed across 3 processors.
Intro
BGL
Parallel
Python
22
Parallel Graph Algorithms
  • Breadth-first search
  • Eager Dijkstras single-source shortest paths
  • Crauser et al. single-source shortest paths
  • Depth-first search
  • Minimum spanning tree (Boruvka, Dehne Götz)
  • Connected components
  • Strongly connected components
  • Biconnected components
  • PageRank
  • Graph coloring
  • Fruchterman-Reingold layout
  • Max-flow (Dinics)

Intro
BGL
Parallel
Python
23
Performance Sparse graphs
24
Scalability (547k vertices/node)
Up to 70M Vertices 1B Edges Small-World Graph
25
Performance vs. CGMgraph
96k vertices 10M edges Erdos-Renyi
17x
30x
Intro
BGL
Parallel
Python
26
Parallel BGL Summary
  • The Parallel BGL is built for huge graphs
  • Millions to hundreds of millions of nodes
  • Distributed-memory parallel processing on
    clusters
  • Future work will permit larger graphs
  • Parallel programming has a learning curve
  • Parallel graph algorithms much harder to write
  • Distributed graph manipulation can be tricky
  • Parallel BGL is an active research library

Intro
BGL
Parallel
Python
27
Distributed Graph Layout
Intro
BGL
Parallel
Python
28
Parallel BGL in Python
  • Preliminary support for the Parallel BGL in
    Python
  • Just import boost.graph.distributed
  • Similar interface to sequential BGL-Python
  • Several options for usage with MPI
  • Straight MPI mpirun -np 2 python script.py
  • pyMPI allows interactive use of the interpreter
  • Initially used to prototype our distributed
    Fruchterman-Reingold implementation.

Intro
BGL
Parallel
Python
29
Porting for Performance
Intro
BGL
Parallel
Python
Porting
30
Which BGL is Right for You?
  • Is any BGL right for you?
  • Depends on how large your networks are
  • Up to 1/2 million vertices, any BGL will do
  • C BGL can push to a couple million vertices
  • For tens of millions or larger, Parallel BGL only
  • Other considerations
  • You can prototype in Python, port to C
  • Algorithm authors might prefer the original BGL
  • Parallelism is very hard to manage

Intro
BGL
Parallel
Python
Porting
31
Conclusion
  • The Boost Graph Library family is a collection of
    full-featured graph libraries
  • All are flexible, customizable, efficient
  • Easy to port from Python to C
  • Can port from sequential to parallel
  • Always growing, improving
  • Is one of the BGLs right for you?
  • A typical build or buy decision

Intro
BGL
Parallel
Python
Porting
Conclusion
32
For More Information
  • (Original) Boost Graph Libraryhttp//www.boost.or
    g/libs/graph/doc
  • Parallel Boost Graph Libraryhttp//www.osl.iu.edu
    /research/pbgl
  • Python Bindings for (Parallel) BGLhttp//www.osl.
    iu.edu/dgregor/bgl-python
  • Contact us!
  • Douglas Gregor ltdgregor_at_osl.iu.edugt
  • Andrew Lumsdaine ltlums_at_osl.iu.edugt

Intro
BGL
Parallel
Python
Porting
Conclusion
33
Other BGL Variants
  • QuickGraph (C)http//www.codeproject.com/cs/misc
    ctrl/quickgraph.asp
  • Ruby Graph Libraryhttp//rubyforge.org/projects/r
    gl/
  • Rooster Graph (Scheme)http//savannah.nongnu.org/
    projects/rgraph/
  • RBGL (an R interface to the C
    BGL)http//www.bioconductor.org/packages/bioc/1.8
    /html/RBGL.html
  • Disclaimer These are all separate projects. We
    do not maintain them.

Intro
BGL
Parallel
Python
Porting
34
Comparative Performance
Intro
BGL
Write a Comment
User Comments (0)
About PowerShow.com