Title: Graphing to visualize data
1Graphing to visualize data
- Satish Raghunath
- rsatish_at_alum.rpi.edu
- Shiv Kalyanaraman
- Google Shiv RPI
- shivkuma_at_ecse.rpi.edu
- http//www.ecse.rpi.edu/Homepages/shivkuma
2Overview
- Issues with graphing
- Types of graphs
- Examples of graph usage what you get out of
them - Art how to choose what graph to use?
- Graphing Tools
- Pitfalls and mistakes in graphing
- Advanced visualization
- In class work reviewing graphing use in selected
technical papers
3Thoughts on Presentation Styles
- Primary purpose illustrate to help understand
The goal of simulation is intuition, not
numbers," - R.W. Hamming
- Corollary dont dump data on the reader.
- Distill it into presentations that give insight
instead
4Descriptive Statistics
- Involves
- Collecting Data
- Presenting Data
- Characterizing Data
- Understanding data distill insights!
50
25
0
Q1
Q2
Q3
Q4
?X 30.5 S2 113
Insights Somewhat skewed Bell shape perhaps a
Poisson (distrn) would fit?
Statistics obtained from data
5To graph or not to graph
- Use graphs when
- Trends in data are not obvious
- It is hard to explain the X-Y relationship in
words - Consider tables if
- The number of data-points are small
- Reader might find exact value of data-points
useful
6Summary Table Frequencies
- 1. Lists Categories No. Elements in Category
- 2. Obtained by Tallying Responses in Category
- 3. May Show Frequencies (Counts), or Both
Row Is Category
Tally
Major
Count
Accounting
130
Economics
20
Management
50
Total
200
7Example Tables from Networking
8What kind of graph?
- Pie-charts to depict fraction of a whole
- Bar-charts when data-points few and table is not
suitable - Line-plots when there are a lot of data-points
- Box-plots if statistical inference is drawn
shows 1st, 2nd, 3rd quartile for each point. - Scatter-plots, 3-d plots only if necessary
AVOID complex graphs
9Pie Chart
- 1. Shows Breakdown of Quantity into Categories
- 2. Useful for Showing Relative Differences
- 3. Angle Size
- (360) x (Percent)
Majors
Mgmt.
Econ.
25
10
36
Acct.
65
(360) (10) 36
10Pie Chart Networking Example
Source http//www.caida.org/bhuffake/papers/skit
viz/
11Another eg VPN Classification
12Bar Chart
Horizontal Bars for Categorical Variables
Bar Length Shows Frequency or
Major
Mgmt.
Equal Bar Widths
Econ.
1/2 to 1 Bar Width
Acct.
0
50
100
150
Zero Point
Frequency
Percent Used Also
13Networking Example Bar Chart
14Example Analysis with Bar Charts
- LT-TCP is able to
- reduce timeouts drastically
- keep the queue non-empty maximizing throughput
and capacity utilization. - minimize use of FEC to level needed
15Histogram for distributions
Class
Freq.
Count
15 but lt 25
3
5
25 but lt 35
5
35 but lt 45
2
4
Frequency Relative Frequency Percent
3
Bars Touch
2
1
0
0 15 25 35 45 55
Lower Boundary
16Recall Real Example Histogram
- What is the fairness between TCP goodputs when we
use different queuing policies? - What is the confidence interval around your
estimates of mean file size? - Note distribution need not just be a
probability/frequency distribution
17Dot Chart or Scatterplots
Major
Line Length Shows Frequency or
Like Horizontal Bar Chart
Mgmt.
Horizontal Lines for Categorical Variables
Equal Spacing
Econ.
Acct.
0
50
100
150
Zero Point
Frequency
Percent Used Also
18Scatter Plots
19Scatter plots with trends
20WiFi Analysis Scatter Plots
- http//www.sigcomm.org/sigcomm2004/papers/p442-agu
ayo1111.pdf
21Line ChartsExampleComparative Performance
Note also plots confidence intervals!
22Line Plots for Distributions Example
- Hop count and RTT distributions
Source http//www.caida.org/bhuffake/papers/skit
viz/
23Recall Distribution Shape
- 1. Describes How Data Are Distributed
- 2. Measures of Shape
- Skew Symmetry
Right-Skewed
Left-Skewed
Symmetric
Mean
Median
Mode
Mean
Median
Mode
Mode
Median
Mean
24Box Plot
- Graphical Display of Data Using5-Number Summary
Median
Q
Q
X
X
3
1
largest
smallest
4
6
8
10
12
253D Graphs Example
- Illustrates a complex parameter response surface
...
263D Plots N/w Example Code Red Worm Analysis
- http//www.prism.uvsq.fr/users/qst/Tomography/Arti
cles_jmf/renesys_bgp_instabilities2001.pdf - http//www.caida.org/outreach/isma/0112/talks/andy
o/index.pdf - http//www.renesys.com/resource_library/Renesys-NA
NOG23.pdf
27Contd
28Tools Gnuplot
- To use with data-generating programs for
repetitive plotting - E.g. generate the plot of throughput for every 1
hour interval in the last week. - http//www.gnuplot.info
- TIP Export gnuplot plots as .fig file and edit
it in xfig for greater flexibility
29Tools XmGrace
- For more intricate details (e.g., creating
error-bars, different shades for bar-charts)
GUI-driven, very user friendly. - http//plasma-gate.weizmann.ac.il/Grace/
- Exports images to EPS (good for LaTeX documents),
PNG (good for PowerPoint) etc. - Can also run on Windows on top of Cygwin!
30Tools MATLAB
- For complex 3-d and other statistical plots like
box-plots, scatter-plots and in general if
enormous quantities of data is involved. - http//www.mathworks.com
31Tools Excel Data Presentations
- Open up Excel to a new Worksheet.
- Code a data set as below
- Blue 34
- White 68
- Red 25
- Green 50
- Explore simple data presentation possibilities
32Graphs things to watch out
- Purpose illustrate entire time-series or
response distribution - Label the x- and y-axis
- Check what units the x- and y-axes are in (not
goats or sheep!) - Check if either scale is logarithmic (changes
meaning) - Check where is the origin (or zero point) for
each axis! - After understanding WHAT is being plotted, close
your eyes and ask - what will different patterns on this graph imply
(relative to what I want to understand)? - See if the relative performance is over- or
under-emphasized (if two systems are being
compared) - Several examples in the Jain textbook
33Errors in Presenting Data
- 1. Using Chart Junk
- 2. No Relative Basis in Comparing Data Batches
- 3. Compressing the Vertical Axis
- 4. No Zero Point on the Vertical Axis
34Chart Junk
Bad Presentation
Good Presentation
Minimum Wage
Minimum Wage
1960 1.00
4
1970 1.60
2
1980 3.10
0
1990 3.80
1960
1970
1980
1990
35No Relative Basis
Good Presentation
Bad Presentation
As by Class
As by Class
Freq.
300
30
200
20
100
10
0
0
FR
SO
JR
SR
FR
SO
JR
SR
36Compressing Vertical Axis
Good Presentation
Bad Presentation
Quarterly Sales
Quarterly Sales
50
200
25
100
0
0
Q1
Q2
Q3
Q4
Q1
Q2
Q3
Q4
37No Zero Point on Vertical Axis
Good Presentation
Bad Presentation
Monthly Sales
Monthly Sales
45
60
42
40
39
20
0
36
J
M
M
J
S
N
J
M
M
J
S
N
38Graphing Practices In pictures ?
39Graphing Practices
40Graphing Practices
41Graphing Practices.
42Checklist In textbook
43More Complex Visualizations
- Internet topology aspects
- CAIDA skitter project
http//www.caida.org/tools/measurement/skitter/vis
ualizations.xml
44More
45The End