Title: RIS 2004: Data Analysis and Visualization (DAV)
1RIS 2004 Data Analysis and Visualization (DAV)
- To envision information and what bright and
splendid visions can result is to work at the
intersection of image, word, number, art. - The instruments of DAV are those of writing
and typography, of managing large data sets and
statistical analysis, of line and layout and
color. - Exploratory Data Analysis (EDA) is central to DAV
Edward R. Tufte. 1990. Envisioning Information.
Graphics Press.
2Exploratory Data Analysis (EDA)
- EDA is
- a state of mind
- a way to think about DAV
- a way of doing DAV
- Underlying assumption of EDA
- the more one knows about the data, the more
effectively the data can be used to develop,
test, and refine theory.
Hartwig, F. B. E. Dearing. 1975. Exploratory
Data Analysis. Sage Publications, Inc.
3Exploratory Data Analysis (EDA)
- EDA is a process
- the breakdown of data into its important
components - not the analysis of data by means of statistics
alone (i.e., by numerical summaries alone to the
exclusion of other methods). - EDA is not just statistical analysis
Hartwig, F. B. E. Dearing. 1975. Exploratory
Data Analysis. Sage Publications, Inc.
4EDA and Statistics
- Consequences of considering EDA Statistics
- The importance of visual displays of data is
downgraded. - Statistics becomes more important than the
graphical representation of the data. - Statistical analysis is Confirmatory (rather than
Exploratory). - statistics usually only considers two
alternatives (i.e., Ho and Ha) - Statistical analysis lacks the openness required
of EDA
Hartwig, F. B.E Dearing. 1975. Exploratory Data
Analysis. Sage Publications, Inc.
5Exploratory Data Analysis.
EDA seeks to maximize what can be learned from
the data
- EDA adheres to 2 basic principles
- Skepticism
- One should be skeptical of measures that
summarize data because they can sometimes conceal
or misrepresent information. - Openness
- One should be open to unanticipated patterns in
the data because these patterns can be the most
revealing outcomes of the analysis.
Hartwig, F. B. E. Dearing. 1975. Exploratory
Data Analysis. Sage Publications, Inc.
6EDA Summary
- When applied to data analysis, the skepticism and
openness principles of EDA imply a flexible,
data-centered approach which is open to
alternative models of relationships and
alternative scales for expressing variables and
which emphasizes visual representation of the
data.
Hartwig, F. B. E. Dearing. 1975. Exploratory
Data Analysis. Sage Publications, Inc.
7EDA Principle 1
Skepticism One should be skeptical of measures
that summarize data because they sometimes can
conceal or misrepresent information.
8Skepticism An ExampleCarabid Beetle Distribution
Rossi et al. 1992. Geostatistical Tools for
Modeling and Interpreting Ecological Spatial
Dependence. Ecol. Monographs 62 277-314.
9Summary StatisticsCarabid Beetle Distribution
Rossi et al. 1992. Geostatistical Tools for
Modeling and Interpreting Ecological Spatial
Dependence. Ecol. Monographs 62 277-314.
10EDA Principle 2
Openness One should be open to
unanticipated patterns in the data because these
patterns can be the most revealing outcomes of
the analysis.
11Openness An ExampleThe Lorenz Attractor Chaos
- Edward Lorenz attempted to model and forecast
weather patterns
- Chaos
- Sensitive dependence to initial conditions
- Orderly disorder
- Bounded randomness
Gleick, J. 1987. Chaos Making a New Science.
Penguin Books.
12The Power of Visual Representation
Because of skepticism for statistical summaries
of data, major emphasis in the EDA is placed on
visual representation (Hartwig Dearing.
1975) Of all the methods for analyzing and
communicating statistical information,
well-designed data graphics (figures, charts,
etc.) are usually the simplest, and at the same
time the most powerful (Tufte 1983).
Hartwig, F. B.E Dearing. 1975. Exploratory Data
Analysis. Sage Publications, Inc. Tufte, E.R.
1983. The Visual Display of Quantitative
Information. Graphics Press.
13The Power of A Visual DisplayJohn Gotti The
Teflon Don.
14Gotti Trial The Defenses Chart
1510 Rules for Drawing Graphs (1975)
- Center the graph on the page.
- Graph axes should be labeled clearly with both
the variables being measured and the units of
measurements. - Axes label should be parallel to the proper axis
and centered. - Grid marks should be drawn inside the axes and
equidistant from each other. - Assign numerical values to each grid mark.
- Plot data points at appropriate intervals.
- Connect the plot points sequentially (from left
to right). - If there are more than 1 dependent variable, draw
a legend whenever possible in the upper right
hand corner and within the axes boundaries. - When you have more than 1 dependent variable,
assign a distinct geometric form to each of the
dependent variables. - There should be no more than one graph on a sheet
of paper
Katzenberg, A. C. 1975. How to Draw Graphs.
Donnelley Sons, Co.
16Example Keeping Fit
What does this graph tell you?
17Additional Rules for Graphical Excellence (1983)
- Induce the reader to think about the substance
rather than about methodology, graphics design,
or the technology of graphics production. - Avoid distorting what the data have to say.
- Present many numbers in a small space.
- Encourage the eye to compare different pieces of
data. - Reveal the data at several levels of detail, from
broad overview to fine structure. - Graphics must be closely integrated with the
statistical and verbal descriptions of the data
set
Tufte, E.R. 1983. The Visual Display of
Quantitative Information. Graphics Press.
18Classified Satellite ImageImperial Valley, CA
Induce the reader to think about the substance
rather than about methodology, graphics design,
or the technology of graphics production.
19Diamonds are a girls best friend
- Induce the reader to think about the substance
- Distorting the data (dont insult the reader)
- Graph axis should be labeled clearly
- Grid marks should be drawn inside the axes
The graph is chockablock with cliché and
stereotype, coarse humor, and a content-empty
third dimension. It shows a contempt both for
information and for the audience (Tufte1990)
20Distorting what the data have to say.
Gee-Whiz Graphics
The graph on the left makes an increase of under
4 look like an increase of gt400
Huff, D. and I. Geis. 1954. How to Lie with
Statistics. W.W. Norton Co., Inc.
21Another Gee-Whiz Graph
Huff, D. and I. Geis. 1954. How to Lie with
Statistics. W.W. Norton Co., Inc.
22More Distortions
Monmonier, M. 1996. How to Lie with Maps. Univ.
Of Chicago Press
23The Lie Factor
Lie Factor 2.8
Tufte, E.R. 1983. The Visual Display of
Quantitative Information. Graphics Press.
24Graphics must not quote data out of context
Tufte, E.R. 1983. The Visual Display of
Quantitative Information. Graphics Press.
25Present Many Numbers (pieces of data) in a Small
Space
Wind-Rose Diagrams
A single figure of 4 graphs. a, b, and c show
data on wind directions from 3 weather stations.
Graph d presents the mean of the three stations
(6 10 AM, July 1 1994 to June 30 1995)
26Playfairs Graph Many pieces of data
Price
Wages
Price
Year
Graph shows 3 parallel time-series (prices,
wages, and the reigns of British kings and
queens)
27Joseph Minards Napolean March on Russia (1812)
100,000
422,000
10,000
28Florence Nightingales CoxComb Diagram
Nightingale's Coxcomb (1858) is notable for its
display of frequency by area, like the pie chart.
But, unlike the pie chart, the Coxcomb keeps
angles constant and varies radius
29Visual Delights
30Many Pieces of Data..Having fun Maps and
Football
Monmonier, M. 1996. How to Lie with Maps. Univ.
Of Chicago Press
31Integration of Graphics and Statistical and
Verbal Descriptions of the Data
Objective To determine the relationship between
the density of an insect on basal leaf 3 and the
density on the whole plant in corn
32Worst Graph Ever?
Five color report, almost happenstance, only five
pieces of data (Tufte 1983).
33Escaping the Flatland
- Even though we navigate daily through a
perceptual world of 3 spatial dimensions and
reason occasionally about higher dimensional
arenas with mathematical ease, the world
portrayed on our information displays is caught
up in the 2 dimensionality of flatlands of paper
and video screen.
Edward R. Tufte. 1990. Envisioning Information.
Graphics Press.
34A SolutionVariography Geostatistics and Kriging
- Geostatistics a branch of applied statistics
that focuses on the detection, modeling, and
estimation of spatial pattern. - Kriging an interpolation procedure that provides
estimates of variables at unsampled locations. - Tobler's First Law of Geographyeverything is
related to everything else, but near things are
more related than distant things.
35Variography Geostatistics and Kriging.
363-D Graphics