Small multiples, or the science and art of combining graphs - PowerPoint PPT Presentation

About This Presentation
Title:

Small multiples, or the science and art of combining graphs

Description:

Small multiples, or the science and art of combining graphs Nicholas J. Cox Department of Geography Durham University, UK * What s in a name? roseplot by any other ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 71
Provided by: Iren76
Category:

less

Transcript and Presenter's Notes

Title: Small multiples, or the science and art of combining graphs


1
Small multiples, or the science and art of
combining graphs
  • Nicholas J. Cox
  • Department of Geography
  • Durham University, UK

1
2
Small multiples
  • Good graphics often exploit one simple design
    that is repeated for different parts of the data.
  • Edward Tufte called this the use of small
    multiples.
  • Well-designed small multiples are inevitably
    comparative, deftly multivariate, shrunken,
    high-density graphics.
  • Edward Rolf Tufte (1942)

3
in Stata
  • In Stata, small multiples are supported for
    different subsets of the data with by() or over()
    options of many graph commands.
  • Users can emulate this in their own programs by
    writing wrapper programs that call twoway or 
    graph bar and its siblings.
  • Otherwise, specific machinery offers repetition
    of a design for different variables, such as the
    graph matrix command. 

4
  • Users can always put together their own composite
    graphs by saving individual graphs and then
    combining them using graph combine.
  • This presentation offers further modest
    automation of the same design repeated for
    different data.

5
Original programs discussed
  • sparkline
  • crossplot
  • combineplot
  • designplot
  • and with cameo roles
  • aaplot
  • sepscatter
  • All may be installed from SSC.

5
6
Whats in a name? roseplot by any other name
  • A minor theme here is that definite names are
    needed for programs, even if kinds of graphs do
    not have distinct agreed names.
  • As in advertising, a good name attracts and keeps
    users.
  • As in politics, a bad name can be fatal.

7
  • sparkline
  • The purpose of visualization is insight, not
    pictures
  • Ben Shneiderman (1947)

8
Sparklines
  • The name sparkline was suggested by Edward
    Tufte for intense text-like graphics.
  • Sparklines are typically simple in design,
    sparing of space and rich in data, but they
    include several quite different kinds of graph
    otherwise.
  • The most common kind shows several time series
    stacked vertically.
  • sparkline is a Stata implementation.

8
9
  • Sparklines have long been standard in several
    fields, including physics and chemistry
    (spectroscopy), seismology, climatology, ecology,
    archaeology and physiology (notably
    encephalography and cardiography).
  • Tufte provided an memorable and evocative new
    name and an excellent provocative discussion.
  • The Grunfeld data (webuse grunfeld) are a classic
    dataset in panel-based economics. Ten companies
    were monitored for 193554. They give us a
    simple sandpit.

10
What are we doing here? The problem of time
series graphics
  • Comparisons of time series are a rich and
    challenging area of statistical graphics.
  • The widespread term spaghetti plot hints
    immediately at the difficulties.
  • As always, we want to combine a grasp of general
    patterns with access to individual details.
  • With this in mind, we look at some sparklines of
    the Grunfeld dataset.

10
11
(No Transcript)
12
12
13
13
14
Vertical and horizontal
  • By default sparkline stacks small graphs
    vertically.
  • If several graphs are combined, it is typical to
    cut down on axis labels and rely on differences
    in shape to convey information.
  • Horizontal stacking is also supported, which can
    be useful for archaeological or environmental
    problems focused on variations with depth or
    height.
  • Here is an archaeological dataset as example.

14
15
(No Transcript)
16
Nightingales data
  • Florence Nightingale (18201910) is well
    remembered for her nursing in the Crimean war and
    (within statistical science) for use of
    quantitative arguments.
  • Her most celebrated dataset is often reproduced
    using her polar diagram, but is easier to think
    about as time series.
  • Zymotic (loosely, infectious) disease mortality
    dominates other kinds, so much so that a square
    root scale helps comparison. (A logarithmic scale
    over-transforms here.)

16
17
(No Transcript)
18
  • Source of image
  • http//understandinguncertainty.org/coxcombs

19
19
20
20
21
Would sparkline help?
  • A sparkline display is useful to show relative
    shape, such as times of peaks.
  • We see that seasonality is only part of what is
    being seen.
  • The harsh winter of 18545 coincided with some of
    the hardest battles of the war, but 18556 was
    quite different.
  • But, as often happens, no one graph dominates
    others here.

21
22
22
23
  • crossplot
  • The scatter plot is the workhorse of statistical
    graphics.
  • John McKinley Chambers (1941 )

24
crossplot
  • crossplot is designed as a quick-and-easy way to
    combine scatter plots.
  • The basic syntax is crossplot (yvarlist)
    (xvarlist) and the idea is to plot every y in
    yvarlist against every x in xvarlist.
  • The use of two varlists gives greater flexibility
    than does graph matrix, which produces every
    possible scatter plot for a single varlist.

25
Scatter plot matrices
  • Scatter plot matrices are great, but they can be
    excessive.
  • Their main feature is also a limitation.
  • p variables mean p2 plots all at once, so 10
    means 100, and so forth.
  • (The half option just controls which plots you
    see. )

26
crossplot design
  • crossplot was developed in teaching, especially
    of regression, with the aim of encouraging
    focused comparisons.
  • Originally (1999) crossplot was called cpyxplot,
  • cp meaning Cartesian product, but the name was
    ugly, cryptic and easily forgotten.
  • The syntax had to be as simple as possible.

27
crossplot examples
  • Versions of a response variable versus a key
    predictor.
  • A response variable versus versions of a key
    predictor.
  • Each output versus each input.
  • Principal components versus original variables.
  • First, let us look at four versions of mpg versus
    weight in the auto dataset.

28
(No Transcript)
29
  • Next we look at an audiometric dataset used as a
    multivariate example in the Stata manuals.
  • There are 8 response variables, 4 for left ears
    and 4 for right ears. Here we just focus on the
    16 plots pairing left and right.
  • Another graph could be the 4 plots comparing left
    and right ears at the same frequency, the
    diagonal here.

30
(No Transcript)
31
crossplot syntax for examples
  • crossplot (mpg rt_mpg ln_mpg rec_mpg) weight,
    combine(imargin(small))
  • crossplot (lft) (rght), jitter(1)

32
crossplot syntax extras
  • By default, crossplot is just calling twoway
    scatter followed by graph combine.
  • It follows that recast() is available to recast
    to twoway line or twoway connected.
  • crossplot has an extra sequence() option to label
    graphs to ease preparation of graphics for papers
  • e.g. sequence(a b c d)

33
  • combineplot
  • The greatest value of a picture is when it forces
    us
  • to notice what we never expected to see.
  • John Wilder Tukey (19152000)

34
combineplot
  • combineplot is a generalisation of crossplot,
    more flexible and inevitably more complicated in
    syntax.
  • The general problem of combining plots of similar
    kind reduces to a loop producing individual plots
    and a call to graph combine. That is bound to be
    a challenge to beginning users.
  • The idea is to avoid that by encapsulating the
    predictable syntax within one command.

35
combineplot examples
  • We will look at a series of univariate examples
    followed by a series of bivariate examples.
  • A great variety is possible, as we can loop over
    user-written graphics commands as well as
    official commands.

36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
A digression on sepscatter
  • The last example used sepscatter, a program
    automating separation of data points on a scatter
    plot by a categorical variable.
  • The repetition of the legend needs some kind of
    fix. In this and similar examples, the legend
    could be deleted and explaining symbols left as a
    task for the text caption.

42
sepscatter and scatter plot matrices
  • combineplot with sepscatter meets a felt need,
    scatter plot matrices with categorisation of data
    points.
  • Here is an example with size variables from the
    auto dataset. The diagonal scatter plots have
    meaning, yet are not conventional. But not every
    graph need be immediately publishable.

43
(No Transcript)
44
(No Transcript)
45
A digression on aaplot
  • The last example used aaplot.
  • aaplot customises automatic annotation of scatter
    plots with fitted regressions with text for key
    results.
  • Originally, it was written following a request by
    my Ph.D. student Alona Armstrong.

46
Back to combineplot
  • Some examples of its syntax will make clearer how
    it works. First look at a univariate example
  • combineplot mpg price weight headroom graph box
    _at_y, over(rep78)
  • Here we have one varlist and the syntax
  • _at_y is a placeholder for the variable name.

47
  • Next look at a bivariate example
  • combineplot price (mpg weight length
    displacement) sepscatter _at_y _at_x, ytitle("Price
    (USD)") sep(foreign)
  • Here we have two varlists and the syntax elements
  • _at_y and _at_x are placeholders for the variable
    names.

48
  • The two varlists may each contain a single
    variable and they may be identical.
  • When both are presented, the combination is the
    Cartesian product of the varlists.
  • Naturally, you can reach through to control the
    options of graph combine as well as those of the
    particular graph command used.

49
Quirk or quick?
  • The quirky syntax of combineplot might cause some
    queasiness.
  • Some might recall the obsolete for command.
  • Confident users would (should) be happy to write
    their own loops, topped by graph combine, and
    that is fine too.
  • The justification for combineplot is just
    convenience it can be quicker than writing your
    own script.

50
  • designplot
  • Real life is both complicated and short,
  • and we make no mockery of honest adhockery.
  • Irving John Good (19162009)

51
designplot
  • Here more than anywhere arbitrariness of names
    can bite.
  • If you have used S or S-Plus or R much, you may
    have come across design plots.
  • But as implemented there they do not look much
    like the graphs you are going to see. Nor are
    they plots showing fitted results nor do they
    imply experimental design.
  • To understand designplot, we need to creep up on
    it step by step.

52
(No Transcript)
53
(No Transcript)
54
designplot syntax
  • Minimal syntax specifies a response first, then
    one or more predictors.
  • The predictors should in practice be categorical,
    meaning taking on only a small or moderate number
    of distinct levels (factors, if you like).
  • The examples were
  • designplot mpg rep78
  • designplot mpg rep78 foreign

55
designplot default
  • The statistics shown are means.
  • Given one, two, predictors, the means are shown
    for all the data, each one-way breakdown, each
    two-way breakdown, .
  • designplot uses a syntax of way being 0, 1, 2,
  • graph dot is the default vehicle.
  • statsby underpins calculations.
  • In essence, we can get a multiscale breakdown.
  • In practice, we might want to restrict what is
    shown.

56
(No Transcript)
57
Restricting designplot
  • Here we restricted the scope by
  • designplot mpg foreign rep78, maxway(1)
  • Let us look at a different dataset. The response
    variable for these data on the Titanic is a
    binary variable survived, so its mean is the
    fraction survived.
  • We restrict using maxway(2).

58
(No Transcript)
59
  • So we have here
  • the overall mean
  • one-way breakdowns for three predictors class,
    adult, male
  • two-way breakdowns for combinations
  • classadult, classmale, adultmale

60
  • This kind of graph is for detailed scrutiny,
    rather than delivering shock.
  • Logically similar displays are often used for
    reporting opinion poll or electoral results.

61
That reminds us of
  • The structure echoes analysis of variance, used
    descriptively.
  • Similar ideas appear in ANOVA and other
    literature going back to J.W. Tukey in 1977.
  • It also echoes the little used official command
    grmeanby.
  • By default, grmeanby also shows means.
  • (Medians are allowed.)
  • It allows one-way breakdowns only.

62
(No Transcript)
63
(No Transcript)
64
grmeanby
  • In these examples, grmeanby shows different means
    distinctly, but that is not guaranteed.
  • Using graph dot as a default within designplot
    ensures more readability, although that too has
    its limits.

65
designplot can show other statistics
  • You can show any summarize result.
  • In practice, you would only want to plot results
    sharing the same units of measurement (including
    none at all, as with skewness and kurtosis).

66
(No Transcript)
67
More to say
  • Although based on graph dot by default,
    designplot can be recast to graph bar or graph
    hbar.
  • Although based on summarizing single variables,
    what could be simpler than putting different
    designplots side-by-side?

68
(No Transcript)
69
Is this just a reinvention of graph dot?
  • No.
  • graph dot and its siblings are restricted in
    offering only one-way or two-way or three-way
    breakdowns given, respectively, one or two or
    three factors.
  • designplot gives scope for saving results for
    separate graphing or tabulation.

70
The main players again
  • sparkline
  • crossplot
  • combineplot
  • designplot
  • All may be installed from SSC.
  • Our attraction to images as a source of
    understanding
  • is both primal and pervasive.
  • Stephen Jay Gould (19412002)

70
Write a Comment
User Comments (0)
About PowerShow.com