ARCH 21266126 - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

ARCH 21266126

Description:

Data are organized, summarized, analysed and results presented ... Methods of summarizing and analysing patterns in the data-set as whole:- visual numerical ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 42
Provided by: anu9
Category:
Tags: arch | analysing

less

Transcript and Presenter's Notes

Title: ARCH 21266126


1
ARCH 2126/6126
  • Session 3 Summarizing data visually

2
To recap, the purpose of statistics..
  • To provide insight into situations and problems
    by means of numbers
  • How is this provided?
  • Data are available or are collected
  • Data are organized, summarized, analysed and
    results presented
  • Conclusions are drawn, in context
  • Whole process is often guided by critical
    appraisal of similar work already done

3
Data sets
  • Usually data do not come singly they come in,
    or are collected in, sets
  • We collect them because we want to test some idea
    fairly against them
  • E.g. we might want to test whether the stone
    artefacts from one site differ in size from stone
    artefacts from another
  • For this, we measure artefact sizes
    systematically consistently

4
Some issues implied by this
  • Definition of data-set what belongs in it and
    what does not
  • Performance of each measurement accurate and
    repeatable
  • Methods of summarizing and analysing patterns in
    the data-set as whole-? visual ? numerical
  • Today mainly defining data-sets, measurement
    visual summary

5
What belongs in a data-set?
  • We have considered it prudent to adopt the years
    1919-1925, excluding the drought year of 1926, as
    a fair standard for the future Queensland Land
    Settlement Advisory Board, 1927
  • The tacit assumption that drought is an
    exceptional visitation to the inland country has
    shaped and infected public thought and official
    policy alike Francis Ratcliffe, 1937

6
Making a measurement
  • A variable is a measured property of a case
    measuring assigns numbers representing each
    cases value for that variable
  • Variables must be exactly defined measurements
    reliably carried out
  • Some variables are relatively simple but still
    need explicit specification, e.g. length
  • Some are more complex and/or depend on
    non-obvious definitions, e.g. unemployment

7
Measurement is never perfectly accurate but
  • Our measurement of scraper length is valid, to
    the extent that it measures what it is supposed
    to measure
  • Our measurement is reliable, to the extent that
    repetitions of the same measurement give the same
    result
  • Our measurement is unbiased, to the extent that
    it does not tend to under-state or over-state the
    true value of the variable

8
Recording a measurement
  • Rare important observations deserve recording as
    insight-giving anecdotes
  • But in many fields the bread butter of research
    are common observations where the issue is
    varying frequency
  • Importance of a recording system
  • Unsystematic recording is likely to lead to
    omissions or inconsistencies
  • Limits to the benefits of precision

9
Recording technology
  • Pen paper still have their place
  • Complex technology has its traps its
    vulnerability, your dependence
  • But early, direct or automatic data entry into
    computers can bring big benefits in efficient
    use of time labour error reduction
    cross-checks
  • Importance of duplicates back-ups

10
How much data to collect?
  • Limits to the benefit from measuring variables to
    many significant figures
  • Limits to the benefit from increasing sample size
    indefinitely
  • Limits to the benefit from increasing number of
    variables how many will you analyse?
  • Attention to limits can save lots of time
  • Limits not fixed, but depend on the situation
    under study the ideas under test

11
Spreadsheets (e.g. Excel) databases (e.g.
Access)
  • End point of data collection is often a matrix or
    table a column for each variable, a row for each
    case
  • Often convenient to enter these into a
    spreadsheet or database (linkable, searchable)
  • These can store, check, transform, calculate,
    apply conditions, select, test statistically,
    output to statpack

12
Study design experiment versus observation
  • How do we define? Variously but element of
    control often the key
  • For practical, ethical etc. reasons, experiments
    rare in our subjects
  • But experimental design important
  • Dependent variable response variable under study
  • Independent variable explanatory variable or
    factor

13
Contexts and confounds
  • Treatment a combination of specific conditions
    (levels of experimental factors)
  • Extraneous variables ones not being studied but
    which may influence dependent variable thus
    part of relevant context
  • Effects of different (independent or extraneous)
    variables are said to be confounded if they
    cannot be distinguished
  • Good study design requires data on context

14
Observational studies the risks of confounding
  • Well designed experiments minimize confounding by
    appropriate choice of variables, cases and
    treatments random sequence of treatments
    random allocation of cases to groups
  • Observational surveys lack this control
  • Groups may be self-selected
  • Differences in groups may have causes other than
    the variables under study
  • But much can be done despite limitations

15
Examples of presentation
  • Even the simplest forms of stating findings
    numerically (percentages, averages), and the
    simplest graphical presentations, emphasize
    selected aspects
  • This can be legitimate can also be misleading
    much depends on honesty clarity with which
    procedure is described
  • What as a percentage of what?
  • Please bring in examples yourselves

16
(No Transcript)
17
Value of examining data visually first
  • Even if you will eventually do sophisticated
    statistical testing
  • Start clear and simple
  • This familiarizes the researcher with the
    characteristics of the data set
  • At the end of the process, it also helps the
    researcher to communicate the patterns found

18
(No Transcript)
19
Graphical displays should
  • Show the data
  • Lead viewer to think about content, not graphic
    technology itself
  • Avoid distorting data
  • Present much info concisely coherently
  • Encourage eye to compare
  • Show both overview detail
  • Serve clear purpose
  • Be integrated with text and/or numerical
    descriptions

20
Graphical depictions include
  • line graphs
  • bar charts
  • histograms
  • pie charts
  • stem--leaf plots
  • scatterplots
  • An important principle is data density

21
Line graphs
  • Usually used to plot a variable against time (on
    horizontal axis)
  • Shows seasons trends
  • Does the graph have linear scales? A zero?
  • Different scales give different impressions, e.g
    non-zero base to vertical axis, unequal units,
    log scale

22
(No Transcript)
23
Bar charts and histograms
  • Bar charts compare the values of different
    variables, often categorical
  • Histograms display frequency or relative
    frequency distributions of one variable at a time
  • Width of histogram bars has meaning
  • Eyes respond to impressions of area symbols,
    unequal widths, pseudo-3D can give a misleading
    impression

24
(No Transcript)
25
(No Transcript)
26
Pie charts
  • Circular symbols divided into sections according
    to divisions of a category into sub-categories
  • Pies sometimes also vary in size
  • And sometimes are presented in pseudo-3D manner
  • Will return to pie charts a bit later

27
(No Transcript)
28
Stem-and-leaf plots
  • A simple way of showing the pattern in a set of
    numbers
  • Truncate the numbers at an appropriate point,
    write out the truncated numbers in an even
    systematic way (forming stem)
  • For each number, add the amount left next to the
    truncated one (forming leaf)

29
(No Transcript)
30
Scatterplots
  • Show the distributions of two variables at once
    (i.e. bivariate data)
  • If one variable is independent and one dependent,
    independent goes on horizontal axis
  • Essence of any relationship between them is
    apparent visually ve or -ve, strong or weak,
    simple or complex
  • This can affect future statistical testing

31
(No Transcript)
32
Graphics can also mislead or even deliberately
deceive
  • The principle should be show data variation, not
    design variation
  • Perception depends on individual, experience,
    context
  • Perception of circle area grows more slowly than
    its actual physical area
  • Computer packages increase ease of making both
    good bad images

33
So
  • Beware irregular scales
  • Beware symbols with changing area or volume,
    including pie charts
  • Beware pseudo-3-dimensional depictions
  • Beware excessively busy images which distract
    attention from the information to be conveyed

34
Ed Tufte (1983) on pie charts
  • The only worse design than a pie chart is
    several of them, for then the viewer is asked to
    compare quantities in spatial disarray both
    within and between pies Given their low
    data-density and failure to order numbers along a
    visual dimension, pie charts should never be
    used.

35
(No Transcript)
36
Lie factor (effect in graphic/effect in data)
14.8 (should not be less than 0.95 or greater
than 1.05)
37
(No Transcript)
38
5 different vertical scales 2 different
horizontal scales Lie factor 15.1
39
(No Transcript)
40
Lie factor of 2.8 plus effects of perspective and
horizontal spacing
41
The pie chart problem
Write a Comment
User Comments (0)
About PowerShow.com