Title: II. Graphical Displays of Data
1II. Graphical Displays of Data Like many other
things, statistical analysis can suffer from
garbage in, garbage out This often happens
because no one bothered to look at the data.
Simple data displays can convey a lot of
information. A. Stem-and-Leaf
Displays Purpose To provide a basis for
evaluating the shape of the data without the
loss of any information.
21. Basic Stem and Leaf Display This technique is
best illustrated by an example. Pencil lead is
actually a ceramic matrix filled with
graphite. A measure of the quality of many
ceramic bodies is the porosity. Porosity is a
measure of the void space in the body. The
following data set represents the result of a
porosity test on good pencil lead
3Let the numbers to the left of the decimal point
be the stems. Let the numbers to the right of
the decimal point be the leaves. Stem
Leaves 11 12 13 14 Representing the
value 12.1 Stem Leaves 11 12 1 13
14
4Representing the first row of data Stem
Leaves 11 7 12 1557 13 5 14
The entire data set Stem Leaves 11. 7
12. 1557375836 13. 538552796633027 14.
3221 This display is the raw stem-and-leaf
display.
5Usually, we refine the stem-and-leaf
display. First, we order the leaves on each
stem. Stem Leaves 11. 7 12.
1335556778 13. 022333555667789 14. 1223
6Next, we add the depth information. The depth
represents how far from the closest end of the
data set a particular point is. For example, the
data value 11.7 is the smallest observation
thus, it has a depth of 1. What is the depth of
the data value 14.3? What is the depth of the
data value 12.6? The completed stem-and-leaf
gives the depth of the last value on the stems
for the top part of the display. It gives the
depth for the first value of the stems for the
bottom part of the display.
7We do not give the depth for the stem which
contains the middle value of the data set. In
this case, the depth information would be
ambiguous. An aid for finding the depth is to
report the number of leaves on each stem. Until
we reach the middle stem, the depth for any stem
is just the depth reported from the previous stem
plus the number of leaves on the stem. Stem
Leaves No. Depth 11.
7 1 1
12. 1335556778 10 11
13. 022333555667789 15 14. 1223
4 4
8- 2. Stretched Stem-and-Leaf
- Consider a stretched stem and leaf display.
- Basically it splits each simple stem into two.
- Let X be X0 X4
- Let X be X5 X9
- No. Depth
- 11
- 11 7 1 1
- 12 133 3 4
- 12 5556778 7 11
- 13 022333 6
- 13 555667789 9 13
- 14 1223 4 4
- 14
9- 3. Reading a Data Display
- Goal of a data display let the data speak to
you? - Like any conversation, some points are obvious,
others come only from questioning the data. - Some obvious questions
- What is the center'' of the data?
- What is the spread'' of the data.
- More subtle questions
- Do the data follow some pattern?
- Is the pattern symmetric?
10- If the pattern is not symmetric, is it right or
left tailed? - A right tailed or right skewed pattern
- A left tailed or left skewed pattern
11- Are there multiple peaks?
- What do multiple peaks suggest?
- Are there outliers?
12- B. Box Plots
- Purpose To give a quick display of some
important features of the data. - Note The box plot represents a distillation of
the data. - The stem-and-leaf display only loses the time
order of the data. - The box plot loses some of the information in the
data. - However, under several very reasonable
assumptions, the information lost is of little or
no value. - 1. Preliminaries
- The box plot is based upon
- the median
- the quartiles
13Let y_1, y_2, \cdots, y_n denote our data
set. Rearrange the data in ascending order, and
let the new data set be denoted by where
Note the stem and leaf with ordered leaves
is such an ordered data set. a. The
median The median, , is the middle value of
the ordered data set and is a measure of the
center. Literally, the median splits the
data set into two equal parts.
14Let denote the location of the median in
the ordered data set. If n is odd, then
is an integer thus, If n is even, then
contains the fraction 1/2. In such a case,
the median is the average of the two
values closest to the center.
15First Example The following five values
represent the ash content of pencil lead.
16Second example the porosities of good pencil
lead Note the stem and leaf is an ordered data
set No.
Depth 11 11 7 1
1 12 133 3 4 12 5556778 7
11 13 022333 6 13
555667789 9 13 14 1223
4 4 14
17b. The upper and lower quartiles While the
median divides the data into two parts of equal
numbers, the quartiles (Q1, Q3) divide the date
into four parts. Note the second quantile (Q2)
is the median. Let be the location of the
first and third quantiles.
18If is an integer, then If
is not an integer, then the quartile is the
average of the two values closest to it.
19 First example The good pencil lead
data
20Second example breaking strength of
yarn
21- 2. The Box Plot Itself
- We shall illustrate this technique through the
porosity data for the good pencil lead. - Construct a horizontal scale, marked
conveniently, which covers at least the range of
the data - Find , Q1, Q3
22Use Q1 and Q3 to make a rectangular box above the
scale. Draw a vertical line across the box for
the median.
23- Determine the Step
- The Interquartile Range is a measure of
variability or spread defined by - Q3 - Q1
- We define the stepsize by
- Step (1.5)(Q3 Q1)\
- For the good pencil lead data,
- Q3 - Q1 13.7 12.6 1.1
- Step 1.5(1.1) 1.65
24- Determine the inner fences
- The fences help us isolate possible outliers
- The inner fences define the bounds for the
unquestionably good data - The Upper Inner fence (UIF) is
- UIF Q3 Step
- The Lower Inner Fence (LIF) is
- LIF Q1 Step
- For the good pencil lead data
- UIF 13.7 1.65 15.35
- LIF 12.6 1.65 10.95
-
255. Locate the most extreme data points which
are on or within the inner fences. These data
values are called the adjacents. Draw vertical
lines at these points, and connect these points
to the box with a horizontal line. This line
is called a whisker. For the good pencil lead
data, all of the values fall within the inner
fences. Thus, the adjacents are 11.7 and
14.3
266. Calculate the outer fences The outer fences
allow us to discriminate between mild and
extreme outliers. Data values between the
inner and outer fences are considered mild. Data
values beyond the outer fences are considered
extreme. The Upper Outer Fence (UOF) is UOF
Q3 2(step) The Lower Outer Fence (LOF)
is LOF Q1 - 2(step)
27For the good pencil lead data UOF 13.7
2(1.65) 17.0 LOF 12.6 - 2(1.65) 9.3 7.
Mark possible outliers We use a ? to
denote the mild outliers. We use a to
denote the extreme outliers. Note No
outliers occur in our example.
28Parallel Box Plots allow us to compare two or
more sets of data. The Key must use a common
scale. Place box plots above each other or
side-by-side. ____________ o
-------___________-------- o o
____________ -------___________-------- o
____________
-------___________--------
-----------------------------------------------
---- scale
29Box Plots can also be used to analyze designed
experiments. When there are categorical factors,
the design can be unstripped and analyzed using
parallel box plots. Example Consider an
experiment to study the influence of operating
temperature and glass type on light output.
30The resulting box plot is given below.
The box plots show that a higher
temperature yields higher light output. Also at
the low temperature, glass type does not affect
light output, but at the high temperature, glass
type A produces higher light output.
31Importance of Box Plots Boxplots allow us to
tell at a glance 1. center 2. spread 3.
outliers
32- Other important data displays
- histograms
- time plots
- We generally use software to generate all data
displays. - The instructor should do an class demonstration
using the software selected by the instructor.