Does the distribution have one or more peaks (modes) or is it unimodal? - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Does the distribution have one or more peaks (modes) or is it unimodal?

Description:

Chapter 1 - The Practice of Statistics 3ed – PowerPoint PPT presentation

Number of Views:121
Avg rating:3.0/5.0
Slides: 39
Provided by: Erin197
Category:

less

Transcript and Presenter's Notes

Title: Does the distribution have one or more peaks (modes) or is it unimodal?


1
(No Transcript)
2
(No Transcript)
3
(No Transcript)
4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
  • Does the distribution have one or more peaks
    (modes) or is it unimodal?
  • Is the distribution approximately symmetric or is
    it skewed in one direction? Is it skewed to the
    right (right tail longer) or left?

11
(No Transcript)
12
Example Description
  • Shape The distribution is roughly symmetric
    with a single peak in the center.
  • Center You can see from the histogram that the
    midpoint is not far from 110. The actual data
    shows that the midpoint is 114.
  • Spread The spread is from 80 to about 150.
    There are no outliers or other strong deviations
    from the symmetric, unimodal pattern.

13
(No Transcript)
14
Calculator Example
Text
(To save data for later use on home screen type
L1 -gt Prez)
15
Calc continued
  • Frequency shortcut If you have a dataset
    comprised of 75 3s and 35 4s for example, you
    can enter the values in list 1 and the
    frequencies in list 2 then pull 1 variable stats
  • Stats-edit- L1 3, 4 L1 75, 35
    stat-calc-1var stats L1,L2 enter

16
Relative frequency/Cumulative Frequency
  • A histogram does a good job of displaying the
    distribution of values of a quantitative
    variable, but tells us little about the relative
    standing of an individual observation.
  • So, we construct an ogive (Oh-Jive) aka a
    relative cumulative frequency graph.

17
Step 1- Construct table
  • Decide on intervals and make a frequency table
    with 4 columns Freq, Relative frequency,
    cumulative frequency, and rel. cum. Freq.
  • To get the values in the rel. freq. column,
    divide the count in each class interval by the
    total number of observations. Multiply by 100 to
    convert to .
  • In Cum freq column, add the counts that fall in
    or below the current class interval
  • for rel. cum. freq. column, divide the entries in
    the cum freq column by total number of
    individuals.

18
(No Transcript)
19
Step 2 3
  • Label and scale your axes and title your graph.
    Vertical axis always Relative Cum. Freq. Scale
    the horizontal axis according to your choice of
    class intervals and the vertical axis from 0 to
    100.
  • Plot a point corresponding to the rel. Cum. freq.
    in each class interval at the LEFT ENDPOINT of
    the NEXT class interval. (example, the 40 to 44
    interval, plot a point at a height of 4.7 above
    the age value of 45.
  • Begin with 0 you should end with 100. Connect
    dots

20
To Locate an individual within distribution
What about Clinton? He was 46. To find his
relative standing, draw a vertical line up from
his age (46) on the horizontal axis until it
meets the ogive. Then draw a horizontal line
from this point of intersection to the vertical
axis. Based on our graph his age places him at
the 10 mark which tells us that about 10 of all
US presidents were the same age as or younger
than Bill Clinton when they were inaugurated.
To locate a value corresponding to a
percentile, do the opposite. Ex 50th
percentile, 55 years old.
21
  • Whenever data are collected over time, plot
    observations in time order. Displays of
    distributions such as stemplots and histograms
    which ignore time order can be misleading when
    there is systematic change over time.

22
Shows change in gas price over time. Shows TRENDS
23
Exploring Data
  • 1.2 Describing Distributions with Numbers
  • YMS3e
  • AP Stats at CSHNYC
  • Ms. Namad

24
Sample Data
  • Consider the following test scores for a small
    class

75 76 82 93 45 68 74 82 91 98
Plot the data and describe the SOCS
What number best describes the center? What
number best describes the spread?
25
Measures of Center
  • Numerical descriptions of distributions begin
    with a measure of its center.
  • If you could summarize the data with one number,
    what would it be?

Mean The average value of a dataset.
Median Q2 or M The middle value of a
dataset. Arrange observations in order min to
max Locate the middle observation, average if
needed.
26
Mean vs. Median
  • The mean and the median are the most common
    measures of center.
  • If a distribution is perfectly symmetric, the
    mean and the median are the same.
  • The mean is not resistant to outliers.
  • You must decide which number is the most
    appropriate description of the center...

27
Measures of Spread
  • Variability is the key to Statistics. Without
    variability, there would be no need for the
    subject.
  • When describing data, never rely on center alone.
  • Measures of Spread
  • Range - rarely used...why?
  • Quartiles - InterQuartile Range IQRQ3-Q1
  • Variance and Standard Deviation var and sx
  • Like Measures of Center, you must choose the most
    appropriate measure of spread.

28
Quartiles
  • Quartiles Q1 and Q3 represent the 25th and 75th
    percentiles.
  • To find them, order data from min to max.
  • Determine the median - average if necessary.
  • The first quartile is the middle of the bottom
    half.
  • The third quartile is the middle of the top
    half.

19 22 23 23 23 26 26 27 28 29 30 31 32
45 68 74 75 76 82 82 91 93 98
29
5-Number Summary, Boxplots
  • The 5 Number Summary provides a reasonably
    complete description of the center and spread of
    distribution
  • We can visualize the 5 Number Summary with a
    boxplot.

MIN Q1 MED Q3 MAX
min45 Q174 med79 Q391 max98
30
Determining Outliers
1.5 IQR Rule
  • InterQuartile Range IQR Distance between Q1
    and Q3. Resistant measure of spread...only
    measures middle 50 of data.
  • IQR Q3 - Q1 width of the box in a boxplot
  • 1.5 IQR Rule If an observation falls more than
    1.5 IQRs above Q3 or below Q1, it is an outlier.

Why 1.5? According to John Tukey, 1 IQR seemed
like too little and 2 IQRs seemed like too much...
31
1.5 IQR Rule
  • To determine outliers
  • Find 5 Number Summary
  • Determine IQR
  • Multiply 1.5xIQR
  • Set up fences Q1-(1.5IQR) and Q3(1.5IQR)
  • Observations outside the fences are outliers.

32
Outlier Example
All data on p. 48.
1.5IQR1.5(26.66) 1.5IQR39.99
33
Standard Deviation
  • Another common measure of spread is the Standard
    Deviation a measure of the average deviation
    of all observations from the mean.
  • To calculate Standard Deviation
  • Calculate the mean.
  • Determine each observations deviation (x -
    xbar).
  • Average the squared-deviations by dividing the
    total squared deviation by (n-1).
  • This quantity is the Variance.
  • Square root the result to determine the Standard
    Deviation.

34
Standard Deviation
  • Variance
  • Standard Deviation
  • Example 1.16 (p.85) Metabolic Rates

1792 1666 1362 1614 1460 1867 1439
35
Standard Deviation
1792 1666 1362 1614 1460 1867 1439
Metabolic Rates mean1600
x (x - x) (x - x)2
1792 192 36864
1666 66 4356
1362 -238 56644
1614 14 196
1460 -140 19600
1867 267 71289
1439 -161 25921
Totals 0 214870
Total Squared Deviation 214870
Variance var214870/6 var35811.66
Standard Deviation sv35811.66 s189.24 cal
What does this value, s, mean?
36
Linear Transformations
  • Variables can be measured in different units
    (feet vs meters, pounds vs kilograms, etc)
  • When converting units, the measures of center and
    spread will change.
  • Linear Transformations (xnewabx) do not change
    the shape of a distribution.
  • Multiplying each observation by b multiplies both
    the measure of center and spread by b.
  • Adding a to each observation adds a to the
    measure of center, but does not affect spread.

37
Data Analysis Toolbox
  • To answer a statistical question of interest
  • Data Organize and Examine
  • Who are the individuals described?
  • What are the variables?
  • Why were the data gathered?
  • When,Where,How,By Whom were data gathered?
  • Graph Construct an appropriate graphical display
  • Describe SOCS
  • Numerical Summary Calculate appropriate center
    and spread (mean and s or 5 number summary)
  • Interpretation Answer question in context!

38
Chapter 1 Summary
  • Data Analysis is the art of describing data in
    context using graphs and numerical summaries.
    The purpose is to describe the most important
    features of a dataset.
Write a Comment
User Comments (0)
About PowerShow.com