Title: Describing Data: Frequency Distributions and Graphic Presentation
1Describing Data Frequency Distributions and
Graphic Presentation
2Frequency Distribution
- A Frequency Distribution is a grouping of data
into mutually exclusive categories showing the
number of observations in each class. - -(explanation) you are just developing categories
or classes based on a characteristic and then
putting elements into categories based on that
characteristic. No element appears in more than
one class.
3Frequency Distribution
- -Here is an analogyWe divide clothes to wash
into WHAT THREE CATEGORIES???!!! - Whites, Lights/Colors, and Darks, right?
- -The freq. dist. of clothes is developed by
counting how many articles of clothing are in
each laundry bin.
4Rule of Thumb for Developing a Frequency Dist.
- Step 1 Decide on the number of classes (k) or
containershint must be more than 1, but less
than a million. 2 gt n, where knumber of
classes ,nnumber of obs. - If obs50, 2 64gt50 so we should use at least 6
classes. - Step 2 Determine the class interval width (i)
- Should be the same for all classes, and
- Cover lowest (L) to highest (H) observation value
- i (H-L)/k
- This is a rule of thumb folks typically round up
to the next convenient number for i, e.g., 8.9
becomes 10 and 94 becomes 100. - Step 3 Set the individual class limits. Dont
overlap at all. E.g., dollarsclasses like
50-59, 60-69, and so on. Dont have 50-60,
60-70if something is 60 is will appear in two
classes.
k
6
5Rule of Thumb for Developing a Frequency Dist.
- Step 4 Tally the items into classes.
-
- Step 5 Count the number of items in each class.
- Now you can graphically depict the counts with a
histogram.
6Example Hudson Auto Repair
The manager of Hudson Auto would like to have a
better understanding of the cost of parts used in
the engine tune-ups performed in the shop. She
examines 50 customer invoices for tune-ups. The
costs of parts, rounded to the nearest dollar,
are listed on the next slide.
7 Example Hudson Auto Repair
- Sample of Parts Cost for 50 Tune-ups
- Based on the rule of thumb, how many classes
might we use? 2kgtn, where n is 50 - 2664 which is juuuust greater than 50.
- Based on the rule of thumb, what should the
width of the classes be? i (H-L)/k - (109-52)/69.5 Lets round up to 10 to make
it easy, and lets start the classes at 50 (just
lower than the lowest observation)
8 Tabular Summary Frequency and Relative (or
Percent) Frequency
Parts Cost ()
Relative Frequency()
Parts Frequency
2 13 16
7 7 5 50
4 26 32 14
14 10 100
50-59 60-69 70-79 80-89
90-99 100-109
(2/50)100
9 Graphical Summary Histogram
Tune-up Parts Cost
Frequency
Parts Cost ()
50 60 70 80 90 100
110 120
10 Numerical Descriptive Statistics
- The most common numerical descriptive
statistic - is the average (or mean).
- Hudsons average cost of parts, based on the
50 - tune-ups studied, is 79 (found by summing
the - 50 cost values and then dividing by 50).
- In Excel there are several common ways for
- obtaining the mean. Three of the most common
are - AVERAGE()
- SUM()/n n in this case is 50.
- ToolsgtData Analysisgt
- Descriptive StatisticsgtSummary Statistics
-
11Statistical Inference
Population
- the set of all elements of interest in a
particular study
Sample
- a subset of the population
Statistical inference
- the process of using data obtained from a
sample to make estimates and test hypotheses
about the characteristics of a population
Census
- collecting data for a population
Sample survey
- collecting data for a sample
12 Process of Statistical Inference
1. Population consists of all tune-ups.
Average cost of parts is unknown.
2. A sample of 50 engine tune-ups is examined.
3. The sample data provide a sample average
parts cost of 79 per tune-up.
4. The sample average is used to estimate the
population average.
13Statistical Analysis Using Microsoft Excel
- Statistical analysis typically involves
working with - large amounts of data.
- Computer software is typically used to conduct
the - analysis.
- Frequently the data that is to be analyzed
resides in a - spreadsheet (or, it will when you are done
with it).
- Modern spreadsheet packages are capable of
data - management, analysis, and presentation.
Analysis - Pack is an add-in in Excel.
- MS Excel is the most widely available
spreadsheet - software in business organizations.
14Statistical Analysis Using Microsoft Excel
- Enter Functions and Formulas
15Statistical Analysis Using Microsoft Excel
Note Rows 10-51 are not shown.
16Statistical Analysis Using Microsoft Excel
Note Columns A-B and rows 10-51 are not
shown. Neat excel trick not taught in CISM To
view a function instead of its result press
ltCtrlgt
17Statistical Analysis Using Microsoft Excel
Note Columns A-B and rows 10-51 are not shown.
18Pop Quiz!!!
- You were just handed an Excel spreadsheet with
two years of monthly sales data from Off Campus
Liquor, a local beverage distributor. - Your manager says, make this data say something,
our jobs are on the line. He then staggers out
of the door and passes out in the parking lot. - Although you have very little actual experience
in statistics, you know a few things about the
data and how it might be presented. Right?!
19Questions
time series
- Is the data cross section or time series?
Quantative
- Is sales data Qualitative or Quantative?
- How many observations are there?
24
20Lets Look at Exercise 7 (page 31)
- The data set is pb2-07.xls
- The BiLo store is gathering info on its customer
visits during each month. - You need to used the data to create a frequency
distribution. - -Start with 0 as the lower limit of the first
class and use a class interval of 3. - Describe the distribution (see any clusters?)
- Convert the distribution to a relative frequency
distribution. - There are several ways to attack this
problemlets look at one.
21Homework
- For the next class period, try 4, 6 and 8 on
pages 31-32
22Other Graphical Depictions of Data
- Pie Chart-for Relative Frequencies and Shares of
the Whole - Line Graphs-for changes over time, trends, or
differences between groups - Bar Charts-Similar to line graphs in their uses.
Sometimes they make for better pair-wise
comparisons.
23The three commonly used graphic forms are
Histograms, Frequency Polygons, and a
Cumulative Frequency distribution.
A Histogram is a graph in which the class
midpoints or limits are marked on the horizontal
axis and the class frequencies on the vertical
axis. The class frequencies are represented by
the heights of the bars and the bars are drawn
adjacent to each other.
24Example Histogram for Hours Spent Studying
Class widths are all the same
7.5 up to 12.5
12.5 up to 17.5
17.5 up to 22.5
22.5 up to 27.5
27.5 up to 32.5
32.5 up to 37.5
Midpoints of classes
How do you read this graphic? How many people
study around 20 hours per week? How many study
less than 32.5 hours per week?
25Graphic Presentation of a Frequency Distribution
A Frequency Polygon consists of line segments
connecting the points formed by the class
midpoint and the class frequency.
26Frequency Polygon for Hours Spent Studying
27Both on the same Chart
28Cumulative Frequency Distribution
A Cumulative Frequency Distribution is used to
determine how many or what proportion of the data
values are below or above a certain value. You
are just adding up as you go along
29Cumulative Frequency Table for Hours Spent
Studying
30Cumulative Frequency Distribution For Hours
Studying
31Line graphs are typically used to show the change
or trend in a variable over time.
32Example 3 continued
33A Bar Chart can be used to depict any of the
levels of measurement (nominal, ordinal,
interval, or ratio).
Construct a bar chart for the number of
unemployed per 100,000 population for selected
cities during 2001
34Bar Chart for the Unemployment Data
35A Pie Chart is useful for displaying a relative
frequency distribution. A circle is divided
proportionally to the relative frequency and
portions of the circle are allocated for the
different groups.
A sample of 200 runners were asked to indicate
their favorite type of running shoe. Draw a pie
chart based on the following information.
36Pie Chart for Running Shoes
37More homework
- Do problem 28 and 32 on page 48-52 in addition
to 4, 6 and 8 on pages 31-32 assigned earlier.