Title: Unit I: Descriptive Statistics
1Unit I Descriptive Statistics
- Introduction
- Summarising and Describing Data
- Graphical Methods of Presentation
- Cross-Sectional and Time Series Data
- Measures of Central Tendency
- Measures of Variation
- Measures of Shape
2Introduction
- What is Statistics
- Basic Definitions
3What is Statistics
- Numerical facts
- A group of methods used in the collection,
analysis, presentation and interpretation of data
in order to make decisions.
4Key Definitions
- A population is the collection of all items of
interest or under investigation - N represents the population size
- A sample is an observed subset of the population
- n represents the sample size
- A parameter is a specific characteristic of a
population - A statistic is a specific characteristic of a
sample
5Population vs. Sample
Population
Sample
a b c d ef gh i jk l m n o p
q rs t u v w x y z
b c g i n o r
u y
Values calculated using population data are
called parameters
Values computed from sample data are called
statistics
6Examples of Populations
- Names of all registered voters in Jamaica
- Incomes of all families living in Kingston
- Grade point averages of all the students in your
university
7Types of Statistics
- Descriptive Statistics these are methods used
for organising, displaying and describing data
using tables, graphs and summary measures. - Inferential Statistics these are methods that
use sample results to help make decisions or
predictions (inferences) about a population.
8Descriptive Statistics
- Collect data
- e.g., Survey
- Present data
- e.g., Tables and graphs
- Summarize data
- e.g., Sample mean
9Inferential Statistics
- Estimation
- e.g., Estimate the population mean weight using
the sample mean weight - Hypothesis testing
- e.g., Test the claim that the population mean
weight is 120 pounds
Inference is the process of drawing conclusions
or making decisions about a population based on
sample results
10Descriptive vs. Inferential Statistics
- Descriptive Statistics
- Collect
- Organize
- Summarize
- Display
- Analyze
- Inferential Statistics
- Predict and forecast values of a population
- Test hypotheses about values of a population
- Make decisions
11Basic Definitions
- Data
- numbers or measurements that are collected
- Variables
- characteristics or attributes that enable us to
distinguish one individual from another - they take on different values when different
individuals are observed (eg. height) - Element
- a single person, object or event in a data set
12Types of Variables
- Quantitative or Numerical these are variables
that can be measured numerically, examples are,
number of children, age, weight, height - Qualitative or Categorical these are variables
that cannot assume numerical values but can be
classified into categories or groups, examples
are, marital status, eye colour, opinions
13Summarising and Describing Data
14Summarising Describing Data
- Describing the observed patterns in data is an
important part of statistics - Distribution of a single variable
- Shape
- Outliers
- Centre
- Spread
15Describing Data
16Graphical Methods of Presenting Data
17Graphical Methods of Presentation
- Data in raw form are usually not easy to use for
decision making - Some type of organization is needed
- Table
- Graph
- Techniques reviewed here
- Bar charts and Pie charts
- Frequency distributions, histograms and polygons
- Cumulative distributions and ogives
18Graphical Methods of Presentation
- Type of graphical representation depends on the
type of data to be presented - When presenting Quantitative Data use
- histograms
- frequency polygons
- cumulative frequency polygons
- When presenting Qualitative Data use
- pie charts
- bar charts
19Frequency Distributions
- A frequency distribution
- a table in which measurements are tallied
- then the frequency or total number of times that
each item occurs is recorded - Usually measurements are arranged in ascending or
descending order - A frequency distribution has 3 columns
- the data categories or classes
- the tally column
- the corresponding frequencies
- Used for both quantitative qualitative data
20Examples
21Frequency Distribution Contd
- two main types of frequency distributions
- Ungrouped data
- Grouped data
Mean
22Guidelines for Constructing Frequency
Distributions
- Class boundaries never overlap - no element can
belong to more than one class - Even if the frequency is zero, include each and
every class -
- Make all classes the same width, determine the
width of each interval by -
- Usually at least 5 but no more than 15 groupings,
depending on the range and number of data points -
- Keep the limits as simple and as convenient as
possible
23Definitions
- Class Intervals/Limits
- largest or smallest numbers which can actually
belong to each class - each class has a lower class limit and an upper
class limit
24Definitions
- Class Boundaries
- the numbers which separate classes
- they are equally spaced halfway between
neighbouring class limits
25Definitions
- Class Mark
- midpoints of the classes
- aka Midpoint
- may be used in the calculation of other
statistics - found by taking the average of the class limits
or boundaries
26Definitions
- Class Width
- aka class size, class width, class length
- Two ways of calculating
- Method 1 the difference between corresponding
class limits - Method 2 the difference between two class
boundaries
27Relative Frequency Distribution
- This gives the frequency of each class interval
as a proportion of the total frequencies - The sum of the relative frequencies MUST add to 1
- Sometimes expressed as a percentage
Class 1
28Frequency Distribution Example
- Consider the following data set
- 24, 35, 17, 21, 24, 37, 26, 46, 58, 30,
- 32, 13, 12, 38, 41, 43, 44, 27, 53, 27
- Group these figures into a frequency distribution
- What is the class interval (width)
- Calculate the class boundaries
- Calculate the class midpoints
- Calculate the relative frequencies
29Bar and Pie Charts
- Bar charts and Pie charts are used for
qualitative data
30Bar Charts
- Bars (columns) are separated from each other
- Similar to a histogram (which we will soon meet)
- Height of bar shows the frequency for each
category - Used to represent qualitative data
31Constructing a Bar Chart
- Divide the data into groups (also called
segments, bins or classes) - Label the vertical axis (y-axis) - Frequency (the
number of counts for each bin) - Label the horizontal axis (x-axis) - the group
names of your response variables - Determine the number of data points that are in
each bin from the frequency and construct the bar
chart
32Pie Charts
- A pie chart is a circle which is divided into
portions where each portion represents the
percentage of a population or sample that belongs
to different categories. - A pie chart is mostly used to display percentages
even though it can also be used to display
frequencies or relative frequencies.
33Constructing a Pie Chart
- Segment the range of the data into groups (also
called segments or classes) - Determine the number of data points that are
within each group from the frequency. - Express the number of data points in each
category as a percentage of the total. - Now construct the pie chart, each slice of the
pie should be representative of the percentage
of the data that lies within each category.
34Bar Chart Example
Current Investment Portfolio
Investment Amount Percentage Type
(in thousands ) () Stocks
46.5 42.27 Bonds
32.0 29.09 CD 15.5
14.09 Savings 16.0
14.55 Total 110.0 100.0
35Pie Chart Example
Current Investment Portfolio
Investment Amount Percentage Type
(in thousands ) () Stocks
46.5 42.27 Bonds
32.0 29.09 CD 15.5
14.09 Savings 16.0
14.55 Total 110.0 100.0
Savings 15
Stocks 42
CD 14
Percentages are rounded to the nearest percent
Bonds 29
36Histograms
- Similar to bar chart
- Bars (columns) are (joined together)
- Used to present quantitative data
- This method shows
- Location (measures of centre) of the data
- spread (the scale) of the data
- shape of the data
- presence of outliers
37Histograms
- A graph of the data in a frequency distribution
is called a histogram - The class boundaries (or class limits) are shown
on the horizontal axis - the vertical axis is either frequency, relative
frequency, or percentage - Bars are drawn where the base of each bar covers
the class while the height of each bar represents
the frequency (relative frequency or percentage) - The bars are joined
38Histogram - Example
- Construct a histogram using the frequency
distribution constructed from the example on
slide 28 above.
39Frequency Polygons
- This is a line graph of a frequency distribution.
- It is a line graph formed by joining the
midpoints of the tops of the bars in a
histogram. - We plot the midpoint of each class against the
frequency for that class. - The midpoint could also be plotted against the
relative frequency for each class, this would be
called a relative frequency polygon.
40Constructing a Frequency Polygon
- Mark a dot above the midpoint of each class at a
height equal to the frequency (relative
frequency) of that class. Simply mark the
midpoint at the top of each bar in the histogram - Imagine that two more classes exist, one before
the first class and one after the final class.
Plot the midpoints for these classes as well,
remembering that the frequencies for these two
classes is zero. - Join the points using straight lines.
41Frequency Polygon - Example
- Using the histogram constructed on slide 38
above, construct a frequency polygon.
42Cumulative Frequency Distribution
- A cumulative frequency distribution contains the
total number of observations whose values are
either less than or greater than the upper
boundary for each interval. - A cumulative frequency distribution which tallies
the total number of observations whose values are
less than the upper boundary is known as a less
than cumulative frequency distribution. - A cumulative frequency distribution which tallies
the total number of observations whose values are
greater than the upper boundary is known as a
more than cumulative frequency distribution.
43Less than Cumulative Frequency Ogive
- This is a plot of a less than cumulative
frequency distribution. - Similar to the histogram, the less than
cumulative frequency ogive can be plotted against
either frequency, relative frequency, or
percentage - Using Frequencies
- The first value in the distribution is ALWAYS
zero - The last value in the distribution is ALWAYS the
total number - Using Relative Frequencies
- The first value in the distribution is ALWAYS
zero - The last value in the distribution is ALWAYS 1
- Using Percentages
- The first value in the distribution is ALWAYS
zero - The last value in the distribution is ALWAYS 100
44More than Cumulative Frequency Ogive
- This is a plot of a more than cumulative
frequency distribution. - Similar to the histogram, the more than
cumulative frequency ogive can be plotted against
either frequency, relative frequency, or
percentage - Using Frequencies
- The first value in the distribution is ALWAYS the
total number - The last value in the distribution is ALWAYS zero
- Using Relative Frequencies
- The first value in the distribution is ALWAYS 1
- The last value in the distribution is ALWAYS zero
- Using Percentages
- The first value in the distribution is ALWAYS 100
- The last value in the distribution is ALWAYS zero
45Example - Cumulative Frequency Distribution
- Using the example on slide 28,
- Calculate less than cumulative frequencies and
construct a less than cumulative frequency ogive - Calculate more than cumulative frequencies and
construct a more than cumulative frequency ogive