Title: Statistics 222 Review of Statistics 221
1Statistics 222 Review of Statistics 221
2Definitions
- Elements
- Entities on which data is collected
- Variables
- Items of Interest
- Qualitative vs Quantative
3Summary - Levels of Measurement
- Nominal - categories only
- Ordinal - categories with some order
- Interval - differences but no natural
starting point - Ratio - differences and a natural starting
point
4Definitions
- Cross Sectional vs Time Series
- Statistical Inference
- Population vs Sample
- Relative Frequency Frequency of Class / N
- Population vs Sample
- Class Width Largest Data Value Smallest Data
Value - Number of classes
- Histogram -gt no gaps
- Ogive -gt Cumulative Frequence
- Scatter Plot
5Best Measure of Center
6Skewness
7Sample Standard Deviation Formula
8Sample Standard Deviation (Shortcut Formula)
9Population Standard Deviation
10Variance - Notation
standard deviation squared
11Standard Deviation from a Frequency Distribution
- Use the class midpoints as the x values
12Estimation of Standard Deviation Range Rule of
Thumb
For estimating a value of the standard deviation
s, Use Where range (highest value) (lowest
value)
13Definition
Empirical (68-95-99.7) Rule For data sets having
a distribution that is approximately bell shaped,
the following properties apply
- About 68 of all values fall within 1 standard
deviation of the mean
- About 95 of all values fall within 2 standard
deviations of the mean
- About 99.7 of all values fall within 3 standard
deviations of the mean
14The Empirical Rule
FIGURE 2-13
15The Empirical Rule
FIGURE 2-13
16The Empirical Rule
FIGURE 2-13
17Definition
- z Score (or standard score) the number of
standard deviations that a given value x is above
or below the mean.
18Measures of Position z score
Population
Round to 2 decimal places
19Interpreting Z Scores
Whenever a value is less than the mean, its
corresponding z score is negative Ordinary
values z score between 2 and 2 sd Unusual
Values z score lt -2 or z score gt 2 sd
20Quartiles
Q1, Q2, Q3 divides ranked scores into four
equal parts
21Finding the Percentile of a Given Score
22Converting from the kth Percentile to the
Corresponding Data Value
Notation
n total number of values in the data set k
percentile being used L locator that gives the
position of a value Pk kth percentile
23Some Other Statistics
- Interquartile Range (or IQR) Q3 - Q1
- 10 - 90 Percentile Range P90 - P10
24Law of Large Numbers
- As a procedure is repeated again and again,
the relative frequency probability (from Rule 1)
of an event tends to approach the actual
probability.
25Definition
26Graphs
27Requirements for Probability Distribution
0 ? P(x) ? 1 for every individual value of x
28Definition
- Standard Normal Distribution
- a normal probability distribution that has a
- mean of 0 and a standard deviation of 1.
29(No Transcript)
30(No Transcript)
31Finding z Scores when Given Probabilities
5 or 0.05
1.645
(z score will be positive)
Finding the 95th Percentile
32Finding z Scores when Given Probabilities
(One z score will be negative and the other
positive)
Finding the Bottom 2.5 and Upper 2.5
33Nonstandard Normal Distributions
- If ? ? 0 or ?? ? 1 (or both), we will convert
values to standard scores using Formula 5-2, then
procedures for working with all normal
distributions are the same as those for the
standard normal distribution.
34Converting to Standard Normal Distribution
35Cautions to keep in mind
- 1. Dont confuse z scores and areas.  z scores
are distances along the horizontal  scale, but
areas are regions under the  normal curve.
Table A-2 lists z scores in the left column and
across the top row, but areas are found in the
body of the table. - 2. Choose the correct (right/left) side of the
graph. - 3. A z score must be negative whenever it is
located to the left half of the normal
distribution. - 4. Areas (or probabilities) are positive or zero
values, but they are never negative.
36Central Limit Theorem
Given
- 1. The random variable x has a distribution
(which may or may not be normal) with mean µ and
standard deviation ?. - 2. Samples all of the same size n are randomly
selected from the population of x values.
37Central Limit Theorem
Conclusions
1. The distribution of sample x will, as the
sample size increases, approach a normal
distribution. 2. The mean of the sample means
will be the population mean µ. 3. The standard
deviation of the sample means will approach
??????????????
n
38Practical Rules Commonly Used
- 1. For samples of size n larger than 30, the
distribution of the sample means can be
approximated reasonably well by a normal
distribution. The approximation gets better as
the sample size n becomes larger. - 2. If the original population is itself normally
distributed, then the sample means will be
normally distributed for any sample size n (not
just the values of n larger than 30).
39Notation
- the mean of the sample means
- ???????????????
µx µ
40Notation
- the mean of the sample means
- the standard deviation of sample mean
- ???????????????
µx µ
?
?x
n
41Notation
- the mean of the sample means
- the standard deviation of sample mean
- ???
- (often called standard error of the mean)
µx µ
?
?x
n
42DefinitionPoint Estimate
- A point estimate is a single value (or point)
used to approximate a population parameter.
43DefinitionConfidence Interval
- A confidence interval (or interval estimate)
is a range (or an interval) of values used to
estimate the true value of a population
parameter. A confidence interval is sometimes
abbreviated as CI.
44DefinitionConfidence Interval
- A confidence level is the probability 1?
(often expressed as the equivalent
percentage value) that is the proportion of
times that the confidence interval actually
does contain the population parameter,
assuming that the estimation process is
repeated a large number of times.
This is usually 90, 95, or 99. (? 10),
(? 5), (? 1)
45 The Critical Value
z??2
46Notation for Critical Value
The critical value z?/2 is the positive z value
that is at the vertical boundary separating an
area of ?/2 in the right tail of the standard
normal distribution. (The value of z?/2 is at
the vertical boundary for the area of ?/2 in the
left tail). The subscript ?/2 is simply a
reminder that the z score separates an area of
?/2 in the right tail of the standard normal
distribution.
47Finding z??2 for 95 Degree of Confidence
48Finding z??2 for 95 Degree of Confidence
? 0.05
Use Table to find a z score of 1.96
49Assumptions
1. The sample is a simple random sample. 2.
The value of the population standard deviation ?
is known. 3. Either or both of these conditions
is satisfied The population is normally
distributed or n gt 30.
50Definitions
- Estimator
- is a formula or process for using sample data
to estimate a population parameter. - Estimate
- is a specific value or range of values used to
approximate a population parameter. - Point Estimate
- is a single value (or point) used to
approximate a population parameter.
51Sample Mean
- 1. For many populations, the distribution of
sample means x tends to be more consistent (with
less variation) than the distributions of other
sample statistics. - 2. For all populations, the sample mean x is an
unbiased estimator of the population mean ?,
meaning that the distribution of sample means
tends to center about the value of the population
mean ?. -
52Definition
Level of Confidence
- confidence level is often expressed as
probability 1 - ?, where ? is the complement of
the confidence level. For a 0.95(95) confidence
level, ? 0.05. For a 0.99(99) confidence
level, ? 0.01.
53Definition
Margin of Error
54Procedure for Constructing a Confidence Interval
for µwhen ? is known
- 1. Verify that the required assumptions are met.
2. Find the critical value z??2 that corresponds
to the desired degree of confidence.
5. Round using the confidence intervals roundoff
rules.
55Sample Size for Estimating Mean ?
56Finding the Sample Size nwhen ? is unknown
- Use the range rule of thumb
- to estimate the standard deviation as follows ?
? range/4.
2. Conduct a pilot study by starting the sampling
process. Based on the first collection of at
least 31 randomly selected sample values,
calculate the sample standard deviation s and use
it in place of ?.
3. Estimate the value of ? by using the results
of some other study that was done earlier.
57? Not KnownAssumptions
- 1) The sample is a simple random sample.
- 2) Either the sample is from a normally
distributed population, or n gt 30. - Use Student t distribution
58Student t Distribution
- If the distribution of a population is
essentially normal, then the distribution of
x - µ
t
s
n
- is essentially a Student t Distribution for all
samples of size n, and is used to find
critical values denoted by t?/2.
59Margin of Error E for Estimate of ?
- Based on an Unknown ? and a Small Simple Random
Sample from a Normally Distributed Population
60Confidence Interval for the Estimate of E Based
on an Unknown ? and a Small Simple Random Sample
from a Normally Distributed Population
61Procedure for Constructing a Confidence Interval
for µwhen ? is not known
- 1. Verify that the required assumptions are met.
2. Using n 1 degrees of freedom, refer to Table
A-3 and find the critical value t??2 that
corresponds to the desired degree of confidence.
5. Round the resulting confidence interval limits.
62Student t Distributions for n 3 and n 12
Figure 6-5
63Using the Normal and t Distribution