Title: Review of Basic Statistics
1Review of Basic Statistics
2Descriptive Statistics Review
- Measures of Location
- The Mean
- The Median
- The Mode
- Measures of Dispersion
- The variance
- The standard deviation
3Mean
The mean (or average) is the basic measure of
location or central tendency of the data.
- The sample mean is a sample statistic.
- The population mean ? is a population statistic.
4Sample Mean
Where the numerator is the sum of values of n
observations, or
The Greek letter S is the summation sign
5Example College Class Size
We have the following sample of data for 5
college classes 46 54 42 46 32
We use the notation x1, x2, x3, x4, and x5 to
represent the number of students in each of the 5
classes
X1 46 x2 54 x3 42 x4 46 x5
32
Thus we have
The average class size is 44 students
6Population Mean (?)
The number of observations in the population is
denoted by the upper case N.
The sample mean is a point estimator of the
population mean ?
7Median
The median is the value in the middle when the
data are arranged in ascending order (from
smallest value to largest value).
- For an odd number of observations the median is
the middle value. - For an even number of observations the median is
the average of the two middle values.
8The College Class Size example
First, arrange the data in ascending order
32 42 46 46 54
Notice than n 5, an odd number. Thus the median
is given by the middle value.
32 42 46 46 54
The median class size is 46
9Median Starting Salary For a Sample of 12
Business School Graduates
A college placement office has obtained the
following data for 12 recent graduates
Graduate Starting Salary Graduate Starting Salary
1 2850 7 2890
2 2950 8 3130
3 3050 9 2940
4 2880 10 3325
5 2755 11 2920
6 2710 12 2880
10First we arrange the data in ascending order
2710 2755 2850 2880 2880 2890 2920 2940
2950 3050 3130 3325
Notice that n 12, an even number. Thus we take
an average of the middle 2 observations
2710 2755 2850 2880 2880 2890 2920 2940
2950 3050 3130 3325
Middle two values
Thus
11Mode
The mode is the value that occurs with
greatest frequency
The mode is Coke Classic. A mean or median is
meaningless of qualitative data
Soft Drink Example
Soft Drink Frequency
Coke Classic 19
Diet Coke 8
Dr. Pepper 5
Pepsi Cola 13
Sprite 5
Total 50
12Using Excel to Compute the Mean, Median, and Mode
- Enter the data into cells A1B13 for the starting
salary example. - To compute the mean, activate an empty cell and
enter the following in the formula
barAverage(b2b13) and click the green
checkmark. - To compute the median, activate an empty cell and
enter the following in the formula bar
Median(b2b13) and click the green checkmark. - To compute the mode, activate an empty cell and
enter the following in the formula
barAverage(b2b13) and click the green
checkmark. -
13The Starting Salary Example
Mean 2940
Median 2905
Mode 2880
14Variance
- The variance is a measure of variability that
uses all the data - The variance is based on the difference between
each observation (xi) and the mean ( ) for
the sample and µ for the population).
15 The variance is the average of the squared
differences between the observations and the mean
value
For the population
For the sample
16Standard Deviation
- The Standard Deviation of a data set is the
square root of the variance. - The standard deviation is measured in the same
units as the data, making it easy to interpret.
17Computing a standard deviation
For the population
For the sample
18Measures of AssociationBetween two Variables
- Covariance
- Correlation coefficient
19Covariance
- Covariance is a measure of linear association
between variables. - Positive values indicate a positive correlation
between variables. - Negative values indicate a negative correlation
between variables.
20To compute a covariance for variables x and y
For populations
For samples
21n 299
II
I
IV
III
22If the majority of the sample points are located
in quadrants II and IV, you have a negative
correlation between the variablesas we do in
this case.
Thus the covariance will have a negative sign.
23The (Pearson) Correlation Coefficient
A covariance will tell you if 2 variables are
positively or negatively correlatedbut it will
not tell you the degree of correlation. Moreover,
the covariance is sensitive to the unit of
measurement. The correlation coefficient does not
suffer from these defects
24The (Pearson) Correlation Coefficient
For populations
For samples
Note that
25(No Transcript)
26I have 7 hours per week for exercise
27Normal Probability Distribution
The normal distribution is by far the most
important distribution for continuous random
variables. It is widely used for making
statistical inferences in both the natural and
social sciences.
28Normal Probability Distribution
- It has been used in a wide variety of
applications
Heights of people
Scientific measurements
29Normal Probability Distribution
- It has been used in a wide variety of
applications
Test scores
Amounts of rainfall
30The Normal Distribution
Where µ is the mean s is the standard
deviation ? 3.1459 e 2.71828
31Normal Probability Distribution
The distribution is symmetric, and is
bell-shaped.
x
32Normal Probability Distribution
The entire family of normal probability
distributions is defined by its mean m and its
standard deviation s .
Standard Deviation s
x
Mean m
33Normal Probability Distribution
The highest point on the normal curve is at the
mean, which is also the median and mode.
x
34Normal Probability Distribution
The mean can be any numerical value negative,
zero, or positive.
x
-10
0
20
35Normal Probability Distribution
The standard deviation determines the width of
the curve larger values result in wider, flatter
curves.
s 15
s 25
x
36Normal Probability Distribution
Probabilities for the normal random variable
are given by areas under the curve. The total
area under the curve is 1 (.5 to the left of the
mean and .5 to the right).
.5
.5
x
37The Standard Normal Distribution
The Standard Normal Distribution is a normal
distribution with the special properties that is
mean is zero and its standard deviation is one.
38Standard Normal Probability Distribution
The letter z is used to designate the standard
normal random variable.
s 1
z
0
39Cumulative Probability
Probability that z 1 is the area under the
curve to the left of 1.
z
0
1
40What is P(z 1)?
To find out, use the Cumulative Probabilities
Table for the Standard Normal Distribution
Z .00 .01 .02
?
?
?
.9 .8159 .8186 .8212
1.0 .8413 .8438 .8461
1.1 .8643 .8665 .8686
1.2 .8849 .8869 .8888
?
?
41(No Transcript)
42Area under the curve
- 68.25 percent of the total area under the curve
is within () 1 standard deviation from the mean. - 95.45 percent of the area under the curve is
within () 2 standard deviations of the mean.
68.25
95.45
z
0
2
1
1
2
43Exercise 1
- Answer
- .9931
- 1-.9931.0069
- What is P(z 2.46)?
- What is P(z 2.46)?
z
2.46
44Exercise 2
- Answer
- 1-.9015.0985
- .9015
- What is P(z -1.29)?
- What is P(z -1.29)?
Red-shaded area is equal to green- shaded area
Note that
-1.29
z
1.29
Note that, because of the symmetry, the area to
the left of -1.29 is the same as the area to the
right of 1.29
45Exercise 3
What is P(.00 z 1.00)?
P(.00 z 1.00).3413
0
1
z