Title: Introductory Statistical Concepts
1Introductory Statistical Concepts
2Disclaimer
- I am not an expert SAS programmer.
- Nothing that I say is confirmed or denied by
Texas AM University.
3Why Are We Here?
- Deming
- To Learn
- To Have Fun
- Question Who was Deming?
4Poll What type of organization do you work for?
- PlaceWare Multiple Choice Poll. Use PlaceWare gt
Edit Slide Properties... to edit. - Business
- Government
- Education
- Nonprofit
- Other
5Purpose of These Lectures
- A review of the statistical concepts used in most
of the SAS Analytics Lecture Series. - We will look at questions such as the following
- What is the nature of statistical analyses?
- Why are population parameters so important?
- What is really being tested when you see a
p-value? - Why does regression handle missing data so well?
- What are residual analyses?
6Descriptive Statistics
7 The Population
8Learning Outcomes
- You will learn
- basic statistical concepts
- the definition of mean, median, mode and standard
deviation - the difference between populations and samples
- the difference between parameters and estimates
- about confidence intervals
- how to test a statistical hypothesis
- how to run a regression analysis
9Parameters
- Characteristics of the variable of interest
- It is how we describe the variable of interest
- Parameters are unknown
10Parameters(Characteristics)
- Central Tendency
- Mode
- Median
- Mean
- Measures of Variability
- Range
- Variance
- Standard Deviation
Click Here for more information on Mode Mean
Median Click Here for an applet
11Variability
12What is an Index ?
How SUNNY is SUNNY? THE UV Index Click Here
13Air Quality IndexWhat Does It Mean?
14DOW JONES INDUSTRIAL AVERAGE INDEX
What does 10,971.16 really mean? What is
better a DJIA of 10,000 Or a DJIA of 12,000?
15Variability Index
- A Simple One
- Find the Largest Value
- Find the Smallest Value
- Let Range R Largest Smallest
16A More Complex Variation Index
- The Standard Deviation
- Statisticians use this index to indicate
variability - You will see it written as
- Widely available from SAS, Excel, and other
statistical packages
17Details of the More Complex Index
- Example Suppose that we observe the following
three numbers - 1 4 7
- The mean of these number is
- ( 1 47)/3 4
- We now subtract the mean from each number and
square it - (1-4)(1-4) (4-4)(4-4) (7-4)(7-4) 18
- The Standard Deviation sqrt(18/2) 3
18What does this Mean?
- By itself , it may be confusing to some.
- Comparing populations, we can use it to say which
population varies the most. - Let us look at an applet Click Here
19Using Graphs to Determine Variability
20Distributions
21Known Distribution
- With a known distribution, we know the following
- the shape
- the mean
- the variability (standard deviation)
- and/or some other information
22Classical Distributions-Normal
23Normal-Overlay
24Classical Distributions-Uniform
25Survey
- The following are called parameters of the
population - mean, median, mode
- variance, standard deviation, range,
inter-quartile range (IQR) - In general, are these known or unknown?
- Known yes (select using your seat indicator)
- Unknown no (select using your seat indicator)
26MPG-Histogram
Compare with true values !
27Simulated Sample
- In this example, we simulated taking a sample of
size 1000 from one population of cars weighing
3000 pounds with a normal distribution with
mean24 and standard deviation1. - You can practice this after class.
28Section 1.2
29Objectives
- Understand the relationships between
- populations and samples
- parameters and estimates.
- Look at an overview of hypotheses testing.
30Population
Mean, Variance, Median, Mode, Distribution,
Parameters
31Example
- Mpg of American-made cars that weigh between 2000
and 3500 pounds and were built in the 1970s. - Parameters mean, variance, and so on
- In general, we do not know the parameters.
32Purpose of Statistical Analyses
- Estimate the parameters. (Make guesses.)
- Example What is the population mean?
- Test hypothesis about the parameters. (Ask
questions.) - Example Is the population mean30mpg?
33Role of Samples
- Taking a sample of the population enables you to
- make estimates of the population parameters
- answer the questions about the population
parameters.
34Population and Sample
Mean, Variance, Median, Mode, Distribution,
Parameters
Sample
Sample mean Sample variance
S
Inference Estimates Test of hypotheses
35Example cars_american
- This is a sample of American-made cars that weigh
between 2000 and 3500 pounds and that were built
in the 1970s. - We are interested in the mpg.
- Use summary statistics to analyze the data.
36Results of Summary Statistics
37Results of Histogram
continued...
38 Results of Histogram
39Sampling Distribution Applet sampling_dist
- This demonstration illustrates how to estimate
and plot the sampling distribution of various
statistics.
40View/Application Share Demo Sampling
Distributions Applet
- PlaceWare View/Application Share. Use PlaceWare
gt Edit Slide Properties... to edit.
41http//www.ruf.rice.edu/lane/stat_sim/sampling_di
st/index.h...
- PlaceWare Web Page. Use PlaceWare gt Edit Slide
Properties... to edit.
42Confidence Intervals on the Population Mean
- Level of Comfort
- 50 21.57 to 22.21
- 95 20.96 to 22.82
- 99.9 20.30 to 23.48
What does this mean?
43Test That the Population Mean 30 mpg
- Use t-test ? One Sample t-test
- Requirements for running this test
- Large n gt 35
- Or leftovers are normal
- What is the p-value or sig value?
44Testing Mean 30
45Conclusions of the Test
- Choose an alpha level, usually alpha.05.
- If sigltalpha, then reject.
- Otherwise, fail to reject.
46Sig and p-values
- When you see a sig value or p-value
- You know that some hypothesis is being tested.
- You know whether or not the hypothesis is being
rejected. - You probably do not know what the hypothesis
really is. - Ask yourself these questions
- What are the population parameters being tested?
- How is what is being tested related to those
parameters?
47Requirements for Doing This Test
- Large n ? n gt 35
- Or leftovers are normally distributed.
- Use Histogram to test for normality.
48Populations-Which Ones are Similar?
49Populations-Which Ones are Similar?
50Take Samples
- Use the samples to answer this question
- Which populations are similar?
- Statistical translations
- Which populations are similar? is the same as
asking - Are the following the same
- distribution?
- mean?
- variance?
51Background/Requirements
- Before we jump into the analysis, we must ask the
following questions - How many populations are there?
- How many population parameters are we interested
in and what are they? - What tests do we want to do, and what are the
requirements for doing those? - Are we using everything we know?
52Example
- Suppose that we are interested in the mpg of
American and European cars. How many populations
are there?
American Cars Mpg Distribution Mean Variance
European Cars Mpg Distribution Mean Variance
53Poll How many populations are there?
- PlaceWare Multiple Choice Poll. Use PlaceWare gt
Edit Slide Properties... to edit. - One - MPG
- Two - American and European
- Depends on the sample size
54Parameters
Population 1 Population 2
American Cars European Cars
Variable of interest mpg Variable of interest mpg
Distribution Normal? Distribution Normal?
Mean Mean
Variance Variance
55Analyses
- We want to look at the distributions.
- We want to estimate the parameters.
- We want to answer these questions
- Are the populations means the same?
- Are the population variances the same?
56Example Our Data Set car_am_eu
- Suppose that we are interested in the mpg of
American and European cars.
American Cars Mpg Distribution Mean Variance
European Cars Mpg Distribution Mean Variance
Sample
Sample
57Results from the Sample
continued...
58Results
59Box Plots
American
European
60Histograms
American
European
61Poll Are the populations the same?
- PlaceWare Yes/No Poll. Use PlaceWare gt Edit
Slide Properties... to edit. - Yes
- No
62Conclusion Based on Sample Numbers and Graphs
- Easy -- Based on the samples, the populations are
differentno statistical jargon - But I must have a p-value for my boss, for my
paper, and so on.
63Formal Tests
- The classical approach in determining whether two
populations are the same is to test to see
whether the two population means are equal. - But first we check to see whether the two
population variances are equal -
continued...
64Formal Tests
- We use t-test ? Two Sample.
Test 2
Test 1
65Section 1.3
66Objectives
- Identify the following
- the population parameters
- the appropriate model
- number of populations sampled
- the correct hypotheses
- what should be tested for normality
- what equal variances means.
67MPG Example
Weight 3000
Weight 2600
Take a sample of size 1 from each population!
Weight 2300
Weight 2900
68Data
- We should be in deep trouble with one sample from
each population. - We have eight unknown population parameters.
- Can you name them?
- But what do we know?
69Survey
- Name the population parameters.
70Essential Part and Leftovers
- We want to model the data as follows
- MPG Essential Part Leftover
- or
- MPG Mean Leftover
71Know or Assumptions
- First, we know that
- Second, each population mean is related to weight
by the following - The population means fall on a straight line!!
- How many unknowns are there now?
72Poll How many unknowns are there?
- PlaceWare Multiple Choice Poll. Use PlaceWare gt
Edit Slide Properties... to edit. - 1
- 2
- 3
- 4
- 5
- n
73Graph
74Observed, Essential Part, Leftover
75The Official Regression Model
76Main Assumptions
- The means of the populations fall on a straight
line. - All of the variances are equal ( ).
- The errors are known to be normal with mean 0
and variance .
77Assumptions for Simple Linear Regression
Appendix A
- This demonstration illustrates the fundamental
concepts of simple linear regression.
78View/Application Share Demo Linear.doc
- PlaceWare View/Application Share. Use PlaceWare
gt Edit Slide Properties... to edit.
79How Can We Estimate the Unknown Parameters?
- The Principle of Least Squares
- or
- or
- Now, choose a and b so that
is as small as possible. - or
- Minimize .
80OUTPUT_0
81OUTPUT
82OUTPUT_1
83OUTPUT_2
84OUTPUT_3
85OUTPUT_4
86Missing Values
- Suppose that we want to estimate the mean mpg
when weight2500. - Predicted (Estimated) Mean MPG 44.05 -
.0078weight - Why does this work?
87Survey
- Can anyone explain why this works?
88Conclusion
- Simple linear regression is very powerful.
- But it is based on assumptions (what we know).
- We need to check assumptions (residual analyses).