Title: STT215: CHAPTER 3 PRODUCING DATA Dr. Cuixian Chen
1STT215 CHAPTER 3 PRODUCING DATADr. Cuixian
Chen
2UNCW 2011-2012 Enrollment Profile
- How many students enroll at UNCW for 2011-2012?
- How many undergraduates/graduates?
- How many of female/male students?
- What is the expenses for In-state/out-state
students? - How many of UNCW faculties have PhD or the
highest degree in their fields? - What about freshmens SAT/ACT scores?
- How many of freshmen choose UNCW as their first
choice?
3UNCW 2011-2012 Enrollment Profile
http//uncw.edu/admissions/documents/FreshmanProfi
le2012.pdf
4UNCW 2011-2012 Enrollment Profile
http//uncw.edu/admissions/documents/FreshmanProfi
le2012.pdf
5Some terminology
- Definition
- Population the entire group of individuals or
objects of interest. - Sample subset of the population on which
information is obtained. - Census when sample is the entire population.
- Response rate ( of response)/(sample size)
6Example of population/sample
- To assess the opinion of students at the Ohio
State University about campus safety, a reporter
interviews 15 students he meets walking on the
campus late at night who are willing to give
their opinion. - ? What is the sample here? What is the
population? Why? - All those students walking on campus late at
night - All students at this university with safety
issues - The 15 students interviewed
- All students approached by the reporter
73.1 Design of Experiments
- Experimental units (subjects for human)
individual on which experiment is done. - Treatment (or factor) specific experimental
condition (e.g. certain real medicine). - Placebo false treatment to control for
psychological effects (e.g. sugar pills) - Types of variables
- Response variable variable that measures the
outcome of the study. - Explanatory variable (Factors) variable(s)
that explains or causes changes in the response
variable.
- In a study of sickle cell anemia, 150 patients
were given the drug hydroxyurea, and 150 were
given a placebo (dummy pill). The researchers
counted the episodes of pain in each subject.
Identify - The subjects
- The factors / treatments
- And the response variable
- (hydroxyurea and placebo)
Examples 1. Smoking and lung cancer 2.Running
on a treadmill and heart rate php 3.23(a)
3.27, 3.28,3.30(a).
8Example New Drug Experiment
- A new drug is introduced. The drug is given by
investigator to subjects (patients) in a
treatment group, but other subjects are in
control group they arent treated or treated
with traditional method (placebo). - Subjects should be assigned randomly. The
experiment should be double-blind neither the
subjects nor the doctors (evaluators) should know
who was in the control group. - Question how can you make 3.10(P174) a double
blind experiment? - php 3.19,3.22(how you make it a double blind)
9Observational study vs Experiment
- Observational study the investigator observes
individuals and measures variables of interest
but does not attempt to influence the response. - Example Based on observations you make
in nature, you suspect that female crickets
choose their mates on the basis of their health.
? Observehealth of male crickets that mated. - Experiment (study) the investigator observes
how a response variable behaves when the
researcher manipulates one or more factors. - Example Deliberately infect some males with
intestinal parasites and see whether females
tend to choose healthy rather than ill males. - Php 3.121, 3.124
10Example 3.4, page 168
- Researchers had a study on a daycare which
had enrollment 1,364 infants in 1991. In 2003,
the researchers found out that the more time
children spent in child care from birth to age
4.5, the more adults tended to rate them, both at
age of 4.5 and at kindergarten, as less likely to
get along with others, as more assertive, as
disobedient, and as aggressive. - Q1 Is it an observational study or an
experiment? Why? - Q2 Explanatory variable? Response variable?
- Q3 Does it prove that spending more time in
daycare causes children to have more problems in
behaviors? How to improve it to be an
experiment?
11Drawbacks of Observational Study (example 3.4)
- In Example 3.4, the effect of child care on
behavior is confounded (mixed up) with the
characteristics of families who use daycare
(lurking variables the variable(s) associated
with the response, but are not of interest
effects cannot be separated from the effect of
the explanatory variable on the response ). -
- Observational studies Often, the effect of one
variable on another often fail because the
explanatory variable is confounded with lurking
variables. - Question find the lurking variable of EX 3.18
(a)page 184 - HWQ find the lurking variable of EX 3.17 page 184
12Example 3.7, page 170
- Study Do smaller classes in elementary school
really benefit students in areas such as scores
on standard tests, staying in school, and going
to college? - The Tennessee STAR program each students of
6,385 students who were beginning kindergarten
was assigned to three types of classes - (1) regular class with one teacher
- (2) regular class with one teacher and a
full-time aid - (3) small class.
- Four years later, they returned to regular
classes. The only systematic difference was the
type of class. In later years, the students from
small classes had higher scores on standard
tests. - Q1 What is the treatment?
- Q2 Is it an observational study or an
experiment? Why? - Q3 Explanatory variable? Response variable?
- Q4 What is the only systematic difference within
the students? - Q5 Can it prove that class size made the
difference?
13The Strength of Experiments (compared with
observational studies)
- Experiments provide good evidence for causation
(able to control lurking variables) - Example 3.7, page 170
- lurking variables the variable(s) associated
with the response, but are not of interest
effects cannot be separated from the effect of
the explanatory variable on the response - Example 3.4, page 168
143.1 Design Of Experiments (Bias in Comparative
Experiments)
Ann Landers summarizing responses of readers 70
of (10,000) parents wrote in to say that having
kids was not worth itif they had to do it over
again, they wouldnt.
Bias Most letters to newspapers are written by
disgruntled people. A random sample showed that
91 of parents WOULD have kids again.
153.1 Design Of Experiments (Principles in
Comparative Experiments)
- 4. Plus Double Blind if possible.
- Randomization is very important in
experimentshelps to ensure groups are as similar
as possible. - Q 3.17 on p184.
163.1 Design Of Experiments (How do we randomize
by Calculator)
- Draw names out of a hat, toss a fair coin (die),
use table of random digits, computer software
(calculator).
How to use TI83/84 to generate number and
randomly select 2 subjects out of 3? step1 From
the main screen press MATH and use the arrow
keys to scroll to PRB step2 Select 1rand and
rand will be displayed on the main screen step3
Press ( 3 ) and ENTER step4The
calculator will display the 3 randomly generated
numbers step5 order the subjects in the
population, and match each subject with a
number. step6 the two subjects associated with
the 2 smallest numbers is our random choice.
Q1 How do we randomly select two names from
Tom, Jerry, Micky, Minnie ? Q2 How do we
randomly divide Tom, Jerry, Micky, Minnie into
two groups?
17How to use table of Random Digits (Table B)
- Steps
- Label each subjects.
- Use table to choose the number of labels until
you get the sample size you desire. - EX 3.11, page 185 Use table to assign class of
40 students to two groups of same size. Suppose
we begin at line 130 of Table B. - 69051 64817 87174 09517 84534 06489 87201
97245
EX Begin with Line 151 of Table B, assign a
class of 10 students into 2 groups of same size.
Start label 01, 02, , 10.
183.1 Design Of Experiments(Outline of a
randomized designs)
Completely randomized experimental designs
Individuals are randomly assigned to groups, then
the groups are randomly assigned to treatments.
19Example 3.13, page 179
- What are the effects of repeated exposure to an
advertising message (digital camera)? The answer
may depend on the length of the ad and on how
often it is repeated. Outline the design of this
experiment with the following information. - Subjects 150 Undergraduate students.
- Two Factors length of the commercial (30 seconds
and 90 seconds 2 levels) and repeat times (1,
3, or 5 times 3 levels) - Response variables their recall of the ad,
their attitude toward the camera, and their
intention to purchase it. (see page 187 for the
diagram.)
HWQ 3.18, 3.30(b),3.32
203.1 Design Of Experiments (Block designs)
In a block, or stratified, design, subjects are
divided into groups, or blocks, prior to
experiments to test hypotheses about differences
between the groups. The blocking, or
stratification, here is by gender (blocking
factor).
EX3.19
Ex 3.17 (p182), 3.18 HWQ 3.47(a,b), 3.126.
213.1 Design Of Experiments (Matched pairs designs)
Matched pairs Choose pairs of subjects that are
closely matchede.g., same sex, height, weight,
age, and race. Within each pair, randomly assign
who will receive which treatment. It is also
possible to just use a single person, and give
the two treatments to this person over time in
random order. In this case, the matched pair
is just the same person at different points in
time.
HWQ 3.120
223.2 Sampling Design (Stratified random sample)
- Simple Random Sample (SRS) every sample of size
n has the same chance of being selected - Stratified random sample (strata) first divide
into groups, and then take a SRS from each
stratum.
233.2 Sampling Design (simple random sample)
- Simple Random Sample (SRS) every sample of size
n has the same chance of being selected. - How do we do it? Use your calculator.
- Q1 How do we select a simple random sample of
two from Tom, Jerry, Micky, Minnie ? - HWQ 3.52(a,b,c) 3.54(b,c) (are they SRS?)
Example A university has 2000 male and 500
female faculty members. This is the total
population. The university wants to randomly
select 50 females and 200 males for a survey,
giving each faculty member a 1 in 10 chance of
being chosen. Is this a simple random sample
(SRS)?
No. In an SRS there could be any number of males
and females in the final sample. Here,
stratification prevents that.
243.2 Sampling Design( Voluntary Response Sampling)
- Voluntary Response Sampling Individuals choose to
be involved. These samples are very susceptible
to being biased because different people are
motivated to respond or not. Often called
public opinion polls. These are not considered
valid or scientific. - Bias Sample design systematically favors a
particular outcome.
Ann Landers summarizing responses of readers 70
of (10,000) parents wrote in to say that having
kids was not worth itif they had to do it over
again, they wouldnt.
Bias Most letters to newspapers are written by
disgruntled people. A random sample showed that
91 of parents WOULD have kids again.
253.3 Towards Statistical Inference
- Use information from sample (known information)
to infer about the population (unknown) - Statistics information from a sample.
- Parameter information from a population.
- Sampling variability information from a sample
will differ from one sample to the next.
26Population versus sample
- Sample The part of the population we actually
examine and for which we do have data. - How well the sample represents the population
depends on the sample design. - A statistic is a number describing a
characteristic of a sample.
- Population The entire group of individuals in
which we are interested but cant usually assess
directly. - Example All humans, all working-age people in
California, all crickets - A parameter is a number describing a
characteristic of the population.
Population
Sample
27Sampling variability
- Each time we take a random sample from a
population, we are likely to get a different set
of individuals and a calculate a different
statistic. This is called sampling variability. - The good news is that, if we take lots of random
samples of the same size from a given population,
the variation from sample to samplethe sampling
distributionwill follow a predictable pattern.
All of statistical inference is based on this
knowledge.
28(No Transcript)
29Bias and variability Arrow shooting as an example
303.3 Towards Statistical Inference (cont.)
- How to decrease bias?
- Random sample and better instruments
- How to increase precision?
- Larger sample
- Population size does not effect precision!!!
Sample size does.