Welcome to Statistics 111 - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Welcome to Statistics 111

Description:

Fill out a questionnaire and hand it in before the break ... Does this mean that sweeteners cause weight gain? What is probably happening here? ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 30
Provided by: shanej8
Category:

less

Transcript and Presenter's Notes

Title: Welcome to Statistics 111


1
Welcome to Statistics 111
  • Alex Braunstein

The goal of this course is to develop basic tools
for data analysis, probability and statistical
methods. Key topics covered in the course include
exploratory data analysis, regression,
probability, estimation, and hypothesis testing
2
Syllabus notes website
  • All handouts will be available on the website
  • http//stat.wharton.upenn.edu/braunsf/stat111.htm
    l
  • Website also contains my contact information
  • Link on website for getting Wharton class account
    if you are not a Wharton student
  • Helpful if you want to use Wharton computer labs

3
Syllabus notes Homeworks
  • Homeworks will be handed out at the beginning of
    every week
  • 5 homeworks in all
  • Homeworks will be submitted at the beginning of
    class on Mondays
  • You are encouraged to work together on homework,
    but homeworks are to be completed separately and
    handed in individually.
  • Do not copy from another person.
  • No late homeworks will be accepted!!
  • Late homeworks will get a score of zero, without
    exception
  • Your lowest homework grade is not included in
    final grade

4
Syllabus Notes Midterm Exam
  • Midterm is held on following date
  • Monday, June 15th (in class)
  • No makeup midterm examination!
  • A missing midterm exam counts as a zero score
  • Consider taking this class in the fall or spring
    if you can not attend the midterm!

5
Student Questionnaire
  • Fill out a questionnaire and hand it in before
    the break
  • I will try to incorporate some of the subjects
    that interest you into future lectures

6
Course Overview
Collecting Data
1
Exploring Data
2
Probability Intro.
3
Inference
4
Comparing Variables
Relationships between Variables
2
1
1
1
Means
Proportions
Regression
Contingency Tables
7
Out in public You do statistics ?!?
  • I hated that class in college!
  • That was the most boring class ever!
  • Lame.

8
Big Picture Ideas
  • Statistics is all about uncertainty
  • Focus as much on what we dont know (or havent
    observed) instead of what we know
  • Formulating the question that we want to answer
    is often the most difficult part
  • Statistics is part mathematics, part
    roll-up-your-sleeves-and-get-thinking.

9
Science and Skepticism
  • We always need to be cautious about conclusions
    based on data
  • Possible sources of bias and confounding?
  • How might things have gone wrong?
  • A little bit of skepticism is a good thing!

10
Statistical Modeling
  • Inference using mathematical models of
    uncertainty to answer questions
  • Connect probability concepts to our data
  • Can not make claims without using models and
    making assumptions
  • Are the assumptions reasonable?

11
After the break
  • Collecting Data Design of Experiments
  • Sections 3.1-3.2 in Moore, McCabe and Craig
  • First couple of classes will not involve much
    math at all, but we will get into lots of data
    analysis after that!

12
Break!
  • Hand in questionnaire
  • 5 minutes

13
Outline for Second Half of Lecture
  • Introduction to Experiments
  • Sources of Bias in Experiments
  • Techniques for Avoiding Bias
  • Matching
  • Randomization
  • Block Designs
  • Blinding and Double-Blinding
  • Experiments vs. Observational Studies
  • Association vs. Causation

14
Experiments
  • Used to address a specific question
  • Often used to examine causal effects
  • Eg. medical trials, education interventions

Treatment Group
Treatment
Result
1
Experimental Units
2
3
4
Population
Control Group
No Treatment
Result
  • Can we just look at difference in results to get
    the causal effect of the treatment?
  • Depends on whether the experiment was done well
  • many possible sources of bias in design of
    experiments

15
Sources of Bias
  • An experiment or study is biased if it
    systematically favors a particular outcome
  • Subjects are not representative of the population
  • Treatment and control groups are inherently
    different on some lurking or confounding variable
  • Subjects are influenced by knowing they are in
    treatment or control groups
  • Evaluator of outcomes is influenced by knowing
    they are in treatment or control groups

Treatment Group
Treatment
Result
1
Experimental Units
2
3
4
Population
Control Group
No Treatment
Result
16
Bias 1 Non-representative units
  • If your subjects are not representative of the
    population, you wont be able to generalize the
    results even if the experiment is well done
  • Here are two examples
  • Treatment group High Level NICUs
  • Control Group Low Level NICUs
  • Problem classification of NICU is different from
    state to state, so a hospital that might qualify
    as a high level NICU in one state might not in
    another
  • Observed differences between the groups can not
    be generalized from one state to another

17
Bias 2 Confounding/Lurking Variables
  • Treatment group and control group are different
    on some variable that also influences the outcome
  • A confounding variable means that we cant
    attribute difference in outcomes to just the
    treatment
  • Part of the difference may be due to the
    confounding variable not the treatment
  • Simple example a breast cancer drug trial where
    only women receive the treatment and only men
    receive the control
  • Gender becomes a confounding variable
  • Are treatment vs control outcomes different due
    to the treatment or gender differences between
    groups?

18
Bias 3 Subject knows treatment assignment
  • A subjects outcome is influenced by knowing that
    he/she is in a treatment or control group
  • Eg. drug trials patients improve just because
    they think they are receiving the drug
  • Solution blinded experiment with placebo
  • Placebo appears to be the treatment, so all
    subjects (treatment and control) dont know their
    true treatment assignment
  • Controls may improve outcomes slightly this is
    often called the placebo effect

19
Bias 4 Evaluator knows treatment assignment
  • Person evaluating outcome (eg. doctor in drug
    trial) may also be influenced by knowing who
    receives treatment
  • Not a problem if outcome is something
    indisputable, such as death!
  • This is a problem for more subjective measures
    like pain reduction or results from social
    programs
  • Solution double-blinded experiment where neither
    subjects not evaluators know treatment
    assignments

20
Association vs Causation
  • In the presence of a confounding variable, we can
    only conclude there is an association between
    treatment and outcome, not causation

21
Examples Reporters are stupid
  • Children who watch many hours of TV get lower
    grades in school on average than those who watch
    less TV
  • Does this mean that TV causes poor grades?
  • What are potential confounding variables?
  • People who use artificial sweeteners in place of
    sugar tend to be heavier than people who use
    sugar
  • Does this mean that sweeteners cause weight gain?
  • What is probably happening here?

22
One solution Matching
  • Make sure that treatment and control groups are
    very similar on observed variables like race,
    gender, age etc.
  • Block designs divide subjects into blocks with
    similar observed variables before dividing them
    into treatment vs control
  • Special case Matched Pairs
  • Subjects are matched up into pairs, then one
  • member of each pair gets treatment and the
  • other gets control
  • Example Dandruff experiment
  • treatment applied to one side and control
  • to other side of head
  • No reason to expect difference
  • in sides except for treatment

23
Another Solution Randomization
  • Problem with matching is that you cannot usually
    match on unobserved characteristics (eg.
    Genetics)
  • Eg. Cholesterol drug trial - cant match
    treatment and control groups on genetic
    predisposition for high cholesterol
  • Randomly assign subjects to treatment or control
  • Random assignment should lead to groups that are
    similar or balanced on both observed and
    unobserved confounding variables
  • Example student questionnaire earlier in class -
    each form you filled out was randomly assigned
    either a 1 or 2

24
Randomization of In-Class Survey
  • Check to see if groups are balanced
  • There are differences, but are they
    significant?
  • Later on in the course, we will be able to answer
    questions like this
  • Of course, we cant check the balance for
    unobserved variableswe just have to trust the
    randomization process
  • This is why good science needs to be replicable

25
Even Better Randomization Matching
  • Randomization generally leads to treatment and
    control groups that are evenly balanced but you
    can still get unlucky and get unbalanced groups
  • Example randomly placing 20 people (10 males, 10
    females) into treatment and control groups.
  • How many males will end up in treatment group?
  • Ideally, we would have 5 males in treatment
    group, and 5 males in control group (balanced)
  • However, there is a chance to get 9 males in
    treatment and 1 male in control group (unbalanced)

26
Even Better Randomization Matching
  • Randomized Blocks randomize within blocks of
    observed variables
  • Example
  • Divide up subjects into males and females first,
    then randomly assign treatment or control to
    subjects in each group separately
  • Guarantees that equal number of males end up in
    treatment group and control group (same with
    females)
  • Randomized Matched Pairs randomly decide which
    member of each pair gets treatment vs. control
  • Example
  • For each head in dandruff experiment, randomly
    assign which side of head to get dandruff shampoo
    vs. control

27
Experiments vs. Observational Studies
  • Often, we want the causal effect of some
    treatment, but our data are from an observational
    study
  • Observational studies examine effects of some
    variable but without the advantages of a
    controlled experiment
  • No treatment is applied in observational studies
  • Example health effects of smoking
  • Unethical to randomly impose a treatment
  • Could there be some confounding variable that
    explains health differences between smokers and
    non-smokers ?
  • Very risky to make causal statements from
    observational data, since we can not avoid bias!

28
Health Effects of Chocolate
  • Report to European Society of Sexual Medicine
  • 153 Italian women filled out sexual function
    questionnaires
  • intriguing correlation sexual function/desire
    significantly greater among chocolate-eaters
  • Observational study association does not imply
    causation!
  • Confounding average age is 35 among frequent
    chocolate-eaters, compared with 40.4 in
    non-chocolate group

29
Next Class - Lecture 2
  • Collecting Data
  • Surveys and Sampling
  • Graphical summaries of a single variable
  • Moore, McCabe and Craig Sections 3.3 and 1.1
Write a Comment
User Comments (0)
About PowerShow.com