Exploring Group Differences - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Exploring Group Differences

Description:

EXPLORING GROUP DIFFERENCES 100 85 115 70 130 X = 100 SD = 15 55 145 Distribution of IQ scores from the entire population In this example, a 95% confidence interval ... – PowerPoint PPT presentation

Number of Views:174
Avg rating:3.0/5.0
Slides: 49
Provided by: DelSi9
Category:

less

Transcript and Presenter's Notes

Title: Exploring Group Differences


1
Exploring Group Differences
2
Before Break
  • 1) Descriptive Statistics
  • Measures of central tendency
  • Measures of variability
  • Z-scores
  • 2) Understanding statistical significance
  • Hypothesis testing
  • Alpha and p-values
  • 3) Testing for relationships/associations between
    variables
  • Correlation (Pearsons r)
  • Simple regression
  • Multiple regression

3
After Break
  • 1) Testing for Group differences
  • T-tests
  • ANOVA
  • 2) Understanding statistical significance
  • Effect Size
  • Power
  • 3) Nonparametric statistics and other common
    tests
  • Chi-square
  • Logistic regression

If you have a firm grip on pre-break material
the second half of the course becomes much easier
(in my opinion)
4
Quick review
  • I have a dataset that contains information on
    fitness and academic performance in middle-school
    children
  • I want to know if fitness is related to academic
    success
  • Im going to use PACER laps to quantify fitness
    and ISAT science scores to quantify academic
    success
  • I can answer this question in various ways
    lets start with measures of association
    (correlations)
  • What would be my null and alternative hypothesis
    using a correlation?

5
Results
  • What is the relationship between aerobic fitness
    and science ISAT results?
  • p 0.009, what does this mean?
  • Low chance of random sampling error
  • We would only see a correlation this strong, or
    stronger, 9 times out of 1000 due to random
    sampling error (due to chance)

6
Association
  • Association (and prediction) statistics like
    correlation and regression are useful, but can be
    limited
  • The other half of statistical testing is
    centered around determining group differences
  • For example, we could ask our fitness/academics
    question a different way and use a different set
    of statistics
  • Also useful in experiments (treatment vs
    control), comparing genders (males vs females),
    etc

7
Example
  • Imagine I use PACER laps to split kids into two
    different groups
  • High Fitness (high number of laps)
  • Low Fitness (low number of laps)
  • NOTE I took a continuous variable and made it
    into a categorical variable (nominal/ordinal)
  • Now I can ask the question a different way
  • What are my null and alternative hypotheses?
  • Remember, I believe that fitness is related to
    academic success

8
Example cont
  • HO There is no difference in science scores
    between the high fitness and low fitness group
  • Notice, no difference would mean fitness has no
    effect
  • HA There is a difference in science scores
    between the high fitness and low fitness group
  • A difference would indicate that fitness has some
    effect
  • This is simple enough we know how to calculate
    and compare means in SPSS

9
High vs Low Fitness Mean
  • Conclusion? Should I reject the null
    hypothesis?
  • Wait could this difference be due to random
    sampling error?

10
Need for new statistical test
  • Is this difference due to random sampling error?
  • Due to the effect of random sampling, the two
    groups will NEVER have the exact same science
    scores
  • I need a way to determine if this difference is
    REAL or due to RSE
  • I need to use a statistical test that can
    determine group differences and provide me with a
    p-value

11
T-test
  • A t-test is a family of statistical tests
    designed to determine if differences exist
    between two groups (and ONLY two groups)
  • Based on t-scores (which are very similar to
    z-scores), should tip you off they are based on
    mean and SD
  • They test for equality of means
  • If the two group means are equal then there is
    no difference
  • 3 major types of t-tests
  • One sample t-test, independent samples t-test,
    paired-samples t-test

12
T-tests
  • One-sample t-test
  • Compares mean of a single sample to known
    population mean
  • i.e., group of 100 people took IQ test, are they
    different from the population average? Do they
    have above average IQ?
  • Independent samples t-test
  • Compares the means scores of two different groups
    of subjects
  • i.e., are science scores different between high
    fitness and low fitness
  • Paired-samples t-test
  • Compares the mean scores for the same group of
    subjects on two different occasions
  • i.e., is the group different before and after a
    treatment?
  • Also called a dependent t-test or a repeated
    measures t-test
  • In all cases TWO group means are being compared

13
Independent Samples T-Test
  • Lets start here, since we need to use this test
    for our fitness/science question
  • Independent Samples T-tests
  • Used with a two-level, categorical, independent
    variable (High/Low Fitness) ONLY two groups
  • and with one continuous dependent variable
    (science ISAT scores)
  • Statistical assumptions 1) data are normally
    distributed, 2) samples represent the population,
    3) the variance of the two groups are similar
    (homoscedasticity of variance)

NOTE Same as correlation/regression except we
no longer have to worry about a linear
relationship since one of our variables is
categorical (high/low fitness)
14
SPSS Data format
  • In SPSS, the science scores are my continuous,
    dependent variable
  • I created the high and low fitness groups
    based on how many PACER laps each child completed
  • When I created them, I coded high fitness as 0
    and low fitness as 1
  • You need to recognize how your data are coded for
    a t-test

15
(No Transcript)
16
(No Transcript)
17
SPSS T-test
  • Move dependent variable to Test Variable
  • Move your independent variable to Grouping
    Variable
  • Notice, it now has 2 question marks
  • SPSS needs to know which groups to compare
  • Define Groups

18
SPSS T-test
  • Recall, high fitness was 0, low fitness was 1
  • Manually enter these values into the box
  • When done, hit continue, then ok

19
T-test results
  • The first box will contain what youve already
    seen the mean of the two groups
  • Notice, n, mean, standard deviation (ignore SE)
    for each group
  • The next box is too big for one screen, so Ive
    split it into two pieces

20
SPSS results - Output
  • Recall, both groups need to have equal variance
    (homogeneity of variance, or homoscedasticity)
  • SPSS tests for this using Levenes Test
  • Null hypothesis There is equal variance
  • This means you do NOT want a p-value lt 0.05

21
SPSS results - Output
  • If this Levenes Test p-value is gt 0.05
  • Equal variances exist, use the top line of the
    table
  • If this Levens Test p-value is lt or 0.05
  • Equal variance does not exist, use the bottom
    line
  • Becomes harder to find a statistically
    significant result

22
df Degrees of Freedom
  • The table also shows df, or degrees of freedom
  • df is used to calculate the t-score and p-value
    for the t-test
  • df n 1
  • For each group you have, subtract 1 from the
    sample
  • We have two groups
  • High Fitness, n 176, so 176 1 175
  • Low Fitness, n 98, so 98 1 97
  • Total n 176 98 274, we have 2 groups so
  • Total df 274 2 272

23
Degrees of Freedom
  • df is important to understand if you are
    calculating the p-values by hand we are NOT
  • All you need to know now is that
  • Larger sample size ? df
  • More groups ? df
  • You want large df because it reduces your chance
    of random sampling error (a large sample) and
    increases the chance youll find statistically
    significant results
  • This becomes more important beyond t-tests, since
    we can have several groups (not just 2)

24
df in our example
  • Notice, the df in our example is 272 (274
    subjects minus our two groups (high and low
    fitness)
  • If you do not have equal variances, SPSS
    downgrades your df, making it more difficult to
    find statistically significant results

25
Before we move on
  • Questions about equality of variance test? df?
  • Remember, were trying to determine if the
    difference between the two groups is real or
    due to RSE
  • What we know so far
  • And, the two groups do have equal variance

26
More results
  • Here is the important stuff (remember, using top
    line)
  • Our two groups (high/low fitness) had a mean
    difference of 12.2 on the science ISAT
  • 239.1 226.9 12.2
  • This difference is statistically significant, p
    0.001

27
Decisions
Questions about t-test results?
  • HO There is no difference in science scores
    between the high fitness and low fitness group
  • HA There is a difference in science scores
    between the high fitness and low fitness group
  • Decision?
  • Results The high fitness group scored higher
    than the low fitness group on their science ISAT
    test by 12.2 points. This difference was
    statistically significant, t (272) 3.262, p
    0.001.
  • Usually report the t value of the test and the
    degrees of freedom in the paper (from table)

28
One more thing
  • Notice in the t-test table that we also were
    provided with a 95 confidence interval
  • 95 confidence intervals are a statistic
    available from most tests, and are related to
    p-values.
  • Lower Bound 4.8, upper bound 19.5

29
95 confidence intervals
  • Confidence intervals are similar to p-values
  • Remember, p-values indicate probability of random
    sampling error
  • We want low p-values, which indicate a low
    probability of random sampling error
  • We most often use a p-value cutoff of 0.05,
    meaning we like to be 0.95 (or 95 confident)
    that this was NOT due to random sampling error
  • Confidence intervals give you a similar type of
    information, but in a more practical sense
  • Many people prefer confidence intervals over
    p-values

30
95 confidence intervals
  • Remember, in statistics we are using samples to
    try and figure out information about the
    population
  • When we calculate a mean for a sample, we are
    really trying to understand what the REAL
    population mean is
  • But, due to random sampling error, we always know
    that our sample mean is different from the real
    population mean
  • Example mean IQ score for all 7 billion humans
    is 100
  • Sample 1 of 100 humans 102.1, Sample 2 105.3,
    Sample 3 98.2, etc
  • Random sampling error

31
IQ
Pretend we keep on drawing more and more samples
until we got 100 different samples and 100
different lines on this chart
1
3
2
If we did that, would there be a pattern to where
the lines were drawn?
Sample 1 Mean 102.1
Sample 2 Mean 105.3
Sample 3 Mean 98.2
Would ALL the lines be so close to the population
mean of 100?
X 100 SD 15
145
100
85
115
70
130
55
Distribution of IQ scores from the entire
population
32
However, a 95 confidence interval would tell you
where 95 of the 100 lines fell
Not all samples will be close to 100, just due to
random sampling error
95 Confidence Interval
145
100
85
115
70
130
55
Distribution of IQ scores from the entire
population
33
But notice, the more confident we want to be,
the wider the gap gets
Could also make a 99 confidence interval if we
wanted to
Usually, people stick with a 95 confidence
interval (since we usually use a p-value of 0.05)
99 Confidence Interval
145
100
85
115
70
130
55
Distribution of IQ scores from the entire
population
34
In this example, a 95 confidence interval
indicates that we are 95 certain that the REAL
population mean falls between these two values
We can use a 95 confidence interval for
virtually any population parameter we want to
such as a correlation coefficient, a regression
slope, or a mean difference between two groups
(like with our t-test)
95 Confidence Interval
145
100
85
115
70
130
55
Distribution of IQ scores from the entire
population
35
Back to our t-test
  • Our 95 confidence interval
  • Notice is says, Interval of the Difference
  • We are 95 certain that the real difference is AT
    LEAST as big as 4.8 and might be up to 19.5
    points between our High and Low Fitness Groups
  • Our confidence level can never be 100, so there
    is always a chance the real population difference
    is outside of this range (just like p can never
    be 0)

36
Confidence Intervals and p-values
  • These two values are connected because
  • Both related to RSE
  • Both calculated using n (and df)
  • A low p-value (low chance of random sampling
    error) will result in a smaller (more narrow)
    confidence interval we can be more confident
  • A larger p-value will result in a wider
    confidence interval we are less confident

Questions on confidence intervals?
37
One more example t-test
  • Instead of using Aerobic fitness, lets use
    flexibility
  • I split my sample into High Flexibility and Low
    Flexibility groups (based on sit and reach test)
  • Now, Ill run a t-test to see if the High
    Flexibility kids score higher on their science
    tests than the Low Flexibility kids

38
What are my hypotheses?
  • HO There is no difference in science scores
    between the high flexibility and low flexibility
    group
  • HA There is a difference in science scores
    between the high flexibility and low flexibility
    group

39
T-test results Flexibility
  • We can see that the high flexibility group has a
    higher Science ISAT score by about 5, but is this
    difference statistically significant???

40
T-test results Flexibility
  • Levenes Test p 0.521
  • What does this mean?
  • df 285
  • What is our sample size?

41
T-test results
  • Notice the mean difference (difference between
    High/Low groups) and the 95 confidence interval
  • I have intentionally removed the p-value for this
    t-test
  • Is there a statistically significant difference
    between the two groups? Is the p-value lt or
    0.05?

42
T-test results
  • Remember, to reject the null hypothesis we have
    to be reasonably certain that the two groups are
    different
  • If this was the case, the difference between the
    two groups could NOT be 0
  • If the mean difference is 0, the two groups are
    identical
  • When the 95 confidence interval INCLUDES 0, we
    cant be 95 certain that there is a group
    difference and therefore, p is gt 0.05

43
T-test results
  • Our 95 confidence interval includes 0 (one
    number is negative and the other is positive)
  • Therefore, we cant be 95 certain the real group
    difference is NOT 0
  • p 0.311, we cant be sure this is not due to RSE

44
95 CI and p
  • If your 95 CI includes 0, your p-value will NOT
    be less than or equal to 0.05
  • Because both statistics are evaluating the chance
    of RSE
  • If your 95 CI does not include 0 (both numbers
    are positive or both are negative), then we can
    be confident that the two groups are not the same
  • This means that p lt or 0.05

Questions?
45
Upcoming
  • In-class activity
  • Homework
  • Cronk re-read 6.1, complete 6.3 (skip 6.2 for
    now)
  • Holcomb Exercises 35 and 37, 38, 39
  • More t-tests next week
  • Single sample t-test
  • Paired samples t-test (repeated measures t-test)

46
Example In-Class, 10 minutes
  • Go to Blackboard and open the SPSS dataset
  • Fitness and Academics Reduced (week 7)
  • Run two different independent samples t-tests
  • Determine if kids who are aerobically fit (using
    PACER) score higher in reading or math than kids
    who have low fitness
  • Write down your results in this format (x2)
  • T XX, df XX, p XX
  • Mean difference XX, 95CI (XX to XX)

47
Results of two t-tests
48
Results of two t-tests
  • Reading (equal variances NOT assumed)
  • t 4.856, df 411.2, p lt 0.0005
  • Mean difference 9.8, 95CI (5.9 to 13.8)
  • Math (equal variances assumed)
  • t 4.021, df 837, p lt 0.0005
  • Mean difference 8.8, 95CI (4.5 to 13.0)
Write a Comment
User Comments (0)
About PowerShow.com