Title: Research methods and statistics in Cognitive Science by Annette Hohenberger
1Research methods and statistics in Cognitive
Science by Annette Hohenberger
- Recapitulating/Introducing basics in Statistics
- Chapter 1 Everything you ever wanted to know
about statistics - Andy Field (20052) Discovering Statistics Using
SPSS. Second Edition. London Sage. - Disclaimer
- All pictures and equations are taken from chapter
1 (2005) and chapter 2 (2000)?
2Building statistical models
In Architecture 1. There is a real world
phenomenon, e.g., a physical bridge, which we
want to construct 2. We construct a model of the
bridge (see 3 models in Fig. 1) 3. We compare
the model(s) to the real world. How good are the
models? 4. We construct the real brigde
accordingly.
Field (2000) 2
3In CogSci, how do we proceed?
1. We don't have access to a real-world
phenomenon in its entirety, e.g., to the
distribution of intelligence in the human kind 2.
Instead, we collect data (administer an
intelligence test) from the real world (draw a
sample) on the phenomenon we are interested in,
e.g., intelligence 3. We construct a model of the
data, e.g., analyze the mean IQ, SD, factorial
anaysis, etc. 4. We draw conclusions about the
real world phenomenon, how intelligence is
distributed in the population.
4- Sample
- Subset of the population (as in Micro-census)?
- Should be representative of the population (for
valid inferences)? - Fully observable
- Can be measured repeatedly
- Various samples can be drawn
- Population vs.
- Entire set of people, animals, or objects
- Usually not observable in its entirety
- Attempts Macro-census
5Statistical values of a sample- The mean -
- Expl We draw a sample of 5 statistics teachers
and determine how many friends they have - (12334)/52.6
- The mean number of friends is 2.6
- The mean is a statistical model of the
population. It also sometimes written as ?
- Total error sum of deviations
- (-1.6)(-0.6)(0.4) (0.4)(1.4)0
6Mean and total error
deviations
7Sum of squared errors (SS)?
- Errors are squared so to get rid of the direction
of the error (if or )
The sum of the squared errors SS 5.20 Problem
SS depends on the number of subjects
8Variance s²SS/n-1
- We divide the sum of the squared errors by n-14
- It's n-1 because of the degrees of freedom
- if you have 5 observations, you can choose 4 of
them freely, the last one, however, is determined
9Standard deviation s square root of the
variance s²
S ? 1.3 1.14
- The standard deviation is abbreviated with s, or
sigma ???or SD
10Same mean different SD's
- Although the mean of two samples is the same,
their SD's may not be the same
11SE The SD of the population Mean
Population Mean
- (the SE is a 2nd order SD)?
Individual sample mean
- The SE is the SD of the mean of the population.
If we draw multiple samples, the SE measures the
variability between these different samples
12SE The SD of the Mean
- If we collected many many samples, their means
would form a normal distribution, see diagram - The SD of this distribution is the SE.
Most samples would have values around the
mean, e.g. 2,6 friends
Few samples would have very low values, e.g.
only 1 friend
Few samples would have very high values, e.g.
only 5 friend
http//helios.bto.ed.ac.uk/bto/statistics/tress3.h
tml
13Calculating the SE, ?n
- ?n ? / ?n
- SE is SD divided by the square root of the sample
size
14Summary - Example
- Sample 5 lecturers have 1, 2, 3, 3, and 4
friends - Mean (12334)/5 2.6
- Sum of squared errors, SS ? ( xi x)2
- (-1.6)2 (-0.6)2 (0.4)2 (0.4)2 (1.4)2
- 2.56 0.36 0.16 0.16 1.96 5.2
- Variance s2 SS/n-1 5.2/4 1.3
- Standard deviation, s, SD ?s2 ?1.3 1.14
- Standard error, SE SD/?n 1.14/?5 1.14/2.23
0.5
15Frequency Distributions
http//www.geocities.com/ResearchTriangle/System/3
737/statdist.html
16The normal distribution/ bell-shaped curve
- The 'normal distribution', made popular by Carl
Friedrich Gauss, as depicted on the old
10-DM-bill (DM 'Deutsche Mark', Germany's
currency before the EURO)?
17The ubiquity of the normal distribution
- Many features in humans are normally distributed
- -
- -
- -
- -
- -
- -
- -
18Examples for normal distributions in genetics
body height
http//www.micro.utexas.edu/courses/levin/bio304/g
enetics/genetics.html
19Examples for normal distributions in genetics
skin colour
http//www.micro.utexas.edu/courses/levin/bio304/g
enetics/genetics.html
20Examples for normal distributions in genetics -
intelligence
http//www.micro.utexas.edu/courses/levin/bio304/g
enetics/genetics.html
21Skewed distributions left skewed vs. Right
skewed
Mean Mode
Mode Mean
- In skewed distributions, the mean is not a good
characteristic. The mode or the median are better
suited values for the central value.
22Alternative average values
- Mean statistical mean
- 1 1 1 1 1 2 3 3 4 5 9 mean2.9
- Median The value in the middle of a series of
values, e.g. - 1 1 1 1 1 2 3 3 4 5 9 median2
- Mode The most frequent value of a series,
- 1 1 1 1 1 2 3 3 4 5 9 mode1
- Note The median and the mode characterize
non-normal distributions better than the mean.
23Confidence IntervalsSignificance levels/Fisher's
criterion
- Confidence intervals set boundaries within which
the true value of the population falls. - The sample mean is the midpoint, and there is a
lower and an upper boundary (two-tailed test) or
there is just an lower or an upper boundary
(one-tailed test). - We can choose the probability with which the
population mean should fall within the confidence
interval, usually with ???????????error
chance??or????????????error chance??? - When the statistical test (e.g., t-test, F-test)
yields a value that is significant for an ?0.05
or ?0.01, we can say that with an error
probability of only 5 or 1 our finding is
significant.
24Confidence Intervals two-tailed
- Two-tailed testing
- With a chosen confidence level of ??????, 2,5 of
both sides of the distribution is cut off (2,5
2,5 5). Within these two boundaries, the true
value must fall. - If we have no expectation about the presumed
direction of the effect, we make a two-tailed
test.
2,5
2,5
In a normal distribution, 95 of the distribution
falls within the z-values from -1.96 to 1.96 (to
be looked up in the Appendix A1). Z-values are
standardized values in a standard normal
distribution with mean 0 and SD 1
25Confidence interval, one-tailed
- One-tailed testing
- With a chosen confidence level of ??????, the
entire error probability is accrued on one side. - If we have a clear expectation about the presumed
direction of the effect (treatment goup should
have smaller or bigger values than the control
group), we make a one-tailed test.
5
Q Is it easier to get an effect statistically
significant for a chosen ?, in a two- or in a
one-tailed test?
26Calculating boundaries of confidence intervals
- The equation for calculating the lower and upper
boundary for any given (normal) distribution is - lower boundary mean (1.96 x SE)?
- upper boundary mean (1.96 x SE)?
27Linear Models
- The standard model in statistics is the
- LINEAR MODEL.
- In a linear model, all data points are captured
by an ideal straight line (therefore 'linear').
The straight line either expresses a positive
(upward line) or negative (downward line)
relation between two variables.
28Linear vs. Non-linear models
29Pros and Cons of the Linear Model
- Pro
- Easy, feasible, less powerful
- Most wide-spread in statistics
- Often good approximation
- Counter
- Linearity is rather the exception than the rule
- In nature, non-linearity is ubiquitous
- There are non-linear tools availabe!
- Good statistical demeanour first plot your data
and then decide on a fit linear or non-linear
30Is my model representative of the real world? -
H1 vs H0-
- In an experiment, we test the
- experimental hypothesis H1 against the null
hypothesis H0. If the statistical test reaches
significance (p0.05 or p0,01), we conclude that
H1 is true, i.e., that our statistical model also
represents the real world.
31What can we conclude from a significance test?
- 1. Is a significant effect automatically
important or meaningful? - --gt NO, you can get any small effect significant
if you take a big enough sample. - 2. Does a non-significant result mean that H0 is
right? --gt NO, it just tells us that the effect
is not big enough to be distinguishable from
noise. - --gt Null-effects should not be interpreted at
all. - 3. Does a significant result mean that H0 is
wrong? --gt NO, only that H0 is very unlikely - --gt Statistics only allow us to draw
probabilistic conclusions
32Is my model representative of the real world? -
variance -
- In a test statistics we confront two types of
variance - test statistics V1 variance explained by the
model - V2 variance not explained by the model
-
- V1 is due to a genuine effect of the experimental
condition (hopefully the one we have
hypothesized)? - V2 is a mixture of unsystematic (error) variance
and systematic variance other than V1. - The more variance we can explain (V1), the better
is our test statistics.
33Type 1 and Type 2 Errors
- Type 2, ??level
- We believe there is no effect while there is one
- Cohen suggests that the ??level of type 2 error
be only p.2 (20)? - Expl If we repeat our test 100 times in a
population where a genuine effect exists, we
would overlook this effect in 20.
- Type 1, ? level
- We believe there is a genuine effect while there
isn't any - The probability of this error is 5 if we accept
the ? level of p0.05 (Fisher's criterion). - Expl If we repeat our test 100 times, there will
be 5 tests that will turn out to be significant
(making us think there is an effect)?
--gt There is a trade-off between Type 1 2
error The higher the one, the lower the other.
34Effect size
- As a means to judge whether a given effect is
indeed important, we can calculate the 'effect
size' of the test. The effect size is a
standardized and objective measure of the
magnitude of an observed effect - Cohen's d and Pearson's r are common measures
- Pearson's r is a correlation coefficient that
lies between 0 (no effect) and 1 (full effect). - Expl r.10 (small effect, explains 1 of the
variance)? - r.30 (medium effect, explains 9 of the
variance)? - r.50 (big effect, explains 25 of the
variance)? - --gt It is recommended that the effect size is
reported in any publication -
35Determining the effect sizeStatistical power
- The ability of a test to detect an effect of a
given size is called 'statistical power'. - The probability to detect an effect if one exists
is 1-? (i.e., 1- the p of overlooking the
effect). - When the recommended ? level is .2, then the
statistical power of a test should be 1-.2.8. - --gt The power of a statistical test should be .8,
i.e., we want to have a 80 chance of detecting
an effect if there is one in reality.
36How to compute the statistical power ??of a test
- 1. (Calculate the power of our test --gt to be
done in later chapters, in concrete examples)? - 2. Estimate the sample size necessary for
obtaining a desired level of power - n of required subjects effect size
- n 783 for r .1
- n 85 for r .3
- n 28 fpr r .5