Title: Outline
1Outline
- Test bias definitions
- The basic issue group differences
- What causes group differences?
- Arguments that tests are not biased
- Differential item functioning analysis
- Criterion-related sources of bias
2Outline
- Other approaches to testing minority groups
- Chitling test
- BITCH test
- SOMPA
- Models of test Bias
- Regression
- Constant Ratio
- Cole/Darlington
- Quota
3Test bias definition
- A test is biased if it gives a systematically
wrong result when used to predict something.
- So, an intelligence test would be biased if, for
example, it underestimated one groups
probability of success in a given endeavor.
4Test bias the basic issue
- Various groups within society differ in their
average scores on some psychological tests.
- We dont know what causes these differences.
5What causes group differences?
- Genetics
- Socioeconomic factors
- Caste
- Culture
- Stereotype threat
6Arguments that tests are not biased
- Major tests have been subjected to impressive
scrutiny for decades - Enormous resources are devoted to this purpose
- Criterion validity has been established very
securely for the major intelligence tests they
do predict college and job performance
7Arguments that tests are not biased
- It is not appropriate to focus on individual
items on a test, which some critics of testing do
- Items should be drawn from a variety of domains,
not all of which will be familiar to anyone
8Arguments that tests are not biased
- Test developers evaluate tests on the basis of
overall patterns of prediction utility
- Theyre future-oriented, not past-oriented
- How will you do in college or in a job?
- Not have you had the opportunity to learn?
9Arguments that tests are not biased
- Do you think of test score results as outcomes
or as information (predictors)?
- Test developers say, results are the beginning,
not the end they are information that will
guide us - Opponents see test results as outcomes
10Arguments that tests are not biased
- Systematic studies have asked whether biased
items produce group differences on tests such as
Stanford-Binet and Wechsler tests
- These studies found no evidence that group
differences disappeared when allegedly biased
items were removed
11Argument that tests are not biased
- Group differences just as large on what is
considered the most culture fair test, Ravens
Progressive Matrices, as on WAIS
- IQ scores have same utility for prediction
regardless of race or socio-economic status.
12Differential item functioning analysis
- In this approach to testing for bias, you first
form groups for comparison which are equated on
overall test score - Implication groups are equivalent in overall
ability
- Then, you look for differences between groups on
individual items - Where difference is found, you conclude that the
item is biased (since groups are not different on
ability)
13Differential item functioning analysis
- But removing such items does not eliminate group
differences
- E.g., people depicted in test items may typically
be White male - But changing this has little effect (McCarty,
Noble, Huntley, 1989)
14Criterion-related sources of bias
- We evaluate criterion validity by looking at
correlation between test scores and criterion
scores
- E.g., SAT scores vs. GPA after 4 years at
university
15Criterion-related sources of bias
- If correlation is good, we use test scores (e.g.,
SAT) to predict criterion and make selection
decisions
- What do we do if the correlation is different for
different groups? - This would imply that test scores mean different
things for different groups
16Criterion-related sources of bias
- In this graph, Group B performs better than Group
A but the correlation is the same for both
17Criterion-related sources of bias
- In this graph, the slopes of the lines are the
same but the intercepts are different - Equal slopes means equal correlations that is,
equally good predictions
Group B
Criterion
Group A
Test score
18Criterion-related sources of bias
- Here, the intercepts are different and the slopes
are different, so predictions for Groups A and B
would not be equally good - Such cases are rare
Group B
Group A
X1
X2
19Criterion-related sources of bias
- Major tests, such as SAT and WISC-R, have equal
criterion validity for various ethnic groups
(e.g, African-American, White, Latino/Latina)
- Similar results have been found in other
multi-ethnic countries, such as Israel
20Other approaches to testing minority groups
- The Chitling Test
- The BITCH Test
- SOMPA
21The Chitling Test (Dove, 1968)
- Developed to make a point about testing for
information a group is unlikely to have acquired - Questions require a particular form of street
smarts to answer correctly
- No validity data exist for this test
- If you want to predict college performance for
minority students, this test wont help
22The BITCH test (Williams, 1974)
- Task define 100 words drawn from the
Afro-American Slang Dictionary and Williams'
personal experience
- African-Americans score higher than Whites
- Williams argues that this test is analogous to
the standard IQ tests, which are also
culture-bound
23The BITCH test (Williams, 1974)
- Problem there is no reason to accept the claim
that this is an intelligence test. - There is no validity evidence no prediction of
any performance
- Does not test reasoning skills
- May have some value for testing familiarity with
African-American culture
24SOMPA (Mercer, 1979)
- System of Multi-cultural Pluralistic Assessment
- Based on idea that what constitutes knowledge is
socially-constructed
- Mercer also suggested that IQ tests are a tool
Whites use to keep minority groups in their
place.
25SOMPA (Mercer, 1979)
- Inspired originally in part by over-representation
of minority group children in EMR classes in US
schools
- Mercer this over-representation resulted from
both - More medical problems
- Unfamiliar cultural references on tests
26SOMPA (Mercer, 1979)
- Fundamental assumption all cultural groups have
the same potential on average
- On this view, if one cultural group does more
poorly than another on a test, that is a fact
about the test, not the groups.
27SOMPA (Mercer, 1979)
- Combines 3 kinds of evaluation
- Medical
- Health, vision, hearing, etc.
- Social
- Entire WISC-R
- Pluralistic
- Compare WISC-R scores to those of same community
28SOMPA (Mercer, 1979)
- Estimated Learning Potentials WISC-R scores
adjusted for socio-economic background
- But these ELPs dont predict school performance
as well as the original WISC-R scores - Mercer ELPs are intended to assess who should be
in EMR classes
29SOMPA (Mercer, 1979)
- A major problem, in my view, is that we dont
know what consequences arise for children who are
removed from EMR classes on basis of ELPs
- Is what we call these children important? It is
if the label has an effect, but data do not show
that effect - SOMPA used much less today than it used to be
30Models of test Bias
- Regression
- Constant Ratio
- Cole/Darlington
- Quota
31Regression
- Basis unqualified individualism
- Treat each person as an individual, not as a
member of a group - Select people with highest scores for job or
college place
- Ignores sex, race, other group characteristics
- Leads to highest average performance on criterion
32Constant Ratio
- Basis choose so that selection ratio for groups
success ratio for groups
- Select the best candidate but give a boost to
minority group members scores so that selection
probability success probability
33Constant Ratio
- Adjust test scores for minority groups upwards by
half the mean difference between groups
- Leads to somewhat lower average performance on
criterion
34Cole/Darlington
- Basis If there is special value in selecting
minority group members, then a minority score of
Y on criterion is equal to a majority score of Y
k on criterion
- Separate regression equations used for different
groups and adjustment made - Leads to lower average performance on criterion
35Cole/Darlington
- If a value is placed on selection of minority
group members, and intercept is lower for that
group, then we consider minority test score X1
and majority test score X2 equal
k
36Quota
- Basis idea that all groups should have equal
outcomes - Selection based on different regression equations
for each group
- Produces lower average performance on criterion
37Quota
- If 10 of population is Asian then 10 of student
body should be Asian
- Another way to look at this if 10 of population
is Jewish then no more than 10 of professors
should be Jewish. - This puts the quota idea in a different light.