Title: Ethical Standards for Selecting Tests to Assess Educational Abilities and Needs
1Ethical Standards for Selecting Tests to Assess
Educational Abilities and Needs
- Dr. Mary E. Stafford
- Chair, ISPA Ethics Committee
- Editor, WorldGoRound
2Overview
- Relevant definitions
- Historic perspective on development of standards
and classification systems - Ethical standards for selecting tests
- Cultural issues to consider
- Audience participation Issues you face related
to assessment in your country
3Definition Test versus Assessment
- Test
- Procedure or method to determine presence of
phenomenon - Standard set of questions to assess knowledge,
skills, interests, or other characteristics of an
examinee - Set of operations designed to determine validity
of hypothesis - Assessment
- Overall investigation into ones functional
capacities and limitations - Some are brief
- Others are comprehensive
- (Vandenbos, 2006)
4Relevant Definitions Principles versus Standards
- Ethical principles
- Identify virtues to which practitioners strive
- Are desired, but not required
- Ethical standards
- Specify behaviors that members of the
professional organization are expected to follow - Are required to be followed
- (Koocher Keith-Spiegel, 1998)
5Example of Standards for Testing
- Test users should select tests that meet the
intended purpose and that are appropriate for the
intended test takers. - Test users should administer and score tests
correctly and fairly. - Test users should report and interpret test
results accurately and clearly. - Test users should inform test takers about the
nature of the test, test taker rights and
responsibilities, the appropriate use of scores,
and procedures for resolving challenges to
scores. - (Joint Committee on Testing Practices, 2004, pp.
5-11)
6History of Development of Classification Systems
- International List of Causes of Diseases, 1893
- International Classification of Diseases-6, 1948
- Current systems for classification
- International Statistical Classification of
Diseases and Related Health Problems-10th
Revision ICD-10, 1990 - International classification of functioning,
disability and health ICF, 2001 - (World Health Organization WHO, 2006)
7History of Development of Classification Systems
- Diagnostic and Statistical Manual of Mental
Disorders -I, 1952 (DSM) - DSM-II, 1968
- DSM-III, 1980
- DSM-III-R, 1987
- DSM-IV, 1994
- DSM-IV-TR, 2000
- American Psychiatric Association, 2007,
http//www.psych.org/research/dor/dsm/dsm_faqs/faq
81301.cfm
8Historic Perspective on Development of Standards
for Assessment
- Hippocratic Oath 400 BCE
- Codes of ethics from care giving professions
- Current codes used by school psychologists
- International School Psychology Association, 1991
- American Psychological Association, 2002
- National Association of School Psychologists,
2000 - Numerous codes of ethics for school psychologists
in various countries
9Ethical Standards for Selecting Tests
- Standards for this presentation derive primarily
from - NASP Code of Ethics
- ISPAs Code of Ethics
- American Psychological Associations Code of
Ethics - American Educational Research Association
- International Testing Commission
10(No Transcript)
11Defining the Purpose of Testing
- Have you fully defined in observable, measurable
terms the primary purpose or complaint that the
patient has and for which you will do the
assessment?
12Evaluating Available Tests or Other Assessment
Methods
- Before selecting a test or other assessment
method, have you evaluated a representative
sample of test questions and/or practice tests,
directions, answer sheets, manuals, and score
reports? - For the tests you consider using, did their
manuals adequately describe the development of
the instrument and its norming and scaling
processes?
13Evaluating available tests or other assessment
methods
- Have you evaluated the tests technical qualities
by reviewing information in the test manual,
research articles, and test reviewers? - Specifically, does the test provide
- Evidence of good reliability for measuring the
constructs to be assessed? - Information about standard errors of measurement
and confidence intervals? - Evidence of adequate validity to address the
reasons for using the test and in light of the
patients demographic qualities? - Information about norms for the comparison group
to which the patient belongs?
14Reliability
- The trustworthiness or the accuracy of a measure
- Typically is estimated based on the internal
consistency and stability of a tests scores - Internal consistency refers to the degree to
which all parts of a test measure the same
construct - Stability refers to the degree to which a test
measures the same quality at different times or
in different situations - Test-retest reliability, refers to the
consistency of scores obtained from the same
persons when tested on two or more occasions - A test is considered to be reliable if the scores
provide consistent information about a person
15Reliability
- Reliability coefficients range from 0 to 1.00
- Intelligence tests have reliability coefficients
in the high .90s - Personality tests may have reliability
coefficients in the high .70s or the low .80s - Reliability estimates
- Between .70 and .79 - fair (clinical decisions
should be supportable by other strong evidence) - Between .80 and .90 good
- Above .90 excellent
- Internal reliability estimates below .70
generally are considered to be too unstable to be
used with confidence.
16Standard Error of Measurement
- Based on tests reliability
- Estimate of the error score
- Provides a confidence interval, i.e., a number
used to determine the area around an obtained
score in which the true score lies - Report scores using confidence intervals rather
than the observed score
17Interrater Reliability
- Refers to the degree to which scores obtained
from ratings of the same behavior by two or more
independent raters are consistent - Used for nonstandardized measures
- Calculating interrater reliability
- Percentage of time the ratings agree by dividing
the number of times they agreed by the total
number of ratings - Correlate the scores from two or more ratings of
the childs life skills abilities - Tests that have higher rates of agreement or
higher correlations are more reliable
18Validity
- Refers to the extent to which test scores measure
their targeted construct(s), as well as the
extent to which they may be used meaningfully to
guide decision making - Process of test validation involves accumulating
evidence to provide a sound scientific basis for
the proposed score interpretations - Validity coefficients range from 0 to 1.00
- The higher the coefficient, the higher the
validity, and thus the greater confidence we have
in using a tests scores to make decisions
19Validity
- Types of validity
- Construct validity the extent to which a test
measures the theoretical construct it intends to
measure - Face validity the degree to which items on the
test are judged to appropriately measure the
targeted construct - Content validity the degree to which items on a
test represent the tasks, behaviors, or knowledge
of the domain of interest - Discriminative validity the degree to which a
test is able to effectively differentiate between
clinical and nonclinical samples of people who
take the test - Criterion-related validity the degree of
relationship between a new, targeted test and an
already established test that purports to measure
the same construct
20Validity
- Types of criterion-related validity
- Predictive validity when the criterion scores
are obtained at a later time - Concurrent validity when the tests are
administered at the same time - Convergent validity both tests measure the same
construct - Divergent validity the tests are measures of
different psychological constructs
21Test norms
- Information about the tests average and typical
range of scores - Likely to provide the greatest problem for school
psychologists in countries where there are no
test developers who understand the culture and
language - Consider the relevance of the norms in light of
the tests use - Look for norms that are
- Acquired recently
- Representative of the general population,
including persons - From racial/ethnic/cultural group of the child
- From the full range of socioeconomic levels
- With disabilities in proportion to their
representation in the population
22Evaluating Available Tests or Other Assessment
Methods
- Following your review, have you found the test
procedures and materials to not be potentially
offensive in content and language?
23Selecting the Best Test or Other Assessment
Method
- Have you selected a test or other assessment
method that - Addresses the needs for the assessment in light
of the tests content and skills - Is appropriate in light of the childs age,
gender, cultural/racial/ethnic and developmental
level? - Has clear, accurate, and complete psychometric
information? - Has the potential of providing information
relevant to the development or evaluation of
interventions for this child?
24Providing Accommodations for Subgroups
- If the test taker has disabilities that require
special accommodations, have you selected tests
for which modified forms and/or administration
procedures exist or can be developed? - If the test takers are members of diverse
subgroups, have you evaluated cultural learning
factors relevant to test-taking behaviors and
determined to the extent feasible which
performance differences are likely to be caused
by factors related to culture rather than skills
being assessed?
25Evaluating Your Administration Skills For The
Selected Test
- Do you have the appropriate knowledge, skills,
and training to properly administer the selected
assessment method?
26Cultural Issues to Consider
- Standards for ethnically-appropriate test
selection - Select tests that are fair to all test takers
- Eliminate language, symbols, words, phrases, and
content that generally are regarded as offensive
by members of racial, ethnic, gender, or other
groups, except when judged to be necessary for
adequate representation of the domain - Minimize the linguistic or reading demands of the
test to a level necessary for the valid
assessment when the level of linguistic or
reading ability is critical to the assessment - Describe linguistic modifications and a rational
for the modifications in detail in the test
manual - Biases from our own cultural learning
- Differences in world view
27Cultural Issues to Consider - Summary
- Norms may not be available for group of the child
- Translators may not be trained in test
administration - Test developers may not be willing to expend
money without getting a return on it - Interpretation of test data may be difficult
- It is difficult to recognize our own biases and
we tend to think the way we view the world is the
same as others view it
28Audience Participation
- What problems do you face related to assessment
of children in your country? - What steps have you taken to try to deal with the
problems you face in this area? - What solutions would you recommend be considered
in solving these problems?