Title: Business Research Methods
1Business Research Methods
- Measurement and Scaling
- Noncomparative ScalingTechniques
2Noncomparative Scaling Techniques
- Respondents evaluate only one object at a time,
and for this reason noncomparative scales are
often referred to as monadic scales. - Noncomparative techniques consist of continuous
and itemized rating scales.
3Continuous Rating Scale
- Respondents rate the objects by placing a mark at
the appropriate position - on a line that runs from one extreme of the
criterion variable to the other. - The form of the continuous scale may vary
considerably. -
- How would you rate Sears as a department store?
- Version 1
- Probably the worst - - - - - - -I - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - Probably the best -
- Version 2
- Probably the worst - - - - - - -I - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - -
- - - -- - Probably the best - 0 10 20 30 40 50 60 70 80 90 100
-
- Version 3
- Very bad Neither good Very
good - nor bad
- Probably the worst - - - - - - -I - - - - - - - -
- - - - - - - - - - - - - -- - - - - - - - - - -
- - - - - -Probably the best - 0 10 20 30 40 50 60 70 80 90 100
4Itemized Rating Scales
- The respondents are provided with a scale that
has a number or brief description associated with
each category. - The categories are ordered in terms of scale
position, and the respondents are required to
select the specified category that best describes
the object being rated. - The commonly used itemized rating scales are the
- Likert,
- semantic differential, and
- Stapel scales.
5Likert Scale
- The Likert scale requires the respondents to
indicate a degree of agreement or - disagreement with each of a series of statements
about the stimulus objects. -
- Strongly Disagree Neither Agree Strongly
- disagree agree nor agree
- disagree
-
- 1. Sears sells high quality merchandise.
1 2X 3 4 5 -
- 2. Sears has poor in-store service. 1 2X 3 4 5
-
- 3. I like to shop at Sears. 1 2 3X 4 5
-
- The analysis can be conducted on an item-by-item
basis (profile analysis), or a total (summated)
score can be calculated. - When arriving at a total score, the categories
assigned to the negative statements by the
respondents should be scored by reversing the
scale.
6Semantic Differential Scale
- The semantic differential is a seven-point rating
scale with end - points associated with bipolar labels that have
semantic meaning. -
- SEARS IS
- Powerful ---------X----- Weak
- Unreliable -----------X--- Reliable
- Modern -------------X- Old-fashioned
- The negative adjective or phrase sometimes
appears at the left side of the scale and
sometimes at the right. - This controls the tendency of some respondents,
particularly those with very positive or very
negative attitudes, to mark the right- or
left-hand sides without reading the labels. - Individual items on a semantic differential scale
may be scored on either a -3 to 3 or a 1 to 7
scale.
7A Semantic Differential Scale for Measuring Self-
Concepts, Person Concepts, and Product Concepts
1) Rugged ---------------------
Delicate
2) Excitable ---------------------
Calm 3) Uncomfortable ----------------
----- Comfortable 4)
Dominating ---------------------
Submissive 5)
Thrifty ---------------------
Indulgent 6) Pleasant
--------------------- Unpleasant
7) Contemporary -----------------
---- Obsolete 8)
Organized ---------------------
Unorganized
9) Rational ---------------------
Emotional 10) Youthful
--------------------- Mature
11) Formal ---------------------
Informal 12) Orthodox
--------------------- Liberal
13) Complex ---------------------
Simple 14) Colorless
--------------------- Colorful 15)
Modest --------------------- Vain
8Stapel Scale
- The Stapel scale is a unipolar rating scale with
ten categories - numbered from -5 to 5, without a neutral point
(zero). This scale - is usually presented vertically.
-
- SEARS
-
- 5 5
- 4 4
- 3 3
- 2 2X
- 1 1
- HIGH QUALITY POOR SERVICE
- -1 -1
- -2 -2
- -3 -3
- -4X -4
- -5 -5
- The data obtained by using a Stapel scale can be
analyzed in the
9Basic Noncomparative Scales
Scale
Basic
Examples
Advantages
Disadvantages
Characteristics
Continuous
Place a mark on a
Reaction to
Easy to construct
Scoring can be
continuous line
TV
cumbersome
Rating
commercials
unless
Scale
computerized
Itemized Rating
Scales
Likert Scale
Degrees of
Measurement
Easy to construct,
More
agreement on a 1
of attitudes
administer, and
time
-
consuming
(strongly disagree)
understand
to 5 (strongly agree)
scale
Semantic
Seven
-
point scale
Brand,
Versatile
Controversy as
with bipolar labels
product, and
to whether the
Differential
company
data are interval
images
Stapel
Unipolar ten
-
point
Measurement
Easy to construct,
Confusing and
scale,
-
5 to 5,
of attitudes
administer over
difficult to apply
Scale
witho
ut a neutral
and images
telephone
point (zero)
10Summary of Itemized Scale Decisions
- 1) Number of categories Although there
is no single, optimal number, traditional
guidelines suggest that there should be
between five and nine categories - 2) Balanced vs. unbalanced In general, the
scale should be balanced to obtain objective
data - 3) Odd/even no. of categories If a neutral or
indifferent scale response is possible from
at least some of the respondents, an odd
number of categories should be used - 4) Forced vs. non-forced In situations where the
respondents are expected to have no opinion,
the accuracy of the data may be improved by a
non-forced scale - 5) Verbal description An argument can be made
for labeling all or many scale categories.
The category descriptions
should be located as close to the response
categories as possible - 6) Physical form A number of options should be
tried and the best selected
11Balanced and Unbalanced Scales
Figure 9.1
Jovan Musk for Men is Jovan Musk for Men is
Extremely good Extremely good Very
good Very good Good Good
Bad Somewhat good Very bad Bad
Extremely bad Very bad
Balanced Scale
Unbalanced Scale
12Rating Scale Configurations
A variety of scale configurations may be
employed to measure the gentleness of Cheer
detergent. Some examples include Cheer
detergent is 1) Very harsh
--- --- --- --- --- --- --- Very gentle
2) Very harsh 1 2 3 4 5 6
7 Very gentle 3) . Very
harsh . .
. Neither harsh nor gentle . .
. Very gentle 4)
____ ____ ____
____ ____ ____
____ Very Harsh
Somewhat Neither harsh Somewhat
Gentle Very harsh
Harsh nor gentle gentle
gentle 5)
Very Neither harsh Very
harsh nor gentle
gentle
Figure 9.2
Cheer
-3
-1
0
1
2
-2
3
13Some Unique Rating Scale Configurations
Thermometer Scale Instructions Please
indicate how much you like McDonalds hamburgers
by coloring in the thermometer. Start at the
bottom and color up to the temperature level that
best indicates how strong your preference is.
Form Smiling Face Scale
Instructions Please point to the face
that shows how much you like the Barbie Doll. If
you do not like the Barbie Doll at all, you would
point to Face 1. If you liked it very much, you
would point to Face 5. Form
1 2 3 4 5
Figure 9.3
Like very much
100 75 50 25 0
Dislike very much
14Thurstone Scale
- It is a two stage procedure
- In the first stage researcher selects 80 to 100
items indicating different degrees of favourable
attitude for concept under study - They are given to a group of judges to group them
into favourable disfavour able by keeping equal
intervals between categories - All items that have consensus from judges are
selected distributed uniformly on a scale of
favourability - This scale is then administered to respondents to
measure their attitude towards a particular
concept - It is time consuming costly is rarely used in
applied BR
15Measurement Accuracy
- The true score model provides a framework for
understanding the accuracy of measurement. -
- XO XT XS XR
-
- where
- XO the observed score or measurement
- XT the true score of the characteristic
- XS systematic error
- XR random error
16Systematic Error
- Lack of clarity of the scale, including the
instructions or the items themselves. - Mechanical factors, such as poor printing,
overcrowding items in the questionnaire, and poor
design.
17Random Error
- Short-term or transient personal factors, such as
health, emotions,and fatigue. - Situational factors, such as the presence of
other people, noise, and distractions.
18 Criteria for evaluating measurement
- The criteria for evaluating measurements are
- Reliability
- Validity
- Sensitivity
- Generalizability
- Relevance
19Reliability
- The degree to which measures are free from random
error and therefore yield consistent results
across time or situations. - Perfect reliability requires that there is no
random error - XR0
20Validity
- The ability of a scale to measure what was
intended to be measured. - Perfect validity requires that there is no
measurement error either systematic or random. - XRo XS0
21Relationship between validity reliability
- If a measure is perfectly valid it is also
perfectly reliable - However if a measure is perfectly reliable it may
or may not be perfectly valid - If a measure is unreliable it will not be valid
- Reliability is a necessary but not a sufficient
condition for validity
22THE GOAL OF MEASUREMENT VALIDITY and RELIABILITY
23Reliability and Validity on Target
Old Rifle New Rifle New Rifle
Sunglare Low Reliability High
Reliability Reliable but Not Low Validity High
Validity Valid (Target A) (Target B) (Target C)
24RELIABILITY
Of index measures
Repeatability
25 Types of Reliability
- There are two dimensions of reliabilityRepeatabil
ity Internal consistency - If the results of the research are the same even
when it is conducted second or third time it
confirms repeatability aspect - Test-Retest Method An approach for assessing
reliability in which respondents are administered
identical sets of scale items at two different
times under as nearly equivalent conditions as
possible - This measures repeatability since the same scale
or measure is administered to the same set of
respondents at two separate points. If the
measure is stable over time , it should obtain
similar results.(40 satisfied with jobs both
times) - However it is difficult to locate all respondents
for the second round, their attitudes may change
over time or the first measure may sensitize the
respondents -
26 Equivalent Forms Method
- An approach to assess reliability that requires
two equivalent forms of scale to be constructed
administered to the same respondents at two
different times - However it is difficult , time consuming
expensive to construct two equivalent forms of
scale
27 Internal Consistency
- This measure of reliability focuses on internal
consistency of the set of items forming the
scale. - It is used to assess reliability of a summated
scale where several items are summed to form a
total score .Each item measures some aspect of
the construct and the items should be consistent
in what they indicate about the characteristics
28 Split half Method
- Split half Method It is a method of measuring
internal consistency reliability in which the
items constituting the scale are divided into two
halves and the resulting scores of two halves are
correlated. High correlation indicates high
consistency - However results will depend on how the scale
items are split - Coefficient alpha(Cronbachs Alpha) A measure of
internal consistency reliability that is the
average of all possible split half coefficients
resulting from different splitting of the scale
items
29- Some multi item scales include several sets of
items measuring different dimensions of a
multidimensional construct. Since these
dimensions are independent a measure of internal
consistency computed across dimensions would be
inappropriate. so internal consistency
reliability can be computed for each dimension - Store image is a multidimensional construct that
includes - --- Quality of goods,
- --- variety of goods,
- ---returns policy,
- ---service ,
- ----price,
- ----location,
- ----layout
- ----billing credit policy
30Face Professional agreement that logically it
appears valid. (Subjective) Content-Depends on
established theories for support
(objective) Criterion Does it fit or correlate
with other similar measure/constructs? Body Fat
caliper, water displacement, electrical
impedance, BMI. Concurrent two measure, same
time Predictive Two measures at diff.
times. Construct - confirmed with network of
hypotheses. Convergent(High relationship with
similar concepts). and divergent or discriminant
validity (low relationship with dissimilar
concepts).
31 Face Validity
- Face Validity Subjective agreement among
professionals that a scale logically appears to
accurately measure what it is intended to
measure. Weakest form without any analysis - Face validity is concerned with how a measure or
procedure appears. Does it seem like a reasonable
way to gain the information the researchers are
attempting to obtain? Does it seem well designed?
Does it seem as though it will work reliably?
Unlike content validity, face does not depend on
established theories for support
32Content Validity
- Content Validity is based on the extent to
which a measurement reflects the specific
intended domain of content . - Researchers aim to study mathematical learning
and create a survey to test for mathematical
skill. If these researchers only tested for
multiplication and then drew conclusions from
that survey, their study would not show content
validity because it excludes other mathematical
functions. - To measure adequacies of facilities in schools
- attractiveness of school name, frequency of old
students meet. eatables in the canteen not
relevant variables - Number of classrooms, Number of qualified
teachers, playground, liabrary- relevant
variables
33 Criterion related Validity
- Criterion related validity, also referred to as
instrumental validity, is used to demonstrate the
accuracy of a measure or procedure by comparing
it with another measure or procedure which has
been demonstrated to be valid. - For example, imagine a hands-on driving test has
been shown to be an accurate test of driving
skills. By comparing the scores on the written
driving test with the scores from the hands-on
driving test, the written test can be validated
by using a criterion related strategy in which
the hands-on driving test is compared to the
written test. - New measure correlates with criterion measure
34Predictive Validity
- Predictive Validity. A type of criterion validity
whereby a new measure correlates with criterion
measure administered at a later time - In order for a test to be a valid screening
device for some future behaviour, it must have
predictive validity. The SAT is used by college
screening committees as one way to predict
college grades. The GMAT is used to predict
success in business .It measures predictive
validity . - We determine predictive validity by computing a
correlation coefficient comparing
SAT(NEW/Independent) scores, for example, and
college grades (Criterion/dependent). If they
are directly related, then we can make a
prediction regarding college grades based on SAT
score. We can show that students who score high
on the SAT tend to receive high grades in
college.
35Concurrent Validity
- A type of criterion validity whereby a new
measure correlates with a criterion measure at
the same time. - A new test of adult intelligence, for example,
would have concurrent validity if it had a high
positive correlation with the Wechsler Adult
Intelligence Scale since the Wechsler is an
accepted measure of the construct we call
intelligence. An obvious concern relates to the
validity of the test against which you are
comparing your test.
36Construct Validity
- Construct validity seeks agreement between a
theoretical concept and a specific measuring
device or procedure. For example, a researcher
inventing a new IQ test might spend a great deal
of time attempting to "define" intelligence in
order to reach an acceptable level of construct
validity. - Construct validity can be broken down into two
sub-categories Convergent validity and
discriminate validity. Convergent validity is the
actual general agreement among ratings, where
measures should be theoretically related.
Discriminate validity is the lack of a
relationship among measures which theoretically
should not be related
37- To measure Tendency to stay in low cost hotels
- Four personality variables High level of self
confidence, low need for status, low need for
distinctiveness, high level of adaptability - Not related to brand loyalty, high level of
aggressiveness - The scale can be said to have construct if it
correlates highly with other measures of tendency
to stay in low cost hotels Reported hotels
patronised and social class (convergent) - Low correlation with the unrelated constructs of
brand loyalty high level of aggressiveness
(Divergent)
38SENSITIVITY
- A measurement instruments ability to accurately
measure variability in stimuli or responses. - Yes and no agree or disagree are not very
sensitive - Strongly agree, mildly agree, indifferent, mildly
disagree, strongly disagree ,are categories whose
inclusion increases scales sensitivity
39Generizability
- It is the degree to which a study based on a
sample applies to a universe of generalization - Universe of generalization includes set of all
conditions of measurement items, interviewers,
modes of data collection etc. - To generalize a scale developed for personal
interview to other modes of data collection such
as mail, telephone etc. - To generalize from a sample of items to universe
of items
40Relevance
- It represents appropriateness of using a
particular scale for measuring a variable - Relevance Reliability x Validity
- If either reliability or validity is low then the
scale will have little relevance - If correlation coefficient is used to analyse
both reliability validity then the scale can
have relevance from 0 to 1.