Action Research Measurement Scales and Descriptive Statistics - PowerPoint PPT Presentation

1 / 70
About This Presentation
Title:

Action Research Measurement Scales and Descriptive Statistics

Description:

Need a long set of measurements for one project, and/or many ... Kurtosis. Coefficient of variation. Descriptive Statistics. INFO 515. Lecture #2. 38. Mean ... – PowerPoint PPT presentation

Number of Views:335
Avg rating:3.0/5.0
Slides: 71
Provided by: gle9
Category:

less

Transcript and Presenter's Notes

Title: Action Research Measurement Scales and Descriptive Statistics


1
Action ResearchMeasurement Scales and
Descriptive Statistics
  • INFO 515
  • Glenn Booker

2
Measurement Needs
  • Need a long set of measurements for one project,
    and/or many projects to examine statistical
    trends
  • Could use measurements to test specific
    hypotheses
  • Other realistic uses of measurement are to help
    make decisions and track progress
  • Need scales to make measurements!

3
Measurement Scales
  • There are four types of measurement scales
  • Nominal
  • Ordinal
  • Interval
  • Ratio
  • Completely optional mnemonic to remember the
    sequence, I think of NOIR like in the
    expression film noir (noir is French for
    black)

4
Nominal Scale
  • A nominal (name) scale groups or classifies
    things into categories, which
  • Must be jointly exhaustive (cover everything)
  • Must be mutually exclusive (one thing cant be
    in two categories at once)
  • Are in any sequence (none better or worse)
  • So a nominal variable is putting things into
    buckets which have no inherant order to them

5
Nominal Scale
  • Examples include
  • Gender (though some would dispute limitations of
    only male/female categories)
  • Dewey decimal system
  • The Library of Congress system
  • Academic majors
  • Makes of stuff (cars, computers, etc.)
  • Parts of a system

6
Ordinal Scale
  • This measurement ranks things in order
  • Sequence is important, but the intervals between
    ranks is not defined numerically
  • Rank is relative, such as greater than or less
    than
  • E.g. letter grades, urgency of problems, class
    rank, inspection ratings
  • So now the buckets were using have some sense or
    order or direction

7
Interval Scale
  • An interval scale measures quantitative
    differences, not just relative
  • Addition and subtraction are allowed
  • E.g. common temperature scales (F or C), a
    single date (Feb 15, 1999), maybe IQ scores
  • Let me know if you find any more examples
  • A zero point, if any, is arbitrary (90 F is
    not six times hotter than 15 F!)

8
Ratio Scale
  • A ratio scale is an interval scale with a
    non-arbitrary zero point
  • Allows division and multiplication
  • The best type of scale to use, if possible
  • E.g. defect rates for software, test scores,
    absolute temperature (Kelvin or Rankine), the
    number or count of almost anything, size, speed,
    length,

9
Summary of Scales
  • Nominal
  • names different categories, not ordered, not
    ranked Male, Female, Republican, Catholic..
  • Ordinal
  • Categories are ordered Low, High, Sometimes,
    Never,
  • Interval
  • Fixed intervals, no absolute zero IQ,
    Temperature
  • Ratio
  • Fixed intervals with an absolute zero point Age,
    Income, Years of Schooling, Hours/Week, Weight
  • Age could be measured as ratio (years), ordinal
    (young, middle, old), or nominal (baby boomer,
    gen X)
  • Scale of measurement affects (may determine) type
    of statistics that you can use to analyze the data

10
Scale Hierarchy
  • Measurement scales are hierarchicalratio
    (best) / interval / ordinal / nominal
  • Lower level scales can always be derived from
    data which uses a higher scale
  • E.g. defect rates (a ratio scale) could be
    converted to High, Medium, Low or Acceptable,
    Not Acceptable (ordinal scales)

11
Reexamine Central Tendencies
  • If data are nominal, only the mode is meaningful
  • If data are ordinal, both median and mode may be
    used
  • If data are ratio or interval (called scale in
    SPSS), you may use mean, median, and mode

12
Reexamine Variables
  • Discrete variables use counting units or specific
    categories
  • Example makes of cars, grades,
  • Use Nominal or Ordinal scales
  • Continuous Integer or Real Measurements
  • Example IQ Test scores, length of a table, your
    weight, etc.
  • Use Ratio or Interval scales

13
Refine Research Types
  • Qualitative Research tends to use Nominal and/or
    Ordinal scale variables
  • Quantitative Research tends to use Interval
    and/or Ratio scale variables

14
Frequency Distributions
  • Frequency distributions describe how many times
    each value occurs in a data set
  • They are useful for understanding the
    characteristics of a data set
  • Frequencies are the count of how many times each
    possible value appears for a variable (gender
    male, or operating system Windows 2000)

15
Frequency Distributions
  • They are most useful when there is a fixed and
    relatively small number of options for that
    variable
  • Theyre harder to use for variables which are
    numbers (either real or integer) unless there are
    only a few specific options allowed (e.g. test
    responses 1 to 5 for a multiple choice question)

16
Generating Frequency Distributions
  • Select the command Analyze / Descriptive
    Statistics / Frequencies
  • Select one or more Variable(s)
  • Note that the Frequency (count) and percent are
    included by default other outputs may be
    selected under the Statistics... button
  • A bar chart can be generated as well using the
    Charts button see another way later

17
Sample Frequency Output
18
Analysis of Frequency Output
  • The first, unlabeled column has the values of
    data here, it first lists all Valid values
    (there are no Invalid ones, or it would show
    those too)
  • The Frequency column is how many times that value
    appears in the data set
  • The Percent column is the percent of cases with
    that value in the fourth row, the value 15
    appears 116 times, which is 24.5 of the 474
    total cases (116/474100 24.5)

19
Analysis of Frequency Output
  • The Valid Percent column divides each Frequency
    by the total number of Valid cases ( Percent
    column if all cases valid)
  • The Cumulative Percent adds up the Valid Percent
    values going down the rows so the first entry is
    the Valid Percent for first row, the second entry
    is from 11.2 40.1 51.3, next is 51.3 1.3
    52.5 and so on

20
Generating Frequency Graphs
  • Frequency is often shown using a bar graph
  • Bar graphs help make small amounts of data more
    visible
  • To generate a frequency graph alone
  • Click on the Charts menu and select Bar
  • Leave the Simple graph selected, and leave
    Summaries are for groups of cases selected
    click the Define button

21
Generating Frequency Graphs
  • Let the Bars Represent remain N of cases
  • Click on variable Educational Level (years) and
    move it into the Category Axis field
  • Click OK
  • You should get the graph on the next
    slide.Notice that the text below the X axis is
    the Label for the Category Axis.

22
Sample Frequency Output
Notice that the exact same graph can be generated
from Frequencies, or just as a bar graph
23
Frequency Distributions
  • A frequency distribution is a tabulation that
    indicates the number of times a score or group of
    scores occurs
  • Bar charts best used to graph frequency of
    nominal ordinal data
  • Histograms best used to display shape of interval
    ratio data

24
Frequency Distribution Example
SPSS for Windows, Student Version
25
Basic Measures - Ratio
  • Used for two exclusive populations (every case
    fits into one OR the other)
  • Ratio ( of testers) / ( of developers)
  • E.g. tester to developer ratio is 14

26
Proportions and Fractions
  • Used for multiple (gt 2) populations
  • Proportion (Number of this population)
    / (Total number of all populations)
  • Sum of all proportions equals unity (one)
  • E.g. survey results
  • Proportions are based on integer units
  • Fractions are based on real numbered units

27
Percentage
  • A proportion or fraction multiplied by 100
    becomes a percentage
  • Only report percentages when N (total population
    measured) is above 30 to 50 and always provide
    N for completeness
  • Why? Otherwise a percentage will imply more
    accuracy than the data supports
  • If 2 out of 3 people like something, its
    misleading to report that 66.667 favor it

28
Percents
  • Percent the percentage of cases having a
    particular value.
  • Raw percent divide the frequency of the value
    by the total number of cases (including missing
    values)
  • Valid percent calculated as above but excluding
    missing values

29
Percent Change
  • The percent increase in a measurement is the new
    value, minus the old one, divided by the old
    value negative means decrease increase (new
    - old) / old
  • The percent change is the absolute value of the
    percent increase or decrease change
    increase

30
Percent Increase
  • Later Value Earlier Value Earlier Value
  • So if a collection goes from 50,000 volumes in
    1965 to 150,000 in 1975, the percent increase
    is
  • 150,000-50,000 2 200 50,000
  • Always divide by where you started

Carpenter and Vasu, (1978)
31
Percentiles
  • A percentile is the point in a distribution at or
    below a given percentage of scores.
  • The median is the 50 percentile
  • Think of the SAT scores - what percentile were
    you for verbal, math, etc. - means what percent
    of people did worse than you

32
Rate
  • Rate conveys the change in a measurement, such as
    over time, dx/dt. Rate ( observed events) / (
    of opportunities)constant
  • Rate requires exposure to the risk being measured
  • E.g. defects per KSLOC (1000 lines of code) (
    defects)/( of KSLOC)1000

33
Exponential Notation
  • You might see output of the form 2.78E-12
  • The E means times ten to the power of
  • This is 2.78 10-12 (2.7810-12)
  • A negative exponent, e.g. 12, makes it a very
    small number
  • 10-12 0.000000000001
  • 1012 1,000,000,000,000
  • The leading number, here 2.78, controls whether
    it is a positive or negative number

34
Exponential Notation
51012 (a positive number gtgt1)
Pos.
510-12 (a positive number ltlt1)
0
-510-12 (a negative number ltlt1)
Neg.
-51012 (a negative number gtgt1)
35
Precision
  • Keep your final output to a consistent level of
    precision (significant digits)
  • Dont report one value as 12 and another as
    11.86257523454574123
  • Pick a level of precision to match the accuracy
    of your inputs (or one digit more), and make sure
    everything is reported that way consistently
    (e.g. 12.0 and 11.9)

36
Data Analysis
  • Raw data is collected, such as the dates a
    particular problem was reported and closed
  • Refined data is extracted from raw data, e.g. the
    time it took a problem to be resolved
  • Derived data is produced by analyzing refined
    data, such as the average time to resolve problems

37
Descriptive Statistics
  • Descriptive statistics describes the key
    characteristics of one set of data (univariate)
  • Mean, median, mode, range (see also last week)
  • Standard deviation, variance
  • Skewness
  • Kurtosis
  • Coefficient of variation

38
Mean
  • A.k.a. Average Score
  • The mean is the arithmetic average of the scores
    in a distribution
  • Add all of the scores
  • Divide by the total number of scores
  • The mean is greatly influenced by extreme scores
    they pull it off center

39
Mean Calculation
HOLDINGS IN 7 DIFFERENT LIBRARIES X Mean
?X N 7400 6500 39200
5600 6200 7 5900
5100 4300 Here, sum every data value 3800 ?
X 39200
40
Mean with a Frequency Distribution
X (IQ) FFreq FX FX 140 2 280 135 1 135 1
32 2 264 130 1 130 128 1 128 126 1 126 125
4 500 123 1 123 120 4 480 110 3 330 101
1 101 21 2597 Mean
?FX 2597 123.67 124 (round off)
N 21 N SF
41
Central Tendency Example
Staff Salaries 4100 6000 6000 Mode
6000 6000 8000 Median 9 1 5th
value 8000 9000
2 10000 11000 Mean ?X 80100
8900 20000 N 9
Carpenter and Vasu, (1978)
42
Handling Extreme Values
  • In cases where you have an extreme value (high or
    low) in a distribution, it is helpful to report
    both the median and the mean
  • Reporting both values gives some indication
    (through comparison) of a skewed distribution

43
Measures of Variation
  • Measures which indicate the variation, or spread
    of scores in a distribution
  • Range (see last week)
  • Variance
  • Standard Deviation

44
Standard Deviation, Variance
  • Standard deviation is the average amount the data
    differs from the mean (average)SD ?( S
    (Xi-X)2 / (N-1) )SD ?( Variance )
  • Variance is the standard deviation
    squaredVariance S (Xi-X)2 / (N-1)
  • per ISO 3534-1, para 2.33 and 2.34

45
Standard Deviation
  • The standard deviation is the square root of the
    variance. It is expressed in the same units as
    the original data.
  • Since the variance was expressed squared units
    it doesnt make much practical sense. For
    example, what are squared books or squared
    man-hours?

46
Computing the VarianceS2 ?(X Mean)2
N
  • 1. Subtract the mean from each score
  • 2. Square the result
  • 3. Sum the squares for all data points
  • 4. Divide by the N of cases

47
Divide by N or N-1???
  • Youll see different formulas for variance and
    standard deviation some divide by N, some by
    N-1 (e.g. slides 43 and 45) why?
  • If your data covers the entire population (you
    have all of the possible data to analyze), then
    divide by N
  • If your data covers a sample from the population,
    divide by N-1

48
Standard Deviation for Freq Dist.
X F FX X2 FX2 17 2 34 289 578 16 4 64 256
1024 14 5 70 196 980 10 2 20 100 200 9 3 27
81 243 6 1 6 36 36 221
3061 s v (?FX2 (?FX)2/N) v
(3061- (221)2/17) N
17 v ((3061- 2873)/17) 3.3 Notice
that FX2 is F(X2), not (FX)2
Standard Deviation of Bookmobile Distribution
49
Std Dev Reflects Consistency
Distance from Target
Frequency In Meters Battery A
Battery B 200 2 0 150 4 1
100 5 5 50 7
10 0 9 13
-50 7
10 -100 5 5 -150 4 1 -200 2
0 Mean 0 Mean 0 Standard D.
Standard D. 102.74 65.83
Runyon and Haber (1984)
50
Standard Deviation vs. Std. Error
  • To be precise, the standard error is the standard
    deviation of a statistic used to estimate a
    population parameter per ISO 3534-1, para 2.56
    and 2.50
  • So standard error pertains to sample data, while
    standard deviation should describe the entire
    population
  • We often use them interchangeably ?

51
Skewness
  • Skewness is a measure of the asymmetry of a
    distribution.
  • The normal distribution is symmetric, and has a
    skewness value of zero.
  • A distribution with a significant positive
    skewness has a long right tail
  • Positive skewness means the mean and median are
    more positive than the mode (the peak of the
    distribution)
  • Negative skewness has a long left tail.

52
Skewness
  • As a rough guide, a skewness magnitude more than
    two (gt2 or lt-2) is taken to indicate a
    significant departure from symmetry

From www.riskglossary.com
53
Kurtosis
  • Kurtosis is a measure of the extent to which data
    clusters around a central point
  • For a normal distribution, the value of the
    kurtosis is 3
  • The kurtosis excess ( kurtosis-3) is zero for a
    normal distribution
  • Positive kurtosis excess indicates that the data
    have longer tails than normal
  • Negative kurtosis excess indicates the data have
    shorter tails

54
Kurtosis
tail
The curve on the right has higher kurtosis than
the curve on the left. It is more peaked at the
center, and it has fatter tails. If a
distributions kurtosis is greater than 3, it is
said to be leptokurtic (sharp peak). If its
kurtosis is less than 3, it is said to be
platykurtic (flat peak). They might have equal
standard deviation. Mesokurtic is the normal
curve, which has kurtosis 3.
From www.riskglossary.com
55
Skewness Kurtosis Example
  • From the Employee data set, use Analyze /
    Descriptive Statistics / Descriptives, select the
    salary variable
  • Under Options, select Skewness and Kurtosis
  • Skewness is 2.125, so there is significant
    positive skewness to the data
  • Kurtosis is 5.378, so the data is leptokurtic

56
Coefficient of Variation
  • The coefficient of variation (CV) is the ratio of
    the standard deviation to the meanCV s/m
    per ISO 3534-1, para 2.35
  • Smaller CV means the more representative the mean
    is for the total distribution
  • Can compare means and standard deviations of two
    different populations
  • Higher CV means more variability

57
Coefficient of Variation
  • Divide the standard deviation by the mean to get
    CV. CV s/m
  • The smaller the decimal fraction this produces,
    the more representative is the mean for the total
    distribution
  • The larger the decimal fraction, the worse job
    the mean does of giving us a true picture of the
    distribution

58
Generating a Histogram
  • Frequency graphs can be generated for variables
    which have many integer or real values (e.g.
    salary), by using a histogram
  • A histogram shows how many data points fall into
    various ranges of values
  • The closest normal curve can be shown for
    comparison

59
Generating a Histogram
  • The ¾ rule is helpful for histograms
  • The tallest bar should be ¾ of the height of the
    Y axis
  • Be sure to label X and Y axes appropriately
  • The each bar shows how many data points fall
    within a range of X axis values
  • See How to Lie with Statistics, by Darrell Huff

60
Histogram of Salary
61
Another Note on Histograms
  • SPSS will define its own bar widths for a
    histogram, e.g. how wide the range of salary
    values is for each bar
  • Later in the course, well look at how you can
    define your own variables to make predefined
    histograms bars

62
Pie Chart Histogram
  • A histogram can also be made in the shape of a
    pie
  • This should be limited to variables with a small
    number of possible values

63
A bad pie chart histogram
(I had to include this one just because its
colorful)
64
This is a better example
This visually implies the percentages of data in
each value.
65
Bookmobile Data
Bookmobile examples taken from Carpenter and
Vasu, (1978) Same data as used on slides 48 66.
66
Bookmobile Distributions
67
HISTOGRAM OF BOOKMOBILE STOPS
F
68
Normalizing Data
  • Some data sets are not very close to a normal
    distribution
  • Sometimes it helps to transform the independent
    variable by applying a math function to it, such
    as looking at log(x) (the logarithm of each x
    value) instead of just x

69
Normalizing Data
  • In SPSS this can be done by defining a new
    variable, such as log_x
  • Then use Transform / Compute to calculate log_x
    LG10(x) assuming that x is the original
    variable
  • Then generate a histogram showing the normal
    curve, to see if log_x is closer to a normal
    distribution

70
Normalizing Data
  • Who cares if we have a normal distribution?
  • Many tests in statistics can only be applied to a
    variable which has a normal distribution so
    its worth our while to transform the variable
Write a Comment
User Comments (0)
About PowerShow.com