Categorical Data - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Categorical Data

Description:

Goodness-of-Fit. The chi-square test is used to test if a sample of data came ... The chi-square goodness-of-fit test can be applied to discrete distributions ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 40
Provided by: richardu
Category:
Tags: categorical | data

less

Transcript and Presenter's Notes

Title: Categorical Data


1
Categorical Data
2
Overview
3
Outline
4
Population Proportion
  • Let p denote the proportion of entities in a
    population with a specified property. Tests
    concerning p will be based on a random sample of
    size n from the population. Provided n is small
    relative to the population the number of
    successes, those with the property of interest,
    has an approximately binomial distribution.
  • If n is large then estimator for p, X/n, is
    approximately normally distributed.

5
Test Procedure
Test procedure valid provided np010 and
n(1-p0)10
6
Binomial Experiment
  • The test presented is a basic binomial experiment
    where each trial has one of two possible outcomes.

7
Multinomial Experiment
  • Generalizing from the binomial experiment, we
    wish to think of cases in which there are k
    possible outcomes, kgt2.

8
Example
  • A store accepts three different types of credit
    cards. We observe the next n customers paying by
    credit card. Our null hypothesis will specify
    the expected proportion of each type credit card
    used. The test procedure must then evaluate
    whether the observed proportion is different from
    the expected values.

9
Example (cont.)
10
Example (cont.)
11
Test Procedure
  • We will look at the discrepancy between the
    observed and the expected frequencies with H0
    being rejected if the discrepancy is sufficiently
    large.
  • Clearly, we will use (observed-expected)2 as our
    measure of discrepancy.

12
Theorem
  • Provided npi5, ?i,
  • has approximately chi-squared distribution with
    k-1 df.

13
Test Procedure
14
Example
  • Allied Health Corporation owns and operates
    hospitals in the northeast. Three properties
    were audited recently to establish compliance
    with Medicare billing regulations. A random
    sample of audited billings revealed the following
    number from each hospital H1-485, H2-405, H3-310
    for a total of 1200 billings. The auditors claim
    each billing had a equal probability of being
    from each hospital. What can you conclude about
    the auditors selection at the .01 level?

15
Solution (1)
  • Let pi be the proportion of audited billings from
    each hospital (1, 2, 3). If an audited billing
    is equally likely from each hospital then pi1/3
    ?i1, 2, 3, so
  • H0 p11/3, p21/3, p31/3
  • Ha at least one proportion is incorrect
  • a.01
  • Reject H0 if c2gt c2.01, 29.21
  • (using CHIINV(0.01,2))

16
Solution (2)
17
Solution (3)
  • chisquaregtchisquare(.01,2) so we reject the null
    hypothesis, hence on the the proportions must be
    different from 1/3. Looking at the proportions,
    hospital 1 is 485/1200 .404, hospital 2 is
    405/1200.338 and hospital 3 is 310/1200.258.
    This shows us hospital 1 is over-represented
    while hospital 3 is under-represented.

18
Example
  • MetroCodex, suppliers of accounting middleware
    for hospitals, has undertaken an employee survey
    to ascertain employee perspectives on how
    management responds employee suggestions.

19
  • A management consultant predicts the outcomes as

20
  • The following data were obtained from the
    employee survey

21
Question?
  • Does the data conform to the consultants
    expectations (a.05)?

22
Solution
  • Worksheet Metro in Workbook

23
Example
  • We have 200 software modules as part of a large
    development project. We will conduct code
    inspections on all modules. For 100 of the
    modules we will use technique A which is tool
    driven while for the other 100 modules we will
    use technique B which is manual. An inspection is
    considered successful if it detects 85 of the
    known errors in the module. The success rate for
    inspections is 70.

24
Hypothesis
  • H0 tool has no effect, the number of successful
    inspections would be the same with and without
    the use of the tool
  • H1 tool has effect

25
Data
26
Analysis
  • c2 2.38
  • k2 (two group, tool and no tool)
  • vk-12-11 (degrees of freedom)
  • a .05
  • c2 .05, 1 3.84
  • Since c2 2.38 is not greater than c2 .05, 1
    3.84, we cannot reject the null hypothesis.

27
Problem
  • A particular real-time system is defined as a
    collection of three redundant processors. During
    startup the three systems, independently, have a
    probability of failure of 1/6. QA has conducted
    1000 tests of the system during startup,
    resulting in the following
  • Processors Failing Frequency
  • 0                 600       
    1                 330       
    2                  60       
    3                  10
  • Determine if the system behavior is consistent
    with expectations.

28
Solution
29
Two-way Contingency Tables (1)
  • There are I populations of interest, each
    corresponding to a different row in the table,
    and each population is divided into the same J
    categories.
  • Test for Homogenity - the proportion of
    individuals in a category is the same for each
    population and that is true for every category.

30
Two-way Contingency Tables (2)
  • There is single population of interest, with each
    individual in the population categorized with
    respect to two different factors.
  • Tests for Independence - the individuals
    placement in a factor is independent of the other
    factor.

31
Data Representation
32
Homogenity Test Procedure
33
Estimate
  • Homogenity test can be applied as long as each
    estimate5.

34
Example
  • A company packages a particular product in three
    different sizes, each size produced on a
    different production line. Most can pass
    inspection but QA has identified five categories
    of nonconformance. A sample of nonconforming
    units is selected resulting in the following data
    table

35
Data
36
Solution
  • Production in workbook
  • Note homogenity uses higher values of a for
    hypothesis testing.

37
Independence Test Procedure
38
Goodness-of-Fit
  • The chi-square test is used to test if a sample
    of data came from a population with a specific
    distribution.
  • For the chi-square goodness-of-fit computation,
    the data are divided into k bins and the test
    statistic is defined as

39
  • The chi-square test is an alternative to the
    Anderson-Darling and Kolmogorov-Smirnov
    goodness-of-fit tests. The chi-square
    goodness-of-fit test can be applied to discrete
    distributions such as the binomial and the
    Poisson. The Kolmogorov-Smirnov and
    Anderson-Darling tests are restricted to
    continuous distributions.

http//www.itl.nist.gov/div898/handbook/eda/sectio
n3/eda35f.htm
Write a Comment
User Comments (0)
About PowerShow.com