Categorical Data - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Categorical Data

Description:

1,073 subjects of both genders were recruited for a study where the onset of ... Useful in testing for independence between ... Yate's Continuity Correction ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 19
Provided by: Teo6
Category:
Tags: categorical | data | yate

less

Transcript and Presenter's Notes

Title: Categorical Data


1
Categorical Data
2
Categorical Data Analysis
  • To identify any association between two
    categorical data.

3
Chi-Square Test
  • Commonly denoted as ?2
  • Useful in testing for independence between
    categorical variables (e.g. genetic association
    between cases / controls
  • Assumptions
  • Sufficiently large data in each cell in the
    cross-tabulation table.

4
Small Cell Counts
  • In general, require(a) Smallest expected count
    is 1 or more(b) At least 80 of the cells have
    an expected count of 5 or more
  • Yates Continuity CorrectionProvides a better
    approximation of the test statistic when the data
    is dichotomous (2 ? 2)

5
Goodness-of-fit Test
  • Null hypothesis of a hypothesized distribution
    for the data.
  • Expected frequencies calculated under the
    hypothesized distribution.
  • For example The number of outbreaks of flu
    epidemics is charted over the period 1500 to
    1931, and the number of outbreaks each year is
    tabulated. The variable of interest counts the
    number of outbreaks occurring in each year of
    that 432 year period. E.g. there were 223 years
    with no flu outbreaks.

6
Goodness-of-fit Test
  • Hypotheses H0 Data follows a Poisson
    distribution with mean 0.692 H1 Data does not
    follow a Poisson distribution with mean 0.692
  • Note Mean 0.692 is obtained from the sample
    mean.
  • Expected frequency for X 0
  • 432 ? P(X 0), where X Poisson(0.692)
  • Test Statistic , with df (6 1).
  • This yields a p-value of 0.99, indicating that
    we will almost certainly be wrong if we reject
    the null hypothesis.

7
Test for Independence
  • Most common usage for Pearsons Chi-square
    statistic.
  • Expected frequencies calculated by
  • Degrees of freedom (r 1) ? (c 1)

8
Chi-Square Test
9
Chi-Square Test
10
Quantification of Effect
  • ?2-test identifies whether there is significant
    association between the two categorical
    variables.
  • But does not quantify the strength and direction
    of the association.
  • Need odds ratio to do this.
  • Odds ratio defines how many times more likely
    it is to be in one category compared to the
    other
  • Example For the previous example on severe chest
    pain, males are about 1.4 times more likely to
    experience severe chest pains than females.

11
Odds Ratio
12
Exegesis on Epidemiology
  • Case-Control Study
  • Compare affected and unaffected individuals
  • Usually retrospective in nature
  • Temporal sequence cannot be established (timing
    for the onset of the disease)
  • No information on population incidence of the
    disease
  • Cohort Study
  • Usually random sampling of subjects within the
    population
  • Prospective, retrospective or both
  • Long follow-up loss to follow-up
  • Costly to conduct
  • Temporal sequence can be established
  • Provides information on population incidence of
    the disease

13
Confidence Intervals of Odds Ratio
  • Not straightforward to obtain confidence
    intervals of odds ratio (due to complexity in
    obtaining the variance)
  • Straightforward to obtain the variance of the
    logarithm of odds ratio.
  • Odds ratio is always reported together with the
    p-values (obtained from Pearsons Chi-square
    test), and the corresponding confidence
    intervals.

14
Case Study on Lung Cancer and Smoking
Odds and Odds Ratio Odds Ratio (OR) (1301/56)/(1
205/152) 2.93 Pearsons Chi-square 47.985,
on df 1? p-value 0 Varlog(OR)
0.026 95 Confidence interval (2.14,
4.02)
15
More Examples
16
c2 TEST FOR TREND
ORsmoker 1.52 (0.88, 2.63), p
0.180 ORex-smoker 2.11 (1.00, 4.51), p
0.081 with non-smoker as reference category.
17
Procedure for Categorical Data Analysis
  • Summarise data using cross-tabulation tables,
    with percentages
  • Perform a chi-square of independence to test for
    association between the two categorical variables
  • Quantify any significant association using odds
    ratios
  • Always report odds ratios with corresponding 95
    confidence interval

18
Case Study on Lung Cancer and Smoking
  • Chi-square test statistic 46.991
  • p-value 7.13 ? 10-12
  • Odds ratio 2.93
  • 95 CI (2.14, 4.02)
Write a Comment
User Comments (0)
About PowerShow.com