Univariate and Multivariate Analysis - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Univariate and Multivariate Analysis

Description:

Chi-square test ... Chi-Square Test: (X2) For Qualitative Data: Smoker or Non-Smoker. Normotensive or Hypertensive ... will be checked by chi-square test. ... – PowerPoint PPT presentation

Number of Views:5414
Avg rating:5.0/5.0
Slides: 33
Provided by: Sai161
Category:

less

Transcript and Presenter's Notes

Title: Univariate and Multivariate Analysis


1
Univariate and Multivariate Analysis
Suresh Rathi Program Consultant The INCLEN Trust
International New Delhi 110020 suresh_at_inclentrus
t.org
2
  • We owe a lot to the Indians, who taught us how
    to count, without which no worthwhile scientific
    discovery could have been made
    Albert Einstein

3
STATISTICS
  • Defined as the
  • Collection
  • Compilation
  • Presentation
  • Analysis
  • Interpretation
  • OF DATA
  • When it applies to medical sciences-
    Bio-statistics

4
  • TYPES OF DATA

5
Data Analyses
  • Descriptive Statistics
  • Frequency Distributions and Cross -Tabulations
  • Measures of central tendency and dispersion
  • Univariate / Bivariate Analysis
  • t-tests and Analysis of Variance (ANOVA)
  • Chi-square test
  • Multivariate Analysis
  • To adjust for simultaneous effects of multiple
    factors or to control the effects of confounding
    factors on the outcome variable.

6
Descriptive analysis
  • In the first step descriptive analysis will be
    done,
  • Summarizing demographic variables by computing
    means with standard deviation for continuous
    variables and
  • Percentages for categorical variables.

7
Univariate analysis
  • t-tests and Analysis of Variance (ANOVA)
  • Chi-square test
  • Univariate logistic regression analysis will be
    conducted by comparing two variables for each
    variable of interest using odds ratio (OR) and
    their 95 confidence intervals (CI).

8
Epidemiology
Observation, measurement, analysis, correlation,
interpretation
  • .the study of Distribution and Determinants of
    diseases

How many? In whom? Where?, When?
What, How, Why
9
Definitions
  • HYPOTHESIS
  • A statement of belief used in the evaluation of
    population values
  • NULL HYPOTHESIS (Ho)
  • A claim that there is no difference b/w
    population mean (?) hypothesized value (?o)
  • ALTERNATE HYPOTHESIS (H1)
  • A claim that disagrees with the Null Hypothesis
  • TEST STATISTIC
  • A statistic used to determine the relative
    position of the mean in the hypothesized
    probability distribution of sample means.

10
Definitions
  • CRITICAL REGION (REJECTION REGION)
  • Region on the far end of the distribution
  • If only one end of the distribution is involved,
    the region is referred to as one-tailed test.
  • If both ends are involved, the region is known as
    two-tailed test.
  • When the computed value falls in the critical
    region, we reject the null Hypothesis.
  • The probability that a test statistic falls in
    the critical region is denoted by ?
  • SIGNIFICANCE LEVEL
  • Level that corresponds to the area in critical
    region.
  • When a test statistics falls in this area the
    result is called as Significant at ? level

11
Definitions
  • P-VALUE
  • Area in the tail(s) of a distribution beyond the
    value of the test statistic.
  • The probability that value of calculated test
    statistic or a more extreme one, occurred by
    chance alone is denoted by p
  • NON-REJECTION REGION
  • Region of the sampling distribution not included
    in ?. That is located under the middle portion
    of the curve.
  • Non-Rejection Region is denoted by (1- ? )
  • TEST OF SIGNIFICANCE (Hypothesis Test)
  • Procedure used to establish the validity of a
    claim by determining whether or not the test
    statistic falls in the critical region. If it
    does, the results are referred to as Significant.

12
PROCEDURE FOR TEST OF SIGNIFICANCE (STEPS)
  • I. State Null versus Alternate Hypothesis
  • Ho ? ?o
  • H1 ? ?o
  • H1 ? ? ?o, H1 ? ?o
  • II. Choose a significance Level
  • ? ?o (?o 0.05 or 0.01)
  • III. Compute the test Statistic (Z-test, t-test)
  • x ?
  • Z --------------
  • ? / n
  • x ?
  • t --------------
  • s / n

13
PROCEDURE FOR TEST OF SIGNIFICANCE (STEPS)
  • IV. Determine the critical Region
  • Which is the region of Z-distribution or
    t-distribution with ?/2 in each tail.
  • V. Reject the null Hypothesis if the test
    statistic falls in the rejection Region
  • Do not reject the null Hypothesis if it falls in
    the non-rejection Region
  • VI. State appropriate conclusion

14
t- Distribution
  • Unimodal
  • Bell Shaped
  • Symmetrical
  • Extends initially in either direction
  • An area under curve is equal to 1.0 (100)
  • Areas under curve (?) and are a function of
    quantity called degrees of Freedom (df)
  • df n-1
  • df Measures the quantity of information
    available in ones data that can be used in
    estimating the Population Variance (?2).
  • Uses
  • When population SD is not known
  • Sample size less than 25

15
EXAMPLE
  • A smog alert is issued when the amount of
    particular pollutant in the air is found to be
    greater than 7ppm. Samples collected from 16
    stations given an X of 7.84 with an S of 2.01. Do
    these findings indicate that the smog alert
    criterion has been exceeded or can the results be
    explained by chance?

16
  • 1. Ho ? ? 7.0 H1 ? gt 7.0
  • 2. ? 0.05
  • 3. Test Statistic
  • X - ? 7.84 - 7
  • t ---------------- ------------------
    1.68
  • s/ n 2.01/ 16
  • 4. Critical Region
  • Since H1 gt 7.0 indicates one tailed tests. We
    place all of ? 0.05 on the VE side.
  • From table of t distribution we find that,
  • Df 15
  • t 1.753

17
  • 5. Since calculated t 1.68 does not fall in
    critical region we do not reject Ho.
    Alternatively, we conclude the data were
    insufficient to indicate that the critical air
    pollution level of 7ppm.

18
Chi-Square Test (X2)
  • For Qualitative Data
  • Smoker or Non-Smoker
  • Normotensive or Hypertensive
  • ? ( O E )2
  • X2 ----------------
  • E
  • df Degree of Freedom
  • (c-1) (r-1) (Columun 1) (Row 1)

19
For Example
  • In a study we find that 76 out of 100 children
    treated with Vit C and 63 of 100-placebo group
    caught cold. Does the developing cold differ
    b/w the two groups.

20
  • 1. Ho The two groups are homogeneous in their
    cold developing pattern.
  • H1 The two groups are not homogeneous in their
    cold developing pattern.
  • 2. ? 0.05
  • 3. Critical Region?
  • X2 (c-1) (r-1)

21
  • 4. Test Statistic
  • ? ( O E )2
  • X2 ----------------
  • E
  • Row Total Column Total
  • Expected Value ---------------------------------
    ----
  • Grand Total

22
  • O E O-E (O-E)2 (O-E)2

  • --------------------------------------------------
    ---------------------------------
  • 76 69.5 - 6.5 42.25 0.608
  • 63 69.5 - 6.5 42.25 0.608
  • 24 30.5 6.5 42.25 1.385
  • 37 30.5 6.5 42.25 1.385
  • ?3.986

23
Multivariate analysis
  • Multiple models
  • Linear regression
  • Logistic regression
  • Cox model
  • Poisson regression
  • Loglinear model
  • Discriminant analysis
  • Choice of the tool according to the objectives,
    the study, and the variables

24
Multiple Regression
25
Multiple Regression
Regression Analysis is the
estimation of the linear relationship between a
dependent variable and one or more independent
variables or covariates.
26
Multiple Regression
  • Linear
  • Logistic
  • Independent variables
  • Dependent variable

27
Simple linear regression
  • Relation between 2 continuous variables (SBP and
    age)
  • Regression coefficient b1
  • Measures association between y and x
  • Amount by which y changes on average when x
    changes by one unit
  • Least squares method

Slope
y
x
28
Multiple linear regression
  • Relation between a continuous variable and a set
    of i continuous variables
  • Partial regression coefficients bi
  • Amount by which y changes on average when xi
    changes by one unit and all the other xis
    remain constant
  • Measures association between xi and y adjusted
    for all other xi
  • Example
  • SBP versus age, weight, height, etc

29
Multiple linear regression
  • Predicted Predictor variables
  • Response variable Explanatory variables
  • Outcome variable Covariables
  • Dependent Independent variables

30
  • Multiple Logistic Regression

31
Multivariate analysis
  • Before conducting multivariate analysis,
    association among independent variables will be
    checked by chi-square test. All the variables
    meeting the selection criteria will be entered
    one by one, starting with the highly significant
    factor from the univariate analysis.
  • Selection of final model will be based on
  • Parsimony, (good sense)
  • Biological interpretability and
  • Statistical significance.
  • The adjusted odds ratios (ORs) and their 95
    confidence intervals (CIs) will be computed using
    the estimates of parameters of final model. The
    dependent variable will be dichotomous,
  • P-values will be noted to assess the model fit.

32
THANKS
Write a Comment
User Comments (0)
About PowerShow.com