Univariate and Multivariate Analysis

About This Presentation

Title:

Univariate and Multivariate Analysis

Description:

Chi-square test ... Chi-Square Test: (X2) For Qualitative Data: Smoker or Non-Smoker. Normotensive or Hypertensive ... will be checked by chi-square test. ... – PowerPoint PPT presentation

Number of Views:5415

Avg rating:5.0/5.0

Slides: 33

Provided by: Sai161

Category:

more less

Transcript and Presenter's Notes

Title: Univariate and Multivariate Analysis

1
Univariate and Multivariate Analysis
Suresh Rathi Program Consultant The INCLEN Trust
International New Delhi 110020 suresh_at_inclentrus
t.org
2

We owe a lot to the Indians, who taught us how
to count, without which no worthwhile scientific
discovery could have been made
Albert Einstein

3
STATISTICS

Defined as the
Collection
Compilation
Presentation
Analysis
Interpretation
OF DATA
When it applies to medical sciences-
Bio-statistics

TYPES OF DATA

5
Data Analyses

Descriptive Statistics
Frequency Distributions and Cross -Tabulations
Measures of central tendency and dispersion
Univariate / Bivariate Analysis
t-tests and Analysis of Variance (ANOVA)
Chi-square test
Multivariate Analysis
To adjust for simultaneous effects of multiple
factors or to control the effects of confounding
factors on the outcome variable.

6
Descriptive analysis

In the first step descriptive analysis will be
done,
Summarizing demographic variables by computing
means with standard deviation for continuous
variables and
Percentages for categorical variables.

7
Univariate analysis

t-tests and Analysis of Variance (ANOVA)
Chi-square test
Univariate logistic regression analysis will be
conducted by comparing two variables for each
variable of interest using odds ratio (OR) and
their 95 confidence intervals (CI).

8
Epidemiology
Observation, measurement, analysis, correlation,
interpretation

.the study of Distribution and Determinants of
diseases

How many? In whom? Where?, When?
What, How, Why
9
Definitions

HYPOTHESIS
A statement of belief used in the evaluation of
population values
NULL HYPOTHESIS (Ho)
A claim that there is no difference b/w
population mean (?) hypothesized value (?o)
ALTERNATE HYPOTHESIS (H1)
A claim that disagrees with the Null Hypothesis
TEST STATISTIC
A statistic used to determine the relative
position of the mean in the hypothesized
probability distribution of sample means.

10
Definitions

CRITICAL REGION (REJECTION REGION)
Region on the far end of the distribution
If only one end of the distribution is involved,
the region is referred to as one-tailed test.
If both ends are involved, the region is known as
two-tailed test.
When the computed value falls in the critical
region, we reject the null Hypothesis.
The probability that a test statistic falls in
the critical region is denoted by ?
SIGNIFICANCE LEVEL
Level that corresponds to the area in critical
region.
When a test statistics falls in this area the
result is called as Significant at ? level

11
Definitions

P-VALUE
Area in the tail(s) of a distribution beyond the
value of the test statistic.
The probability that value of calculated test
statistic or a more extreme one, occurred by
chance alone is denoted by p
NON-REJECTION REGION
Region of the sampling distribution not included
in ?. That is located under the middle portion
of the curve.
Non-Rejection Region is denoted by (1- ? )
TEST OF SIGNIFICANCE (Hypothesis Test)
Procedure used to establish the validity of a
claim by determining whether or not the test
statistic falls in the critical region. If it
does, the results are referred to as Significant.

12
PROCEDURE FOR TEST OF SIGNIFICANCE (STEPS)

I. State Null versus Alternate Hypothesis
Ho ? ?o
H1 ? ?o
H1 ? ? ?o, H1 ? ?o
II. Choose a significance Level
? ?o (?o 0.05 or 0.01)
III. Compute the test Statistic (Z-test, t-test)
x ?
Z --------------
? / n
x ?
t --------------
s / n

13
PROCEDURE FOR TEST OF SIGNIFICANCE (STEPS)

IV. Determine the critical Region
Which is the region of Z-distribution or
t-distribution with ?/2 in each tail.
V. Reject the null Hypothesis if the test
statistic falls in the rejection Region
Do not reject the null Hypothesis if it falls in
the non-rejection Region
VI. State appropriate conclusion

14
t- Distribution

Unimodal
Bell Shaped
Symmetrical
Extends initially in either direction
An area under curve is equal to 1.0 (100)
Areas under curve (?) and are a function of
quantity called degrees of Freedom (df)
df n-1
df Measures the quantity of information
available in ones data that can be used in
estimating the Population Variance (?2).
Uses
When population SD is not known
Sample size less than 25

15
EXAMPLE

A smog alert is issued when the amount of
particular pollutant in the air is found to be
greater than 7ppm. Samples collected from 16
stations given an X of 7.84 with an S of 2.01. Do
these findings indicate that the smog alert
criterion has been exceeded or can the results be
explained by chance?

1. Ho ? ? 7.0 H1 ? gt 7.0
2. ? 0.05
3. Test Statistic
X - ? 7.84 - 7
t ---------------- ------------------
1.68
s/ n 2.01/ 16
4. Critical Region
Since H1 gt 7.0 indicates one tailed tests. We
place all of ? 0.05 on the VE side.
From table of t distribution we find that,
Df 15
t 1.753

5. Since calculated t 1.68 does not fall in
critical region we do not reject Ho.
Alternatively, we conclude the data were
insufficient to indicate that the critical air
pollution level of 7ppm.

18
Chi-Square Test (X2)

For Qualitative Data
Smoker or Non-Smoker
Normotensive or Hypertensive
? ( O E )2
X2 ----------------
E
df Degree of Freedom
(c-1) (r-1) (Columun 1) (Row 1)

19
For Example

In a study we find that 76 out of 100 children
treated with Vit C and 63 of 100-placebo group
caught cold. Does the developing cold differ
b/w the two groups.

1. Ho The two groups are homogeneous in their
cold developing pattern.
H1 The two groups are not homogeneous in their
cold developing pattern.
2. ? 0.05
3. Critical Region?
X2 (c-1) (r-1)

4. Test Statistic
? ( O E )2
X2 ----------------
E
Row Total Column Total
Expected Value ---------------------------------
----
Grand Total

O E O-E (O-E)2 (O-E)2
--------------------------------------------------
---------------------------------
76 69.5 - 6.5 42.25 0.608
63 69.5 - 6.5 42.25 0.608
24 30.5 6.5 42.25 1.385
37 30.5 6.5 42.25 1.385
?3.986

23
Multivariate analysis

Multiple models
Linear regression
Logistic regression
Cox model
Poisson regression
Loglinear model
Discriminant analysis
Choice of the tool according to the objectives,
the study, and the variables

24
Multiple Regression
25
Multiple Regression
Regression Analysis is the
estimation of the linear relationship between a
dependent variable and one or more independent
variables or covariates.
26
Multiple Regression

Linear
Logistic
Independent variables
Dependent variable

27
Simple linear regression

Relation between 2 continuous variables (SBP and
age)
Regression coefficient b1
Measures association between y and x
Amount by which y changes on average when x
changes by one unit
Least squares method

Slope
y
x
28
Multiple linear regression

Relation between a continuous variable and a set
of i continuous variables
Partial regression coefficients bi
Amount by which y changes on average when xi
changes by one unit and all the other xis
remain constant
Measures association between xi and y adjusted
for all other xi
Example
SBP versus age, weight, height, etc

29
Multiple linear regression

Predicted Predictor variables
Response variable Explanatory variables
Outcome variable Covariables
Dependent Independent variables

Multiple Logistic Regression

31
Multivariate analysis

Before conducting multivariate analysis,
association among independent variables will be
checked by chi-square test. All the variables
meeting the selection criteria will be entered
one by one, starting with the highly significant
factor from the univariate analysis.
Selection of final model will be based on
Parsimony, (good sense)
Biological interpretability and
Statistical significance.
The adjusted odds ratios (ORs) and their 95
confidence intervals (CIs) will be computed using
the estimates of parameters of final model. The
dependent variable will be dichotomous,
P-values will be noted to assess the model fit.

32
THANKS

Write a Comment

User Comments (0)

About PowerShow.com

Univariate and Multivariate Analysis - PowerPoint PPT Presentation

Univariate and Multivariate Analysis

Chi-square test ... Chi-Square Test: (X2) For Qualitative Data: Smoker or Non-Smoker. Normotensive or Hypertensive ... will be checked by chi-square test. ... – PowerPoint PPT presentation