Title: Fundamentals of Research Project Planning: Hypotheses, Questions, Objectives, and Indicators.
1Introduction to Biostatistics
Dr. M. H. Rahbar Professor of Biostatistics Depart
ment of Epidemiology Director, Data Coordinating
Center College of Human Medicine Michigan State
University
2What does STATISTICS mean?
- The word Statistics has several meanings
- It is frequently used in referring to recorded
data - Statistics also denotes characteristics
calculated for a set of data, for example, sample
mean - Statistics also refers to statistical
methodology, techniques and procedures dealing
with the design of experiments, collection,
organization, analysis of the information
contained in a data set to make inferences about
the population parameters
3What do statisticians do?
- To guide the design of an experiment or survey
prior to the data collection -
- To analyze data using proper statistical
procedures and techniques - To present and interpret results to the
researchers and other decision makers including
the government and industries
4WHY STUDY STATISTICS?
- Knowledge of statistics is essential for people
going into research, management or graduate study
- Basic understanding of statistics is useful for
conducting investigations and an effective
presentation - Understanding of statistics can help anyone
discriminate between fact and fancy in daily
life - A course in statistics should help one know when,
and for what, a statistician should be consulted
5Definition of Population Sample
A population is a set of measurements of interest
to the researcher. Examples 1. Income of
households living in Karachi 2. The number of
children in families living Pakistan 3. The
health status of adults in a community A subset
of the population is called sample.
A sample is usually selected such that
it is representative of the population
6Descriptive Inferential Statistics
1. Descriptive Statistics deal with the
enumeration, organization and graphical
representation of data 2. Inferential
Statistics are concerned with reaching
conclusions from incomplete information, that is,
generalizing from the specific sample An
example of inferential statistics include using
available information about the health status of
people in a sample to draw inferences about the
underlying population from which the sample is
selected
7INFERENTIAL STATISTICS
- The objective of inferential statistics is to
make inference about the population parameters
based on the information contained in the sample.
- Estimation (e.g., Estimating the prevalence of
hypertension among adults living in Karachi) - Testing Hypothesis (e.g., Testing the
effectiveness of a new drug for reducing
cholesterol levels)
8Sources of Data
- Data may come from different sources
- Surveillance systems (e.g., NIH)
- Planned surveys (Government, Universities, NGOs)
- Experiments (Pharmaceutical Companies)
- Health Organizations (Administrative Data sets)
- Private sector (Banks, Companies, etc)
- Government (All government agencies)
- Here we will focus on surveys and
experiments - What is the difference between a survey and
an experiment?
9Difference between Surveys Experiments
A Survey Data represent observations of events or
phenomena over which few, if any, controls are
imposed. (e.g., Assessing the association
between different lifestyles and heart
disease) In an experiment we design a research
plan purposely to impose controls over the amount
of exposure (treatment) to a drug. (e.g.,
Clinical Trials)
10Sampling Methods
- Random Sampling (Simple)
- Systematic Sampling
- Stratified Sampling
- Cluster Sampling
- Convenience Sampling
- More complex sampling
11Some Epidemiologic Studies
Retrospective Studies Retrospective Studies
gather past data from selected cases and controls
to determine difference, if any, in the exposure
to a suspected factor. They are commonly
referred to as case-control studies Prospective
Studies Prospective studies are usually cohort
studies in which one enrolls a group of healthy
people and follows them over a certain period to
determine the frequency with which a disease
develops
12Qualitative and Quantitative Variables
Examples of qualitative variables are occupation,
sex, marital status, and etc Variables that
yield observations that can be measured are
considered to be quantitative variables. Examples
of quantitative variables are weight, height, and
age Quantitative variables can further be
classified as discrete or continuous
13VARIABLES TYPES
- Categorical variables (e.g., Sex, Marital Status,
income category) - Continuous variables (e.g., Age, income, weight,
height, time to achieve an outcome) - Discrete variables (e.g.,Number of Children in a
family) - Binary or Dichotomous variables (e.g., response
to all Yes or No type of questions)
14VARIABLES SCALE
- SCALE OF VARIABLE
- Nominal Scale
- Ordinal Scale
- Interval Scale
- Interval Ratio Scale
15Scale of Data
1. Nominal These data do not represent an
amount or quantity (e.g., Marital Status, Sex)
2. Ordinal These data represent an ordered
series of relationship (e.g., level of
education) 3. Interval These data is measured
on an interval scale having equal units but an
arbitrary zero point. (e.g. Temperature in
Fahrenheit) 4. Interval Ratio Variable such
as weight for which we can compare meaningfully
one weight versus another (say, 100 Kg is twice
50 Kg)
16VARIABLES IN THE PROTOCOL
- TYPES OF VARIABLE
- independent
- dependent
- intermediate
- confounding
17Independent Variable
- The characteristic being observed and/or measured
that is hypothesized to influence an event or
outcome (dependent variable). - NOTE
- The independent variable is not influenced by the
event or outcome, but may cause it or contribute
to its variation.
18Dependent Variable
- A variable whose value is dependent on the effect
of other variables (ie., independent variables)
in the relationship being studied. Synonyms
outcome or response variable. - NOTE
- an event or outcome whose variation we seek to
explain or account for by the influence of
independent variables.
19Intermediate Variable
- A variable that occurs in a causal pathway from
an independent to a dependent variable. Synonyms
intervening, mediating - NOTES
- it produces variation in the dependent variable,
and is caused to vary by the independent
variable. - such a variable is associated with both the
dependent and independent variables.
20Confounding Variable
- A factor (that is itself a determinant of the
outcome), that distorts the apparent effect of a
study variable on the outcome. - NOTE
- such a factor may be unequally distributed among
the exposed and the unexposed, and thereby
influence the apparent magnitude and even the
direction of the effect.
21Organizing Data
- Frequency Table
- Frequency Histogram
- Relative Frequency Histogram
- Frequency polygon
- Relative Frequency polygon
- Bar chart
- Pie chart
- stem-and-leaf display
- Box Plot
22Frequency Table
Suppose we are interested in studying the number
of children in the families living in a
community. The following data has been collected
based on a random sample of n 30 families from
the community. 2, 2, 5, 3, 0, 1, 3, 2, 3, 4, 1,
3, 4, 5, 7, 3, 2, 4, 1, 0, 5, 8, 6, 5, 4 , 2, 4,
4, 7, 6 Organize this data in a Frequency
Table!
23XNo. of Children Count (Freq.) Relative Freq.
0 2 2/300.067
1 3 3/300.100
2 5 5/300.167
3 5 5/300.167
4 6 6/300.200
5 4 4/300.133
6 2 2/300.067
7 2 2/300.067
8 1 1/300.033
24(No Transcript)
25Frequency Table
Now suppose we need to construct a similar
frequency table for the age of patients with
Heart related problems in a clinic. The
following data has been collected based on a
random sample of n 30 patients who went to the
emergency room of the clinic for Heart related
problems. The measurements are 42, 38, 51,
53, 40, 68, 62, 36, 32, 45, 51, 67, 53, 59, 47,
63, 52, 64, 61, 43, 56, 58, 66, 54, 56, 52, 40,
55, 72, 69.
26Age Groups Frequency Relative Frequency
32 -36.99 2 2/300.067
37- 41.99 3 3/300.100
42-46.99 4 4/300.134
47-51.99 3 3/300.100
52-56.99 8 8/300.267
57-61.99 3 3/300.100
62-66.99 4 4/300.134
67-72 3 3/300.100
Total n30 1.00
27Measures of Central Tendency
Where is the heart of distribution? 1. Mean
2. Median 3. Mode
28Sample Mean
The arithmetic mean (or, simply, mean) is
computed by summing all the observations in the
sample and dividing the sum by the number of
observations. For a sample of five household
incomes, 6000, 10,000, 10,000, 14000, 50,000 the
sample mean is,
29Sample Median
In a list ranked from smallest measurement to the
highest, the median is the middle value In our
example of five household incomes, first we rank
the measurements 6,000, 10,000, 10,000,
14,000, 50,000 Sample Median is 10,000
30Measures of Dispersion or Variability
- Range
- Variance
- Standard deviation
31Formula for Sample Variance Standard deviation
S
Standard deviation S
32Calculation of Variance and Standard deviation
33Empirical Rule
- For a Normal distribution approximately,
-
- a) 68 of the measurements fall within one
standard deviation around the mean - b) 95 of the measurements fall within two
standard deviations around the mean - c) 99.7 of the measurements fall within three
standard deviations around the mean
34Suppose the reaction time of a particular drug
has a Normal distribution with a mean of 10
minutes and a standard deviation of 2 minutes
- Approximately,
- a) 68 of the subjects taking the drug will have
reaction tome between 8 and 12 minutes - b) 95 of the subjects taking the drug will have
reaction tome between 6 and 14 minutes - c) 99.7 of the subjects taking the drug will
have reaction tome between 4 and 16 minutes