Title: Populations and Sampling
1Lecture 2
- Populations and Sampling
- Types of variables and scales of measurement
2Populations and Sampling a. Reasons for using
samples
- There are many good reasons for studying a sample
instead of an entire population - Samples can be studied more quickly than
populations. Speed can be important if a
physician needs to determine something quickly,
such as a vaccine or treatment for a new disease. - A study of a sample is less expensive than a
study of an entire population because a smaller
number of items or subjects are examined. This
consideration is especially important in the
design of large studies that require a long
follow-up. - A study of the entire populations is impossible
in most situations. - Sample results are often more accurate than
results based on a population. - If samples are properly selected, probability
methods can be used to estimate the error in the
resulting statistics.
3Types of Sampling Methods
Samples
Probability Samples
Non-Probability Samples
Simple Random
Stratified
Consecutive
Judgemental
Cluster
Systematic
Convenience
4Sampling Methods Non-probability samples
- Depends on experts opinion,
- Probabilities of selection not considered.
- Advantages include convenience, speed, and lower
cost. - Disadvantages
- Lack of accuracy,
- lack of results generalizability.
5Sampling Methods Non-probability samples (cont)
- Consecutive sampling
- It involves taking every patient who meets the
selection criteria over a specified time interval
or number of patients. - It is the best of the nonprobability techniques
and one that is very often practical. - Judgmental sampling
- It involves hand-picking from the accessible
population those individuals judged most
appropriate for the study. - Convenience sampling
- It is the process of taking those members of the
accessible population who are easily available. - It is widely used in clinical research because of
its obvious advantages in cost and logistics.
6Probability Samples
Subjects of the sample are chosen based on known
probabilities. Guarantees that every element in
the population of interest has the same
probability of being chosen for the sample as all
other elements in the population random
selection.
Probability Samples
Simple Random
Systematic
Stratified
Cluster
7Advantages of Probability sampling methods
- The population of interest is clear (because it
must be identified before sampling from it.) - Possible sources of bias are removed, such as
self-selection and interviewer selection effects. - The general size of the sampling error can be
estimated.
8Simple Random Sampling
- Every individual or item from the target
population has an equal chance of being selected. - One may use table of random numbers or computers
programs for obtaining samples.
9How to select a simple random sample
- Define the population
- Determine the desired sample size
- List all members of the population or the
potential subjects - For example
- 4th grade boys who have demonstrated problem
behaviors - Lets select 10
10Potential Subject Pool
11So our selected subjects are numbers 10, 22, 24,
15, 6, 1, 25, 11, 13, 16.
12Systematic Sampling
- Decide on sample size n
- Divide population of N individuals into groups
of - k individuals k N/n
- Randomly select one individual from the 1st
group. - Select every k-th individual thereafter.
-
N 64 n 8 k 8
First Group
13Systematic Sampling (cont)
- Advantage The sample usually will be easier to
identify than it would be if simple random
sampling were used. - Example Selecting every 100th listing in a
telephone book after the first randomly selected
listing.
14Stratified Random Sampling
- The population is first divided into groups of
elements called strata. - Each element in the population belongs to one
and only one stratum. - Best results are obtained when the elements
within each stratum are as much alike as possible
(i.e. homogeneous group). - A simple random sample is taken from each
stratum. - Formulas are available for combining the stratum
sample results into one population parameter
estimate.
15Stratified Random Sampling (cont)
- Advantage If strata are homogeneous, this
method is as precise as simple random sampling
but with a smaller total sample size. -
- Example The basis for forming the strata might
be sex, occupation, location, age, industry type,
etc.
16Cluster Sampling
- The population is first divided into separate
groups of elements called clusters. - Ideally, each cluster is a representative
small-scale version of the population (i.e.
heterogeneous group). - A simple random sample of the clusters is then
taken. - All elements within each sampled (chosen)
cluster form the sample.
17Cluster Sampling (cont)
- Advantage The close proximity of elements can
be cost effective (I.e. many sample observations
can be obtained in a short time). - Disadvantage This method generally requires a
larger total sample size than simple or
stratified random sampling. - Example A primary application is area
sampling, where clusters are city blocks or other
well-defined areas.
18Random . . .
- Random Selection vs. Random Assignment
- Random Selection every member of the population
has an equal chance of being selected for the
sample. - Random Assignment every member of the sample
(however chosen) has an equal chance of being
placed in the experimental group or the control
group. - Random assignment allows for individual
differences among test participants to be
averaged out.
19Subject Selection (Random Selection)
Choosing which potential subjects will actually
participate in the study
20Subject Assignment (Random Assignment)
Deciding which group or condition each subject
will be part of
Group B
Group A
21Population 200 8th Graders
40 High IQ students
120 Avg. IQ students
40 Low IQ students
Random Selection
30 students
30 students
30 students
Random Assignment
15 students
15 students
15 students
15 students
15 students
15 students
Group A
Group B
Group A
Group B
Group A
Group B
22Randomization (Random assignment to two
treatments)
- Randomization tends to produce study groups
comparable with respect to known and unknown risk
factors, - removes investigator bias in the allocation of
participants - and guarantees that statistical tests will have
valid significance levels - Trialists most powerful weapon against bias
23Randomization (Cont)
- Simple randomizationToss a Coin
- AAABBAAAAABABABBAAAABAA
- Random permuted blocks (Block Randomization)
- AABB-ABBA-BBAA-BAAB-ABAB-AABB-
24Block Randomization
- Each block contains all conditions of the
experiment in a randomized order.
E, C, C, E
C, E, C, E
E, E, C, C
Control Group N 6
Experimental Group N 6
25Several ways to classify the variables
- They may be defined as
- quantitative variables
- qualitative (categorical) variables
26Quantitative variables
- Measured in the usual sense
- heights of adult males,
- weights,
- age of patients seen in a clinic.
- Measurements made on quantitative variables
convey information regarding amount
27- Quantitative variables are either
- Discrete
- only take values from some discrete set of
possible values (whole integer) - number of patients admitted to the hospital
- Continuous
- Values from a continuous range of possible
values, although the recorded measurements are
rounded - weight,
- height,
- hemoglobin levels, etc..
28Qualitative (categorical) variables
- Some characteristics are not capable of being
measured in the sense that height, weight, and
age are measured. - These characteristics are categorized only
- an ill person is given a medical diagnosis
(hepatitis, cancer, etc..) - a person is designated as belonging to an ethnic
group, - black,
- white,
- Hispanic, etc.
29Scales of measurement
- Another way to classify the variables is to
assign number to the objects or events according
to a set of rules. - These rules are the scales of measurement
- They are commonly broken down into four types
- Nominal
- Ordinal
- Interval (numerical)
- Ratio (numerical)
30Nominal scale
- Simplest level of measurement
- Data values fit into categories.
- No ordering,
- it makes no sense to state that M gt F
- Arbitrary labels,
- m/f, 0/1, etc
- Many classifications in medical research are
evaluated on a nominal scale - Outcomes of a medical treatment occurring or not
occurring - Surgical procedure types of procedures
- Presence of possible risk or exposure factors.
31Nominal scale (cont)
- Dichotomous variables
- take on only one of two values
- presence of pain (yes/no),
- sex (male/female)
- Data that can take on more than two values, as
anemia, for example, may be classified as - microcytic anemia, including iron deficiency
- macrocytic or megaloblastic anemia, including
vitamin B12 deficiency - normocytic anemia, often associated with chronic
disease. - A study examining the prognosis for patients with
lung cancer might sort the type of cancer into
several categories, such as - small cell,
- large cell,
- squamous cell.
32Nominal scale (cont)
- The easiest way to determine whether observations
are measured on a nominal scale is to ask whether
the observations are classified or placed into
categories. - Data evaluated on a nominal scale are also called
qualitative observations, because the values fit
into categories. - Nominal or qualitative data are generally
described in terms of percentages or proportions.
33Ordinal scale
- There is an inherent order among the categories
- Tumors, for example, are staged according to
their degree of development. - The international classification for staging of
carcinoma of the cervix is an ordinal scale from
0 to IV - 0 Carcinoma in situ (localized)
- I Cancer is confined to the cervix
- II Cancer extends to the upper third of the
vagina, or the tissue around the uterus, but not
the pelvic wall - III The lower third of the vagina and/or the
pelvic sidewall and possibly the kidneys are
diseased - IV Cancer has spread beyond the reproductive
tract involving the bladder or rectum, and has
invaded distant organs (most often the lungs or
liver), the bones, or other systems in the body
34Ordinal scale (cont)
- Stage IV is worse than stage 0 with respect to
prognosis - This is an inherent order
- An important characteristic of ordinal scales is
that although order exists among categories, the
difference between two adjacent categories is not
the same throughout the scale. - To illustrate, consider Apgar scores, which
describe the maturity of newborn infants on a
scale of 0 to 10, - lower scores indicating depression of
cardiorespiratory and neurologic functioning - higher scores indicating good cardiorespiratory
and neurologic functioning - difference between a score of 8 and a score of 10
is probably not of the same magnitude, - as the difference between a score of 0 and a
score of 2. - As with nominal scales, percentages and
proportions are often used with ordinal scales.
35Numerical scale
- Observations in which the difference between
numbers has meaning on a numerical scale - Also called quantitative observations, because
they measure the quantity of something. - There are two types of numerical scales
- A continuous scale has values on a continuum
- age
- a discrete has values equal to integers
- number of fractures, number of admissions
36Numerical scale (cont)
- If only a certain level of precision is required,
continuous data may be reported to the closest
integer. The important point, however, is that
more precise measurement is possible, at least
theoretically. - For example, the age of a group of patients can
be any value between zero and the age of the
oldest patient, i.e. age can be specified as
precisely as necessary. - In studies of adults, age to the nearest year
will generally suffice. - For younger children, age to the nearest month is
better. - In infants, age to the nearest hour or even
minute may be appropriate, depending on the
purpose of the study. - Other examples of continuous data include height,
weight, and length of time of survival, range of
joint motion and many laboratory values, such as
serum glucose, sodium, potassium or uric acid.
37Numerical scale (cont)
- Characteristics measured on numerical scale are
- displayed in tables and graphs
- summarize as means and standard deviations