Title: 1: Measurement and Sampling
11 Measurement and Sampling
- What is biostatistics?
- What is measurement?How do we sample populations?
2HS 167 Logistics
- Syllabus materials (text, lab workbook,
calculator) - Calendar and assignments are on
www.sjsu.edu/biostat ? click HS167 (become
familiar with Web site) - Exam1 10/9, Exam2 11/13, Final Thur 12/13
245 - Lab 0 and Lab 1 (Tu and We lab may have
additional time to complete Lab 1) - Text (reading) pp. 1 10, 15 19 (note
vocabulary on p. 11) - Exercises 1.1 1.6, 1.8, 1.9, 2.1 2.3, 2.11
2.13 due at beginning of next lecture - Yahoo group send email to hs167-F07-subscribe_at_yah
oogroups.com - Academic integrity (do your own work)
- Odd-numbered exercises and lab work ? OK to get
help from friends - Even numbered exercises exams ? do NOT get help
from friends - How to get a good grade
- Attend all classes and labs (attendance required)
- Stay on task
- Read text (listed to Nancy)
- Do Lab HWs diligently
- Do not cut corners
3Biostatistics
- is not merely a compilation of computational
techniques - is a way of learning from data
- is concerned with all many elements of study
design and analysis (not just computations) - requires more judgment than math (pay attention
to vocabulary) - is statistics applied to biological and health
problems
4Biostatistics involves
- A data detective element
- Uncovering patterns and clues
- This is a combination of exploratory data
analysis (EDA) and descriptive statistics - A data judge element
- Confirmation of clues
- This often requires inferential methods
5Measurement
- Measurement assigning of numbers and codes
according to prior-set rules - Three types of statistical measurements
- Categorical classify observations into named
(nominal) categories - e.g., HIV classified as positive or negative
- Ordinal ranked categories
- e.g., OPINION ranked 5 strongly agree, 4
agree, 3 neutral, and so on - Quantitative numbers with equal spacing
- e.g., AGE in years
- e.g., BLOOD_PRESSURE in mm Hg
6Illustrative Example Weight Change and Heart
Disease
Source Willett et al., 1995
- Goal to determine the effect of weight change on
coronary heart disease risk - 115,818 women 30- to 55-years of age, free of CHD
- Body mass index (BMI, kg/m2) determined at entry
to study - Body weight determined as of age 18
- Subjects followed for 14 years
- Number of CHD onsets (fatal and nonfatal) counted
(1292 cases)
7Illustrative Example (cont.)
Variables
- Smoker or nonsmoker
- Family history of heart disease (yes or no)
- Non-smoker, light-smoker, moderate smoker, heavy
smoker - BMI (kgs/m3)
- Age (years)
- Weight presently
- Weight at age 18
Categorical
Ordinal
Quantitative
8Variable, Value, Observation
- Observation ? the unit upon which measurements
are made - Can be an individual (e.g., a person)
- Can be an aggregate of individuals (e.g., a
region) - Variable ? the generic thing we measure
- e.g., AGE of a person
- e.g., HIV status of a person
- Value ? a realized measurement
- e.g.,27
- e.g.,positive
9Data Structure (Forms)
Observation 1
Data Collection Form Var1 (ID) 1 Var2 (AGE)
27 Var3 (SEX) F Var4 (HIV) Y Var5 (KAPOSISAR
C) Y Var6 (REPORTDATE) 4/25/89 Var7
(OPPORTUNIS) N
Observation 2
Observation 3
Observation 4
10U.S. Census Form
11Data Structure (Table)
Observations ? rowsVariables ? columnsValues ?
cells
12Illustrative Example Cigarette Consumption and
Lung Cancer
Variables country name of country/region cig193
0 per capita cigarette consumption,
1930 mortalit lung cancer deaths per 100,000 in
1950
Note Unit of observation in this data set are
regions (not people)
13Data Quality
- An analysis is only as good as its data
- GIGO garbage in, garbage out
- Does a variable measure what it purports to?
- Validity freedom from systematic error
- Objectivity seeing things as they are without
making it conform to your worldview - Discussion on avoiding bias when questioning
e.g., consider the word jam
14Ethos Which do you choose?
Blackburn, S. (2005). Truth. Oxford Univ. Press
Frankfurt, H. G. (2005). On Bullshit. Princeton
University Press
The difference is intention and method BS has a
predetermined outcome. Truth is earnest in its
intent and does not bend the facts to a
predetermined outcome.
15Truth Versus Perception
I cannot give any scientist of any age any better
advice than this The intensity of the conviction
that a hypothesis is true has no bearing on
whether it is true or not. Peter Medawar
1915-1987
Platos Allegory of the Cave We observe shadows
on the wall. The truth lies outside.
16Two Types of Statistical Studies
- Surveys quantify population characteristics
- e.g., of population that is overweight
- e.g., expected life span
- Comparative Studies determine relationships
between variables - e.g., relationship between weight gain and heart
disease risk - e.g., relationship between alcohol consumption
and esophageal cancer risk - We start by considering survey sampling
17Sampling for a Survey
- We seldom (if never) study an entire population
- Take a subset (sample) of the population
- Use characteristics of the sample to infer
population characteristics - Select a probability sample
- chance determines which individuals are selected
- Avoid non-probability samples
- Discuss volunteer bias as an example
18Simple Random Sample (SRS)
- SRS (definition) every possible sample from the
population has the same probability - this is the most basic type of probability sample
- SRSs have sampling independence
- selection of one individual does not influence
selection of any other - SRSs can be done with replacement or without
replacement (both methods are usually valid) - Sampling fraction n N probability of
selection where - n ? sample size
- N ? population size
19SRS Method
- Compile census listing (sampling frame)
- individuals numbered 1, 2, . . ., N
- Generate n random numbers between 1 and N
- Can be done with random number generator (lab) or
with table of random digits - Select individuals based on random number list
You will take a SRS in lab this week