Title: Stat 501 Spring 2004
1Stat 501 Spring 2004
- Go through intro doc
- Homework 1
- Send me an email with
- Name (full and what I should call you)
- Major / year
- At least 1 thing you want to learn in this course
- Any stats background?
- Anything else you want to tell me
2Data Example
- A small Gestational Age / Birthweigtht dataset.
- 24 Babies
- 12 boys and 12 girls
- Assume this is a representative sample for the
population of interest - Data
- Gestational Age (weeks)
- Birthweight (grams)
- Gender (1male, 2female)
3(No Transcript)
4Two types of data
- Qualitative
- qualities / not able to be ordered ex gender
- Quantitative s
- Discrete
- weeks of gestational age
- possible values correspond to integers (or a
subset of the integers) - Continuous
- Birth weight
- possible values correspond to real numbers
(between any 2 numbers, a third is possible)
5Histograms A summary of the distribution of
quantitative data
Histogram of birth weight
8/24
6/24
4/24
Probability
2/24
0
2500
2700
2900
3100
3300
3500
birth weight (g)
6Histograms A summary of the distribution of
quantitative data
- Divide range of data into bins of equal width.
- Each bin gets a bar with a height proportional to
the number of data points in the bin. - Example height of bar above the number 2900 is
0.333 33.3 8/24 - 8 of babies with weight between 2800g and
3000g - 24 total of observations (n)
- Note that number of bins is subjective. See page
26 in the book.
7More about histograms
- Histograms show the shape or distribution of
quantitative data - Skewed to the left long left tail
- Gestational age at birth for all babies (some are
premature, but almost none are more than 42
weeks) - Skewed to the right long right tail
- Symmetric
- Unimodal one peak, bimodal two peaks
8Histograms also have a probability interpretation
- Choose one point from the dataset.
- The probability that it falls in any particular
bin is proportional to the corresponding bars
height. - Note that probabilities are in the interval from
0 to 1.
9Histograms also have a probability interpretation
- Important Concept
- Histograms are based on samples from a true
population. - They estimate the probabilities described above.
- As the sample size (n) increases, the estimates
are better guesses of the true population
behavior. - Histograms are estimates of a function
- Input bin location, Output probability
- We call this function the distribution
- Whats an estimate of the probability that a new
baby weighs 3kg or less?
10Numerical Summaries for Quantitative Data
- Let x1,,xn be the dataset
- Measures of the center of histograms.
- Sample mean
- X x bar (x1xn)/n
- m true mean (mu) of the full population.
This is unknown. - x bar estimates m
- Median
- Value where 50 are smaller and 50 are larger.
- Median is also an estimate an unknown true
quantity. - (PIR example)
11Median versus mean
- They tend to be similar if the data are fairly
symmetric - Median is less sensitive to extreme and
anomalous observations (outliers) than the
median. - Example 400 graduates
- 399 of them make 40,000 a year
- 1 is a starting pitcher and makes 10 million
- Mean 64,900
- Median 40,000
12Numerical SummariesMeasure of spread of
histogram
- Measure 1 Range largest x smallest x
- Measure 2 Sample Variance
- s2 (x1 xbar)2 (xn xbar)2 / (n-1)
- average squared variation around the mean
- Sample standard devation s sqrt(s2)
- s2 estimates a true variance s2
- s estimates a true standard devation s
- What does standard deviation mean?
13Meaning of standard deviation
- When distribution roughly has a bell curve
shape, then - about 68 of the data are within /- 1 standard
deviation of the mean - about 95 of the data are within /- 2 standard
deviations of the mean
14Why well care
3024
2911
Birthweight (g)
Female Babies
Male Babies
Example of kind of question well want to
answerIs the true mean birth weight for male
and female babies different?(Answer depends of
the variability of birthweight.)