Stat 501 Spring 2004 - PowerPoint PPT Presentation

About This Presentation
Title:

Stat 501 Spring 2004

Description:

Histograms: A summary of the distribution of quantitative data ... Numerical Summaries for Quantitative Data. Let x1,...,xn be the dataset ... – PowerPoint PPT presentation

Number of Views:10
Avg rating:3.0/5.0
Slides: 15
Provided by: johnstau
Category:
Tags: spring | stat | summaries

less

Transcript and Presenter's Notes

Title: Stat 501 Spring 2004


1
Stat 501 Spring 2004
  • Go through intro doc
  • Homework 1
  • Send me an email with
  • Name (full and what I should call you)
  • Major / year
  • At least 1 thing you want to learn in this course
  • Any stats background?
  • Anything else you want to tell me

2
Data Example
  • A small Gestational Age / Birthweigtht dataset.
  • 24 Babies
  • 12 boys and 12 girls
  • Assume this is a representative sample for the
    population of interest
  • Data
  • Gestational Age (weeks)
  • Birthweight (grams)
  • Gender (1male, 2female)

3
(No Transcript)
4
Two types of data
  • Qualitative
  • qualities / not able to be ordered ex gender
  • Quantitative s
  • Discrete
  • weeks of gestational age
  • possible values correspond to integers (or a
    subset of the integers)
  • Continuous
  • Birth weight
  • possible values correspond to real numbers
    (between any 2 numbers, a third is possible)

5
Histograms A summary of the distribution of
quantitative data
Histogram of birth weight
8/24
6/24
4/24
Probability
2/24
0
2500
2700
2900
3100
3300
3500
birth weight (g)
6
Histograms A summary of the distribution of
quantitative data
  • Divide range of data into bins of equal width.
  • Each bin gets a bar with a height proportional to
    the number of data points in the bin.
  • Example height of bar above the number 2900 is
    0.333 33.3 8/24
  • 8 of babies with weight between 2800g and
    3000g
  • 24 total of observations (n)
  • Note that number of bins is subjective. See page
    26 in the book.

7
More about histograms
  • Histograms show the shape or distribution of
    quantitative data
  • Skewed to the left long left tail
  • Gestational age at birth for all babies (some are
    premature, but almost none are more than 42
    weeks)
  • Skewed to the right long right tail
  • Symmetric
  • Unimodal one peak, bimodal two peaks

8
Histograms also have a probability interpretation
  • Choose one point from the dataset.
  • The probability that it falls in any particular
    bin is proportional to the corresponding bars
    height.
  • Note that probabilities are in the interval from
    0 to 1.

9
Histograms also have a probability interpretation
  • Important Concept
  • Histograms are based on samples from a true
    population.
  • They estimate the probabilities described above.
  • As the sample size (n) increases, the estimates
    are better guesses of the true population
    behavior.
  • Histograms are estimates of a function
  • Input bin location, Output probability
  • We call this function the distribution
  • Whats an estimate of the probability that a new
    baby weighs 3kg or less?

10
Numerical Summaries for Quantitative Data
  • Let x1,,xn be the dataset
  • Measures of the center of histograms.
  • Sample mean
  • X x bar (x1xn)/n
  • m true mean (mu) of the full population.
    This is unknown.
  • x bar estimates m
  • Median
  • Value where 50 are smaller and 50 are larger.
  • Median is also an estimate an unknown true
    quantity.
  • (PIR example)

11
Median versus mean
  • They tend to be similar if the data are fairly
    symmetric
  • Median is less sensitive to extreme and
    anomalous observations (outliers) than the
    median.
  • Example 400 graduates
  • 399 of them make 40,000 a year
  • 1 is a starting pitcher and makes 10 million
  • Mean 64,900
  • Median 40,000

12
Numerical SummariesMeasure of spread of
histogram
  • Measure 1 Range largest x smallest x
  • Measure 2 Sample Variance
  • s2 (x1 xbar)2 (xn xbar)2 / (n-1)
  • average squared variation around the mean
  • Sample standard devation s sqrt(s2)
  • s2 estimates a true variance s2
  • s estimates a true standard devation s
  • What does standard deviation mean?

13
Meaning of standard deviation
  • When distribution roughly has a bell curve
    shape, then
  • about 68 of the data are within /- 1 standard
    deviation of the mean
  • about 95 of the data are within /- 2 standard
    deviations of the mean

14
Why well care
3024
2911
Birthweight (g)
Female Babies
Male Babies
Example of kind of question well want to
answerIs the true mean birth weight for male
and female babies different?(Answer depends of
the variability of birthweight.)
Write a Comment
User Comments (0)
About PowerShow.com