Describing Data A Single Variable The essence of descriptive statistics - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Describing Data A Single Variable The essence of descriptive statistics

Description:

Used to compare variability of some measurement in two groups when the two ... Quantitative also known as metric or numerical. A qualitative variable is called ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 26
Provided by: Medical7
Category:

less

Transcript and Presenter's Notes

Title: Describing Data A Single Variable The essence of descriptive statistics


1
Describing DataA Single VariableThe essence of
descriptive statistics
  • 1

2
Exposed versus Control
3
Exposed Subjects
4
Comparison of Two Systems
5
Shapes of distributions
  • Number of peaks unimodal (most common), bimodal,
    etc.
  • Unimodal distributions
  • Symmetrical
  • Skewed
  • Positively skewed long tail at the upper end
  • Negatively skewed long tail at the lower end

6
Skewed Distributions
7
Measures of dispersion
  • Rangethe difference between the highest and
    lowest values. Easily understood, but ignores
    all the values in between
  • Interquartile range
  • The variance (S)
  • How far each individual observation is away from
    the (arithmetic mean)

8
Degrees of freedom
  • The divisor n-1 in S2.
  • The number of independent quantities making up
    the statistics that are free to vary
  • The number of n original observations and their
    mean, which, of course, is not independent of
    them

9
Measures of dispersion
  • The standard deviation
  • The square root of variance (SD or S), also
    called the root mean square deviation
  • If one distribution is more spread out than
    another, then it has a larger standard deviation
  • Commonly used to compare the variability of some
    measurement in two groups

10
Normal Distribution
  • Also called Gaussian
  • It is not common or typical, but approximate
    normality of some distribution can be a great
    help
  • Unimodal, symmetrical and bell-shaped
    (not all such distributions are normal, though)
  • Mean, median and mode are equal in value

11
Normal Distribution
  • All its properties are characterized completely
    by its mean and standard deviation
  • i.e. two normal distributions with the same means
    and SDs are identical

12
  • Mean m
  • SD s
  • 50 below, 50 above the mean
  • 68.27 between
  • m-s and ms
  • 95 within m1.96s
  • 2.5 in each of the two tails

13
Three different distributions
  • 1. The distribution of the variables in the
    sample
  • e.g. 100 patients observed, mean
  • 2. Underlying distribution in the population of
    all such patients
  • we dont know what this distribution looks
    like, but assume that the sample was
    representative, to estimate population mean m
  • 3. The sampling distribution of the mean
  • theoretical distribution

14
Standard Error
  • Examine the distribution of all the possible
    means from a sample of a given size n
  • All possible sample means have a normal
    distribution, and their mean is equal to the
    unknown population mean m
  • The standard deviation, called the standard error
    of the mean is equal to

15
  • Normal distributions are rare in medical
    applications
  • Non-normality is not a major problem
  • As long as the distribution of the original
    variable is not very skewed, the sampling
    distribution of its mean is approximately normal
  • This approximation improves as the sample size
    increases
  • We can still use normal distribution for sampling
    distribution of the mean

16
  • As n increases, SE decreases
  • SE increases with increasing variability in the
    population
  • If the distribution is normal
  • MeanSD contains about 68 of
    observations
  • Mean1.96SD 95 of
    observations
  • In general
  • MeanSE 68 confidence interval for the
    mean
  • Mean1.96SE 95 confidence interval for the mean

17
Coefficient of Variation
  • Used to compare variability of some measurement
    in two groups when the two groups have very
    different means
  • The variation relative to the mean

18
Important concepts
19
  • Variables and data come in two basic forms
  • Qualitativealso known as categorical or nominal
  • Quantitativealso known as metric or numerical
  • A qualitative variable is called
  • Ordinal if there is an order in the categories
  • Binary, dichotomous or attribute if it has two
    categories

20
  • Presenting qualitative data
  • Count the number of persons in each category
  • Use a table, bar chart or pie chart
  • A numerical distribution can be summarized by
    giving descriptions or measures of
  • Its shape
  • Where its centre is
  • How spread out it is

21
  • Shapes of quantitative distributions
  • Symmetrical or skewed
  • Unimodal or bimodal (if symmetrical)
  • Measures of the centre of a quantitative
    distribution
  • Arithmetic mean (grouped and weighted mean)
  • Median (used for skewed data)
  • Mode
  • Geometric mean (used for positively skewed data)

22
  • Measures of the spread in variability of
    distribution
  • Range
  • Interquartile range
  • Variance
  • Standard deviation
  • Coefficient of variation
  • The normal distribution is
  • Is a bell-shaped distribution
  • Has certain area properties
  • In particular, 95 of observation in a normal
    distribution lie within mean 1.95 (standard
    deviations)

23
  • The sampling distribution of a statistic
  • Is the distribution of potential values of that
    statistic that could be obtained in repeated
    samples of the same size from a population
  • The standard error of a statistic
  • Is the standard deviation of the sampling
    distribution of that static
  • Measures the precision of the sample value of
    that statistic as an estimator of the true
    population value
  • The smaller the standard error is, the more
    precise the estimate
  • Decreases with increasing sample size

24
  • A confidence interval (CI)
  • Gives the precision of a sample estimate
  • Puts a error on a sample statistic
  • (but sometimes the CI cannot be expressed simply
    as statistic error, like with log-normal
    distribution)
  • Is a range of values surrounding a sample
    statistic within which, at a given level of
    confidence, the true value of the population
    parameter is likely to be found

25
  • Statistic Formula
  • Arithmetic mean
  • Weighted mean
  • Median middle number
  • Mode most frequent number
  • Geometric mean
  • Range maximum-minimum
  • Interquartile range IQR75th-25th percentile
  • Variance
  • Standard deviation
  • Coefficient of correlation
Write a Comment
User Comments (0)
About PowerShow.com