General Concepts and Descriptive Statistics I - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

General Concepts and Descriptive Statistics I

Description:

... Physician Visits During the First Trimester FTV (0 = None, 1 = One, 2 = Two, etc. ... ID LOW AGE LWT RACE SMOKE PTL HT UI FTV BWT. 85 0 19 182 2 0 0 0 1 0 2523 ... – PowerPoint PPT presentation

Number of Views:107
Avg rating:3.0/5.0
Slides: 44
Provided by: Spencer
Category:

less

Transcript and Presenter's Notes

Title: General Concepts and Descriptive Statistics I


1
General Concepts and Descriptive Statistics I
  • Trey Spencer
  • Division of Biometry

2
Reference
http//bmj.com/collections/statsbk
3
Definitions
What is Statistics?
  • Statistics is a branch of applied mathematics
    that helps us to make intelligent judgements and
    informed decisions in the presence of uncertainty
    and variation.
  • Useful in the planning of experiments and studies
    that will result in meaningful data.
  • Provides a set of tools to extract and understand
    information resulting from experiments.

4
Definitions
Variable a characteristic that changes over time
and/or for different individuals under
consideration. Experimental Unit the individual
or object on which a variable is measured.
Example Systolic blood pressure is measured on
a group of 100 patients. Here, SBP is the
variable, and a patient is the experimental unit.
5
Definitions
Population the set of all measurements of
interest. Sample a subset of measurements of
interest to the investigator.
Population
Sample
6
Definitions
Parameter a numerical characteristic of a
population. Generally, the value of a population
parameter is never exactly known. Statistic a
numerical characteristic of a sample. This is an
estimate of the population parameter.
7
Definitions
Descriptive Statistics basic tools for
summarizing and presenting numerical
data. Inferential Statistics a objective means
of drawing conclusions from data, and about the
issues under research.
8
Definitions
Univariate Data a single variable is measured
for each experimental unit. Bivariate Data two
variables measured for each EU. Multivariate
Data more than two variables are measured on for
EU.
9
Types of Variables
Qualitative Variables variables that are
intrinsically non-numerical. Often called
categorical variables. Quantitative Variables
variables that are intrinsically numeric in
nature.
10
Qualitative Variables
Nominal Variables variables that have no natural
ordering. For example, race, gender, marital
status, and toxicity listings are nominal
variables. Ordinal Variables variables in which
there is a distinct ordering of values. Pain
scales and response to treatment are examples of
ordinal variables.
11
Quantitative Variables
Discreet Variables variables that have numeric
values that are integers (countable). Examples
include, number of children, number of previous
BMTs, age (in years). Continuous Variables
variables that have values associated with real
numbers. Examples Temperature,
Systolic/Diastolic BP.
12
Low Birth Weight Data
Variable
Abbreviation Identification Code
ID Low Birth Weight (0
Birth Weight 2500g, LOW
1 Birth Weight the Mother in Years
AGE Weight in Pounds at the Last Menstrual
Period LWT Race (1 White, 2
Black, 3 Other) RACE
Smoking Status During Pregnancy (1 Yes, 0
No) SMOKE History of Premature Labor
(0 None, 1 One, etc.) PTL History of
Hypertension (1 Yes, 0 No) HT
Presence of Uterine Irritability (1 Yes, 0
No) UI Number of Physician Visits
During the First Trimester FTV (0 None, 1
One, 2 Two, etc.) Birth Weight in Grams
BWT
13
Low Birth Weight Data
ID LOW AGE LWT RACE SMOKE PTL
HT UI FTV BWT 85 0 19
182 2 0 0 0 1 0
2523 86 0 33 155 3 0
0 0 0 3 2551 87 0
20 105 1 1 0 0 0
1 2557 88 0 21 108 1
1 0 0 1 2 2594 89 0
18 107 1 1 0 0 1
0 2600 91 0 21 124 3
0 0 0 0 0 2622 92
0 22 118 1 0 0 0
0 1 2637 76 1 20 105
3 0 0 0 0 3 2450
77 1 26 190 1 1 0
0 0 0 2466 78 1 14
101 3 1 1 0 0 0
2466 79 1 28 95 1 1
0 0 0 2 2466 81 1
14 100 3 0 0 0 0
2 2495 82 1 23 94 3
1 0 0 0 0 2495 83 1
17 142 2 0 0 1 0
0 2495 84 1 21 130 1
1 0 1 0 3 2495
Hosmer and Lemeshow (2000) Applied Logistic
Regression 2nd Edition John Wiley Sons
N189
14
Data Presentation
Three ways to summarize, or describe data
1. Tabulations and Frequency Distributions 2.
Graphics 3. Basic Summary Statistics
15
Tabulations
Tables are used to describe qualitative data.
The tables simply present the counts, or
frequencies, observed in each category of a
variable of interest.
16
Tabulations
Physician Visits During the 1st Trimester
17
Bar ChartPhysician Visits During First Trimester
18
Pie ChartPhysician Visits During First Trimester
19
Frequency Distributions
20
Frequency Distributions
Example Mothers Weight at Last Menstrual Period
21
Histograms
22
Histograms
Steps in constructing a histogram 1. Choose the
number of classes 2. Calculate the approximate
class width by dividing the difference between
the largest and smallest values by the number of
classes. 3. Construct a statistical table
containing classes, their frequencies, and their
relative frequencies. 4. Construct the
histogram, plotting class intervals on the x-axis
and frequencies on the y-axis.
23
Histograms
24
Cumulative Distribution
25
Stem Leaf Plots
A quick and elegant way of describing relatively
small data sets is by the use of stem-and-leaf
diagrams. Consider the following set of data
30, 26, 26, 36, 48, 50, 16, 31, 22, 27, 23, 35,
52, 28, 37
26
Interpreting Graphs
  • Location of Center
  • Scale
  • Shape
  • Outliers

27
Interpreting GraphsShape
28
Interpreting GraphsShape
29
Interpreting Graphs
30
Summary Statistics
Measures of Center (Central Tendency) Mean Median
Mode Measures of Spread (Variability) Range Varian
ce Standard Deviation
31
Measures of Center
Mean (average) sum of sampled values divided by
the number of samples taken.
n sample size Xi sampled value ? symbol for
summation
32
Measures of Center
Example
30, 26, 26, 36, 48, 50, 16, 31, 22, 27, 23, 35,
52, 28, 37
How do extreme values affect the mean?
30, 26, 26, 36, 48, 50, 16, 31, 22, 27, 23, 35,
52, 28, 37, 113
Note The mean is sensitive to extreme values.
33
Measures of Center
Median the value of a set of measurement that
falls in the middle position when the data are
ordered from smallest to largest.
34
Measures of Center
N 15 is odd, so the 8th value is the median
16, 22, 23, 26, 26, 27, 28, 30, 31, 35, 36, 37,
48, 50, 52
The 8th value
Why 8? (15 1)/2 8
How do extreme values affect the median?
16, 22, 23, 26, 26, 27, 28, 30, 31, 35, 36, 37,
48, 50, 52, 113
Now N16, so the average of the 8th and 9th value
is the median, which is 30.5 ... not much
different from the original data!
35
Measures of Center
Mode the value of a set of measurements that
occurs most frequently. In our example data, the
mode is 26.
16, 22, 23, 26, 26, 27, 28, 30, 31, 35, 36, 37,
48, 50, 52
26 is the mode
Fact For data that is symmetric and unimodal,
the mean, median and mode are similar.
36
Measures of Spread
Range the difference between the largest and
smallest sample measurements. In our example, the
range is 36.
16, 22, 23, 26, 26, 27, 28, 30, 31, 35, 36, 37,
48, 50, 52
Note Two data sets may have the same range, but
very different shape and variability.
R 52-16 36
37
Measures of Spread
Variance (s2) the sum of the squares of the
deviations divided by the sample size minus
one. Standard Deviation (s) the square root of
the variance.
38
Measures of Spread
A computationally more convenient formula to
calculate the variance
39
Measures of Spread
16, 22, 23, 26, 26, 27, 28, 30, 31, 35, 36, 37,
48, 50, 52
The variance and standard deviation for our
example are
40
Percentiles of a Sample
The Pth percentile of a sample of n observations
is that value of the variable with rank
(P/100)(n1). If the rank is not an integer it
is rounded to the nearest half rank.
Percentile Name 25th Lower Quartile 50th Medi
an 75th Upper Quartile
41
Percentiles of a Sample
16, 22, 23, 26, 26, 27, 28, 30, 31, 35, 36, 37,
48, 50, 52
27.5
25th -tile 4th value of ranked data
40th -tile average of the 6th and 7th values of
ranked data
75th -tile 12th value of ranked data
Interquartile Range (IQR) the difference between
the 75th and 25th Percentiles of a sample.
IQR 37 - 26 11
42
Boxplot
43
Weight at Last Menstrual Period
Write a Comment
User Comments (0)
About PowerShow.com