Chapter 2 Characterizing Your Data Set - PowerPoint PPT Presentation

About This Presentation
Title:

Chapter 2 Characterizing Your Data Set

Description:

Chapter 2 Characterizing Your Data Set Allan Edwards: Before you analyze your data, graph your data Chapter 2 Characterizing Your Data Set Frequency Table Variable ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 35
Provided by: Rober338
Learn more at: https://www.d.umn.edu
Category:

less

Transcript and Presenter's Notes

Title: Chapter 2 Characterizing Your Data Set


1
Chapter 2Characterizing Your Data Set
Allan Edwards Before you analyze your data,
graph your data
2
Chapter 2Characterizing Your Data Set
Allan Edwards Before you analyze your data,
graph your data
Francis Galton, Father of Intelligence Testing
Whenever you can, count!
3
Frequency TableVariable is Continuous
4
Grouped Frequency Table Distribution
Continuous variable, Data from Same
100 Subjects Constant Interval Class Interval
5
Grouped Frequency HistogramFor Continuous
Variable
Bars Touch, the end of one interval is
beginning of next Value is middle value of
Interval Spatz says the bars dont touch
Whaaaaaa?????
6
Bar Chart for Categorical Variable
Bars are separated a lot of Biology is not
almost English
7
Standard Normal Distribution
The more Extreme your score the more unusual,
improbable you are Remember this relationship --
its the basis of 90 of statistics Typical of
many characteristics -- E.G., height,
intelligence, speed
8
Rectangular DistributionNever Seen One
Extreme Scores are NOT less usual/frequent/probabl
e
9
Non-Normal Distribution
Example Income -- Where is the mean? How
would you characterize these data?
10
Negative Skew
11
Bimodal Distribution
Is the Mean appropriate/representative E.G., Mean
age of onset for Anorexia is 17yrs One Peak is at
14yrs -- Onset of Puberty One Peak is at 18yrs
-- Going away to college
12
Bimodal Distribution, cont.
13
Characterizing Your DataMeasures of Central
Tendency
  • Characterizing your Data
  • Shorthand notation for all of your values
  • Central Tendency
  • A representative value
  • Where Your Scores tend to Hang Out
  • Where you go to find your data
  • Mean -- What is definition why do you use it?
  • Median -- Middle Value
  • What if you have an even of values?
  • Mode -- Most frequent value

14
Which Central Tendency is Best?
  • Mean
  • Ratio Data (People allow Interval Data)
  • Symmetrical Distributions
  • Median
  • Skewed Distributions
  • Ordinal (Ranked) Data -- A mean cannot be
    computed
  • Mode
  • Nominal (Qualitative) Data
  • Bimodal Data

15
If you Had to Guess the Value of Each
(Quantitative) Data Point
  • Mode Highest of correct guesses
  • Median Errors would be symmetrical
  • Overestimations would balance out
    Underestimations
  • Mean Errors of Estimation will be smallest,
    overall
  • Two Unique Properties of the Mean
  • Deviations are smallest from the mean
  • Than for any other value
  • Deviation scores sum to zero

16
How Strong Is Your Tendency?Measures of
Heterogeneity(Chapter 3)
  • Two Data Sets with nearly identical
  • Ns
  • Means
  • Medians
  • Modes
  • Are these two data sets similar?

17
Are They The Same?
18
Some Data Sets are More Heterogeneous
Jockeys Very Low average height Very
Homogeneous Presbyterians Medium average
height Very Heterogeneous NBA Players Very High
average height Very Homogenous How do you
characterize a data sets Heterogeneity? The
Greater the Heterogeneity, the Weaker the Central
Tendency
19
Quantifying Heterogeneity
Range Highest Score minus Lowest Score Very
sensitive to a single Extreme Score Inter
Quartile Range 75th percentile minus 25th
percentile Captures 50 of the scores How wide
do you have to go to capture 50 of values? The
wider you have to go the more Heterogeneity
20
Heterogeneity, cont.
The more Heterogeneity, the more the scores will
deviate from The mean
21
Heterogeneity, cont.
  • Two Unique properties of the Mean
  • All deviation scores sum to zero
  • Raw scores Deviate Less from the mean than from
    any other
  • Value
  • This makes the mean the Best Representative of
    the data
  • Set
  • If distribution is symmetrical

22
Heterogeneity, cont.
  • Problem
  • All deviation scores sum to zero no matter how
  • Heterogeneous the raw scores
  • You Cannot average deviations scores to quantify
    heterogeneity
  • Solution
  • Make all deviation scores Positive

23
Heterogeneity, cont.
  • Two way to make all deviation scores Positive
  • Take the Absolute Value of the Deviation Scores
  • Average of absolute values Average Deviation
  • Mean /- AD Captures 50 of raw scores
  • Take the Square of the Deviation Scores
  • Average of squared deviation scores Variance
  • ?2 for Population
  • S2 for Sample
  • S2 -hat for estimating Population from Sample

24
Variance
Estimate of Population from Sample
Population
To Describe sample use N S2 Sample Variance
Problem Magnitude of Variance is large relative
to individual Deviation scores -- Quantifies
but not very descriptive
25
Standard Deviation
Population
Sample
Population Estimate
Mean /- SD captures 68 of Data Points
26
Standard Deviation, cont.
27
The Concept
Standard Deviation Standard Deviation from the
Mean Average Deviation from the Mean Expected
Deviation from the Mean Expect 68 of your data
to be within 1 SD of the mean Expect 95 of your
data to be within 2 SD of the mean If your score
is beyond 2 SDs of the mean You are very
infrequent You are very unusual You are very
improbable Associate Infrequent with Improbable
28
Interpreting a Value
  • Transforming a score to make it more
    interpretable
  • Comparing two scores
  • Two tests of Equal Difficulty but of Different
    Length
  • Pretend both tests were 100 items long
  • How many would you have gotten right?
  • Percent Correct is a Transformed Score
  • Comparing one score to everybody else
  • Pretend there were 100 people, where would rank?
  • Percentile is a Transformed Score

29
Z-scores Z-transformations
Take each score (Xi) and covert it to Zi Mean of
z-scores 0 Standard Deviation 1 Units of
z-scores are in Standard Deviations Z-score
compares Your Deviation (numerator) to
the Average Deviation (denominator)
30
Where you are relative to Population
Think Percentile
31
Interpreting Your Z-Score
32
Interpreting Your Z-Score, cont.
33
Interpreting Your Z-Score, cont.
34
Interpreting Your Z-Score, cont.
Write a Comment
User Comments (0)
About PowerShow.com