Quantitative Methods in Policy Analysis

About This Presentation

Title:

Quantitative Methods in Policy Analysis

Description:

not so many that there are only a few observations per category. ... Less intuitive and more difficult to interpret, because it is measured in ... – PowerPoint PPT presentation

Number of Views:113

Avg rating:3.0/5.0

Slides: 54

Provided by: timgu3

Category:

more less

Transcript and Presenter's Notes

Title: Quantitative Methods in Policy Analysis

1
Quantitative Methods in Policy Analysis

Session 2 Summary Statistics
Steve Fetter
Tim Gulden

2
Bad Graphs Revisited
3
Statistics

Collecting, analyzing, and presenting numerical
facts or data
Descriptive statistics organize, summarize, and
present data
Inferential statistics draw general conclusions
about a population based on a small sample drawn
from it

4
Descriptive Statistics

Accurately and honestly report, describe,
present, and summarize data, in numbers, tables,
and graphs
What is the central value?
How widely are values spread from the center?
Is this spreading symmetrical? Are there distinct
groups?
How common is a certain value?
Are there data that are very atypical?

5
Frequency Tables

A frequency table indicates how many observations
fall in various categories
For continuous data, we must first choose
appropriate categories
enough categories so that we can see a meaningful
distribution, but
not so many that there are only a few
observations per category.
Intervals should be multiples of 1, 2, 2.5, or 5
Rule of thumb divide the range of values into 8
to 15 equal bins more data ? more bins

6
Histograms

A histogram is the graphical analog of a
frequency table
We can create frequency tables and histograms in
Excel manually, with the data analysis toolpack,
with pivot tables, or with add-ins
For categorical variables, use a bar or pie chart
to display frequencies

7
ACTOR.XLS

Data for 66 actors and 4 variables
Gender
DomesticGross average domestic gross of last
few movies (in millions)
ForeignGross average foreign gross of last few
movies (in millions)
Salary current amount the actor asks for a movie
(in millions)

8
Creating Histograms in Excel

Create column containing upper limits for each
category (e.g., 2, 4, 620) bins
Manually
FREQUENCY(salary,bins)
Use chart wizard to create histogram
With Data Analysis Toolpack
Tools/Data Analysis/Histogram
Enter bin and data ranges

9
Try it ACTORS.XLS
10
Histogram Shape

symmetric
skewed to the right (positively skewed)
skewed to the left (negatively skewed)
bimodal (or tri-modal, etc.)

11
Symmetric
12
Skewed to Right (Positive Skew)
13
Skewed to Left (Negative Skew)
14
Bimodal
15
Scatterplots

A useful way to picture the relationship between
two variable is to plot a point for each
observation a scatterplot
We can usually see whether there is any
relationship between the two variables, and, if
so, what type it is
Example relationship between actors salary and
movie gross

16
Scatterplots with Excel

Highlight column with Y-variable data
Select Insert/Chart (or click Chart icon)
Select type XY (Scatter) and click next
Select series tab, enter range for X-variable
data
Enter axis labels, delete gridlines and legend
Select location for graph
Edit plot background, axis range, font sizes,
etc.
Save as custom type or copy and change data range
If X-Y in adjacent columns, highlight both in
step 1

17
The resulting scatterplot
Try it ACTORS.XLS
18
Presidents approval rating v. number of House
seats lost by presidents party in mid-term
elections, 1946-98
19
Pivot Tables

Pivot tables are one of Excels most powerful
tools. They allow us to slice and dice the
data, breaking it into various subpopulations
Statisticians often refer to these as contingency
tables or crosstabs
Example actresses claim they are being underpaid
relative to male actors. A pivot table is one way
to investigate if the data support this claim

20
Creating a Pivot Table

Position the cursor in the data set
Select Data/PivotTable
Select Excel List and Pivot Table
Verify that the range of the data set is correct
Click Layout to specify variables
to put a variable in any of the four areas (page,
row, column or data), click and drag it to area
drag salary to the row area, gender to the column
area, and gender to the data area

21
Creating a Pivot Table (cont.)

The table shows too much detail. To group the
salaries in wider bins
right-click on the salary column.
select Group and Outline/Group
in the dialogue box, enter 2 for starting at,
20 for ending at and 2 for by
The resulting table should look as follows after
expressing the counts as percentages (right-click
on count of gender, select field settings and
show data as of column) and rounding off to
two significant digits

22
Modified Pivot Table
23
Bad graphical presentations
24
A good graphical presentation
25
Analyzing the Table

Over half the women are in the lowest salary
category, whereas only 19 of the men are in this
category.
No women are in the highest two salary
categories, whereas 23 of the men are in these
categories
Clearly, women are paid less. Is this evidence of
discrimination? What analysis would you do to
shed light on this question?

26
More Pivot Table Analysis

Perhaps women are paid less because movies with
female leads gross less
To analyze this, examine the average salary of
men and women for each domestic gross level
drag DomesticGross to the row area
drag Gender to the column area
drag Salary to data area (summarize by average)
group DomesticGross in increments of 20
round off!

27
Average Female and Male Salaries by Domestic Gross
28
Salary discrimination against actresses?

Men average more than women in the two lowest
domestic gross categories, but not the third.
Beyond the third category, it is hard to tell
because no females were leads.
Thus, there appears to be discrimination against
women in low-grossing movies. To reach a more
definitive conclusion, we need to use more
sophisticated methods (e.g., chi-square or
regression analysis).

29
Break10 minutes
30
Summary Statistics

Summary statistics describe data sets or
relationships between variables with one or a few
numbers
What is the central value?
How widely are values spread from the center?
Are there data that are very atypical?
Are variables related?

31
Measures of Central Location Mean

The mean is the arithmetic average

population mean (?) v. sample mean (x-bar)
mean is representative or typical value if
the distribution is roughly symmetric
Excel command AVERAGE(data)

32
Measures of Central Location Median

The median is the middle observation, when data
are ordered from smallest to largest
If there is an odd number of observations, the
median is the middle observation
If n is even, the median is the average of the
two middle observations
Excel command MEDIAN(data)

33
Mean or Median?

median is representative, resistant, robust
statistic, even if distribution is skewed
Outliers affect mean more than the median
1, 3, 5, 7, 9 median 5, mean 5
1, 3, 5, 7, 90 median 5, mean 21
Use the median when you want the typical value
(e.g., family income of a city)
Use mean when extreme values are more important
(e.g., deaths from reactor accidents)

34
Measures of Variability

The most common measure is the standard
deviation the square root of the average of the
squared deviations from the mean

Use ? for populations, s for samples. Why (n 1)
for samples? Because computing the sample mean
uses up a bit information, leaving only (n
1) independent bits to compute the standard
deviation.
35
More on Standard Deviation

We square the individual variations so that they
dont cancel out when added together
We take the square root to express the result in
the same units as the original data
Large deviations from the mean contribute more
heavily because they are squared
Standard deviation is sensitive to outliers it
is not a resistant statistic

36
Variance

The variance is the square of the standard
deviation
Less intuitive and more difficult to interpret,
because it is measured in squared units (e.g.,
squared dollars) rather than original units
We wont use variance much, but you should know
what it is.

37
Rules of Thumb

Most data sets obey these rules of thumb
? 2/3 of the observations are within one
standard deviation of the mean (? ? ?)
? 95 are within 2? of the mean (? ? 2?)
?100 are within 3? of the mean (? ? 3?)
Observations more than 3? from the mean are
generally called outliers

38
Measures of Variability IQR

The interquartile range (IQR) is a more resistant
measure than standard deviation
Use IQR with median std dev with mean
Sort observations from smallest to largest,
divide data into four equal-sized sets at first
quartile (Q1), median, and third quartile (Q3)
IQR Q3 Q1
QUARTILE(data,3) QUARTILE(data,1)

39
Other Measures of Variability

The range is the difference between the largest
and smallest values in the data set
A very crude measure of variability, but useful
for designing tables and graphs
MAX(data) MIN(data)
Percentiles, or values below which a certain
percentage of the data lie) can also be used
PERCENTILE(data,0.95)

40
Summary Statistics with Data Analysis

Select
Tools/Data Analysis/Descriptive Statistics
Select data
include labels and check Labels in First Row
dont include non-numeric data
Check Summary Statistics
Select output option
Reformat output

41
Other Measures

Kurtosis and skewness indicate the relative
peakedness and skewness of the distribution
Note if you use Data Analysis, summary
statistics wont change automatically if the data
change better to use formulae in this case

42
Determining Linear Relationships

Scatterplots provide insight into relationship
between variables, but making scatterplots for
each pair of a large number of variables is
tedious
To get a quick indication of possible linear
relationships we can use the correlation
statistic r for a large number of variables, a
correlation matrix

43
Covariance and Correlation

The relationship between two variables is
summarized by covariance and correlation
Each measures the strength (and direction) of a
linear relationship between two variables
Relationship is strong if data cluster tightly
around a straight line
If the values of one variable generally increase
with increasing values of the other variable, the
relationship is positive otherwise it is
negative

44
Scatterplot Positive Relationship
45
Scatterplot Negative Relationship
Correlation -0.413
46
Correlation

Excel command CORREL(xdata,ydata)
More than two variables (correlation matrix)
Select Tools/Data Analysis/Correlations
Select data (with labels), output location

47
Correlation Statistic

correlation ranges between -1 (perfect negative
relationship) to 1 (perfect positive) 0 no
linear relationship at all
correlation between a variable and itself is 1
correlation between X and Y is the same as the
correlation between Y and X thus, it is
sufficient to list the correlations below (or
above) the diagonal in the table.

48
Boxplots

A boxplot is a useful graphical method for
summarizing, comparing data
Analyse-it makes nice boxplots
A sheet is added with the boxplot and the
statistics used to form it

49
Boxplot Chart
50
Understanding Boxplots

The right and left of the box are Q1 and Q3 the
length of the box is the IQR the box contains
the middle 50 of the observations
The vertical line inside the box is the median
The point inside the box is the mean
The horizontal lines extend to most extreme
observations within 1.5 IQR from the box they
indicate variability and skewness
Observations more than 1.5 IQR from the box are
shown as points. If they are 1.5 to 3 IRQ from
the box, they are mild outliers otherwise, they
are extreme outliers.

51
Students Heights by Gender
52
Boxplots in Analyse-It
mean
Q3
median
Q3
53
Manipulating Data in Excel

Importing data fixed/delimited fields in text
files web queries
Cleaning data missing data, duplicate records,
invalid entries, Y2K errors, formatting errors,
coding
Selecting data sort, autofilter, advanced filter

Write a Comment

User Comments (0)

About PowerShow.com

Quantitative Methods in Policy Analysis - PowerPoint PPT Presentation

Quantitative Methods in Policy Analysis

not so many that there are only a few observations per category. ... Less intuitive and more difficult to interpret, because it is measured in ... – PowerPoint PPT presentation