Quantitative Methods in Policy Analysis - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Quantitative Methods in Policy Analysis

Description:

not so many that there are only a few observations per category. ... Less intuitive and more difficult to interpret, because it is measured in ... – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 54
Provided by: timgu3
Category:

less

Transcript and Presenter's Notes

Title: Quantitative Methods in Policy Analysis


1
Quantitative Methods in Policy Analysis
  • Session 2 Summary Statistics
  • Steve Fetter
  • Tim Gulden

2
Bad Graphs Revisited
3
Statistics
  • Collecting, analyzing, and presenting numerical
    facts or data
  • Descriptive statistics organize, summarize, and
    present data
  • Inferential statistics draw general conclusions
    about a population based on a small sample drawn
    from it

4
Descriptive Statistics
  • Accurately and honestly report, describe,
    present, and summarize data, in numbers, tables,
    and graphs
  • What is the central value?
  • How widely are values spread from the center?
  • Is this spreading symmetrical? Are there distinct
    groups?
  • How common is a certain value?
  • Are there data that are very atypical?

5
Frequency Tables
  • A frequency table indicates how many observations
    fall in various categories
  • For continuous data, we must first choose
    appropriate categories
  • enough categories so that we can see a meaningful
    distribution, but
  • not so many that there are only a few
    observations per category.
  • Intervals should be multiples of 1, 2, 2.5, or 5
  • Rule of thumb divide the range of values into 8
    to 15 equal bins more data ? more bins

6
Histograms
  • A histogram is the graphical analog of a
    frequency table
  • We can create frequency tables and histograms in
    Excel manually, with the data analysis toolpack,
    with pivot tables, or with add-ins
  • For categorical variables, use a bar or pie chart
    to display frequencies

7
ACTOR.XLS
  • Data for 66 actors and 4 variables
  • Gender
  • DomesticGross average domestic gross of last
    few movies (in millions)
  • ForeignGross average foreign gross of last few
    movies (in millions)
  • Salary current amount the actor asks for a movie
    (in millions)

8
Creating Histograms in Excel
  • Create column containing upper limits for each
    category (e.g., 2, 4, 620) bins
  • Manually
  • FREQUENCY(salary,bins)
  • Use chart wizard to create histogram
  • With Data Analysis Toolpack
  • Tools/Data Analysis/Histogram
  • Enter bin and data ranges

9
Try it ACTORS.XLS
10
Histogram Shape
  • symmetric
  • skewed to the right (positively skewed)
  • skewed to the left (negatively skewed)
  • bimodal (or tri-modal, etc.)

11
Symmetric
12
Skewed to Right (Positive Skew)
13
Skewed to Left (Negative Skew)
14
Bimodal
15
Scatterplots
  • A useful way to picture the relationship between
    two variable is to plot a point for each
    observation a scatterplot
  • We can usually see whether there is any
    relationship between the two variables, and, if
    so, what type it is
  • Example relationship between actors salary and
    movie gross

16
Scatterplots with Excel
  • Highlight column with Y-variable data
  • Select Insert/Chart (or click Chart icon)
  • Select type XY (Scatter) and click next
  • Select series tab, enter range for X-variable
    data
  • Enter axis labels, delete gridlines and legend
  • Select location for graph
  • Edit plot background, axis range, font sizes,
    etc.
  • Save as custom type or copy and change data range
  • If X-Y in adjacent columns, highlight both in
    step 1

17
The resulting scatterplot
Try it ACTORS.XLS
18
Presidents approval rating v. number of House
seats lost by presidents party in mid-term
elections, 1946-98
19
Pivot Tables
  • Pivot tables are one of Excels most powerful
    tools. They allow us to slice and dice the
    data, breaking it into various subpopulations
  • Statisticians often refer to these as contingency
    tables or crosstabs
  • Example actresses claim they are being underpaid
    relative to male actors. A pivot table is one way
    to investigate if the data support this claim

20
Creating a Pivot Table
  • Position the cursor in the data set
  • Select Data/PivotTable
  • Select Excel List and Pivot Table
  • Verify that the range of the data set is correct
  • Click Layout to specify variables
  • to put a variable in any of the four areas (page,
    row, column or data), click and drag it to area
  • drag salary to the row area, gender to the column
    area, and gender to the data area

21
Creating a Pivot Table (cont.)
  • The table shows too much detail. To group the
    salaries in wider bins
  • right-click on the salary column.
  • select Group and Outline/Group
  • in the dialogue box, enter 2 for starting at,
    20 for ending at and 2 for by
  • The resulting table should look as follows after
    expressing the counts as percentages (right-click
    on count of gender, select field settings and
    show data as of column) and rounding off to
    two significant digits

22
Modified Pivot Table
23
Bad graphical presentations
24
A good graphical presentation
25
Analyzing the Table
  • Over half the women are in the lowest salary
    category, whereas only 19 of the men are in this
    category.
  • No women are in the highest two salary
    categories, whereas 23 of the men are in these
    categories
  • Clearly, women are paid less. Is this evidence of
    discrimination? What analysis would you do to
    shed light on this question?

26
More Pivot Table Analysis
  • Perhaps women are paid less because movies with
    female leads gross less
  • To analyze this, examine the average salary of
    men and women for each domestic gross level
  • drag DomesticGross to the row area
  • drag Gender to the column area
  • drag Salary to data area (summarize by average)
  • group DomesticGross in increments of 20
  • round off!

27
Average Female and Male Salaries by Domestic Gross
28
Salary discrimination against actresses?
  • Men average more than women in the two lowest
    domestic gross categories, but not the third.
    Beyond the third category, it is hard to tell
    because no females were leads.
  • Thus, there appears to be discrimination against
    women in low-grossing movies. To reach a more
    definitive conclusion, we need to use more
    sophisticated methods (e.g., chi-square or
    regression analysis).

29
Break10 minutes
30
Summary Statistics
  • Summary statistics describe data sets or
    relationships between variables with one or a few
    numbers
  • What is the central value?
  • How widely are values spread from the center?
  • Are there data that are very atypical?
  • Are variables related?

31
Measures of Central Location Mean
  • The mean is the arithmetic average
  • population mean (?) v. sample mean (x-bar)
  • mean is representative or typical value if
    the distribution is roughly symmetric
  • Excel command AVERAGE(data)

32
Measures of Central Location Median
  • The median is the middle observation, when data
    are ordered from smallest to largest
  • If there is an odd number of observations, the
    median is the middle observation
  • If n is even, the median is the average of the
    two middle observations
  • Excel command MEDIAN(data)

33
Mean or Median?
  • median is representative, resistant, robust
    statistic, even if distribution is skewed
  • Outliers affect mean more than the median
  • 1, 3, 5, 7, 9 median 5, mean 5 
  • 1, 3, 5, 7, 90 median 5, mean 21
  • Use the median when you want the typical value
    (e.g., family income of a city)
  • Use mean when extreme values are more important
    (e.g., deaths from reactor accidents)

34
Measures of Variability
  • The most common measure is the standard
    deviation the square root of the average of the
    squared deviations from the mean

Use ? for populations, s for samples. Why (n 1)
for samples? Because computing the sample mean
uses up a bit information, leaving only (n
1) independent bits to compute the standard
deviation.
35
More on Standard Deviation
  • We square the individual variations so that they
    dont cancel out when added together
  • We take the square root to express the result in
    the same units as the original data
  • Large deviations from the mean contribute more
    heavily because they are squared
  • Standard deviation is sensitive to outliers it
    is not a resistant statistic

36
Variance
  • The variance is the square of the standard
    deviation
  • Less intuitive and more difficult to interpret,
    because it is measured in squared units (e.g.,
    squared dollars) rather than original units
  • We wont use variance much, but you should know
    what it is.

37
Rules of Thumb
  • Most data sets obey these rules of thumb
  • ? 2/3 of the observations are within one
  • standard deviation of the mean (? ? ?)
  • ? 95 are within 2? of the mean (? ? 2?)
  • ?100 are within 3? of the mean (? ? 3?)
  • Observations more than 3? from the mean are
    generally called outliers

38
Measures of Variability IQR
  • The interquartile range (IQR) is a more resistant
    measure than standard deviation
  • Use IQR with median std dev with mean
  • Sort observations from smallest to largest,
    divide data into four equal-sized sets at first
    quartile (Q1), median, and third quartile (Q3)
  • IQR Q3 Q1
  • QUARTILE(data,3) QUARTILE(data,1)

39
Other Measures of Variability
  • The range is the difference between the largest
    and smallest values in the data set
  • A very crude measure of variability, but useful
    for designing tables and graphs
  • MAX(data) MIN(data)
  • Percentiles, or values below which a certain
    percentage of the data lie) can also be used
  • PERCENTILE(data,0.95)

40
Summary Statistics with Data Analysis
  • Select
  • Tools/Data Analysis/Descriptive Statistics
  • Select data
  • include labels and check Labels in First Row
  • dont include non-numeric data
  • Check Summary Statistics
  • Select output option
  • Reformat output

41
Other Measures
  • Kurtosis and skewness indicate the relative
    peakedness and skewness of the distribution
  • Note if you use Data Analysis, summary
    statistics wont change automatically if the data
    change better to use formulae in this case

42
Determining Linear Relationships
  • Scatterplots provide insight into relationship
    between variables, but making scatterplots for
    each pair of a large number of variables is
    tedious
  • To get a quick indication of possible linear
    relationships we can use the correlation
    statistic r for a large number of variables, a
    correlation matrix

43
Covariance and Correlation
  • The relationship between two variables is
    summarized by covariance and correlation
  • Each measures the strength (and direction) of a
    linear relationship between two variables
  • Relationship is strong if data cluster tightly
    around a straight line
  • If the values of one variable generally increase
    with increasing values of the other variable, the
    relationship is positive otherwise it is
    negative

44
Scatterplot Positive Relationship
45
Scatterplot Negative Relationship
Correlation -0.413
46
Correlation
  • Excel command CORREL(xdata,ydata)
  • More than two variables (correlation matrix)
  • Select Tools/Data Analysis/Correlations
  • Select data (with labels), output location

47
Correlation Statistic
  • correlation ranges between -1 (perfect negative
    relationship) to 1 (perfect positive) 0 no
    linear relationship at all
  • correlation between a variable and itself is 1
  • correlation between X and Y is the same as the
    correlation between Y and X thus, it is
    sufficient to list the correlations below (or
    above) the diagonal in the table.

48
Boxplots
  • A boxplot is a useful graphical method for
    summarizing, comparing data
  • Analyse-it makes nice boxplots
  • A sheet is added with the boxplot and the
    statistics used to form it

49
Boxplot Chart
50
Understanding Boxplots
  • The right and left of the box are Q1 and Q3 the
    length of the box is the IQR the box contains
    the middle 50 of the observations
  • The vertical line inside the box is the median
    The point inside the box is the mean
  • The horizontal lines extend to most extreme
    observations within 1.5 IQR from the box they
    indicate variability and skewness
  • Observations more than 1.5 IQR from the box are
    shown as points. If they are 1.5 to 3 IRQ from
    the box, they are mild outliers otherwise, they
    are extreme outliers.

51
Students Heights by Gender
52
Boxplots in Analyse-It
mean
Q3
median
Q3
53
Manipulating Data in Excel
  • Importing data fixed/delimited fields in text
    files web queries
  • Cleaning data missing data, duplicate records,
    invalid entries, Y2K errors, formatting errors,
    coding
  • Selecting data sort, autofilter, advanced filter
Write a Comment
User Comments (0)
About PowerShow.com