Quantitative Methods For Social Sciences presentation

About This Presentation

Transcript and Presenter's Notes

Title: Quantitative Methods For Social Sciences

1
Quantitative MethodsFor Social Sciences
SKEMA Ph.D programme 2010 2011

Lionel Nesta
Observatoire Français des Conjonctures
Economiques
Lionel.nesta_at_ofce.sciences-po.fr

2
Objective of The Course

The objective of the class is to provide students
with a set of techniques to analyze quantitative
data. It concerns the application of quantitative
and statistical approaches as developed in the
social sciences, for future decision makers,
policy markers, stake holders, managers, etc.
All courses are computer-based classes using the
STATA statistical package. The objective is to
reach levels of competence which provide the
students with skills to both read and understand
the work of others and to carry out one's own
research.

3
Examples

Rise in biotechnology
Should the EU fund fundamental research in
biotechnology?
Has biotechnology increased the productivity of
firm-level RD?
Did it increase the speed of discovery in
pharmaceutical RD?
Increasing university-industry collaborations
Does it facilitate innovation by firms?
Does it increase the production of new knowledge
by academics?
Does it modify the fundamental/applied nature of
research?

4
Examples

Economic (productivity) Growth
Does it come mainly from new firms or improving
existing firms?
Is market selection operating correctly?
Why do good firms exit the market?
How does the organisation of knowledge impact on
performance?
How do knowledge stock and specialisation impact
on productivity?
How do firms enter into new technological fields?
Do firms diversify in new technologies/businesses
purposively?

5
Structure of the Class

Part 1 Descriptive Statistics
Part 2 Statistical Inference
Part 3 Relationship Between Variables
Part 4 Ordinary Least Squares (OLS)
Part 5 Extension to OLS
Part 6 Qualitative Dependent variables

6
Structure of the Class

Part 1 Descriptive Statistics
Mean, variance, standard deviation
Data management
Part 2 Statistical Inference
Part 3 Relationship Between Variables
Part 4 Ordinary Least Squares (OLS)
Part 5 Extension to OLS
Part 6 Qualitative Dependent variables

7
Structure of the Class

Part 1 Descriptive Statistics
Part 2 Statistical Inference
Distributions
Comparison of means
Part 3 Relationship Between Variables
Part 4 Ordinary Least Squares (OLS)
Part 5 Extension to OLS
Part 6 Qualitative Dependent variables

8
Structure of the Class

Part 1 Descriptive Statistics
Part 2 Statistical Inference
Part 3 Relationship Between Variables
ANOVA, Chi-Square
Correlation
Part 4 Ordinary Least Squares (OLS)
Part 5 Extension to OLS
Part 6 Qualitative Dependent variables

9
Structure of the Class

Part 1 Descriptive Statistics
Part 2 Statistical Inference
Part 3 Relationship Between Variables
Part 4 Ordinary Least Squares (OLS)
Correlation coefficient, simple regression
Multiple regression
Part 5 Extension to OLS
Part 6 Qualitative Dependent variables

10
Structure of the Class

Part 1 Descriptive Statistics
Part 2 Statistical Inference
Part 3 Relationship Between Variables
Part 4 Ordinary Least Squares (OLS)
Part 5 Extension to OLS
Regressions diagnostics
Qualitative explanatory variables
Part 6 Qualitative Dependent variables

11
Structure of the Class

Part 1 Descriptive Statistics
Part 2 Statistical Inference
Part 3 Relationship Between Variables
Part 4 Ordinary Least Squares (OLS)
Part 5 Extension to OLS
Part 6 Qualitative Dependent variables
Linear probability model
Maximum likelihood (logit, probit)

12
Part 1Descriptive Statistics
13
Types of Data

Descriptive statistics is the branch of
statistics which gathers all techniques used to
describe and summarize quantitative and
qualitative data.
Quantitative data
Continuous
Measured on a scale (value its the range)
The size of the number reflect the amount of the
variable
Age wage, sales height, weight GDP
Qualitative data
Discrete, categorical
The number reflect the category of the variable
Type of work gender nationality

14
Descriptive Statistics

All means are good to summarize data in a
synthetic way graphs charts tables.
Quantitative data
Graphs scatter plots line plots histograms
Central tendency
Dispersion
Qualitative data
Graphs pie graphs histograms
Tables, frequency, percentage, cumulative
percentage
Cross tables

15
Central Tendency and Dispersion

A distribution is an ordered set of numbers
showing how many times each occurred, from the
lowest to the highest number or the reverse
Central tendency measures of the degree to which
scores are clustered around the mean of a
distribution
Dispersion measures the fluctuations around the
characteristics of central tendency
In other words, the characteristics of central
tendency produce stylized facts, when the
characteristics of dispersion look at the
representativeness of a given stylized fact.

16
Central Tendency

The mode
The most frequent score in distribution is called
the mode.
The median
The middle value of all observed values, when 50
of observed value are higher and 50 of observed
value are lower than the median
The mean
The sum of all of the values divided by the
number of value

The mode, the mean and the median ore equal if
and only of the distribution is symmetrical and
unimodal.
17
Dispersion

The range
Difference between the maximum and minimum values
The variance
Average of the squared differences between data
points and the mean (average) quadratic deviation
The standard deviation
Square root of variance, therefore measures the
spread of data about the mean, measured in the
same units as the data

18
Research Productivity in the Bio-pharmaceutical
Industry

EU Framework Programme 7

19
Stylised Facts about Modern Biotech

Innovations emerge from uncertain, complex
processes involving knowledge and markets Roles
of networks.
Economic value is created in many ways globally
and in geographical agglomerations
Various linkages exist among diverse actors
(LDFs, DBFs, Univ, Venture Capital) in innovation
processes, but the firm plays a particularly
important role.
Regulations, social structures and institutions
affect on-going innovation processes as well as
their impacts on society Importance of IPR.

20
STATA softwareStatistical Package for the Social
Sciences
21
The Stata software

Stata Corp, spinoff from Texas AM College
Station Texas (1985)
Among the most widely used programs for
statistical analysis in social sciences.
Probably to most widely used econometric software
among economists
Data management (case selection, file reshaping,
creating derived data)
Features of Stata are accessible via pull-down
menus
The pull-down menu interface generates command
syntax.

22
The Stata software

STATA is a statistical software in constant
evolution
Updates are constantly put on the web available
to the use of other Stata user (command update
all)
Most are available through the Boston College
server
ssc install module_name, all
And hundreds of other can be reached as follows
net search key_words
net install module_name, all

23
The Stata software
Pull down menus
Review window
Results window
Variable window
Command window
24
The Stata software

How to use STATA ?
Using pull-down menus
Typing STATA instructions in the Command window
Grouping a series of STATA instructions in a .do
files
Programming new functions (.ado files)
Programming new functions with a powerful matrix
language (MATA) similar to C (Version 9.0 of
STATA onwards)

25
The Stata software

All STATA commands used from the menu or the
command window are automatically stored in the
Review window
At the end of a session, the review window can
then be saved by right-clicking on it
save all under a .Do-file
Send to do-file editor A new window opens up.
A Do-file is a text file containing a list of
STATA commands which will be executed step by
step by STATA.
It is recommended to explore results and methods
with the command window. Once the methods are
settled, save the series of commands as a do-file.

26
The Stata software

All STATA results are displayed in the Result
window
This window is a buffer. Once it disappears from
the screen, it is deleted. That is why you may
want to record results.
log using log_name.txt (beginning of a session)
log close (end of a session)
It is recommended to save results in a log file.
Moreover, if you work with a do file, you can
always get ols results with the do-file.

27
The Stata software

Memory settings
By default, 10 megabytes are available for
database uploading. If a database is greater than
10Mb, STATA does not upload the database. There
are also other limits (matrix size, of
variables) which can be managed using the
commands below.
Useful commands
describe using database_name.dta query
memory clear set memory 500m,
permanently set maxvar n , permanently set
matsize n , permanently set virtual on ,
permanently

28
Data Handling (1) Database creation

1st step Creating a database
Typing data in the database through Data Editor
(edit)
Importing data
insheet myfile.txt , options
options tab comma delimiter("char")
clear names
Importing data from a .txt file
- Without fixed format (without dictionnary)
infile1 var1 var2 var3 using myfile.txt ,
options
- With a fixed format (with dictionnary)
infile2 using mydict.dct , using (myfile.txt)
options

29
DH(2) Database Exploration

2nd step Exploring the Data
To obtain a description of the database
describe varlist, options inspect
varlist codebook varlist, options nmissing
varlist, options npresent varlist, options
To display all possible values of a variable
list varlist if in, options
Example list var1 if var2 gt var3 in 1/100

30
DH(3) Database Organisation

3rd step Organisation of the database
Sorting observations
sort varlist gsort - varlist
Sorting variables
order varlist aorder varlist (If no varlist is
specified, _all is assumed.)
Fusionner plusieurs bases de données (ajouter des
variables)
merge varlist using base1.dta base2.dta,
options
Fusionner plusieurs bases de données (ajouter des
observations)
append using base1.dta base2.dta, options

31
DH(3) Database Organisation

3rd step Organisation of the database
Modifying the shape of the database
reshape long stubnames, i(varlist) j(varlist)
reshape wide stubnames, i(varlist) j(varlist)
i j
id year sex inc ---------------
--------------
i 1 80 0 5000
id sex inc80 inc81 inc82 1 81
0 5500
-------------------------------------- 1
82 0 6000
1 0 5000 5500 6000 2 80
1 2000 Long form
2 1 2000 2200 3300 2 81
1 2200
3 0 3000 2000 1000 2 82
1 3300
3 80 0 3000
Wide form 3 81 0 2000
3 82 0 1000

32
DH(4) Saving, Opening, Exporting

4th step Save and re-use STATA database files
(.dta files)
Changes the working directory to the specified
drive and directory
cd "C\STATA SKEMA"
Saves the database as a STATA file (.dta)
save myfile.dta , replace
Opens a STATA format database (.dta)
use myfile.dta , clear
Exports a database as a txt files
outsheet varlistusing myfile.txt , options
options comma nonames replace

33
Handling Variables

Create a new variable
By assigning a value to it
generate var1 expression if in
Using a predefined function Extensions to
generate
egen var1 fcn(arguments) if in, options
by(varlist)
fcn min max mode mean median sd
total
pctile group count etc
Examples egen mean(salaire) , by(age)
egen group(nom)
egen count(id), by(sector)

34
Handling Variables

Variables modifications and removal
Modifying a variable which has already been
created
replace var1 expression in if
Erasing variables
drop varlist
keep varlist
Erasing observations
drop in if
keep in if
Examples drop if revenu lt 100 keep if age
gt 18

35
Handling Variables

Time series and panel data utilities
Declaring data as time series or panel data
tsset panelvar timevar , options
options daily weekly monthly quarterly
yearly
Exemple tsset id annee , yearly
Using time series operators
Lagged values L. ? L.X Xt-1
L2. ou LL. ? L2.X Xt-2
Forwarded values F. ? F.X Xt1
F2. ou FF. ? F2.X Xt2
Differenced values D. ? D.X Xt - Xt-1
D2. ou DD. ? D2.X Xt - Xt-1 (Xt-1 -
Xt-2 )

36
Descriptive Statistics with STATA
Using log files log using xxx, replace / log
close Defining and using labels label
variable label define label values Descriptive
statistics summarize table table,
content() tabulate Manipulating .dta files and
exporting collapse save as outsheet using...
37
Log files

Log files save the result window. They are useful
when producing descriptives statistics on the
.dta files and on the variables.
log using nom_fichier_log, replace
Instructions STATA
log close
Advantage. Very useful to find back old results
(replication and refutation)
Drawbacks. Tedious to manipulate

38
Labelling variables

Labelling is too often neglected.
No influence on the results
Large influence on correct interpretation of
variables and results
label variable. Describe a variable
label variable asset "real capital"
label define. Define a label
label define firm_type 1 "biotech" 0 "Pharma"
label values Applies the label
label values type firm_type

39
Descriptive statistics summarize

summarize var1 var2....varN
Produces number of obs. means, variance, min and
max
We can add a condition using if
summarize var1 var2 ....varN if condition
We can produce descriptive statistics by subsets
of teh database using bysort
bysort varcat summarize var1 var2 ....varN
Beware Most of the time, you do not need a
comma before if. However, if you get an error
message, there is very high chances that it comes
from the absence of a comma before if.

40
Descriptive statistics table
The table command applies to categorical
variables (string or categorical). table
varcat1 Provide the number of observations by
categories of varcat1 table varcat1
varcat2 Provides a cross table between varcat1
and varcat2 table varcat, content(count var1
mean var1 sd var1...) Provide descriptive
statistics on var1 by categories of varcat
41
Descriptive statistics tabulate
The tabulate command is similar to table, but
obtions are different. tabulate varcat,
gen(varcat_) generates dummy variables for each
category of varcat tabulate varcat1 varcat2,
options Generate measures of associations
between two categorical variables tabulate
varcat1, summarize(var2) Provide descriptive
statistics on var2 by categories of var1
42
Stacking observations collapse

The collapse command produces a new database
which is an aggregation of the old database.
collapse will aggregate lines (observation) by
categories of your choice of a define categorical
variable
collapse (mean)var1 var2 (sum) var3, by(varcat)
Will generate a new database with as many lines
as there are categories of varcat, with 3
variables (means of var1 var2, sum of var3)
collapse (mean)var1 var1 (sd) sdvar1var1
sdvar2var2, by(varcat1 varcat2)
Will generate a new database with as many lines
as there are categories of varcat1 varcat2,
with 3 variables (means of var1 var2, standard
deviation of var1 var2)
Note collapse is interesting to export tables
of results to excel.
Note Please save the old and new database
under different names!

43
Keywords for table collapse
mean means (default) sd
standard deviations sum sums
rawsum sums, ignoring optionally
specified weight count number of
nonmissing observations max
maximums min minimums iqr
interquartile range median
medians p1 1st percentile
p2 2nd percentile ...
3rd-49th percentiles p50 50th
percentile (same as median) ...
51st-97th percentiles p98 98th
percentile p99 99th percentile
44
Graphs

Graphic representations are a very effective
means of synthesis .
Pie graphs, which display proportions of a
population or a sample
Two-way graphs linking any two quantitative
dimensions
Distribution graphs (histograms) which plots
central tendency characteristics and dispersion
of a variable

45
Pie Graphsgraph pie, over(varcat)
46
Two-way Graphs
Two-way graphs link two continuous var1 and
var2. There are several types of two-way graphs
- Line graphs twoway line var1 var2 -
Classical scatterplot twoway scatter var1
var2 - Conencted graphs twoway connected var1
var2
47
Line graphestwoway line var1 var2
twoway line rdi year if name Abbott"
48
Line graphstwoway line var1 var2
twoway (line rdi year if name"Amgen", sort)
(line rdi year if name"Abbott", sort),
legend(on order(1 "Amgen" 2 "Abbott"))
49
Connected graphstwoway connected var1 var2
twoway (connected rdi year if name"Abbott")
50
Scatterplots twoway scatter var1 var2
twoway scatter lrdi lassets
51
Distribution graphs

Distribution graphs plot the distribution of one
quantitative variable var1 at a time by means of
a histogram
On the horizontal axis, classes of var1 are
displayed.
On the vertical axis, the density of each class
is displayed.

52
Distributionnal histogrammes hist var1
hist lassets
53
Kernel distributions
Using kernel, one can get the probability density
function of var1. The probability density
function is important to visually look at the
normality of the distribution. Normal
distributions are also called Gaussian
distribution. These are very frenquently used in
sciences to account for random processes. They
are based on the theory of large numbers and the
central limit theorem.
54
Distribution de kernelkdensity var1
kdensity lassets
55
Exporting Graphs
One can simply copy and paste graph in any
microsoft office software. One can use.do files,
and write graph export graph_name,
asextension options Exemple graph export
SKEMA_rdi.wmf, as(wmf) replace Possible
extensions PostScript (ps), Encapsulated
PostScript (eps), Windows Metafile (wmf), Windows
Enhanced Metafile (emf), Macintosh PICT format
(pict), Acrobat Reader (pdf)
56
SPSS softwareStatistical Package for the Social
Sciences
57
SPSS Opening SPSS
58
SPSS Importing data
59
SPSS Importing data
60
SPSS Importing data

Settings in the import text dialogue box
No predefine format (1)
Delimited (2)
First lines contains the variable names (2)
One observation per line // all observations (3)
Tab delimited only (4)
Finish (6)

61
SPSS windows

SPSS has opens automatically windows
The datasheet window
Observe, manage, modify, create, data
The results window
Everything you do will be stored there
The syntax window can be opened

62
SPSS Data sheet (1)
63
SPSS Data sheet (2)
64
SPSS Result / Journal
65
SPSS Saving data
66
SPSS working, at last!
67
Recoding Variables

Changing existing values to new values
(biotechnologie ? DBF, pharmaceutique ? LDF)

3
1
2
68
Computing New Variables

Taking logarithm (normalization of continuous
variables)

1
2
69
Creating Dummy Variables

Taking logarithm (normalization of continuous
variables)

1
3
2
70
Computation of Descriptive Statistics
1
3
2
71
Descriptive Statistics
72
Splitting Database
1
2
73
Descriptive Statistics (by type)
74
Logarithm

Normalization
Taking the logarithm is a transformation which
usually normalize distribution.
Elasticities http//en.wikipedia.org/wiki/Elastici
ty_(economics)
A change in log of x is a relative change of x
itself.
Cobb-Douglas production function

Write a Comment

User Comments (0)

About PowerShow.com

Quantitative Methods For Social Sciences PowerPoint PPT Presentation