Quantitative Methods For Social Sciences - PowerPoint PPT Presentation

1 / 74
About This Presentation
Title:

Quantitative Methods For Social Sciences

Description:

Title: PowerPoint Presentation Last modified by: lionel nesta Created Date: 1/1/1601 12:00:00 AM Document presentation format: Affichage l' cran (4:3) – PowerPoint PPT presentation

Number of Views:529
Avg rating:5.0/5.0
Slides: 75
Provided by: hpGredegC
Category:

less

Transcript and Presenter's Notes

Title: Quantitative Methods For Social Sciences


1
Quantitative MethodsFor Social Sciences
SKEMA Ph.D programme 2010 2011
  • Lionel Nesta
  • Observatoire Français des Conjonctures
    Economiques
  • Lionel.nesta_at_ofce.sciences-po.fr

2
Objective of The Course
  • The objective of the class is to provide students
    with a set of techniques to analyze quantitative
    data. It concerns the application of quantitative
    and statistical approaches as developed in the
    social sciences, for future decision makers,
    policy markers, stake holders, managers, etc.
  • All courses are computer-based classes using the
    STATA statistical package. The objective is to
    reach levels of competence which provide the
    students with skills to both read and understand
    the work of others and to carry out one's own
    research.

3
Examples
  • Rise in biotechnology
  • Should the EU fund fundamental research in
    biotechnology?
  • Has biotechnology increased the productivity of
    firm-level RD?
  • Did it increase the speed of discovery in
    pharmaceutical RD?
  • Increasing university-industry collaborations
  • Does it facilitate innovation by firms?
  • Does it increase the production of new knowledge
    by academics?
  • Does it modify the fundamental/applied nature of
    research?

4
Examples
  • Economic (productivity) Growth
  • Does it come mainly from new firms or improving
    existing firms?
  • Is market selection operating correctly?
  • Why do good firms exit the market?
  • How does the organisation of knowledge impact on
    performance?
  • How do knowledge stock and specialisation impact
    on productivity?
  • How do firms enter into new technological fields?
  • Do firms diversify in new technologies/businesses
    purposively?

5
Structure of the Class
  • Part 1 Descriptive Statistics
  • Part 2 Statistical Inference
  • Part 3 Relationship Between Variables
  • Part 4 Ordinary Least Squares (OLS)
  • Part 5 Extension to OLS
  • Part 6 Qualitative Dependent variables

6
Structure of the Class
  • Part 1 Descriptive Statistics
  • Mean, variance, standard deviation
  • Data management
  • Part 2 Statistical Inference
  • Part 3 Relationship Between Variables
  • Part 4 Ordinary Least Squares (OLS)
  • Part 5 Extension to OLS
  • Part 6 Qualitative Dependent variables

7
Structure of the Class
  • Part 1 Descriptive Statistics
  • Part 2 Statistical Inference
  • Distributions
  • Comparison of means
  • Part 3 Relationship Between Variables
  • Part 4 Ordinary Least Squares (OLS)
  • Part 5 Extension to OLS
  • Part 6 Qualitative Dependent variables

8
Structure of the Class
  • Part 1 Descriptive Statistics
  • Part 2 Statistical Inference
  • Part 3 Relationship Between Variables
  • ANOVA, Chi-Square
  • Correlation
  • Part 4 Ordinary Least Squares (OLS)
  • Part 5 Extension to OLS
  • Part 6 Qualitative Dependent variables

9
Structure of the Class
  • Part 1 Descriptive Statistics
  • Part 2 Statistical Inference
  • Part 3 Relationship Between Variables
  • Part 4 Ordinary Least Squares (OLS)
  • Correlation coefficient, simple regression
  • Multiple regression
  • Part 5 Extension to OLS
  • Part 6 Qualitative Dependent variables

10
Structure of the Class
  • Part 1 Descriptive Statistics
  • Part 2 Statistical Inference
  • Part 3 Relationship Between Variables
  • Part 4 Ordinary Least Squares (OLS)
  • Part 5 Extension to OLS
  • Regressions diagnostics
  • Qualitative explanatory variables
  • Part 6 Qualitative Dependent variables

11
Structure of the Class
  • Part 1 Descriptive Statistics
  • Part 2 Statistical Inference
  • Part 3 Relationship Between Variables
  • Part 4 Ordinary Least Squares (OLS)
  • Part 5 Extension to OLS
  • Part 6 Qualitative Dependent variables
  • Linear probability model
  • Maximum likelihood (logit, probit)

12
Part 1Descriptive Statistics
13
Types of Data
  • Descriptive statistics is the branch of
    statistics which gathers all techniques used to
    describe and summarize quantitative and
    qualitative data.
  • Quantitative data
  • Continuous
  • Measured on a scale (value its the range)
  • The size of the number reflect the amount of the
    variable
  • Age wage, sales height, weight GDP
  • Qualitative data
  • Discrete, categorical
  • The number reflect the category of the variable
  • Type of work gender nationality

14
Descriptive Statistics
  • All means are good to summarize data in a
    synthetic way graphs charts tables.
  • Quantitative data
  • Graphs scatter plots line plots histograms
  • Central tendency
  • Dispersion
  • Qualitative data
  • Graphs pie graphs histograms
  • Tables, frequency, percentage, cumulative
    percentage
  • Cross tables

15
Central Tendency and Dispersion
  • A distribution is an ordered set of numbers
    showing how many times each occurred, from the
    lowest to the highest number or the reverse
  • Central tendency measures of the degree to which
    scores are clustered around the mean of a
    distribution
  • Dispersion measures the fluctuations around the
    characteristics of central tendency
  • In other words, the characteristics of central
    tendency produce stylized facts, when the
    characteristics of dispersion look at the
    representativeness of a given stylized fact.

16
Central Tendency
  • The mode
  • The most frequent score in distribution is called
    the mode.
  • The median
  • The middle value of all observed values, when 50
    of observed value are higher and 50 of observed
    value are lower than the median
  • The mean
  • The sum of all of the values divided by the
    number of value

The mode, the mean and the median ore equal if
and only of the distribution is symmetrical and
unimodal.
17
Dispersion
  • The range
  • Difference between the maximum and minimum values
  • The variance
  • Average of the squared differences between data
    points and the mean (average) quadratic deviation
  • The standard deviation
  • Square root of variance, therefore measures the
    spread of data about the mean, measured in the
    same units as the data

18
Research Productivity in the Bio-pharmaceutical
Industry
  • EU Framework Programme 7

19
Stylised Facts about Modern Biotech
  1. Innovations emerge from uncertain, complex
    processes involving knowledge and markets Roles
    of networks.
  2. Economic value is created in many ways globally
    and in geographical agglomerations
  3. Various linkages exist among diverse actors
    (LDFs, DBFs, Univ, Venture Capital) in innovation
    processes, but the firm plays a particularly
    important role.
  4. Regulations, social structures and institutions
    affect on-going innovation processes as well as
    their impacts on society Importance of IPR.

20
STATA softwareStatistical Package for the Social
Sciences
21
The Stata software
  • Stata Corp, spinoff from Texas AM College
    Station Texas (1985)
  • Among the most widely used programs for
    statistical analysis in social sciences.
  • Probably to most widely used econometric software
    among economists
  • Data management (case selection, file reshaping,
    creating derived data)
  • Features of Stata are accessible via pull-down
    menus
  • The pull-down menu interface generates command
    syntax.

22
The Stata software
  • STATA is a statistical software in constant
    evolution
  • Updates are constantly put on the web available
    to the use of other Stata user (command update
    all)
  • Most are available through the Boston College
    server
  • ssc install module_name, all
  • And hundreds of other can be reached as follows
  • net search key_words
  • net install module_name, all

23
The Stata software
Pull down menus
Review window
Results window
Variable window
Command window
24
The Stata software
  • How to use STATA ?
  • Using pull-down menus
  • Typing STATA instructions in the Command window
  • Grouping a series of STATA instructions in a .do
    files
  • Programming new functions (.ado files)
  • Programming new functions with a powerful matrix
    language (MATA) similar to C (Version 9.0 of
    STATA onwards)

25
The Stata software
  • All STATA commands used from the menu or the
    command window are automatically stored in the
    Review window
  • At the end of a session, the review window can
    then be saved by right-clicking on it
  • save all under a .Do-file
  • Send to do-file editor A new window opens up.
  • A Do-file is a text file containing a list of
    STATA commands which will be executed step by
    step by STATA.
  • It is recommended to explore results and methods
    with the command window. Once the methods are
    settled, save the series of commands as a do-file.

26
The Stata software
  • All STATA results are displayed in the Result
    window
  • This window is a buffer. Once it disappears from
    the screen, it is deleted. That is why you may
    want to record results.
  • log using log_name.txt (beginning of a session)
  • log close (end of a session)
  • It is recommended to save results in a log file.
    Moreover, if you work with a do file, you can
    always get ols results with the do-file.

27
The Stata software
  • Memory settings
  • By default, 10 megabytes are available for
    database uploading. If a database is greater than
    10Mb, STATA does not upload the database. There
    are also other limits (matrix size, of
    variables) which can be managed using the
    commands below.
  • Useful commands
  • describe using database_name.dta query
    memory clear set memory 500m,
    permanently set maxvar n , permanently set
    matsize n , permanently set virtual on ,
    permanently

28
Data Handling (1) Database creation
  • 1st step Creating a database
  • Typing data in the database through Data Editor
    (edit)
  • Importing data
  • insheet myfile.txt , options
  • options tab comma delimiter("char")
    clear names
  • Importing data from a .txt file
  • - Without fixed format (without dictionnary)
  • infile1 var1 var2 var3 using myfile.txt ,
    options
  • - With a fixed format (with dictionnary)
  • infile2 using mydict.dct , using (myfile.txt)
    options

29
DH(2) Database Exploration
  • 2nd step Exploring the Data
  • To obtain a description of the database
  • describe varlist, options inspect
    varlist codebook varlist, options nmissing
    varlist, options npresent varlist, options
  • To display all possible values of a variable
  • list varlist if in, options
  • Example list var1 if var2 gt var3 in 1/100

30
DH(3) Database Organisation
  • 3rd step Organisation of the database
  • Sorting observations
  • sort varlist gsort - varlist
  • Sorting variables
  • order varlist aorder varlist (If no varlist is
    specified, _all is assumed.)
  • Fusionner plusieurs bases de données (ajouter des
    variables)
  • merge varlist using base1.dta base2.dta,
    options
  • Fusionner plusieurs bases de données (ajouter des
    observations)
  • append using base1.dta base2.dta, options

31
DH(3) Database Organisation
  • 3rd step Organisation of the database
  • Modifying the shape of the database
  • reshape long stubnames, i(varlist) j(varlist)
  • reshape wide stubnames, i(varlist) j(varlist)
    i j
  • id year sex inc ---------------
    --------------
  • i 1 80 0 5000
  • id sex inc80 inc81 inc82 1 81
    0 5500
  • -------------------------------------- 1
    82 0 6000
  • 1 0 5000 5500 6000 2 80
    1 2000 Long form
  • 2 1 2000 2200 3300 2 81
    1 2200
  • 3 0 3000 2000 1000 2 82
    1 3300
  • 3 80 0 3000
  • Wide form 3 81 0 2000

  • 3 82 0 1000



32
DH(4) Saving, Opening, Exporting
  • 4th step Save and re-use STATA database files
    (.dta files)
  • Changes the working directory to the specified
    drive and directory
  • cd "C\STATA SKEMA"
  • Saves the database as a STATA file (.dta)
  • save myfile.dta , replace
  • Opens a STATA format database (.dta)
  • use myfile.dta , clear
  • Exports a database as a txt files
  • outsheet varlistusing myfile.txt , options
  • options comma nonames replace

33
Handling Variables
  • Create a new variable
  • By assigning a value to it
  • generate var1 expression if in
  • Using a predefined function Extensions to
    generate
  • egen var1 fcn(arguments) if in, options
    by(varlist)
  • fcn min max mode mean median sd
    total
  • pctile group count etc
  • Examples egen mean(salaire) , by(age)
  • egen group(nom)
  • egen count(id), by(sector)

34
Handling Variables
  • Variables modifications and removal
  • Modifying a variable which has already been
    created
  • replace var1 expression in if
  • Erasing variables
  • drop varlist
  • keep varlist
  • Erasing observations
  • drop in if
  • keep in if
  • Examples drop if revenu lt 100 keep if age
    gt 18

35
Handling Variables
  • Time series and panel data utilities
  • Declaring data as time series or panel data
  • tsset panelvar timevar , options
  • options daily weekly monthly quarterly
    yearly
  • Exemple tsset id annee , yearly
  • Using time series operators
  • Lagged values L. ? L.X Xt-1
    L2. ou LL. ? L2.X Xt-2
  • Forwarded values F. ? F.X Xt1
    F2. ou FF. ? F2.X Xt2
  • Differenced values D. ? D.X Xt - Xt-1
    D2. ou DD. ? D2.X Xt - Xt-1 (Xt-1 -
    Xt-2 )

36
Descriptive Statistics with STATA
Using log files log using xxx, replace / log
close Defining and using labels label
variable label define label values Descriptive
statistics summarize table table,
content() tabulate Manipulating .dta files and
exporting collapse save as outsheet using...
37
Log files
  • Log files save the result window. They are useful
    when producing descriptives statistics on the
    .dta files and on the variables.
  • log using nom_fichier_log, replace
  • Instructions STATA
  • log close
  • Advantage. Very useful to find back old results
    (replication and refutation)
  • Drawbacks. Tedious to manipulate

38
Labelling variables
  • Labelling is too often neglected.
  • No influence on the results
  • Large influence on correct interpretation of
    variables and results
  • label variable. Describe a variable
  • label variable asset "real capital"
  • label define. Define a label
  • label define firm_type 1 "biotech" 0 "Pharma"
  • label values Applies the label
  • label values type firm_type

39
Descriptive statistics summarize
  • summarize var1 var2....varN
  • Produces number of obs. means, variance, min and
    max
  • We can add a condition using if
  • summarize var1 var2 ....varN if condition
  • We can produce descriptive statistics by subsets
    of teh database using bysort
  • bysort varcat summarize var1 var2 ....varN
  • Beware Most of the time, you do not need a
    comma before if. However, if you get an error
    message, there is very high chances that it comes
    from the absence of a comma before if.

40
Descriptive statistics table
The table command applies to categorical
variables (string or categorical). table
varcat1 Provide the number of observations by
categories of varcat1 table varcat1
varcat2 Provides a cross table between varcat1
and varcat2 table varcat, content(count var1
mean var1 sd var1...) Provide descriptive
statistics on var1 by categories of varcat
41
Descriptive statistics tabulate
The tabulate command is similar to table, but
obtions are different. tabulate varcat,
gen(varcat_) generates dummy variables for each
category of varcat tabulate varcat1 varcat2,
options Generate measures of associations
between two categorical variables tabulate
varcat1, summarize(var2) Provide descriptive
statistics on var2 by categories of var1
42
Stacking observations collapse
  • The collapse command produces a new database
    which is an aggregation of the old database.
  • collapse will aggregate lines (observation) by
    categories of your choice of a define categorical
    variable
  • collapse (mean)var1 var2 (sum) var3, by(varcat)
  • Will generate a new database with as many lines
    as there are categories of varcat, with 3
    variables (means of var1 var2, sum of var3)
  • collapse (mean)var1 var1 (sd) sdvar1var1
    sdvar2var2, by(varcat1 varcat2)
  • Will generate a new database with as many lines
    as there are categories of varcat1 varcat2,
    with 3 variables (means of var1 var2, standard
    deviation of var1 var2)
  • Note collapse is interesting to export tables
    of results to excel.
  • Note Please save the old and new database
    under different names!

43
Keywords for table collapse
mean means (default) sd
standard deviations sum sums
rawsum sums, ignoring optionally
specified weight count number of
nonmissing observations max
maximums min minimums iqr
interquartile range median
medians p1 1st percentile
p2 2nd percentile ...
3rd-49th percentiles p50 50th
percentile (same as median) ...
51st-97th percentiles p98 98th
percentile p99 99th percentile
44
Graphs
  • Graphic representations are a very effective
    means of synthesis .
  • Pie graphs, which display proportions of a
    population or a sample
  • Two-way graphs linking any two quantitative
    dimensions
  • Distribution graphs (histograms) which plots
    central tendency characteristics and dispersion
    of a variable

45
Pie Graphsgraph pie, over(varcat)
46
Two-way Graphs
Two-way graphs link two continuous var1 and
var2. There are several types of two-way graphs
- Line graphs twoway line var1 var2 -
Classical scatterplot twoway scatter var1
var2 - Conencted graphs twoway connected var1
var2
47
Line graphestwoway line var1 var2
twoway line rdi year if name Abbott"
48
Line graphstwoway line var1 var2
twoway (line rdi year if name"Amgen", sort)
(line rdi year if name"Abbott", sort),
legend(on order(1 "Amgen" 2 "Abbott"))
49
Connected graphstwoway connected var1 var2
twoway (connected rdi year if name"Abbott")
50
Scatterplots twoway scatter var1 var2
twoway scatter lrdi lassets
51
Distribution graphs
  • Distribution graphs plot the distribution of one
    quantitative variable var1 at a time by means of
    a histogram
  • On the horizontal axis, classes of var1 are
    displayed.
  • On the vertical axis, the density of each class
    is displayed.

52
Distributionnal histogrammes hist var1
hist lassets
53
Kernel distributions
Using kernel, one can get the probability density
function of var1. The probability density
function is important to visually look at the
normality of the distribution. Normal
distributions are also called Gaussian
distribution. These are very frenquently used in
sciences to account for random processes. They
are based on the theory of large numbers and the
central limit theorem.
54
Distribution de kernelkdensity var1
kdensity lassets
55
Exporting Graphs
One can simply copy and paste graph in any
microsoft office software. One can use.do files,
and write graph export graph_name,
asextension options Exemple graph export
SKEMA_rdi.wmf, as(wmf) replace Possible
extensions PostScript (ps), Encapsulated
PostScript (eps), Windows Metafile (wmf), Windows
Enhanced Metafile (emf), Macintosh PICT format
(pict), Acrobat Reader (pdf)
56
SPSS softwareStatistical Package for the Social
Sciences
57
SPSS Opening SPSS
58
SPSS Importing data
59
SPSS Importing data
60
SPSS Importing data
  • Settings in the import text dialogue box
  • No predefine format (1)
  • Delimited (2)
  • First lines contains the variable names (2)
  • One observation per line // all observations (3)
  • Tab delimited only (4)
  • Finish (6)

61
SPSS windows
  • SPSS has opens automatically windows
  • The datasheet window
  • Observe, manage, modify, create, data
  • The results window
  • Everything you do will be stored there
  • The syntax window can be opened

62
SPSS Data sheet (1)
63
SPSS Data sheet (2)
64
SPSS Result / Journal
65
SPSS Saving data
66
SPSS working, at last!
67
Recoding Variables
  • Changing existing values to new values
    (biotechnologie ? DBF, pharmaceutique ? LDF)

3
1
2
68
Computing New Variables
  • Taking logarithm (normalization of continuous
    variables)

1
2
69
Creating Dummy Variables
  • Taking logarithm (normalization of continuous
    variables)

1
3
2
70
Computation of Descriptive Statistics
1
3
2
71
Descriptive Statistics
72
Splitting Database
1
2
73
Descriptive Statistics (by type)
74
Logarithm
  • Normalization
  • Taking the logarithm is a transformation which
    usually normalize distribution.
  • Elasticities http//en.wikipedia.org/wiki/Elastici
    ty_(economics)
  • A change in log of x is a relative change of x
    itself.
  • Cobb-Douglas production function
Write a Comment
User Comments (0)
About PowerShow.com