EC5200 Research Methods Lecture 4 Introduction to Stata - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

EC5200 Research Methods Lecture 4 Introduction to Stata

Description:

recode x1 .=0, 1/5=1 (. is missing value (mv)) . replace rate=rate/100. ... Transformations (.gen .recode .replace) Regression (.reg .predict .test) ... – PowerPoint PPT presentation

Number of Views:492

Avg rating:3.0/5.0

Slides: 32

Provided by: ianwa9

Category:

more less

Transcript and Presenter's Notes

Title: EC5200 Research Methods Lecture 4 Introduction to Stata

1
EC5200 Research MethodsLecture 4Introduction to
Stata

Prof Peter Dolton
Room H309
Office hours Wed 1200-1300, Thurs 1500-1600
? peter.dolton_at_rhul.ac.uk ? 01784 443378

Slides and other exercises and handouts available
at http//personal.rhul.ac.uk/UQTE/004/EC5200
2
Econometrics Software

You can use any software that does what you need
we dont care and we dont get commission.
See Timberlake for details of what does what well
PC Give is hard to beat for time series analysis
Microfit, EViews are good alternatives
STATA does (just about) everything.
STATA (and everything else) is available as a
delivered application on the network.

3
Buying software

ITS gives a good deal on SPSS and PC Give.
Stata and Eviews are available form the
distributor Timberlake.
Three varieties of STATA
Little STATA 36 for one-year license
Medium Intercooled STATA 95 perpetual
license
BIG STATA se 180 perpetual license
Documentation Very big - 115
Eviews
Student edition (10,000 datapoints limit) - 35

4
STATA

Use STATA
for large survey datasets (and especially merging
them),
for complex nonlinear models (e.g. LDVs)
But see also LimDep
for nonparametric and evaluation methods
if you want to continue studying economics,
if you want to be a professional economist,
if you want to learn something new,
if you hate PC Give.

5
Some useful websites

My notes on http//personal.rhul.ac.uk/UQTE/004/EC
5200
RAE website for links to ESDSs Stata for LFS
and Arnaud Chevaliers tutorials (both v7)
Statas own resources for learning STATA
Stata website, Stata journal, Stata library,
Statalist archive
http//www.stata.com/links/resources1.html
Michigans web-based guide to STATA (for SA)
UCLA resources to help you learn and use STATA
http//www. ats.ucla.edu/stat/ stata/
including movies and web-books

6
Some useful books

A Handbook of Statistical Analyses using Stata
(3rd Ed) S. Rabe-Hesketh B. Everitt, Chapman
and Hall.
Regression Models for Categorical Dependent
Variables Using Stata (2nd Ed) J. Scott Long and
J. Freese, Stata Press
Longitudinal and Panel Data E. Frees Cambridge
University Press
An Introduction to Survival Analysis Using Stata,
M. Cleves, W. Gould and R Gutierrez, Stata Press
Maximum Likelihood Estimation with Stata W.
Gould, J. Pitblado and W. Sribney. Stata Press

7
Getting started in STATA 9

Start STATA
Simply click on icon
Stata should open and look (a bit) like this
Buttons/menu
Review window
Results window
Command entry window
Variables window
To exit type
. exit, clear

8
Getting help

There is extensive on-line help
Click on help on menu (or type help in command
line)
Type help xxx for help on the xxx command

9
Click and point in v9

Use the menu bar to click and point to most
commands
Then fill in the boxes in the resulting dialog
box
Click on tabs for further options

10
Important features

NOTE
Always use lowercase in STATA
Otherwise you can very confused
More
When you see --more-- in your output window
there is more output to come. Press the spacebar
and the next page appears.
Put the command . set more off in your .do file
to turn this off
Break
When STATA scrolls output and you want to stop
it, hit the break (menu button with red cross,
or hit Ctrl and C simultaneously)
Not enough memory
. set mem XXXm (resizes STATA to allow XXX mb of
data)
. set matsize XXX (sets the max size of a matrix
to XXX square)

11
Using data on disk

You will usually want to open some dataset
Stata expects datasets to be rectangular with
columns being variables and rows being
observations
Stata datasets have a .dta extension
There are several ways of getting data into
STATA
. use myfile (or click on file and then open on
the menu bar)
(opens a stata format file called myfile.dta)
. use var1 var2 var3 myfile in 1/1000 if var41
(opens myfile but loads only the variables called
var1, var2 and var3 for the first 1000
observations but only if var41)
. insheet using myfile.csv (or .txt)
imports a csv file which Excel can read (or
imports a text file)

12
Basic data reporting

.describe (or press F3 key)
Lists the variable names and labels
.describe using myfile
Lists the variable names etc WITHOUT loading the
data into memory (useful if the data is too big
to fit)
.codebook
Tells you about the means, labels, missing values
etc
sort and count
.sort personid
sorts data by personid
.count if personidpersonid_n-1
counts how many unique separate personids
_n-1 is the previous observation

13
First look at the data

.list var1 var2 var3 in 1/10 if var4gt0
Lists the first 10 rows of var1 to var3 for which
var40
.tab x1 x2 (or tabulate)
gives a crosstab of x1 vs x2
use only if x1 and x2 are integers
.summ x1 x2 (or summarize or sum)
Gives you the means, std devs etc for x1 and x2
.corr x1 x2 in 1/100 if x4lt0 (,cov)
correlation coeffs (or covariances) for selected
data
.pwcorr x1 x2 x3 does all pairwise corr
coeffs

14
Tabulating

tab x1 x2 if x40, sum(x3)
gives the means of x3 for each cell of the x1 vs
x2 crosstabulation for observations where x40
(note )
tab x1 x2, missing
Includes the missing values
tab x1 x2, nolabel
Uses numeric codes instead of labels
Eg 1 instead of NorthWest etc
tab x1 x2, col
Gives of column instead of count
table educ ethnic, c(mean wage) row col
Customises the table so it includes the mean (or
median or mx or count or sd .) of wage by cells

15
Labelling

Always a good idea to have your data
comprehensively labelled
.label data This is pooled GHS 90-99
.label variable reg region
.lab define reglab 0 North 1 South 2
Middle
.lab values region reglab
Tedious to do for lots of variables
but then your output will be intelligibly
labelled
other people will be able to understand it in
future

16
Data manipulation

Data can be renamed, recoded, and transformed
. gen logrwlog((earn/hours)/rpi)
. gen agesqage2 ( raises to the
power)
. gen region1(region1) (returns 1 if true,
0 if not)
. gen ylaggedy _n-1 (_n is the obs
in STATA)
. recode x1 .0, 1/51 (. is missing
value (mv))
. replace raterate/100
. replace age25 if age250
. egen meanincmean(income), by (region)
(see help egen for details)

17
Data selection

You can also organise your data set with various
commands
. keep if _nlt1000 ( _n is the observation
number)
. drop region
. drop if ethnic1
keeps only the first 1000 observations, drops
region, and drops all the observations where the
variable ethnic?1 ( is not equal to)
Then save the smaller file for subsequent
analysis
. save newfile
. save, replace (take care it overwrites
existing file)

18
Functions

Lots of functions are possible.
See . help functions
Obvious ones like
Log(), abs(), int(), round(), sqrt(), min(),
max(), sum()
And many very specialised ones.
Statistical functions
distributions
String functions
Converting strings to numbers and vice versa
Date functions
Converting dates to numbers and vice versa
And lots more

19
Running linear regressions

Simple regressions are easy
. reg logw educ age agesq region1 region2 region3
sex
. reg logw educ age agesq regi if sex1
after regi includes all vars beginning with
regi
Make predictions after a regression
. predict yhat or . predict e, resid
(predict has lots of options that differ across
models)
Test restrictions after a regression
. testparm x1 x2, equal or . testparm x35
or . testnl (_bvar1 _bvar2 _bvar3)
(_bconstant0)

20
More on regression

You can expand discrete variables into a set of
dummy variables with the xi prefix before the
reg command
. xi reg logw educ age agesq sex i.regi
You can repeat commands for subsets of the data
according to the value of some discrete variable
using the by var prefix. E.g.
. by region reg logw educ age agesq if sex1
You can do IV
. ivreg logw educ(myiv) age agesq if sex1, first

21
Graphing your data

The graph command is very complex
see . help graph for more examples.
But the new menu system is a powerful way of
generating complex graphs.
Good idea to save the resulting syntax in your
.do file
You can save graphs
Click on Save, graph and choose a filetype and
name (.gmf is a good one for importing into Word
documents)
You can choose your own default format for graphs
Click on Graphics and then Graph preferences

22
Example graph commands

Here are some simple examples are
. histogram x1, discrete
draws a histogram of x1 which is a discrete
variable
. scatter x1 x2
draws a scatterplot of x1 against x2 with dots
for observations
. graph pie if degreegt0, over(degree) plabel(_all
percent) sort
draws a pie chart of type of degree for those
with degrees and labels it with percentages in
each type
. twoway (lfitci logw edage) (scatter logw
edage) if year99
draws a scatter plot of logw vs edage for
observations in 1999 and superimposes a least
squares line
You can choose a graph format eg The
Economist
. graph display, scheme(economist)
or point and click to Graphics then Change
Scheme/Size

23
Graph terms
ticks
24
Command files

More complicated ideas can be implemented as a
sequence of commands. For example
. regress y x1 x2 x3 x4 x5
. predict yhat
. predict r, resid
Stata command files have a .do extension
Often you will want to develop ideas and it will
be handy to collect commands in an editor and
save as a .do file.
Then type . do mycommands.do, nostop
(echoes to screen, and keeps going after error
encountered)
Or . run mycommands.do (executes silently)
It is ALWAYS good practice to use a .do file
So you know exactly what you have done.
It makes it easy to develop ideas.
And correct mistakes.

25
Keeping track of output

STATA allows you to scroll back your screen
But better to open a log file at the beginning of
your session, and close it at the end.
Click on file, log, begin . Or type
. log using myoutput
. Commands
. log close
log command allows the replace and append
options.
Default is a .smcl file extension (that STATA can
read)
You might prefer to give your own, say, .log
extension in which case you get an ASCII file
that anything can edit
Logging your output is a good way of developing a
.do file since it saves the commands as well as
the output

26
STATA so far (dont type the .s)

We have covered the basics of
Help (.help xxxx)
Data input (.use myoldfile, clear )
and output (.save using mynewfile)
Data selection (.keep if .drop if)
Data inspection (.list .tab .table .sum
.corr)
Labelling (.label)
Data management (.sort xxx .by xxx)
Transformations (.gen .recode .replace)
Regression (.reg .predict .test)
Note you can abbreviate commands (except the
dangerous ones)

27
Example

Now practice these commands using the auto.dta
file
Copy the data (dta) file and the command (do)
file auto.do to your PC
Copy one line at a time from the do file, into
STATAs command box
Try to understand the output as you go
Try some variations on these commands, and try
some of your own commands
Explore Statas menus using this dataset

28
Merging data - 1

One file contains id x1 x2 x3 while another
contains id x4 x4 x5.
You can merge using the key variable that is in
BOTH files (id)
But you need to sort both files first so they are
in the same order.
. use file1
. sort id (sorts file1 according to the value if
the id variable)
. save, replace
. use file 2
. sort id (sorts file2 according to the value if
the id variable)
. merge using file1
. drop if _merge3
. save file3

29
Merging data - 2

For each row (id) all the vars in file1 are added
to the corresponding row of file2 (if there is
one).
.merge creates a new variable, _merge
which has the value 1 for those obs with data
only in file1, 2 for those only in file2, and 3
for those in both.
So the syntax above drops those observations that
dont have data in both files
and the saves the result containing x1-x6 in
file3
Use .joinby to merge and then drop in one step.
Use .append to add more obs on the same vars.

30
Collapsing data (use with care)

Collapse converts the data in memory into a
dataset of means (or sums, medians, etc.)
This is useful when you want to provide summary
information at a higher level of aggregation
For example, suppose a dataset contains data on
individuals say their region and whether they
are unemployed
To find the average unemployment rates across
regions simply type
. collapse unemp, by(region)
which leaves one observation for each region
and one variable the mean unemployment rate.

31
Reshaping files