EC5200 Research Methods Lecture 4 Introduction to Stata - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

EC5200 Research Methods Lecture 4 Introduction to Stata

Description:

recode x1 .=0, 1/5=1 (. is missing value (mv)) . replace rate=rate/100. ... Transformations (.gen .recode .replace) Regression (.reg .predict .test) ... – PowerPoint PPT presentation

Number of Views:489
Avg rating:3.0/5.0
Slides: 32
Provided by: ianwa9
Category:

less

Transcript and Presenter's Notes

Title: EC5200 Research Methods Lecture 4 Introduction to Stata


1
EC5200 Research MethodsLecture 4Introduction to
Stata
  • Prof Peter Dolton
  • Room H309
  • Office hours Wed 1200-1300, Thurs 1500-1600
  • ? peter.dolton_at_rhul.ac.uk ? 01784 443378

Slides and other exercises and handouts available
at http//personal.rhul.ac.uk/UQTE/004/EC5200
2
Econometrics Software
  • You can use any software that does what you need
  • we dont care and we dont get commission.
  • See Timberlake for details of what does what well
  • PC Give is hard to beat for time series analysis
  • Microfit, EViews are good alternatives
  • STATA does (just about) everything.
  • STATA (and everything else) is available as a
    delivered application on the network.

3
Buying software
  • ITS gives a good deal on SPSS and PC Give.
  • Stata and Eviews are available form the
    distributor Timberlake.
  • Three varieties of STATA
  • Little STATA 36 for one-year license
  • Medium Intercooled STATA 95 perpetual
    license
  • BIG STATA se 180 perpetual license
  • Documentation Very big - 115
  • Eviews
  • Student edition (10,000 datapoints limit) - 35

4
STATA
  • Use STATA
  • for large survey datasets (and especially merging
    them),
  • for complex nonlinear models (e.g. LDVs)
  • But see also LimDep
  • for nonparametric and evaluation methods
  • if you want to continue studying economics,
  • if you want to be a professional economist,
  • if you want to learn something new,
  • if you hate PC Give.

5
Some useful websites
  • My notes on http//personal.rhul.ac.uk/UQTE/004/EC
    5200
  • RAE website for links to ESDSs Stata for LFS
    and Arnaud Chevaliers tutorials (both v7)
  • Statas own resources for learning STATA
  • Stata website, Stata journal, Stata library,
    Statalist archive
  • http//www.stata.com/links/resources1.html
  • Michigans web-based guide to STATA (for SA)
  • UCLA resources to help you learn and use STATA
  • http//www. ats.ucla.edu/stat/ stata/
  • including movies and web-books

6
Some useful books
  • A Handbook of Statistical Analyses using Stata
    (3rd Ed) S. Rabe-Hesketh B. Everitt, Chapman
    and Hall.
  • Regression Models for Categorical Dependent
    Variables Using Stata (2nd Ed) J. Scott Long and
    J. Freese, Stata Press
  • Longitudinal and Panel Data E. Frees Cambridge
    University Press
  • An Introduction to Survival Analysis Using Stata,
    M. Cleves, W. Gould and R Gutierrez, Stata Press
  • Maximum Likelihood Estimation with Stata W.
    Gould, J. Pitblado and W. Sribney. Stata Press

7
Getting started in STATA 9
  • Start STATA
  • Simply click on icon
  • Stata should open and look (a bit) like this
  • Buttons/menu
  • Review window
  • Results window
  • Command entry window
  • Variables window
  • To exit type
  • . exit, clear

8
Getting help
  • There is extensive on-line help
  • Click on help on menu (or type help in command
    line)
  • Type help xxx for help on the xxx command

9
Click and point in v9
  • Use the menu bar to click and point to most
    commands
  • Then fill in the boxes in the resulting dialog
    box
  • Click on tabs for further options

10
Important features
  • NOTE
  • Always use lowercase in STATA
  • Otherwise you can very confused
  • More
  • When you see --more-- in your output window
    there is more output to come. Press the spacebar
    and the next page appears.
  • Put the command . set more off in your .do file
    to turn this off
  • Break
  • When STATA scrolls output and you want to stop
    it, hit the break (menu button with red cross,
    or hit Ctrl and C simultaneously)
  • Not enough memory
  • . set mem XXXm (resizes STATA to allow XXX mb of
    data)
  • . set matsize XXX (sets the max size of a matrix
    to XXX square)

11
Using data on disk
  • You will usually want to open some dataset
  • Stata expects datasets to be rectangular with
    columns being variables and rows being
    observations
  • Stata datasets have a .dta extension
  • There are several ways of getting data into
    STATA
  • . use myfile (or click on file and then open on
    the menu bar)
  • (opens a stata format file called myfile.dta)
  • . use var1 var2 var3 myfile in 1/1000 if var41
  • (opens myfile but loads only the variables called
    var1, var2 and var3 for the first 1000
    observations but only if var41)
  • . insheet using myfile.csv (or .txt)
  • imports a csv file which Excel can read (or
    imports a text file)

12
Basic data reporting
  • .describe (or press F3 key)
  • Lists the variable names and labels
  • .describe using myfile
  • Lists the variable names etc WITHOUT loading the
    data into memory (useful if the data is too big
    to fit)
  • .codebook
  • Tells you about the means, labels, missing values
    etc
  • sort and count
  • .sort personid
  • sorts data by personid
  • .count if personidpersonid_n-1
  • counts how many unique separate personids
  • _n-1 is the previous observation

13
First look at the data
  • .list var1 var2 var3 in 1/10 if var4gt0
  • Lists the first 10 rows of var1 to var3 for which
    var40
  • .tab x1 x2 (or tabulate)
  • gives a crosstab of x1 vs x2
  • use only if x1 and x2 are integers
  • .summ x1 x2 (or summarize or sum)
  • Gives you the means, std devs etc for x1 and x2
  • .corr x1 x2 in 1/100 if x4lt0 (,cov)
  • correlation coeffs (or covariances) for selected
    data
  • .pwcorr x1 x2 x3 does all pairwise corr
    coeffs

14
Tabulating
  • tab x1 x2 if x40, sum(x3)
  • gives the means of x3 for each cell of the x1 vs
    x2 crosstabulation for observations where x40
    (note )
  • tab x1 x2, missing
  • Includes the missing values
  • tab x1 x2, nolabel
  • Uses numeric codes instead of labels
  • Eg 1 instead of NorthWest etc
  • tab x1 x2, col
  • Gives of column instead of count
  • table educ ethnic, c(mean wage) row col
  • Customises the table so it includes the mean (or
    median or mx or count or sd .) of wage by cells

15
Labelling
  • Always a good idea to have your data
    comprehensively labelled
  • .label data This is pooled GHS 90-99
  • .label variable reg region
  • .lab define reglab 0 North 1 South 2
    Middle
  • .lab values region reglab
  • Tedious to do for lots of variables
  • but then your output will be intelligibly
    labelled
  • other people will be able to understand it in
    future

16
Data manipulation
  • Data can be renamed, recoded, and transformed
  • . gen logrwlog((earn/hours)/rpi)
  • . gen agesqage2 ( raises to the
    power)
  • . gen region1(region1) (returns 1 if true,
    0 if not)
  • . gen ylaggedy _n-1 (_n is the obs
    in STATA)
  • . recode x1 .0, 1/51 (. is missing
    value (mv))
  • . replace raterate/100
  • . replace age25 if age250
  • . egen meanincmean(income), by (region)
  • (see help egen for details)

17
Data selection
  • You can also organise your data set with various
    commands
  • . keep if _nlt1000 ( _n is the observation
    number)
  • . drop region
  • . drop if ethnic1
  • keeps only the first 1000 observations, drops
    region, and drops all the observations where the
    variable ethnic?1 ( is not equal to)
  • Then save the smaller file for subsequent
    analysis
  • . save newfile
  • . save, replace (take care it overwrites
    existing file)

18
Functions
  • Lots of functions are possible.
  • See . help functions
  • Obvious ones like
  • Log(), abs(), int(), round(), sqrt(), min(),
    max(), sum()
  • And many very specialised ones.
  • Statistical functions
  • distributions
  • String functions
  • Converting strings to numbers and vice versa
  • Date functions
  • Converting dates to numbers and vice versa
  • And lots more

19
Running linear regressions
  • Simple regressions are easy
  • . reg logw educ age agesq region1 region2 region3
    sex
  • . reg logw educ age agesq regi if sex1
  • after regi includes all vars beginning with
    regi
  • Make predictions after a regression
  • . predict yhat or . predict e, resid
  • (predict has lots of options that differ across
    models)
  • Test restrictions after a regression
  • . testparm x1 x2, equal or . testparm x35
  • or . testnl (_bvar1 _bvar2 _bvar3)
    (_bconstant0)

20
More on regression
  • You can expand discrete variables into a set of
    dummy variables with the xi prefix before the
    reg command
  • . xi reg logw educ age agesq sex i.regi
  • You can repeat commands for subsets of the data
    according to the value of some discrete variable
    using the by var prefix. E.g.
  • . by region reg logw educ age agesq if sex1
  • You can do IV
  • . ivreg logw educ(myiv) age agesq if sex1, first

21
Graphing your data
  • The graph command is very complex
  • see . help graph for more examples.
  • But the new menu system is a powerful way of
    generating complex graphs.
  • Good idea to save the resulting syntax in your
    .do file
  • You can save graphs
  • Click on Save, graph and choose a filetype and
    name (.gmf is a good one for importing into Word
    documents)
  • You can choose your own default format for graphs
  • Click on Graphics and then Graph preferences

22
Example graph commands
  • Here are some simple examples are
  • . histogram x1, discrete
  • draws a histogram of x1 which is a discrete
    variable
  • . scatter x1 x2
  • draws a scatterplot of x1 against x2 with dots
    for observations
  • . graph pie if degreegt0, over(degree) plabel(_all
    percent) sort
  • draws a pie chart of type of degree for those
    with degrees and labels it with percentages in
    each type
  • . twoway (lfitci logw edage) (scatter logw
    edage) if year99
  • draws a scatter plot of logw vs edage for
    observations in 1999 and superimposes a least
    squares line
  • You can choose a graph format eg The
    Economist
  • . graph display, scheme(economist)
  • or point and click to Graphics then Change
    Scheme/Size

23
Graph terms
ticks
24
Command files
  • More complicated ideas can be implemented as a
    sequence of commands. For example
  • . regress y x1 x2 x3 x4 x5
  • . predict yhat
  • . predict r, resid
  • Stata command files have a .do extension
  • Often you will want to develop ideas and it will
    be handy to collect commands in an editor and
    save as a .do file.
  • Then type . do mycommands.do, nostop
  • (echoes to screen, and keeps going after error
    encountered)
  • Or . run mycommands.do (executes silently)
  • It is ALWAYS good practice to use a .do file
  • So you know exactly what you have done.
  • It makes it easy to develop ideas.
  • And correct mistakes.

25
Keeping track of output
  • STATA allows you to scroll back your screen
  • But better to open a log file at the beginning of
    your session, and close it at the end.
  • Click on file, log, begin . Or type
  • . log using myoutput
  • . Commands
  • . log close
  • log command allows the replace and append
    options.
  • Default is a .smcl file extension (that STATA can
    read)
  • You might prefer to give your own, say, .log
    extension in which case you get an ASCII file
    that anything can edit
  • Logging your output is a good way of developing a
    .do file since it saves the commands as well as
    the output

26
STATA so far (dont type the .s)
  • We have covered the basics of
  • Help (.help xxxx)
  • Data input (.use myoldfile, clear )
  • and output (.save using mynewfile)
  • Data selection (.keep if .drop if)
  • Data inspection (.list .tab .table .sum
    .corr)
  • Labelling (.label)
  • Data management (.sort xxx .by xxx)
  • Transformations (.gen .recode .replace)
  • Regression (.reg .predict .test)
  • Note you can abbreviate commands (except the
    dangerous ones)

27
Example
  • Now practice these commands using the auto.dta
    file
  • Copy the data (dta) file and the command (do)
    file auto.do to your PC
  • Copy one line at a time from the do file, into
    STATAs command box
  • Try to understand the output as you go
  • Try some variations on these commands, and try
    some of your own commands
  • Explore Statas menus using this dataset

28
Merging data - 1
  • One file contains id x1 x2 x3 while another
    contains id x4 x4 x5.
  • You can merge using the key variable that is in
    BOTH files (id)
  • But you need to sort both files first so they are
    in the same order.
  • . use file1
  • . sort id (sorts file1 according to the value if
    the id variable)
  • . save, replace
  • . use file 2
  • . sort id (sorts file2 according to the value if
    the id variable)
  • . merge using file1
  • . drop if _merge3
  • . save file3

29
Merging data - 2
  • For each row (id) all the vars in file1 are added
    to the corresponding row of file2 (if there is
    one).
  • .merge creates a new variable, _merge
  • which has the value 1 for those obs with data
    only in file1, 2 for those only in file2, and 3
    for those in both.
  • So the syntax above drops those observations that
    dont have data in both files
  • and the saves the result containing x1-x6 in
    file3
  • Use .joinby to merge and then drop in one step.
  • Use .append to add more obs on the same vars.

30
Collapsing data (use with care)
  • Collapse converts the data in memory into a
    dataset of means (or sums, medians, etc.)
  • This is useful when you want to provide summary
    information at a higher level of aggregation
  • For example, suppose a dataset contains data on
    individuals say their region and whether they
    are unemployed
  • To find the average unemployment rates across
    regions simply type
  • . collapse unemp, by(region)
  • which leaves one observation for each region
    and one variable the mean unemployment rate.

31
Reshaping files
  • Data may be long but thin
  • for example each record is a household member
  • But there are few variables - say wage and hours
  • Data may be wide but short
  • each record in a household and there are lots of
    variables
  • (eg wage1 wage2 wage3 hours1 hours2 hours3)
  • . reshape long inc ue, i(id) j(year) converts
    from wide to long
  • . reshape wide inc ue, i(id) j(year) converts
    back to wide
  • . reshape long inc, i(id) j(year 80-82 85)
    specifying j() values
  • Handy for merging data together and for panel
    data
Write a Comment
User Comments (0)
About PowerShow.com