Title: EC5200 Research Methods Lecture 2 Data for Research
1EC5200 Research MethodsLecture 2Data for
Research
- Prof Peter Dolton
- Room H309
- Office hours Wed 1200-1300, Thurs 1500-1600
- ? peter.dolton_at_rhul.ac.uk ? 01784 443378
Slides and other exercises and handouts available
at http//personal.rhul.ac.uk/UQTE/004/EC5200
2Data Types
- Cross Section different units (households,
countries, states etc) at the same point in time. - Time Series same unit over time (years,
quarters, months etc) - Longitudinal/Panel same obs at different points
in time at least 2. - May involve admin data merging
- Main feature of (prospective) Longitudinal data
is that allows us to track change and development.
3What is a Good Data set?
- Depends on
- Question you want to answer.
- Types of data cross section, time series,
panel data. - Availability of Sources
- Econometric considerations sample size issues
min of 40 obs for time series to test
stationarity. min of 200 or so for cross
section. - Availability of Controls/Regressors
4How do I get Data?
- Either
- Electronically from source.
- Input it from available published data books
- Courtesy of Advisor?!?!?! LUCKY YOU
- Own Survey Be very careful with this!!!!
5I will focus on Electronic Sources in this
lecture.
6Useful general sources of data
- See RM slides and notes to start you off.
- The WWW is the best place to look
- Best one-stop site for links
- Economics Learning and Teaching Support Network
(LTSN) - Am Stat Assoc Data-Links
- Bill Goffes Resources for Economists on the
Internet - provides good links to US data and many other
things besides. - Use Google
- Replication data from Journal of Applied
Econometrics. - Other link websites JAE
7Sources of data MetaLib
- Library resources
- MetaLib website gives good advice and guidance
on accessing electronic databases. To use you
will need your library barcode number and your
library PIN. Through this service you can get
access to LOADS of data and downloads for free.
E.g. ATHENS - MetaLib demo on finding resources also drop-in
sessions and information sheet.
8Understanding data
- You will be expected to show an understanding of
the limitations and weaknesses of the data. - An excellent book on survey data use is A.S.
Deaton - development (especially World Bank's own LSMS
data) - Always find out your data has been constructed
- See ONS for guides, themes, and Virtual
bookshelf - Check the questionnaire and the question
routing - READ THE DOCUMENTATION
- But the best way to understand your data is to
- Graph it .graph x
- Tab it .tab x
- Crosstab it .tab x y
9UK data - look here first
- ONS
- try their NAVIData facility for simple time
series data series (guide) - or visit the ESDS (Data Archive) for a bespoke
service. - Neighbourhood Statistics and lots more
- Question Bank
- Surrey search engine for survey datasets
- Data Archive
- has a great engine for searching their extensive
archives - DA and MIMAS now merged into ESDS
- NESSTAR service
- Some datasets can be accessed directly on-line
- allows you to browse through the contents of the
main surveys - download your own customised datasets in your
chosen format. - CASWEB service
- Census Area Statistics on the WEB
10Registering on UKDA/ESDS
- You dont have to login or register to browse
- Athens usually works at ESDS. Use the link that
appears in the purple box right on every ESDS
page - If yours doesnt work see Athens help on the ESDS
web site.
11Using UKDA/ESDS
- You can find datasets using
- Search catalogue
- Major studies
- Geographic focus.
- Download/order to add the datasets to your
shopping basket - At this point you HAVE to login to Athens via
MetaLib - Register a new use of data.
- Give a brief description of how you intend to use
the data - Download, request for download, request on CD.
12UK government sources
- Government departments
- Here are just a few
- DWP (ASD)
- Lots of admin statistics caseloads of JSA etc
- Tax credits (and child benefit) has transferred
to HMRC) - FRS - Family Resources Survey
- FaCS - Family and Children Survey
- Dept for Education and Skills (Statistics)
- League tables, absences, free school meals,
HEFCE, etc - Home Office (RDS)
- Motoring offences, firearm use, immigration,
British Crime Survey. - Digest 4
- One hundred years of crime
13US data look here first
- Freedom of Information Act
- "public use" datasets available without
restriction just click - make sure its what you want,
- that you have somewhere to store it
- and that you're not paying for the connection
before you click! - The US Data Archive is ICPSR
- DAS - their equivalent to the NESSTAR
- Does data analysis (including quite sophisticated
stuff) on-line. - Fed Stats gateway to US government statistics
- US government agencies
- usually have a www.xxxx.gov address where xxxx is
something obvious - like FBI. - RFE
14US Census Bureau
- The Census Bureau is the best place to find links
to US statistics. - Survey datasets can be downloaded
- go for public use files since these are freely
available without registration procedures - FERRET service
- click and 400mb of survey data can be yours in 1
minute - choose the DOS/Windows compression option
- a great deal longer if you access over a modem.
- Also allows you to construct a subset of a large
dataset and just download the desired data.
15EU data look here first
- A clickable map of data archives around the EU
and elsewhere can be found at the Data Archive. - A number of EU wide (and beyond) datasets can be
found here too. - ECB where their Monthly Bulletin data can be
downloaded - in old-fashioned .csv format that Excel and some
statistics packages can read - EUROSTAT official EU statistics agency
- Much improved
- Check out the EU DGs for specific topics
- Start with Europa
16Anywhere else look here first
- A worldwide guide to data archives can be found
in Norway - CESSDA (the Council of Europe Social Science Data
Archives). - Most foreign archive sites have (sometimes
limited) English language versions. - ESDS is on the case
- Besides UK time-series
- OECD Main Economic Indicators, Eurobarometer,
European Social Survey, ISSP, and many many more
17International agencies
- International agencies often have data in the own
specialised areas - World Bank - LSMS surveys of poverty and more.
- IMF hosts the World Economic Outlook databases
and much much more. - OECD has lots of cross country data on education,
health and several other subjects. - UNESCO has a huge site
- ILO specialises in labour issues .
- IZA Labour market data
18Warning
- Almost all data is available electronically
- But putting together your own data from the
original sample surveys can be very hard work - STATA format selected datasets for illustrative
purposes can be accessed via your N drive - highly selected datasets that have only a subset
of the variables and cases available so they may
not be directly suitable for your problem - Moreover, even if the data is nicely rectangular
it will still need lots of understanding - Even simple time series data will require some
understanding - Rebasing, definition changes ..
- Work methodically
- keep log and command files
- Keep backups
19Many time series available
- General UK from ESDS and ONS
- Get Search NS information first
- ONS
- See Contents of Time Series Data
- Download NAVIDATA
- Or download text files
- Attend the hands-on training session
- Monetary data from Bank of England
- Online Statistical Interactive Database and links
- OECD data from ESDS
- Get OECD MEI information first
- then access via ESDSs Beyond 20/20 interface
- Attend the hands-on training session
20Many survey datasets
- We next examine some example data on themes
- General, Labour market, Finance, Travel, Health,
Crime . - Much of this data is freely available at the Data
Archive and elsewhere - See QuestionBank at Surrey
- WARNING Most survey datasets are large.
- Make sure you have space to store it.
- See me if you need large temporary diskspace
- Make sure you have the memory to load it into RAM
for analysis - Many campus PCs have CD writers
- Only XP PCs have USB support
- Most are 256mb or higher
- It MAY be better to download a subset of the data
- Think VERY CAREFULLY about what data you need
before you act
21GHS - General Household Survey
- Annual (since the early 60s except 96 and 99) .
- Details of content at ONS and QB at Surrey
- Many official uses.
- Around 10k households a year.
- Revamped in 2000
- Income, labour market status, education and much
more every year. - Some questions rotate alternate years smoking,
health, etc. - Each year also features a specific topic
- caring for the elderly, contraceptive practice,
leisure activities, social capital, etc. feature
only now and again. - Annual report (Living in Britain) downloadable
from ONS - See Appendix (G) to Annual Reports (Library or
online) for variables list - NI version called CHS (it includes religion as
well as the usual GHS questions).
22FES - Family Expenditure Surveys
- Annual (since the early 60s) but most easily
obtainable (and mostly comparable) since 1978. - Details at ONS . Data Archive keep the data.
- Around 7k households a year.
- Data is UK. Covers incomes, labour market status,
and extremely detailed expenditure patterns. - Internet shopping, use of credit cards
- Very stable over time (but take care with the
definitions of a few variables like marital
status). - Reinvented as Expenditure and Food Survey in 2001
- FES annual reports (most recently that have been
called Family Spending in Britain). - There is a Northern Ireland FES (called NIFES)
from which ONS draws a small subset to add to the
GB FES to make it UK.
23FRS - Family Resources Survey
- Annual since 1993.
- Managed by DWP. Details at QB at Surrey
- Around 25k households a year.
- Data is UK since 02/3, GB before that.
- Covers very detailed incomes, labour market
status. - Wealth and savings information OK.
- Also covers childcare.
- Used by DWP/HMT for tax/welfare modelling
- See also IFS for history of tax/welfare system
- Annual reports and HBAI at DWP website
24FACS Families and children survey
- Maintained by DWP
- Originally the WFTC evaluation dataset
- Before and after labour supply etc
- All lone parents plus poor couples
- Continued since WFTCs demise
- Now includes non-poor couples too
- Panel of about 8000 familiess
- Now 5 waves available
- Lots of work, wage, family background info
- Plus deprivation, childcare, attitudes and
awareness
25LFS - Labour Force Surveys
- Managed by ONS. See QB.
- Every 2 years from 1984, annual since 1991, and
rotating 5-wave quarterly panel design from 1993 - Earnings in wave 1 collected from 1993
- Earnings in waves 1 and 5 from 97.
- Data is GB. Also contains education (even
including type of degree) and employment
information. - Enormous size so good for looking at minority
groups like economics graduates, or black
women, etc. compared to other groups like
sociology students and white women. - ESDS for more details.
26BSA - British Social Attitudes
- Small but long running annual survey on peoples
values - Brief details at ONS and QB
- core set of questions on income (in bands),
education etc. - vast variety of attitudinal questions that
differs across years. - Attitudes to public good provision for example
- BSA is the UK contribution to the ISSP (subset is
on your N drive) - similar data is available for many other
countries. - Possible to order all the data for many years
from ESDS.
27BHPS - British Household Panel Study
- Panel data of 5000 households followed over
time (13 years) - control for endogeneity - unobservable fixed
effects . - extensive information about economic variables
- lots of social background information.
- booster samples for recent Scots and Welsh.
- Details at QB
- Maintained by the ISER at Essex
- extensive details of work already carried out
using the data. - 70mb of docs !
- Data available at Data Archive
- Also bundled by Cornell into CNEFs with GSOEP and
the US PSID
28NCDS National Child Development StudyBCS70
British Cohort Study 1970
- Longitudinal study that follows people over
their lives - 17k children born in a particular week in 1958
(in 1970 for the BCS cohort) - interviewed at 7, 11, 16, 23, 33 (29 for BCS) and
42 (not yet for BCS) - parents and teachers interviewed
- followed up NCDS childrens own children.
- Enormous detail on family background, education
(including special test scores), health and the
labour market. See QB-ncds and QB-bcs. - CLS at the Institute of Education have docs and
refs - - and an interactive manual or pdfs you can
download - Combined NCDS and BCS data is available at ESDS
29NCDS Follow-ups
30BCS70 Follow-ups
31US CPS Current Population Survey
- LFS lookalike
- employment, unemployment, earnings, hours of
work, age, sex, race, marital status, and
educational attainment. - But monthly and goes back to 50s
- much bigger around 50k households p.a.
- Has range of extra variables added
- Child support, household income, previous work
experience, health, employee benefits, and work
schedules - Available from BLS
- Construct your own queries
- Get the lot by FTP or use Ferret for on-line
selection - Use LABSTAT on-line for time series formed from
the CPS - CPS also from NBER with docs and programmes for
merging etc - And for easy-peasy-CPS
- go for MORG s on CD for 115 from NBER
- 25k households per month for 25 years in STATA
format
32LSMS - Living Standard Measurement
- World Bank poverty data
- Many countries
- Cote dIvoire, Ghana, Ecuador, Peru
- Many years
- Look very closely at documentation
- Fill in on-line Data Agreement Form
- Wait for email with links and password
- Click on links to download data
- Unzip
- Analyse
33Data from Books Examples
- Anthony King British Political Opinion
1938-2000 - Layard, Nickell Jackman Unemployment has
appendix of all their data. - Reitlinger Economics of Taste Art prices from
1750. - OECD Education at a Glance
- World Bank World Development Report Human
Development Report
34Data format
- Many programmes can import non-native formats
- xls, csv, ascii (text in free format), SAS, SPSS
- STAT-TRANSFER
- converts (nearly) anything to anything else
- including PC Give and Excel - but not Eviews!
- Delivered application on university network
- Free demo version available (drops 1 in every 16
obs) - Student price 49 for a download
- Programmes usually expect datasets to be
rectangular - May have to use merge facilities for complex
data - STATA