Title: Gereltuya Altankhuyag, LecturerStatistician, UNSIAP
1STATA Third group training course in application
of information and communication technology to
production and dissemination of official
statistics 10 May 11July 2007
Gereltuya Altankhuyag, Lecturer/Statistician,
UNSIAP gereltuya_at_unsiap.or.jp
2Getting Started
- There are three ways of executing commands
- Using menu-bar
- Using dialog box (db)
- Using Syntax
- It is preferable to use Syntax
3Getting Started dialog box
- Dialog box db is the command-line way to launch a
dialog for a Stata command. - Syntax
- db varname
- For instance db sum
4Getting Started dialog box
5Basic commands to inspect datasets
- The following commands are used to inspect
datasets -
- codebook
- count
- describe
- list
- summarize
- table
- tabstat
-
6Basic commands to inspect datasets
- codebook
- It examines
- the variable names,
- labels,
- data to produce a codebook describing the dataset
- It distinguishes/reports the standard missing
values - Syntax
- codebook varlist if in , option
- Example codebook
- codebook region
7Basic commands to inspect datasets
- option
- all provides a complete report excluding mv
- header adds header to the top of the output,
name, date - notes lists any notes attached to the variables
- mv determines the pattern of missing values
- Examples codebook region hhlandd famsize, all
- codebook region hhlandd
famsize, header - codebook region hhlandd
famsize, notes - codebook region hhlandd
famsize, mv
8Basic commands to inspect datasets
- count
- It counts the number of observations that satisfy
the specified conditions. If no conditions are
specified, count displays the number of
observations in the data. - Syntax
- count if in
- For instance count
- count if famsizegt5
-
9Basic commands to inspect datasets
- describe
- It produces a summary of the dataset
- In memory
- Of the data stored in a Stata-format dataset
- Syntax
- Data in memory
- describe varlist , describem_options
- Data in file
- describe varlist using filename,
describef_options - Example des
- des region famsize toilet
-
10Basic commands to inspect datasets
- options
- simple display only variable names
- short display only general information
- detail display additional details
- fullname do not abbreviate variable names
- numbers display vriable number along with name
11Basic commands to inspect datasets
- list
- It displays values of variables
- Syntax
- list
- list varlist if in , options
- Example list
- list region famsize toilet
- list region famsize toilet in
1/15 - list region if famsizegt5 in 1/15
12Basic commands to inspect datasets
- summarize
- It calculates and displays a variety of summary
statistics. If no varlist is specified, summary
statistics are calculated for all the variables
in the dataset. - Syntax
- summarize
- summarize varlist if in weight ,
options -
- Example sum
- sum in 1/15
- sum region famsize toilet
- sum region famsize toilet awweight
13Basic commands to inspect datasets
- options
- detail - produces additional statistics
including skewness, kurtosis, the four smallest
and four largest values, and various percentiles. - meanonly - which is allowed only when detail is
not specified, suppresses the display of results
and calculation of the variance. - format - requests that the summary statistics
be displayed using the display formats associated
with the variables, - separator() - specifies how often to insert
separation lines into the output. The default is
separator(5), meaning that a line is drawn after
every 5 variables. separator(10) would draw a
line after every 10 variables. separator(0)
suppresses the separation line.
14Basic commands to inspect datasets
- NOTE
- Commands and output are shown in Results window.
- When MORE message is shown,
press GO to continue display
or X button to stop display
15Basic commands to inspect datasets
- NOTE
- We may specify a variable list for a range of
variables - des region toilet
- sum region hhlandd
- list thana - famsize
16Basic commands to inspect datasets
- NOTE
- We may use the menus
- for DESCRIBE
- Data ? Describe Data ?Describe Variables in
Memory - for SUMMARIZE
- Statistics ? Summaries, Tables Tests?Summary
Statistics ?Summary Statistics - Data ? Describe Data ?Summary Statistics
17Basic commands to inspect datasets
- There are 5 types of table command
- table
- tabstat
- tabulate one-way
- tabulate two-way
- tabulate summarize
18Basic commands to inspect datasets
- table
- It calculates and displays tables of statistics.
- Syntax
- table rowvar colvar supercolvar if in
weight , options - Main options
- contents - specifies the contents of the table's
cells select up to 5 statistics - by(superrowvarlist) - superrow variables up to
4 variables.
19Basic commands to inspect datasets
- Examples
- table region, c(mean famsize median hhandd)
- table region, by(sexhead) c(mean famsize median
hhandd)
20Basic commands to inspect datasets
- tabstat
- It displays table of summary statistics
- Syntax
- tabstat varlist if in weight , options
- Main options
- by(varname) - group statistics by variable
- statistics(statname ...) - report specified
statistics
21Basic commands to inspect datasets
- Examples
- tabstat region, stats(mean range)
- tabstat region, by( sexhead) stat(min mean max)
col (stat)
22Basic commands to inspect datasets
- tabulate one-way (tab1)
- It produces one-way tables of frequency counts.
- Syntax
- tabulate varname if in weight , options
- It produces one-way tables of frequency counts.
- tab1 varlist if in weight , tab1_options
- It produces a one-way tabulation for each
variable specified in varlist.
23Basic commands to inspect datasets
- Examples
- tabulate toilet
- tabulate region
- tabulate hhelec
- tabulate sexhead
- tab1 region toilet hhelec sexhead
- Note please see the differences!!
24Basic commands to inspect datasets
- tabulate two-way (tab2)
- It produces two-way tables of frequencies
- Syntax
- tabulate varname1 varname2 if in weight ,
options - It produces two-way tables of frequency counts,
along with various measures of association,
including the common Pearson's chi-squared, the
likelihood-ratio chi-squared, Cramér's V,
Fisher's exact test, Goodman etc.
25Basic commands to inspect datasets
- tab2 varlist if in weight , options
- It produces all possible two-way tabulations of
the variables specified in varlist. - Examples
- tabulate region toilet, row
- tabulate region sexhead, row col chi2
- tabulate region toilet, all exact
- tab2 region sexhead toilet
- tab2 region sexhead toilet, all exact
26Basic commands to inspect datasets
- Tabulate summarize
- It produces one- and two-way tables (breakdowns)
of means and standard deviations. - Syntax
- tabulate varname1 varname2 if in weight
, summarize
27Basic commands to inspect datasets
- Examples
- One-way tables
- tabulate region, summarize( hhlandd)
- tabulate region aweightweight, summarize(
toilet) - Two-way tables
- tabulate region sexhead, summarize( hhlandd)
- tabulate region sexhead aweightweight,
summarize( hhlandd)
28Basic commands to create and change variables,
labels etc.
- generate
- It creates a new variable. The values of the
variable are specified by exp. - Syntax
- generate type newvarlblname exp if in
- Examples
- gen agehead2ageheadagehead
- gen agehead3ageheadagehead if sexhead1
29Basic commands to create and change variables,
labels etc.
- replace
- It changes the contents of an existing variable.
Because replace alters data, the command cannot
be abbreviated. - Syntax
- replace oldvar exp if in , nopromote
- Examples
- replace agehead30 if region2
30Basic commands to create and change variables,
labels etc.
- egen
- It creates newvar of the optionally specified
storage type equal to fcn(arguments). Here fcn()
is a function specifically written for egen. - Syntax
- egen type newvar fcn(arguments) if in ,
options
31Basic commands to create and change variables,
labels etc.
- Examples
- egen age4mean( agehead)
- egen testmedian( weight- d_bank)
32Please perform EXERCISE 2