Title: Estimating
1Estimating multivariate skewed distributions
via Maximum Likelihood in R
2Contents
- Two issues to be presented
- Maximum Likelihood Estimation in R (vs. Stata)
- Skewed (multivariate) Normal and t distributions
3R
First, R
4R
R is a system for statistical computation and
graphics. It consists of a language plus a
run-time environment with graphics, a debugger,
access to certain system functions, and the
ability to run programs stored in script files.
- Based on S and SCHEME
- Fairly easily ported from C, C, FORTRAN
- Allows, branching, looping, modular programming
- Base gives functionality for linear and
generalized linear models, nonlinear regression
models, time series analysis, classical
parametric and nonparametric tests, clustering
and smoothing - Add-ons do just about everything else
5R
- Distributed as a GNU project (open source)
- Made up of base contributed packages
- Easily downloaded, installed and updated
- Goto http//cran.r-project.org/ or
- http//www.r-project.org/
-
6Statistical Problem
Thesis flashback!
7Statistical Problem
- Consider Utility U(UMUH,Ul1,..,Ul6)
- M, H and l1,..,l6 are as before
- Utility U(.) is measured by the inverse of the
probability of being in each state - Then we have a system of 8 linear equations with
continuous dependent variables in each
Consider as explanatory variables age, gender and
social class.
8Stata
Were we to estimate this in Stata
sureg (invdeath age male sc3 sc45) (invsah
age male sc3 sc45) (invbreak age male sc3
sc45) (invnobe age male sc3 sc45) (invnsmok
age male sc3 sc45) (invsleep age male sc3 sc45)
(invalq age male sc3 sc45) (invexce age male
sc3 sc45)
9Stata
Statas SUREG procedure is a GLS seemingly
unrelated regression model. Using the option
isure we can tell Stata to push beyond the
two-step estimates to get maximum likelihood
estimates
sureg (invdeath age male sc3 sc45) (invsah
age male sc3 sc45) (invbreak age male sc3
sc45) (invnobe age male sc3 sc45) (invnsmok
age male sc3 sc45) (invsleep age male sc3 sc45)
(invalq age male sc3 sc45) (invexce age male
sc3 sc45), isure
10Stata R
Why would be want to switch to R, then?
- Because R contains several maximum likelihood
packages of interest - Multivariate normal
- Multivariate t
- Multivariate skewed normal and t
11Multivariate skewed distributions
12Multivariate skewed distributions
From http//tango.stat.unipd.it/SN/
The skewed normal then is
13Multivariate skewed distributions
The skewed t is such that
14R
And R can do this?
AZZALINI, A. (1985). A class of distributions
which includes the normal ones. Scand.
J. Statist. 12, 171178. AZZALINI, A. (1986).
Further results on a class of distributions which
includes the normal ones. Statistica XLVI,
199208. AZZALINI, A. (2001). A note on regions
of given probability of the skew-normal
distribution. Metron LIX, 2734. AZZALINI, A.
(2005). The skew-normal distribution and related
multivariate families (with discussion). Scand.
J. Statist. 32, 159188 (C/R 189200). AZZALINI,
A. CAPITANIO, A. (1999). Statistical
applications of the multivariate skew normal
distributions. J. R. Stat. Soc., ser. B 61,
579602. AZZALINI, A. CAPITANIO, A. (2003).
Distributions generated by perturbation of
symmetry with emphasis on a multivariate skew t
distribution. J. R. Stat. Soc., ser. B 65,
367389. AZZALINI, A. CHIOGNA, M. (2004). Some
results on the stress-strength model for
skewnormal variates. Metron LXII,
315326. AZZALINI, A., DAL CAPPELLO, T., KOTZ,
S. (2003). Log-skew-normal and log-skew-t
distributions as model for family income data.
Journal of Income Distribution 11,
1220. AZZALINI, A. DALLA VALLE, A. (1996). The
multivariate skew-normal distribution. Biometrika
83, 715726.
Yes, it can! Thanks to a package contributed by
(none other than) Azzalini.
Azzalinis contributed package sn can be
installed from the R-project page, loaded and
run.
15Comparing skew/non-skew
Information criteria, for example, could be used
16Comparing skew/non-skew
Or the results on skew, df
df
alpha
17Comparing skew/non-skew
Replication is a better method, though
18Comparing skew/non-skew
These seem rather close
19Comparing skew/non-skew
Differencing correlation matrices
20Comparing skew/non-skew
Comparing distributions (mortality)
21Parametric results
What does it all mean?
Coefficients (breakfast)
t-statistics
t-stats are the same for SUREG
22Post-estimation
R is essentially written in S like Gauss or Ox,
it isnt as user-friendly as Stata.
Also Stata is more centrally-written, and a lot
of its options are the same for most functions.
Result Stata has a lot more post-estimation
options for the researcher than does R.
23Porting
We basically exist in a Stata environment,
recognising its relative disadvantages.
However, we do use other programmes when required.
We should consider porting those to Stata, as
much/often as we are able.