Title: A Report on the NSF ICORS 2002 Computational Workshop
1A Report on the NSF ICORS 2002 Computational
Workshop
- Arnold Stromberg
- Department of Statistics
- With support from the
- National Science Foundation
ICORS 2003 Antwerp, Belgium July 18, 2003
2Background
- The NSF sponsered International workshop on
Computational Methods for Robust Statistics was
held immediately following ICORS 2002 in
Vancouver, BC, Canada from May 18 to May 20, 2002
3Invited Presentations
- Doug Martin of Insighful, Inc., the makers of
Splus, discussed the need for emphasis on
computation of robust statistical techniques - Colin Chen of SAS, Inc. gave a demonstration of
Proc Robustreg which he wrote to do robust
regression. It will be available with SAS,
Version 9 in Fall, 2003
4Comments Software Availability
- SAS users, primarily applied statisticians, can
now do robust regression, thus applying what they
learned in graduate school. - Nonstatisticians, who will never use SAS, are
starting to use R, and by extension, Splus. - We must write code in R. Good methods will be
incorporated by Splus, and eventually SAS.
5Research Directions
- Literally hundreds of statistical procedures are
in need of robustification. By providing
computational tools, especially in R,
statisticians can compare classical and robust
methods. - The key is to make R code user friendly so
nonstatisticians can use them.
6Why make your code user friendly?
- Disadvantages
- It takes extra time.
- It doesnt help my career.
- It isnt fundable.
- Writing code is no fun.
- I cant publish it.
- No one will use it.
7Why make your code user friendly?
- Advantages
- It takes time, but is publishable in Journal of
Statistical Software. - www.jstatsoft.org
- Abstracts published in Journal of Computational
and Graphical Statistics (JCGS)
8Journal of Statistical Software
- Types of Papers JSS will publish
- Manuals, user's guides, and other forms of
description of statistical software. - The code for new statistical software.
- Data sets that are of use to statisticians.
- Reviews and comparisons of statistical software.
9Publishing in JSS
- The typical JSS paper will have a section
explaining the statistical technique, a section
explaining the code, a section with the actual
code, and a section with examples. All sections
will be made browsable as well as downloadable.
The papers and code should be accessible to a
broad community of practitioners, teachers, and
researchers in the field of statistics.
10Why make your code user friendly?
- More advantages
- It does help your career because
- It increases publications.
- It is fundable! NSF wants innovative and useful
projects. Useful means, others can and do use
it. That means user friendly code. NSF rarely
funds straight theory anymore. - Nonstatistician who are evaluating you appreciate
it. - Many statisticians appreciate it.
11Why make your code user friendly?
- More Advantages
- Writing code may not be fun, but seeing
researchers use your methods is lots of fun! - If you method is useful, researchers will use it
if they know about it and have the tools.
Nonstatisticians dont read statistics journals
so we must publish in their journals!
12Collaborators in other fields
- Why you need them
- They have real problems.
- They at least double your funding options.
- They at least double your publication options.
- They will support you and your department
13The Case of Robust Regression
- Everyone agrees its useful.
- Nearly 40 years after M-estimates, they are in
SAS! - What happened to the 10 year rule?
14Why Did it take 40 Years?
- Only theory mattered.
- Computationally difficult at first.
- No two statisticians could agree on the best
robust method. - Computationally harder with high breakdown.
- SAS mentality.
15Funding Statistical Research
- No one funds straight theory.
- NSF wants useful and innovative.
- Useful Collaborators
- NIH want medical applications.
- Not the next robust regression estimator,
although JASA might accept it. - Everyone funds conferences and workshops!
16COMMENTS
17Collaborations resulting from the workshop
- Ma, Y., Genton, M. G. (2002) "A semiparametric
class of generalized skew-elliptical
distributions," Institute of Statistics Mimeo
Series 2541,under review. - Nora Muler is working together with Victor Yohai
on robust estimators for GARCH models. I proposed
at ICORS 2002 a robust estimator for the general
GARCH(p,q) model that has pq1 parameters to
estimate. The algorithm in our paper is
implemented only for the GARCH(1,1) case. I was
discussing how to generalize the algorithm to the
general GARCH(p,q) case.
18Collaborations resulting from the workshop
- Robust Methods for Microarray Data Analysis by
Hanga Galfalvy, Steven Grambow,Johanna Hardin and
Arnold Stromberg was started and extensive
progress has since be made. - Rocke, D.M., and D.L. Woodruff, "Multivariate
Outlier Detection and Cluster Identification",
Working Paper
19Collaborations resulting from the workshop
- Chen, et. al. extensively discussed smoothing
algorithms for quantile regression which will
become part of SAS soon. - Discussions lead to Salibian-Barreras
Estimating the p-values of robust tests for the
linear model. Now under revision for JSPI
20Workshop Benefits
- Salibian-Barrera (2003). Estimating the p-values
of robust tests for the linear model. Now under
revision for JSPI - Attending the workshop assisted Matias
Salibian-Barrera and two colleagues application
for a large grant for a computer lab here (over
980K)... The agencies were CFI
(www.innovation.ca) and OIT (www.oit.on.ca). They
were awarded the grant in October 2002.
21Computational Issues
- Discussed possibilities for new algorithmic
strategies for efficient computation of robust
estimators such as LTS - Discussed current research on theoretical
properties of algorithmic estimators... in
particular, the recent work of Hawkins and Olive. - Discussed the impact of new robust procedures in
SAS and Splus on data analysis as well as the
impact on the ability to develop and run
simulations for research in robust methods. - Discussed potential applications of traditional
robust estimators to be emerging field of
microarray/gene expression analysis.
22More Computational Issues
- Many robust techniques are only computable for
small data sets, but the larger the data set, the
more likely robust techniques should be used. - Success stories
- Fast LTS, Fast MCD.
- Others?????????
23More Computational Issues
- The need for computation of robust singular value
decomposition for large matrices (with
application to microarray data). - The need for computation of robust variogram
estimator in spatial statistics - The need for investigation of similarities
between support vector machines and robust
regression - The need to detect outliers in asymmetric
distributions
24More Computational Issues
- differences between SAS's and Splus' s
implementations of robust regression - The appropriateness of using the robust Wald or
robust F tests when using ANOVA to compare two
nested robust regression models
25More Computational Issues
- how to deal with categorical variables for high
breakdown methods - how to handle multiple root problems for
re-descending M-estimators - Issues in robust SVD. One thing proposed is to
use norm L1 instead of the usual norm L2, the
problem is the orthogonality property is lost.
26More Computational Issues
- In the Skew-Symmetric type distributions, the
coefficients in the skewing function is very
sensitive to even small amount of outliers. In
fact, a small amount of outlier will demand
increasing the order of the polynomial in the
skewing function, yet the extra coefficients are
very hard to estimate.
27More Computational Issues
- Subsampling strategies
- Empirical analysis of estimator performance
- Robust metrics
- Compute high breakdown value estimates with both
continuous and categorical variable - Compute multivariate robust estimates
- Smoothing algorithm for regression quantile
28More Computational Issues
- Fast and robust bootstrap methods for robust
regression estimates - Fast and robust estimates for p-values for robust
regression - Fast computation of MM-regression estimates for
high-dimensional data...Maybe related to
Hoaglin-Mosteller-Tukey's sweeping method?
29Would you do it again?
- I would definitely be interested in
participating in another workshop. Attendance at
the 2002 workshop proved to be invaluable. I
learned about new research going on in the robust
statistics field, initiated several new research
collaborations with other conference attendees,
and had the opportunity to meet several key
researchers in the robust field. All in all, it
was an excellent experience.
30Would you do it again?
- Yes, especially after SAS and Splus release their
robust procedures formally.
31Workshop Participants
- Chen, Colin (Lin) . SAS INSTITUTE, INC.
- Galfalvy Hanga C. New York State Psychiatric
Institute - Garcia Ben Marta Universidad de Buenos Aires
- Genton Marc North Carolina State University
- Grambow Steve Duke University Medical Centre
32Workshop Participants
- Hardin,Johanna, PostDoc, Fred Hutchinson Cancer
Research Center, Seattle Washington - He Xuming, Professor, Department of Statistics,
University of Illinois - Kafadar, Karen, Professor, Department of
Mathematics, University of Colorado at Denver - Lin Nan
- Ma,Yuanyuan North Carolina State Univerisity
33Workshop Participants
- Muler,Nora, Junior Faculty,Universidad Torcuato
di Tella, Argentina - Stromberg Arnold, Professor, Department of
Statistics, University of Kentucky - Tyler, David, Professor, Department of
Statistics, Rutgers University" - Vanden Branden Karlien, Graduate Student,
Katholieke Universiteit Leuven, Belgium. - Werner, Mark, Graduate Student, University of
Colorado at Denver
34Workshop Participants
- Woodruff ,David, Professor, Department of
Mathematics, University of California at Davis - Zamar, Ruben, Professor, Department of
Statistics, University of British Columbia - Ekblom Hakan, Professor, Lulea University of
Technology, Sweden - Sinha, Sanjoy Assistant Professor, University of
Winnipeg, Canada