Introduction to ObjectOriented Programming for Statistical Applications - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Introduction to ObjectOriented Programming for Statistical Applications

Description:

'The object-oriented approach relies on. ... Cabin. Button. Light. Why OOP? Stability of models with respect to real-world entities ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 54
Provided by: andresh
Category:

less

Transcript and Presenter's Notes

Title: Introduction to ObjectOriented Programming for Statistical Applications


1
Introduction to Object-Oriented Programming for
Statistical Applications
  • E. Andres Houseman
  • April 19, 2004

2
Typical Statistical Development
Software Development/ Data Analysis
Testing and New Development
Theoretical Investigation
Pilot/Proof-of-Concept
Theoretical Enhancements
Total Effort 70
3
Ideal Statistical Development
Software Development/ Data Analysis
Testing and New Development
Theoretical Investigation
Pilot/Proof-of-Concept
Theoretical Enhancements
Total Effort 58
4
Goals for Today
  • Understand the motivation for using OOP in a
    statistical context
  • Basic understanding of OOP concepts
  • Understanding of the architecture of complex R
    functions such as glm() and lme()

5
Object-Oriented Programming
The object-oriented approach relies on .. a
systems-oriented approach that treats a system as
an organized entity made of components that can
be defined with respect to one another. It
proposes a method of decomposition that is based
not only on what the system does, but more
importantly, on the integration of what the
system is and does. -- Muller (1997), Instant
UML, Wrox Press Ltd.
6
Objects
7
Object-Oriented Programming (2)

Door
Open
Elevator Cabin
Light
Go to first floor
Button
Illuminate
8
Why OOP?
  • Stability of models with respect to real-world
    entities
  • Iterative construction, which is made easier by
    the weak coupling between components
  • The ability to reuse elements across development
    projects

-- Muller (1997)
9
Why OOP for Statisticians?
  • Code reusability
  • Enforced organization of development process
  • Architecture of software more closely resembles
    the statistical relationships
  • Protection against errors

10
Why OOP for Statisticians?
Data
Model
Fit
Estimate
Assess
Inference
Goodnessof Fit
Predict
Predictions
11
Simple Example to Fix Concepts
gt Sample data for toy examples gt gt
binomial.data lt- rbinom(1000, size1, p.5) gt gt
Stats lt- function(x) x should be
binomial return( list(p mean(x), n
length(x)) ) gt gt Stats( binomial.data
) p 1 0.494 n 1 1000
12
Problem
gt What is wrong with this? gt gt normal.data lt-
rnorm(1000, mean5, sd2) gt gt Stats( normal.data
) p 1 4.998116 n 1 1000
13
Simple Solution
gt Stats.modBinomial lt- function(x) x
should be binomial return( list(p
mean(x), n length(x)) ) gt gt
Stats.modNormal lt- function(x) x should
be normal return( list(mu mean(x),
sigmasqrt(var(x)), n length(x))
) gt
14
Simple Solution (2)
gt statBinomial lt- Stats.modBinomial (
binomial.data ) gt statNormal lt- Stats.modNormal (
normal.data ) gt
gt statNormal mu 1 4.998116 sigma 1
2.013784 n 1 1000
gt statBinomial p 1 0.494 n 1 1000
15
Mistakes Happen
gt A careless person could do this gt
Stats.modNormal ( binomial.data ) mu 1
0.494 sigma 1 0.5002142 n 1 1000
16
Additional Functionality
gt Expectation lt- function(stat)
if(!is.null(statp)) statp binomial
else if(!is.null(statmu)) statmu normal
else stop("Unknown model type.") something
else gt gt Variance lt- function(stat)
if(!is.null(statp)) statp (1 - statp)
binomial else if(!is.null(statmu))
statsigma2 normal else stop("Unknown
model type.") something else gt gt
ConfidenceInterval lt- function(stat)
Wald-type confidence interval est lt-
Expectation(stat) se lt- sqrt(Variance(stat)
/ statn) return (c(est - 1.96se, est
1.96se))
17
Additional Functionality (2)
gt Expectation(statBinomial) 1 0.494 gt
Expectation(statNormal) 1 4.998116 gt gt
Variance(statBinomial) 1 0.249964 gt
Variance(statNormal) 1 4.055326 gt gt
ConfidenceInterval(statBinomial) 1 0.4630119
0.5249881 gt ConfidenceInterval(statNormal) 1
4.873300 5.122931
18
Additional Problems
gt What's wrong with this? gt gt stats lt-
Stats.modBinomial ( normal.data ) gt
ConfidenceInterval( stats ) 1 NaN NaN Warning
message NaNs produced in sqrt(Variance(stat)/st
atn)
19
Solution Objects
gt as.modBinomial lt- function(x)Class
Constructor class(x) lt- "modBinomial"
return( x ) gt gt as.modNormal lt-
function(x)Class Constructor class(x) lt-
"modNormal" return( x ) gt gt
binomial.data lt- as.modBinomial( binomial.data
) gt normal.data lt- as.modNormal( normal.data
) gt gt binomial.data 1 0 1 1 1 0 0 1 ...
... 1000 0 attr(,"class") 1 "modBinomial"
20
Objects and Methods
gt Stats lt- function(x) Method
if(class(x)"modBinomial") binomial
object lt- list(p mean(x), n
length(x)) class(object) lt-
"statBinomial" return(object)
else if(class(x)"modNormal") normal
object lt- list(mu mean(x),
sigmasqrt(var(x)), n
length(x)) class(object) lt-
"statNormal" return(object)
else stop("Unknown model type.") something
else gt
21
Objects and Methods (2)
gt statBinomial lt- Stats ( binomial.data ) gt
statNormal lt- Stats ( normal.data ) gt gt
statBinomial p 1 0.494 n 1
1000 attr(,"class") 1 "statBinomial"
gt statNormal mu 1 4.998116 sigma 1
2.013784 n 1 1000 attr(,"class") 1
"statNormal"
22
Classes and Methods in R
gt Stats lt- function(x) Method
if(class(x)"modBinomial") binomial
... else if(class(x)"modNormal")
normal ... else if(class(x
)"modPoisson") poisson ... ...
many many many more classes else
stop("Unknown model type.") something else
gt
23
Classes and Methods in R (2)
gt Stats lt- function(x) UseMethod("Stats") gt gt
Stats.modBinomial lt- function(x) object lt-
list(p mean(x), n length(x))
class(object) lt- "statBinomial"
return(object) gt gt Stats.modNormal lt-
function(x) object lt- list(mu mean(x),
sigmasqrt(var(x)), n
length(x)) class(object) lt- "statNormal"
return(object) gt gt Stats.default lt-
function(x) stop("Unknown model type.")
24
Classes and Objects in R (2)
gt Stats ( binomial.data ) p 1 0.494 n 1
1000 attr(,"class") 1 "statBinomial"
gt Stats ( normal.data ) mu 1
4.998116 sigma 1 2.013784 n 1
1000 attr(,"class") 1 "statNormal"
25
Statistical Objects
Estimate
26
Inheritance
gt as.modUnitNormal lt- function(x) x lt-
as.modNormal(x) class(x) lt-
c("modUnitNormal", class(x)) return( x )
gt gt Stats.modUnitNormal lt- function(x)
object lt- Stats.modNormal(x) objectsigma lt-
1 class(object) lt- c("statUnitNormal",
class(object)) return(object) gt
27
Inheritance (2)
gt unitNormal.data lt- as.modStdNormal (
rnorm(1000, mean5, sd1) ) gt Stats(
unitNormal.data ) mu 1 5.0274 sigma 1
1 n 1 1000 attr(,"class") 1
"statUnitNormal" "statNormal" gt gt
ConfidenceInterval( Stats( unitNormal.data )
) 1 4.965420 5.089381
28
Inheritance (3)
statistical model
normal
binomial
unit-variance normal
29
Example lm() and glm()
gt ex.ols lt- lm(normal.data 1) gt ex.glm lt-
glm(binomial.data 1, familybinomial()) gt gt
class(ex.ols) 1 "lm" gt class(ex.glm) 1 "glm"
"lm" gt print function (x, ...)
UseMethod("print") gt summary function (object,
...) UseMethod("summary") gt predict function
(object, ...) UseMethod("predict")
30
Methods for glm() and lm()
gt summary(ex.glm) Call glm(formula
binomial.data 1, family binomial()) Devianc
e Residuals Min 1Q Median 3Q
Max -1.167 -1.167 -1.167 1.188 1.188
Coefficients Estimate Std. Error
Pr(gtz) (Intercept) -0.02400 0.06325
0.704 (Dispersion parameter for binomial family
taken to be 1) Null deviance 1386.2 on
999 degrees of freedom Residual deviance
1386.2 on 999 degrees of freedom AIC 1388.2
gt summary.lm(ex.glm) Call glm(formula
binomial.data 1, family binomial()) Residua
ls Min 1Q Median 3Q Max
-0.9881 -0.9881 -0.9881 1.0121 1.0121
Coefficients Estimate Std. Error
Pr(gtt) (Intercept) -0.02400 0.06329
0.705 Residual standard error 1.001 on 999
degrees of freedom
31
Using the Default Print Method
gt print.default(ex.ols) coefficients (Intercept)
4.998116 residuals 1
2 ... 0.617956019 1.297557292 ...
... 999 5.88151405 1000 7.81920766 attr(,"clas
s") 1 "lm"
gt print(ex.ols) Call lm(formula normal.data
1) Coefficients (Intercept) 4.998 gt
32
Objects and Encapsulation
An object is an atomic entity, formed from the
union of state and behavior. It provides an
encapsulation relationship that ensures a strong
internal cohesion, and a weak coupling with the
outside.
33
Objects and Encapsulation
Q
Land
Take-off
Control Tower
34
Encapsulation in Programming Environments
  • Java and C enforce encapsulation by default
  • R has to be tricked into encapsulation
  • Use the environment to encapsulate in R

35
R Environments
  • An environment is a name-space whose access is
    tightly controlled
  • Functions that live within a specific environment
    can only see data within that environment (or a
    parent environment)
  • Use get and assign for communication with
    other environments

36
R Environments (2)
assign
37
Objects and Encapsulation
statistical model
Sufficient statistics Order statistics
normal
binomial
unit-variance normal
38
Encapsulation Example
Default modelled data class as.model lt-
function( data )Constructor object lt-
new.env() assign("data", data, envobject)
assign("n", length(data), envobject)
class(object) lt- "model" object N lt-
function(x) UseMethod("Size") Stats lt-
function(x) UseMethod("Stats") Mean lt-
function(x) UseMethod("Mean") Variance lt-
function(x) UseMethod("Variance")
39
Encapsulation Example (2)
print.model lt- function(object) cat("Model
for data with", N(object),"observations.\n") Si
ze.model lt- function(object) get("n",
envobject) Stats.model lt-
function(object) sort( get("data",
envobject) ) Mean.model lt- function(object)
mean(get("data", envobject))
Variance.model lt- function(object)
var(get("data", envobject))
40
Encapsulation Example (3)
gt myData lt- as.model( rnorm(100, 10, 2) ) gt
myData Model for data with 100 observations. gt gt
print.default(myData) ltenvironment
017DA720gt attr(,"class") 1 "model" gt gt Mean(
myData ) 1 9.970916 gt Variance( myData ) 1
4.203072 gt gt Stats( myData ) 1 4.463604
5.246475 5.276268 5.929042 6.031831 ... 8
7.032259 7.175986 7.224143 7.269212 7.828484
... 99 13.969482 14.986617
41
Encapsulation Example (4)
as.modelNormal lt- function( data )Constructor
object lt- new.env() if( inherits(data,"model")
) "data" is a model object data lt-
get("data", data) COPY data over Note
this is wrong object lt- data This would
mean that "data" and this object
reference the same environment!
assign("data", data, envobject) class(object)
lt- c("modelNormal", "model") assign("mu",
mean(data), envobject) assign("sigma",
var(data).5, envobject) assign("n",
length(data), envobject) object
42
Encapsulation Example (5)
gt myNormalData lt- as.modelNormal ( myData ) gt
myNormalData Model for data with 100
observations. gt Stats.modelNormal lt-
function(object) stats lt- c(Mean(object),
sqrt( Variance(object) )) names( stats) lt-
c("mu", "sigma") stats gt Stats(
myNormalData ) mu sigma 9.970916
2.050140 gt
43
Encapsulation Example (6)
more efficient Mean.modelNormal lt-
function(object) get("mu", envobject)
Variance.modelNormal lt- function(object)
get("sigma", envobject)2 Stats.modelNormal
lt- function(object) get("mu", envobject)
get("sigma", envobject) stats lt-
c(mu, sigma) names( stats) lt- c("mu",
"sigma") stats
44
Encapsulation Example (7)
updating objects update.model lt-
function(object, newdata) data lt-
c(get("data", envobject), newdata)
assign("data", data, envobject) assign("n",
length(data), envobject) Does the update
method need to be modified for modelNormal?
45
Encapsulation Example (8)
gt myData Model for data with 100 observations. gt
gt someNewData lt- rnorm(50, 10, 2) gt update(
myData, someNewData ) gt gt myData Model for data
with 150 observations. gt myNormalData Model for
data with 100 observations. gt Stats(myNormalData)
mu sigma 9.970916 2.050140 gt update(
myNormalData, someNewData ) gt Stats( myNormalData
) mu sigma 9.970916 2.050140
46
Encapsulation Example (9)
  • update.modelNormal lt- function(object, newdata)
  • update.model( object, newdata )
  • data lt- get("data", envobject)
  • assign("mu", mean(data), envobject)
  • assign("sigma", var(data).5, envobject)
  • gt Reset the data for this example
  • gt myNormalData lt- as.modelNormal ( get("data",
    myNormalData)1100 )
  • gt update( myNormalData, someNewData )
  • gt Stats(myNormalData)
  • mu sigma
  • 9.851004 2.014527

47
Why Use Environments?
Function
Input Data
Output Data
Environment Data
48
Example Architecture of lme()
  • The mixed model function lme() returns objects of
    class lme
  • Correlation models are passed into lme() as
    objects belonging to the class corStruct
  • Random effects structures are stored efficiently
    as objects belonging to the class pdMat

49
pdMat
gt myPDMat lt- pdMat( matrix(c(2,1,1,2), 2, 2) ,
pdClass "pdCompSymm") gt myPDMat Positive
definite matrix structure of class pdCompSymm
representing ,1 ,2 1, 2 1 2,
1 2 gt print.default(myPDMat) 1 0.3465736
1.0986123 attr(,"Dimnames") attr(,"Dimnames")1
NULL attr(,"Dimnames")2 NULL attr(,"ncol") 1
2 attr(,"class") 1 "pdCompSymm" "pdMat"
50
pdMat (2)
gt myPDMat lt- pdMat( matrix(c(2,1,1,3), 2, 2) ,
pdClass "pdNatural") gt myPDMat Positive
definite matrix structure of class pdNatural
representing ,1 ,2 1, 2 1 2,
1 3 gt print.default(myPDMat) 1 0.3465736
0.5493061 0.8670147 attr(,"Dimnames") attr(,"Dimna
mes")1 NULL attr(,"Dimnames")2 NULL attr(,
"class") 1 "pdNatural" "pdMat"
51
pdMat (3)
gt myPDMat lt- pdMat( matrix(c(2,1,1,3), 2, 2) ,
pdClass "pdCompSymm") gt gt myPDMat Positive
definite matrix structure of class pdCompSymm
representing ,1 ,2 1, 2.500000
1.020621 2, 1.020621 2.500000
52
corStruct
gt myCorr lt- corAR1(.5) gt myCorr Correlation
structure of class corAR1 representing Phi 0.5
gt print.default(myCorr) 1 1.098612 attr(,"formu
la") 1 attr(,"formula")attr(,"class") 1
"formula" attr(,"formula")attr(,".Environment") lte
nvironment 04332600gt attr(,"fixed") 1
FALSE attr(,"class") 1 "corAR1" "corStruct"
53
Summary
  • Why OOP?
  • Code reusability and faster development
  • Enforce intelligent design considerations
  • Protection against errors
  • OOP Basic Concepts
  • Classes and Inheritance
  • Objects and Encapsulation
Write a Comment
User Comments (0)
About PowerShow.com