Title: R Project
1R Project
- A programming environment for Data Analysis and
Graphics
2The R Software
- R the tool
- R statistical tools
- R graphical tools
- Examples
- Conclusions
3The R Software
- R
- What do you need?
- The R-project
- Why R? What is R?
- What R does and does not
- Packages
- Getting help
- R statistical tools
- R graphical tools
- Examples
- Conclusions
4What do you need?
- Performance
- Functionality
- Extensibility
- Simplicity
- Compatability
- Graphical Interface
- Low-cost
5The R-project
- Authors
- Ross Ihaka and Robert Gentleman
- Statistics Department of the University of
Auckland, New Zealand - Licence
- R is available as Free Software
- Free Software Foundations GNU General Public
Licence in source code form - Platform
- UNIX (FreeBSD, Linux), WINDOWS, MacOs
- Contributions
- Product of international collaboration
- top computational statisticians
- computer language designers
- Web sites
- http//www.r-project.org
- http//cran.r-project.org
- PACKAGES
6Why R? What is R?
- All source code is published correction check
by expert statisticians - Comprehensive technical documentation and user
contributed tutorials - It is fully programmable, with its own
sophisticated computer language - Easy to write your own functions,
- Easy to write whole packages
- Exchange data in MS-Excel, text, and fixed and
delineated formats - Easy importing and exporting datasets
- Integrated suite of software facilities for data
manipulation, calculation and graphical display
7What R does and does not
- is not a database, but connects to DBMSs
(databased management systems) - has no graphical user interfaces, but connects
to Java, Tcl/Tk - language interpreter can be very slow, but
allows to call own C/C code - no spreadsheet view of data, but connects to
Excel/MsOffice - no professional / commercial support
- data handling and storage numeric, textual
- operators for calculations on arrays matrices
- tools for data analysis
- high-level data analytic and statistical
functions - graphics
- programming language loops, branching,
subroutines
8Packages
- base - The R Base Package
- class - Functions for classification
- cluster - Functions for clustering
- mclust model-based cluster analysis
- graphics - The R Graphics Package
- mle - Maximum likelihood estimation
- nnet - Feed-forward neural networks and
multinomial log-linear models - spatial - functions for kriging and point pattern
analysis
9Packages (2)
- ctest - classical statistical tests,
- mva - multivariate analysis
- gstat - multivariable geostatistical modelling,
prediction and simulation - geoR functions for geostatistical analysis
- fdim functions for calculating fractal
dimensions - fields tools for spatial data
- ncdf UCAR netCDF format reading
- wavethresh wavelet statistics and transforms
- ? directly downloadable from the internet
10Getting help
Details about a specific command whose name you
know (input arguments, options, algorithm,
results) gt? t.test or gthelp(t.test)
11Getting helpo HTML search engineo Search for
topics with regular expressions
help.search
12The R Software
- R
- R statistical tools
- Math
- Stats
- R graphical tools
- Examples
- Conclusions
13Variables
gt a 49 gt sqrt(a) 1 7 gt a "The dog ate my
homework" gt sub("dog","cat",a) 1 "The cat ate
my homework gt a (113) gt a 1 FALSE
numeric
character string
logical
14Lists
- vector an ordered collection of data of the same
type. - gt a c(7,5,1)
- gt a2
- 1 5
- list an ordered collection of data of arbitrary
types. - gt doe list(name"john",age28,marriedF)
- gt doename
- 1 "john
- gt doeage
- 1 28
15Data frames
data frame is supposed to represent the typical
data table that researchers come up with like a
spreadsheet. It is a rectangular table with rows
and columns data within each column has the same
type (e.g. number, text, logical), but different
columns may have different types. Example gt
d.f localisation tumorsize
progress XX348 proximal 6.3
FALSE XX234 distal 8.0
TRUE XX987 proximal 10.0
FALSE
16R as a simple calculator
- gt xlt-c(1,3,2,10,5) ylt-15 creation of 2
vectors - x
- 1 1 3 2 10 5
- gt xy 1 2 5 5 14 10 gt xy 1 1 6 6 40
25 gt x/y 1 1.0000000 1.5000000 0.6666667
2.5000000 1.0000000 gt xy 1 1 9 8
10000 3125 - gt sum(x) sum of elements in x 1 21
gt cumsum(x) cumulative sum vector 1
1 4 6 16 21
17Basic math/stat tools
- gt 20 nombres entre 0 et 20,
- gt arrondis à un chiffre après la virgule
- gt x round(runif(20,0,20), digits1)
- gt x
- 1 10.0 1.6 2.5 15.2 3.1 12.6 19.4 6.1
- 9 9.2 10.9 9.5 14.1 14.3 14.3 12.8
- 16 15.9 0.1 13.1 8.5 8.7
- gt min(x)
- 1 0.1
- gt max(x)
- 1 19.4
- gt median(x) médiane
- 1 10.45
- gt mean(x) moyenne
- 1 10.095
- gt var(x) variance
- 1 27.43734
gt sd(x) standard deviation 1 5.238067 gt
sqrt(var(x)) 1 5.238067 gt length(x) 1 20
gt round(x) 1 10 2 2 15 3 13 19 6 9 11 10 14
14 14 13 16 0 13 8 9 gt cor(x,sin(x/20))
corrélation 1 0.997286 gt quantile(x) les
quantiles, 0 25 50 75 100 0.10
7.90 10.45 14.15 19.40
18Basic Mathematical tools
gt xyseq(-4pi,4pi,len27) gt rltsqrt(outer(x2,y
2,)) gt zcos(r2)exp(-r/6)
19Statistical tools
- Samples tests
- Checking normality
- Kolmogorov-Smirnov test
gt generate 500 observations from uniform (0,1)
distribution gt F500lt-runif(500)alt-c(mean(F500),s
d(F500)) gt qqnorm(F500) normal probability
plot gt qqline(F500) ideal sample will fall
near the straight line gtks.test(F500, "pnorm",
meana1, sda2) One-sample
Kolmogorov-Smirnov test data F500 D 0.0655,
p-value 0.02742 alternative hypothesis
two.sided
20The R Software
- R
- R statistical tools
- R graphical tools
- Graphical options
- Examples
- Examples
- Conclusions
21Graphical options
- Multiple plots in a single graphic window
- par(mfrowc(1,2))allows you to have two plots
side by side - par(mfrowc(2,3)) allows 6 plots to appear on a
page (2 rows of 3 plots each - Adjusting graphical parameters
- Labels and title axis limits
- Types for plots and lines
- Colors and characters
- Controlling axis line
- Controlling tick marks
- Legend
- Putting text to the plot controlling the text
size - Adding symbols to plots
- Adding arrow and line segment
22Graphics (1)
gt data(volcano) gt par(mfrow c(2, 2)) gt
image(volcano, main "heat.colors") gt
image(volcano, main "rainbow", col
rainbow(15)) gt image(volcano, main "topo", col
topo.colors(15)) gt image(volcano, main
"terrain.colors",col terrain.colors(15))
23Graphics (2)
- gt data(volcano)
- gt n.rnrow(volcano)
- gt x 10(1n.r) y 10(1n.r)
- gt image(x,y,volcano, col terrain.colors(100),axe
sF - gt contour(x,y,volcano,levelsseq(90,200,by),
addT,colbrown) - gt axis(1, at seq(100, 800, by 100))
- gt axis(2, at seq(100, 600, by 100))
- gt box()
- gt title(Munga Whau Volcano)
24Graphics (3)
- gt z2 volcano
- gt par(bg "slategray")
- gt persp(x, y, z, theta 135, phi 30, col
"green3", scale FALSE, ltheta -120, shade
0.75, border NA, box FALSE)
25Graphics (4)
- gt z lt- rbind(z0, cbind(z0, z, z0), z0)
- gt x lt- c(min(x) - 1e-10, x, max(x) 1e-10)
- gt y lt- c(min(y) - 1e-10, y, max(y) 1e-10)
- gt fill matrix("green3", nr nrow(z) - 1, nc
ncol(z) - 1) - gt fill, i2 c(1, ncol(fill)) "gray"
- gt filli1 c(1, nrow(fill)), "gray"
- gt fcol fill
- gt zi volcano-1, -1 volcano-1, -61
volcano-87, -1 volcano-87, -61 - gt fcol-i1, -i2 terrain.colors(20)cut(zi,
quantile(zi, seq(0, 1, len 21)),
include.lowest TRUE) - gt persp(x, y, 2 z, theta 110, phi 40, col
fcol, scale FALSE, ltheta -120, shade 0.4,
border NA, box FALSE)
26The R Software
- R
- R statistical tools
- R graphical tools
- Examples
- AR
- Conclusions
27AR
- X(n1) a X(n) bruit suite AR
- n 200
- x rep(0,n)
- for (i in 4n)
- xi 0.3xi-1 -0.7xi-2 0.5xi-3
rnorm(1) - op par(mfrowc(3,1))
- plot(ts(x), main"AR(3)")
- acf(x) autocorrelation function
- pacf(x) estimation of AR(infinit) coefficients
- par(op)
28AR (1)
29AR (2)
- Same example, but with arima.sim function
- n lt- 200
- x lt- arima.sim(list(arc(.3,-.7,.5)), n)
- op lt- par(mfrowc(3,1))
- plot(ts(x), main"AR(3)")
- acf(x)
- pacf(x)
- par(op)
30Advantages/Disadvantages
- Advantages
- Interfaces with C and FORTRAN
- Graphical flexibility
- Large library of statistical packeges and
functions - On-line help
- Easy programming
- Disadvantages
- Memory problems with huge data bases
- Sometimes lack of proper package description
- Technical support
31Conclusions
- Its free!
- Try it and join the R community
32References
- Documentation
- An Introduction to R (R-intro). This document
is based on the Notes on S-Plus by Bill
Venables and David Smith. - Writing R Extensions (R-exts) R Data
Import/Export (R-data) is a guide to importing
and exporting data to and from R. - The R Language Definition (R-lang), a first
version of the Kernighan Ritchie of R. - R Installation and Administration (R-admin).
- Books on R include
- P. Dalgaard (2002), Introductory Statistics with
R, Springer New York, ISBN 0-387-95475-9. - J. Fox (2002), An R and S-Plus Companion to
Applied Regression, Sage Publications, ISBN
0-761-92280-6 (softcover) or 0-761-92279-2 - J. Maindonald and J. Braun (2003), Data Analysis
and Graphics Using R An Example-Based Approach,
Cambridge University Press, ISBN 0-521-81336-0 - S. M. Iacus and G. Masarotto (2002), Laboratorio
di statistica con R, McGraw-Hill, ISBN
88-386-6084-0 (in Italian)
33References (2)
- Books
- W. N. Venables and B. D. Ripley (2002), Modern
Applied Statistics with S. Fourth Edition.
Springer, ISBN 0-387-95457-0 - W. N. Venables and B. D. Ripley (2000), S
Programming. Springer, ISBN 0-387-98966-8 P.
Spector (1994), An introduction to S and
S-Plus, Duxbury Press. - A. Krause and M. Olsen (2002), The Basics of
S-Plus (Third Edition). Springer, ISBN
0-387-95456-2 - J. C. Pinheiro and D. M. Bates (2000),
Mixed-Effects Models in S and S-Plus, Springer,
ISBN 0-387-98957-0 - D. Nolan and T. Speed (2000), Stat Labs
Mathematical Statistics Through Applications,
Springer Texts in Statistics, ISBN 0-387-98974-9 - Ihaka Gentleman (1996), R A Language for Data
Analysis and Graphics, Journal of Computational
and Graphical Statistics, 5, 299314. - An annotated bibliography (BibTeX format) of
R-related publications which includes most of the
above references can be found at - http//www.R-project.org/doc/bib/R.bib
34Thank youAny questions?