R Project - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

R Project

Description:

... c(1, ncol(fill))] = 'gray' fill[i1 = c(1, nrow(fill)), ] = 'gray' fcol = fill ... J. Fox (2002), 'An R and S-Plus Companion to Applied Regression', Sage ... – PowerPoint PPT presentation

Number of Views:100
Avg rating:3.0/5.0
Slides: 35
Provided by: nog9
Category:
Tags: project

less

Transcript and Presenter's Notes

Title: R Project


1
R Project
  • A programming environment for Data Analysis and
    Graphics

2
The R Software
  • R the tool
  • R statistical tools
  • R graphical tools
  • Examples
  • Conclusions

3
The R Software
  • R
  • What do you need?
  • The R-project
  • Why R? What is R?
  • What R does and does not
  • Packages
  • Getting help
  • R statistical tools
  • R graphical tools
  • Examples
  • Conclusions

4
What do you need?
  • Performance
  • Functionality
  • Extensibility
  • Simplicity
  • Compatability
  • Graphical Interface
  • Low-cost

5
The R-project
  • Authors
  • Ross Ihaka and Robert Gentleman
  • Statistics Department of the University of
    Auckland, New Zealand
  • Licence
  • R is available as Free Software
  • Free Software Foundations GNU General Public
    Licence in source code form
  • Platform
  • UNIX (FreeBSD, Linux), WINDOWS, MacOs
  • Contributions
  • Product of international collaboration
  • top computational statisticians
  • computer language designers
  • Web sites
  • http//www.r-project.org
  • http//cran.r-project.org
  • PACKAGES

6
Why R? What is R?
  • All source code is published correction check
    by expert statisticians
  • Comprehensive technical documentation and user
    contributed tutorials
  • It is fully programmable, with its own
    sophisticated computer language
  • Easy to write your own functions,
  • Easy to write whole packages
  • Exchange data in MS-Excel, text, and fixed and
    delineated formats
  • Easy importing and exporting datasets
  • Integrated suite of software facilities for data
    manipulation, calculation and graphical display

7
What R does and does not
  • is not a database, but connects to DBMSs
    (databased management systems)
  • has no graphical user interfaces, but connects
    to Java, Tcl/Tk
  • language interpreter can be very slow, but
    allows to call own C/C code
  • no spreadsheet view of data, but connects to
    Excel/MsOffice
  • no professional / commercial support
  • data handling and storage numeric, textual
  • operators for calculations on arrays matrices
  • tools for data analysis
  • high-level data analytic and statistical
    functions
  • graphics
  • programming language loops, branching,
    subroutines

8
Packages
  • base - The R Base Package
  • class - Functions for classification
  • cluster - Functions for clustering
  • mclust model-based cluster analysis
  • graphics - The R Graphics Package
  • mle - Maximum likelihood estimation
  • nnet - Feed-forward neural networks and
    multinomial log-linear models
  • spatial - functions for kriging and point pattern
    analysis

9
Packages (2)
  • ctest - classical statistical tests,
  • mva - multivariate analysis
  • gstat - multivariable geostatistical modelling,
    prediction and simulation
  • geoR functions for geostatistical analysis
  • fdim functions for calculating fractal
    dimensions
  • fields tools for spatial data
  • ncdf UCAR netCDF format reading
  • wavethresh wavelet statistics and transforms
  • ? directly downloadable from the internet

10
Getting help
Details about a specific command whose name you
know (input arguments, options, algorithm,
results) gt? t.test or gthelp(t.test)
11
Getting helpo HTML search engineo Search for
topics with regular expressions
help.search
12
The R Software
  • R
  • R statistical tools
  • Math
  • Stats
  • R graphical tools
  • Examples
  • Conclusions

13
Variables
gt a 49 gt sqrt(a) 1 7 gt a "The dog ate my
homework" gt sub("dog","cat",a) 1 "The cat ate
my homework gt a (113) gt a 1 FALSE
numeric
character string
logical
14
Lists
  • vector an ordered collection of data of the same
    type.
  • gt a c(7,5,1)
  • gt a2
  • 1 5
  • list an ordered collection of data of arbitrary
    types.
  • gt doe list(name"john",age28,marriedF)
  • gt doename
  • 1 "john
  • gt doeage
  • 1 28

15
Data frames
data frame is supposed to represent the typical
data table that researchers come up with like a
spreadsheet. It is a rectangular table with rows
and columns data within each column has the same
type (e.g. number, text, logical), but different
columns may have different types. Example gt
d.f localisation tumorsize
progress XX348 proximal 6.3
FALSE XX234 distal 8.0
TRUE XX987 proximal 10.0
FALSE
16
R as a simple calculator
  • gt xlt-c(1,3,2,10,5) ylt-15 creation of 2
    vectors
  • x
  • 1  1  3  2 10 5
  • gt xy 1  2  5  5 14 10 gt xy 1  1  6  6 40
    25 gt x/y 1 1.0000000 1.5000000 0.6666667
    2.5000000 1.0000000 gt xy 1     1     9     8
    10000  3125
  • gt sum(x)            sum of elements in x 1 21
    gt cumsum(x)         cumulative sum vector 1 
    1  4  6 16 21

17
Basic math/stat tools
  • gt 20 nombres entre 0 et 20,
  • gt arrondis à un chiffre après la virgule
  • gt x round(runif(20,0,20), digits1)
  • gt x
  • 1 10.0 1.6 2.5 15.2 3.1 12.6 19.4 6.1
  • 9 9.2 10.9 9.5 14.1 14.3 14.3 12.8
  • 16 15.9 0.1 13.1 8.5 8.7
  • gt min(x)
  • 1 0.1
  • gt max(x)
  • 1 19.4
  • gt median(x) médiane
  • 1 10.45
  • gt mean(x) moyenne
  • 1 10.095
  • gt var(x) variance
  • 1 27.43734

gt sd(x) standard deviation 1 5.238067 gt
sqrt(var(x)) 1 5.238067 gt length(x) 1 20
gt round(x) 1 10 2 2 15 3 13 19 6 9 11 10 14
14 14 13 16 0 13 8 9 gt cor(x,sin(x/20))
corrélation 1 0.997286 gt quantile(x) les
quantiles, 0 25 50 75 100 0.10
7.90 10.45 14.15 19.40
18
Basic Mathematical tools
gt xyseq(-4pi,4pi,len27) gt rltsqrt(outer(x2,y
2,)) gt zcos(r2)exp(-r/6)
19
Statistical tools
  • Samples tests
  • Checking normality
  • Kolmogorov-Smirnov test

gt generate 500 observations from uniform (0,1)
distribution gt F500lt-runif(500)alt-c(mean(F500),s
d(F500)) gt qqnorm(F500)    normal probability
plot gt qqline(F500)    ideal sample will fall
near the straight line gtks.test(F500, "pnorm",
meana1, sda2)         One-sample
Kolmogorov-Smirnov test data  F500 D 0.0655,
p-value 0.02742 alternative hypothesis
two.sided
20
The R Software
  • R
  • R statistical tools
  • R graphical tools
  • Graphical options
  • Examples
  • Examples
  • Conclusions

21
Graphical options
  • Multiple plots in a single graphic window
  • par(mfrowc(1,2))allows you to have two plots
    side by side
  • par(mfrowc(2,3)) allows 6 plots to appear on a
    page (2 rows of 3 plots each
  • Adjusting graphical parameters
  • Labels and title axis limits
  • Types for plots and lines
  • Colors and characters
  • Controlling axis line
  • Controlling tick marks
  • Legend
  • Putting text to the plot controlling the text
    size
  • Adding symbols to plots
  • Adding arrow and line segment

22
Graphics (1)
gt data(volcano) gt par(mfrow c(2, 2)) gt
image(volcano, main "heat.colors") gt
image(volcano, main "rainbow", col
rainbow(15)) gt image(volcano, main "topo", col
topo.colors(15)) gt image(volcano, main
"terrain.colors",col terrain.colors(15))
23
Graphics (2)
  • gt data(volcano)
  • gt n.rnrow(volcano)
  • gt x 10(1n.r) y 10(1n.r)
  • gt image(x,y,volcano, col terrain.colors(100),axe
    sF
  • gt contour(x,y,volcano,levelsseq(90,200,by),
    addT,colbrown)
  • gt axis(1, at seq(100, 800, by 100))
  • gt axis(2, at seq(100, 600, by 100))
  • gt box()
  • gt title(Munga Whau Volcano)

24
Graphics (3)
  • gt z2 volcano
  • gt par(bg "slategray")
  • gt persp(x, y, z, theta 135, phi 30, col
    "green3", scale FALSE, ltheta -120, shade
    0.75, border NA, box FALSE)

25
Graphics (4)
  • gt z lt- rbind(z0, cbind(z0, z, z0), z0)
  • gt x lt- c(min(x) - 1e-10, x, max(x) 1e-10)
  • gt y lt- c(min(y) - 1e-10, y, max(y) 1e-10)
  • gt fill matrix("green3", nr nrow(z) - 1, nc
    ncol(z) - 1)
  • gt fill, i2 c(1, ncol(fill)) "gray"
  • gt filli1 c(1, nrow(fill)), "gray"
  • gt fcol fill
  • gt zi volcano-1, -1 volcano-1, -61
    volcano-87, -1 volcano-87, -61
  • gt fcol-i1, -i2 terrain.colors(20)cut(zi,
    quantile(zi, seq(0, 1, len 21)),
    include.lowest TRUE)
  • gt persp(x, y, 2 z, theta 110, phi 40, col
    fcol, scale FALSE, ltheta -120, shade 0.4,
    border NA, box FALSE)

26
The R Software
  • R
  • R statistical tools
  • R graphical tools
  • Examples
  • AR
  • Conclusions

27
AR
  • X(n1) a X(n) bruit suite AR
  • n 200
  • x rep(0,n)
  • for (i in 4n)
  • xi 0.3xi-1 -0.7xi-2 0.5xi-3
    rnorm(1)
  • op par(mfrowc(3,1))
  • plot(ts(x), main"AR(3)")
  • acf(x) autocorrelation function
  • pacf(x) estimation of AR(infinit) coefficients
  • par(op)

28
AR (1)
29
AR (2)
  • Same example, but with arima.sim function
  • n lt- 200
  • x lt- arima.sim(list(arc(.3,-.7,.5)), n)
  • op lt- par(mfrowc(3,1))
  • plot(ts(x), main"AR(3)")
  • acf(x)
  • pacf(x)
  • par(op)

30
Advantages/Disadvantages
  • Advantages
  • Interfaces with C and FORTRAN
  • Graphical flexibility
  • Large library of statistical packeges and
    functions
  • On-line help
  • Easy programming
  • Disadvantages
  • Memory problems with huge data bases
  • Sometimes lack of proper package description
  • Technical support

31
Conclusions
  • Its free!
  • Try it and join the R community

32
References
  • Documentation
  • An Introduction to R (R-intro). This document
    is based on the Notes on S-Plus by Bill
    Venables and David Smith.
  • Writing R Extensions (R-exts) R Data
    Import/Export (R-data) is a guide to importing
    and exporting data to and from R.
  • The R Language Definition (R-lang), a first
    version of the Kernighan Ritchie of R.
  • R Installation and Administration (R-admin).
  • Books on R include
  • P. Dalgaard (2002), Introductory Statistics with
    R, Springer New York, ISBN 0-387-95475-9.
  • J. Fox (2002), An R and S-Plus Companion to
    Applied Regression, Sage Publications, ISBN
    0-761-92280-6 (softcover) or 0-761-92279-2
  • J. Maindonald and J. Braun (2003), Data Analysis
    and Graphics Using R An Example-Based Approach,
    Cambridge University Press, ISBN 0-521-81336-0
  • S. M. Iacus and G. Masarotto (2002), Laboratorio
    di statistica con R, McGraw-Hill, ISBN
    88-386-6084-0 (in Italian)

33
References (2)
  • Books
  • W. N. Venables and B. D. Ripley (2002), Modern
    Applied Statistics with S. Fourth Edition.
    Springer, ISBN 0-387-95457-0
  • W. N. Venables and B. D. Ripley (2000), S
    Programming. Springer, ISBN 0-387-98966-8 P.
    Spector (1994), An introduction to S and
    S-Plus, Duxbury Press.
  • A. Krause and M. Olsen (2002), The Basics of
    S-Plus (Third Edition). Springer, ISBN
    0-387-95456-2
  • J. C. Pinheiro and D. M. Bates (2000),
    Mixed-Effects Models in S and S-Plus, Springer,
    ISBN 0-387-98957-0
  • D. Nolan and T. Speed (2000), Stat Labs
    Mathematical Statistics Through Applications,
    Springer Texts in Statistics, ISBN 0-387-98974-9
  • Ihaka Gentleman (1996), R A Language for Data
    Analysis and Graphics, Journal of Computational
    and Graphical Statistics, 5, 299314.
  • An annotated bibliography (BibTeX format) of
    R-related publications which includes most of the
    above references can be found at
  • http//www.R-project.org/doc/bib/R.bib

34
Thank youAny questions?
Write a Comment
User Comments (0)
About PowerShow.com