R Code Optimization - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

R Code Optimization

Description:

R Code Optimization. Matthew T. Pratola. Dept. of ... R is an interpreted language ... to NA's, then directly assign the values using a row (or col) counter ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 22
Provided by: Zyn
Category:

less

Transcript and Presenter's Notes

Title: R Code Optimization


1
R Code Optimization
  • Matthew T. Pratola
  • Dept. of Statistics and Actuarial Science
  • Simon Fraser University
  • 10.19.2004

2
Outline
  • A quick bit about R
  • Why is it slow?
  • How can we speed it up?
  • Conclusions

3
R Code Optimization
  • R is an interpreted language
  • Easy to author code to reflect an idea due to
    the friendliness of the language
  • Worry about efficiency later

4
Why might it be slow?
  • Computers operate on a few basic datatypes
    integers, floats, chars to name the main types
  • R doesnt directly operate with these underlying
    datatypes, instead it works on abstract
    representations of these (and more complex)
    types.

5
Why might it be slow?
  • Why? So strange things like NAs, data frames,
    etc, can be easily created, accessed and
    manipulated by the end user

6
Why might it be slow?
  • Read abstract representations gt think more
    memory used to represent a given unit of data
  • This means that for every operation you do on
    your data, your computer must spend extra time
    moving around all this memory, much of which
    doesnt directly have anything to do with the
    value of interest

7
How to make it faster?
  • Standard approaches
  • Investigate algorithmic design of your program
  • Optimize memory access patterns
  • Others

8
How?
  • Avoid using dataframes within your code
  • Simple example
  • mmatrix(nrow2000,ncol1)
  • for(i in (12000))mi,11
  • vs.
  • mas.data.frame(m)
  • for(i in (12000))mi,11

9
How?
  • Difference for the above was 0.05s vs. 5.47s
  • Fix Only convert your data to a dataframe at
    the end of the program ie for the user

10
How?
  • Another good trick avoid the use of rbind/cbind
    if possible during long (intensive) loops
  • Instead, pre-allocate the entire (or, if not
    possible, a very large) array initialized to
    NAs, then directly assign the values using a row
    (or col) counter

11
How?
  • Eg
  • xc(1,1,1,1)
  • for(i in 15000)xrbind(x,c(i,i,i,i))
  • vs.
  • xmatrix(nrow5000,ncol4)
  • for(i in 15000)xi,c(i,i,i,i)
  • 0.15s vs. 12.45s

12
How?
  • Another trick often it may be the case that you
    have a main loop in your code, perhaps updating
    many matrices
  • But, it could be that there is no interdependence
    amongst the matrices you are updating

13
How?
  • Consider
  • for()
  • blah
  • blah2
  • vs.
  • for() blah
  • for() blah2

14
How?
  • This second approach can often yield a reasonable
    gain in a very long, intensive loop
  • Real compilers (C, Fortran, etc) do this
    automatically, but R does not

15
How?
  • A final idea avoid excessive use of string
    manipulation functions, such as
  • as.numeric(unlist(strsplit(myvar,split)))
  • Working around excessive use of this type of code
    can provide some improvement

16
How?
  • Other more detailed optimizations can yield some
    improvement, although past these simple
    approaches, the percentage improvement will
    quickly fall to single digit levels
  • Recognize that many basic R operations are
    optimally coded in C, so if you write your code
    to fit well with these operations, you will
    retain execution speed

17
How? system.time
  • Quick and easy way to test, say, one particular
    function
  • gc() forces a garbage collection event
  • system.time(myfunction)
  • gives you output like
  • 1 0.64 0.00 0.66 NA NA
  • Format (user)(system)(elapsed)(NA)(NA)

18
How? Profiling
  • Use Rprof to gain insight into where your code is
    slow
  • Rprof(Rprof.dat)
  • ltcodegt
  • Rprof(NULL)
  • Get profile output by running
  • R CMD Rprof Rprof.dat gt profile.txt

19
How? - Profiling
  • You should always use a high-level function
    when you are profiling, or else Rprof may not
    report the time you expect it will report
  • Not sure if this is available on windows

20
Conclusion
  • Speedup is possible in R using a few tricks
  • Sometimes these are easy to implement, sometimes
    they are not
  • Try to achieve an idea between code that conveys
    an idea and code that runs quickly
  • Try simple approaches that are within your
    comfort zone

21
Questions?
Write a Comment
User Comments (0)
About PowerShow.com