Introduction to R - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Introduction to R

Description:

Matrix multiplication: ... mat -matrix(0,2,3) To make the ... Rows and columns of matrices. You can pick an individual data point, row or column from a matrix ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 46
Provided by: bhe5
Category:

less

Transcript and Presenter's Notes

Title: Introduction to R


1
Introduction to R
  • Summer session Lecture 3
  • Brian Healy

2
Outline
  • Discussion of R
  • Importing and changing data
  • Creating your own data
  • Summary statistics / graphs
  • Tests for normality

3
What is R?
  • Statistical computer language similar to S-plus
  • Has many built-in statistical functions
  • Easy to build your own functions (similar to SAS
    macros)
  • Good graphic displays
  • Extensive help files

4
Strengths
  • Many built-in functions
  • Can get other functions from the internet by
    downloading libraries
  • Relatively easy data manipulations

Weaknesses
  • Not as commonly used by non-statisticians
  • Many datasets already in SAS form

5
Starting R
  • Windows / HSPH computers
  • Open using the start menu or g drive
  • Unix / Telnet in HSPH from home
  • Open using R1.9 command
  • Unix specific commands will be discussed at the
    end of the session

6
Writing R code
  • Can input lines one at a time into R
  • Can write many lines of code in a text editor and
    run all at once
  • Using Windows version, simply paste the commands
    into R
  • Using Unix version, save the commands and run in
    batch mode

7
Types of commands
  • Defining variables
  • Inputting data
  • Using built-in functions
  • Using the help menu and notation
  • ?functionname, help.search(functionname)
  • Writing your own functions

8
Language layout
  • Three types of statement
  • expression it is evaluated, printed, and the
    value is lost (35)
  • assignment passes the value to a variable but
    the result is not printed automatically
    (outlt-35)
  • comment (This is a comment)

9
Naming conventions
  • Any roman letters, digits, and . (non-initial
    position)
  • Avoid using system names c, q, s, t, C, D, F, I,
    T, diff, mean, pi, range, rank, tree, var
  • Hold for variables, data and functions

10
Arithmetic operations and functions
  • Most operations in R are similar to Excel and
    calculators
  • Basic (add), -(subtract), (multiply),
    /(divide)
  • Exponentiation
  • Remainder or modulo operator
  • Matrix multiplication
  • sin(x), cos(x), cosh(x), tan(x), tanh(x),
    acos(x), acosh(x), asin(x), asinh(x), atan(x),
    atan(x,y) atanh(x)
  • abs(x), ceiling(x), floor(x)
  • exp(x), log(x, baseexp(1)), log10(x), sqrt(x),
    trunc(x) (the next integer closer to zero)
  • max(), min()

11
Defining new variables
  • Assignment symbol, use lt- (or _)
  • Scalars
  • scallt-6
  • valuelt-7
  • Vectors
  • veclt-c(0,1,2)
  • vec2lt-c(110)
  • vec3lt-c(8,6,4,2,10,12,14)
  • famnameslt-c("Kate", "Andrew", "Brian")
  • Variable names are case sensitive

12
Examples
  • Try the following
  • 342
  • 5scal
  • (valuescal)2
  • 3vec
  • sqrt(vec2)
  • trunc(log(vec3))
  • What happened in the last three cases?
  • How can we assign the result of these to a new
    variable?
  • What is the minimum of vec3?

13
Indexing vectors
  • To choose individual observations from a vector,
    use vec1
  • Try the following
  • Find the 3rd value from vec
  • List the 4th, 5th, and 6th values from vec2
  • Make a new variable that is only the first 2
    values from vec2

14
Matrix
  • There are several ways to make a matrix
  • To make a 2x3 (2 rows, 3 columns) matrix of 0s
  • matlt-matrix(0,2,3)
  • To make the following matrix
  • mat2lt-rbind(c(71,172),c(73,169),c(69,160),c(65,130
    ))
  • mat3lt-cbind(c(71,73,69,65),c(172,169,160,130))
  • To make the following matrix
  • mat4lt-matrix(vec2,2,5, byrowT)

15
Matrix Examples
  • Create the following matrices
  • ex1
  • ex2 ex3

16
Rows and columns of matrices
  • You can pick an individual data point, row or
    column from a matrix
  • mat1,2 will give the observation in the first
    row, second column
  • What happens when you type
  • mat2,2
  • mat22,
  • mat413,1

17
Lists
  • The final form of data is lists
  • Type the following code
  • ourlistlt-list(v1vec,v2vec2, famfamnames)
  • Now, lets look at ourlist
  • If there are no names, we can give the elements
    of our list names using
  • names(outlist)lt-c(v1, v2, fam)
  • To get individual parts of a list we must use the
    sign
  • What happens when you type ourlistv1?
  • What happens when you type ourlistv11?

18
Data frames
  • Very similar to matrices in form
  • Different columns can be of different types,
    unlike matrices
  • famageslt-c(24, 29, 27) famheightlt- c(64, 73, 71)
  • We can put my family names, ages and heights into
    a data frame
  • famlt-data.frame(famnames, famages, famheight)

19
Converting data
  • To change a matrix to a data frame
  • as.data.frame()
  • To change a data frame to a matrix use
  • fammatlt-as.matrix(fam)
  • Try this on your own
  • How does this change?
  • What happens when we type fam,23 and
    fammat,23

20
Opening data
  • You must always know the directory where your
    files have been stored
  • In R for Windows, use \\
  • c\\splus\\free.dat
  • In Unix, splus/free.dat
  • For now we will show to do this in Windows

21
Reading in data
  • Change the directory to g\shared\bio271summer
  • Lets look at the class data set in the notepad
  • To read in data, use this command
    classlt-read.table("class.dat", headerT)
  • This command assumes that the data is space or
    tab delimited
  • Reads in as data frame

22
Working with the data
  • Type class to look at the data
  • How could we find people height in cm?
  • Use cmheightlt-class,32.54
  • This command completes the operation on the
    entire column as we discussed before
  • We can attach this new variable to the old
    dataset using newclasslt-cbind(class,cmheight)
  • rbind is used to combine extra rows
  • In this example, that would be more students

23
Reading in data cont.
  • Look at the data in auto.dat in the notepad
  • Note that data is comma delimited
  • Try the read.table method
  • What is wrong? Look at ?read.table
  • autolt-read.table(auto2.dat", sep",", headerT)

24
Practice
  • Find the minimum height in the class
  • Add two family members to your class data set
  • Make a new variable mpg from the third column of
    auto

25
Outputting data-CHECK
  • write a vector to a file write()
  • write(x,fileoutdata)
  • write a matrices or data frames
  • write.matrixlt-function(x,file,sep)
  • xlt-as.matrix(x)
  • plt-ncol(x)
  • cat(dimnames(x)2,format(t(x)),filefile,
  • sepc(rep(sep,p-1),\n))
  • Try to write your family dataset to your P\\
    drive

26
Input/Output
  • execute commands from a file
  • source(command.s)
  • use options(echoT) to have the commands echoed.
  • divert output to a file
  • sink(record.lis)
  • write objects to an external file
  • dump(c(a,x,ink),fileoutdat)
  • general print
  • cat(format(iris,1,1),fill60)

27
Sorting data
  • There are several ways to sort data in R
  • To sort a vector, use sort()
  • To sort the ages in the class, sort(class,2)
  • To sort the entire matrix, use order()
  • To order the class by age, classorder(class,2),
  • To get the same result in two steps
  • olt-order(class,2)
  • classo,
  • Try to sort auto by foreign

28
Missing values
  • What happens if there are missing values as in
    the auto data sets?
  • R codes these as NA
  • How can we change the NAs to 0s?
  • is.na(x) is a logical function that assigns a T
    to all values that are NA and F otherwise
  • Ans datais.na(data)lt-0

29
Practice
  • In slide 11, several functions were mentioned
    including sum and min
  • Try these functions on mpg
  • What has happened?
  • How can we find the sum of mpg not including the
    missing values?
  • mean(mpg!is.na(mpg))

30
Practice
  • What is the difference when you use the functions
    sqrt on the same data?
  • Why is there a difference in the effect of the
    missing data?

31
Loops and conditionals
  • Conditional
  • if (expr) expr
  • if (expr) expr else expr
  • Iteration
  • repeat expr
  • while (expr) expr
  • for (name in expr1) expr
  • For comparisons use
  • for equal
  • ! for not equal
  • gt for greater than
  • for and
  • for or

32
Examples
  • What happens with the following code?
  • if (value1) checklt-1 else checklt-0
  • counterlt-0
  • for (i in 110)
  • if (autoi,3lt10)counterlt-counter1

33
Basic R functions
  • Lets use the class data set
  • Find the summary statistics of the data
  • summary(class)
  • Notice how names are handled compared to ages and
    heights
  • What happens when we type range(class)
  • How can we find the range of age and height using
    this command?

34
Functions
  • You can define a function to complete any
    operation
  • outlt-function(var)definition
  • Lets look at this function
  • filter lt- function(x)
  • if (is.na(sum(x)))fillt-F
  • else fillt-T
  • filt lt- apply(auto, 1, filter)
  • newautolt-autofilt,
  • What is this function doing?

35
Practice
  • Write a function to calculate the sum of the
    numbers in a vector greater ten and the sum of
    the numbers less than or equal to ten
  • Output the answer in a list using list(ans1,ans2)
    at the end of your function
  • Try your function on the mpg data from the auto
    data set
  • Look at the output when you apply your function

36
Generating data from distributions
  • Many applications require generating data from
    specific distributions
  • rltdistnamegt(n,ltparametersgt)
  • Possible distributions beta, cauchy, chisq, f,
    gamma, norm, t, unif
  • You find other characteristics of distributions
    as well
  • dltdistgt(x,ltparametersgt) density at x
  • pltdistgt(x,ltparametersgt) cumulative distribution
    function to x
  • qltdistgt(p,ltparametersgt) inverse cdf

37
Using the distribution functions
  • Often when we use simulations we need to use the
    r functions
  • Try samplelt-runif(n10,min0,max1)
  • What happens here?

38
Plots
  • One of the biggest advantages of R is the quality
    of the plots
  • Lets plot the ages of the class
  • To make plots in R, use the following commands
    for the appropriate plots
  • agelt-class,2
  • histogram- hist(age)
  • box plot- boxplot(age)

39
Plot Command
  • The basic command-line command for producing a
    scatter plot or line graph.
  • col set colors,
  • lty set line types,
  • lwd set line widths,
  • pch set the character type,
  • type pick points (type "p"), lines ("l"),
  • cex set the "character expansion,
  • xlab and ylab set the labels,
  • xlim and ylim set the limits of the axes,
  • main put a title on the plot,
  • mtext add a sub-title,
  • help (par) for details

40
One-Dimensional Plots
  • barplot(height) simple form
  • barplot(height, width, names, space.2,
    insideTRUE, besideFALSE, horizFALSE, legend,
    angle, density, col, blocksTRUE)
  • boxplot(..., range, width, varwidthFALSE,
    notchFALSE, names, plotTRUE)
  • hist(x, nclass, breaks, plotTRUE, angle,
    density, col, inside)

41
Two-Dimensional Plots
  • lines(x, y, type"l")
  • points(x, y, type"p"))
  • matplot(x, y, type"p", lty15, pch, col14)
  • matpoints(x, y, type"p", lty15, pch, col14)
  • matlines(x, y, type"l", lty15, pch, col14)
  • plot(x, y, type"p", log"")
  • abline(coef), abline(a, b), abline(reg),
    abline(h), abline(v)
  • qqplot(x, y, plotTRUE)
  • qqnorm(x, dataxFALSE, plotTRUE)

42
Three-Dimensional Plots
  • contour(x, y, z, v, nint5, addFALSE, labex)
  • interp(x, y, z, xo, yo, ncp0, extrapFALSE)
  • persp(z, eyec(-6,-8,5), ar1)

43
Multiple Plots Per Page
  • par(mfrowc(nrow, ncol), omac(0, 0, 4, 0))
  • mfrowc(m,n) subsequent figures will be drawn
    row-by-row in an m by n matrix on the page.
  • omac(xbot,xlef,xtop,xrig)outer margin lines of
    text.
  • mtext(side3, line0, cex2, outerT, "This is an
    Overall Title For the Page")
  • Try this code on your own
  • par(mfrowc(2,1)
  • hist(age)
  • plot(class,2,class,3)

44
Output to a postscript file
  • Often we want to output an R graph to a
    postscript file to place it into a Latex file or
    other document
  • To do this, we use the following code
  • postscript(graph1.ps) This opens a postscript
    file in the home directory
  • plot(regr) This plots a graph into the file
  • dev.off() This closes the postscript file

45
Making plots of your own
  • Make the following plots
  • Histogram of height in the class with the
    appropriate labels
  • Scatterplot of height and age in the class using
    a different point
  • Make a postscript file with four plots of your
    choice from the baby dataset on one graph
  • Write a function to make a histogram and boxplot
    on one graph and use it on any of your data
Write a Comment
User Comments (0)
About PowerShow.com