Title: Introduction to Exploratory Descriptive Data Analysis in SPlus
1Introduction to Exploratory Descriptive Data
Analysis in S-Plus
- Jagdish S. Gangolly
- State University of New York at Albany
2Simple Structures I Arithmetic Operators
- Arithmetic Operators
- , /, , and -.
- Avoid amguity by using parantheses, eg., (72)3,
since 72313 and not 27. - Multiplication and division are evaluated before
addition subtraction. Raising to a power ( or
) takes precedence over everything else.
3Simple Structures I Assignments
- Assignments
- X lt- 3 or 3 -gt x or x_3 or x3
- Not a good idea to use underscore for assignment
or the equals sign. - To see the value of a variable x
- X or print(x)
- To remove a variable x
- Rm(x)
4Simple Structures II Concatenation
- Concatenation
- Used to create vectors of any length
- gt X lt- c(1.5, 2, 2.5)
- gt X
- 1.5 2.0 2.5
- gt X2
- 2.25 4.00 6.25
- .c can be used with any type of data
5Simple Structures III Sequence
- Sequence command
- Seq(lower, upper, increment)
- Some examples
- seq(1,35,5) 1 6 11 16 21 26 31
- seq(5,15,1.5) 5 6.5 8.0 9.5 11 12.5 14.0
- seq(50,25,-5) 50 45 40 35 30 25
6Simple Structures IV Replicate
- Replicate command to generate data that follow a
regular pattern - Some examples
- rep(8,5) 8 8 8 8 8
- rep(8, 5) 8 8 8 8 8
- rep(c(0,ab),2)0 ab 0 ab
- rep(14, 14) 1 2 2 3 3 3 4 4 4 4
- Rep(13, rep(2,3)) 1 1 2 2 3 3
- Rep(c(1,8,7),length5))1 8 7 1 8
7Simple Structures V Expressions
- gt X lt- seq(2,10,2)
- gt Y lt- 15
- gt Z lt- ((3x22y)/((xy)(x-y)))(0.5)
- gt X
- 2 4 6 8 10
- gt Y
- 1 2 3 4 5
- gt Z
- 2.160247 2.081666 2.054805 2.041241 2.033060
8Simple Structures VI Logical Operators
- lt Less Than
- gt Greater than
- lt Less than or equal to
- gt Greater than or equal to
- Equal to
- ! Not equal to
9Simple Structures VII Index Brackets
- Square brackets are used to index vectors and
- matrices.
- gt x lt- seq(0,20,10)
- gt x2
- 10
- gt x5
- NA
- gt xc(1,3)
- 0 20
- gt x-1
- 10 20
10Data Manipulation I Frames matrices I
- Matrices two-dimensional vectors (have row and
column indices - Arrays General data structure in S-Plus
- Zero-dimensional scalar
- One-dimensional vector
- Two-dimensional matrix
- Three to eight-dimensional arrays
- The data in a matrix must all be of the same
datatype (usually numeric datatypes)
11Data Manipulation I Frames matrices II
- The columns in dataframes can be of different
datatypes - Lists The most general datatype in S-Plus
12Data Manipulation I Matrices I
- Reading data
- S-Plus is very finicky about format of input data
- To read a table
- Read.table(filename) The first column must be
rownames - The first row must be column names
- The top left cell must be empty
- Space/tab the default column delimiters
- See the example fasb103.txt in the directory
/db4/teach/acc522/ on the machine
cayley.ba.albany.edu and play around with it.
13Data Manipulation I matrices II
- Read.table and as.matrix()
- x lt- Read.table(filename)
- as.matrix(x)
- Enter data directly
- Matrix(data, nrow, ncol, byrowF)
- Example
- x lt- Matrix(16, nrow2, byrowT)
- dim(x) (2 X 3)
- Dimnames(x) (NULL)
14Data Manipulation I matrices III
- Elements of matrices are accessed by specifying
the row and column indices. - Example
- data lt- c(227,8,1.3,1534,58,1.2,2365,82,1.8)
- dountries lt- c(austria, france, germany)
- variables lt- c(gdp, pop, inflation)
- country.data lt- matrix(data,nrow3,byrowT)
- dimnames(country.data)lt- list(countries,variables)
- Country.data12,23 pop and inflation of
austria france
15S-Plus Graphics I
- To open a graphics window
- motif()
- You can adjust the color scheme and print options
through the drop-down menu on the motif window. - To plot two variables x and y,
- plot(x,y)
- Example (sine curve)
- plot(1100, sin(1100/10))