Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Drowning in Data yet Starving for Knowledge [Naisbitt -Rogers] - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Drowning in Data yet Starving for Knowledge [Naisbitt -Rogers]

Description:

Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than ... – PowerPoint PPT presentation

Number of Views:287
Avg rating:3.0/5.0
Slides: 55
Provided by: pbro5
Category:

less

Transcript and Presenter's Notes

Title: Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Drowning in Data yet Starving for Knowledge [Naisbitt -Rogers]


1
Spatial Statistics and Spatial KnowledgeDiscovery
First law of geography Tobler Everything is
related to everything, but nearby things are more
related than distant things. Drowning in Data
yet Starving for Knowledge Naisbitt -Rogers
  • Lecture 1 Introduction to R
  • Pat Browne

2
Introduction to programming in R
  • R is a computer language and environment that
    allows users to program algorithms and use
    pre-written packages. R is a free software
    environment for statistical computing and
    graphics (including mapping).
  • There are special R-packages for handling and
    analyzing spatial data. For example, The sp
    package provides classes and methods for points,
    lines, polygons, and grids.
  • R can extract spatial data from PostgreSQL. Also,
    R can be combined with SQL using PL/R.

3
Installing R
  • R for Windows can be downloaded from
  • http//ftp.heanet.ie/mirrors/cran.r-project.org/bi
    n/windows/base/R-2.14.1-win.exe
  • See Lab1.doc for installation details.

4
Starting R
  • We will look at the main features of R, see
    lab1.doc for more details. This lecture also
    presents an introduction to programming.
  • The basic components of current languages are
  • Data types e.g. Integers, String, Polygon.
  • Variables to refer to data types e.g. a lt- 2
  • Operations on those data types e.g. area(polygon)
  • Control structures e.g. sequence, iteration, and
    conditions.
  • Logic is an important part of programming, but it
    is often implicit and external to the language.
    Some languages like SQL are quite close to logic.

5
Starting R Programs consists of Data,
Operations etc.
  • The basic components of current languages are
  • Concrete data types e.g. Integer, String,
    Polygon.
  • Variables to refer to data types e.g. a lt- 2
  • Operations on those data types e.g. area(polygon)
  • Control structures e.g. sequence, iteration, and
    conditions.
  • Logic is an important part of programming, but it
    is often implicit and external to the language.
    Some languages like SQL are quite close to logic.

6
Starting R Variables
  • Variables provide a means of accessing the data
    stored in computer memory. R provides a number of
    specialized data structures or objects (also
    called data types). These objects are referenced
    in your programs using variables.
  • Store a lt- 2 Access a
  • Store b lt-Pat Access b
  • Assigns the variable a the number 2 and the
    variable b the string Pat.

7
Starting R Data types
  • A data type represents a constraint placed upon
    the interpretation of data in a type system,
    describing representation, interpretation, legal
    operations and structure of values.
  • Data types are a way to limit the kind of data
    that can be used by a particular program or
    stored in a database table. Types restrict the
    data to a certain set of values (e.g. 1,2,3,..for
    Integers).
  • Data types also are restricted to certain
    operations on the type (e.g. addition for
    Integers). R comes with a range of standard data
    types that can be used to represent strings,
    integers, real numbers, and dates, but R also has
    types that are especially suited to statistics
    such as vectors and tables.

8
Starting R Data types
The c() function combines its argument into a
vector. In R the term modes is used to describe
data types. There are 4 basic types or modes
numeric, character, complex , and logical. These
can be combined to form collections or what are
called objects in R.
9
Starting R Data types (Objects)
10
Starting R Data types (Objects)
11
Starting R Data types (Objects)
12
Starting R Finding data types
13
Starting R Data types
  • Numbers 1, 1.4.
  • Strings ABC or abc
  • Vector
  • Arrays are vectors plus dimension vector (dim)
  • Factors for nominal ordered categorical data
  • Data Frames matrix-like for data of different
    types
  • Tables
  • One Way Tables
  • Two Way Tables

14
Starting R Data types- Numbers
  • a lt- 3
  • b lt- sqrt(aa3)
  • List of the defined variables/objects
  • gt ls()
  • We can add 1 to every element of a list
  • gt a lt- c(1,2,3,4,5)
  • gt a1
  • We can get the mean, variance, and standard
    deviation from a list of numbers
  • gt mean(a)
  • gt var(a)
  • gt sd(a)

15
Starting R Data types- Strings
  • gt a lt- "hello"
  • gt a 1 "hello"
  • gt b lt- c("hello","there")
  • gt b 1
  • gt b 2

16
Starting R Data types-Vector
  • R operates on named data structures. The simplest
    such structure is the numeric vector, which is a
    single entity consisting of an ordered collection
    of numbers. To set up a vector named x use the R
    command
  • gt x lt- c(10.4, 5.6, 3.1, 6.4, 21.7)
  • gt x2
  • Variable assignment can be written as lt- in R.
    The above assignment uses the function c() which
    can take an arbitrary number of vector arguments
    and whose value is a vector got by concatenating
    its arguments end to end.
  • A number occurring by itself in an expression is
    taken as a vector of length one.

17
Starting R Data types-Arrays
  • Arrays are vectors plus the dim attribute
    (dimension vector), matrices are arrays with a
    dim attribute of length 2. Arrays are ordered
    column major order

18
Starting R Data types-Matrices
  • Arrays are vectors plus the dim attribute
    (dimension vector), matrices are arrays with a
    dim attribute of length 2. Arrays are ordered
    column major order

19
Starting R Data types-Tables
  • xc("Yes","No","No","Yes","Yes")
  • gt table(x)
  • x
  • No Yes
  • 2 3

20
Types of Categorical data
  • Nominal Mutually exclusive categories
    male/female, dead/alive, smoker/non-smoker,
    bus/car/train. Tends to be unordered or have no
    logical hierarchy
  • Ordinal Can be ranked in a meaningful order.
    Distance between values is not relevant as there
    is no distance information race positions (1st,
    2nd, 3rd), grouped amounts (1-5, 6-10, 11-15 per
    day). Unlike nominal data, ordinal data can be
    compared against each other.

21
Starting R Data types- Factor
  • When looking at the impact of carbon dioxide
    (CO2) on the growth rate of a tree you might try
    to observe how different trees grow when exposed
    to different preset concentrations of CO2. The
    different levels are often called categories or
    factors. CO2 is measured in parts per million by
    volume (ppmv). Levels could be L1 0-3, L23-6,
    L36-9, L49-12 ppmv (ignoring double inclusion
    of boundaries).

22
Starting R Data types- Factor
  • Categorical data is often used to classify data
    into various levels or factors. For example,
    smoking data could be a factor in a broader
    survey on health issues. R has a special class
    for working with factors, R will adapt itself
    when it knows it has a factor.
  • gt xc("Yes","No","No","Yes","Yes")
  • gt factor(x)
  • 1 Yes No No Yes Yes
  • Levels No Yes

23
Starting R Data types- Factor
  • We will assume that your data files are stored in
    C\My-R-Dir\
  • Load in the file tree91.csv.
  • tree lt- read.csv(file"C\\My-R-Dir\\trees91.csv"
    ,headerTRUE,sep",")
  • The summary operation prints out the possible
    values and the frequency that they occur. Find
    summary of the chamber identification label
    (CHBR)
  • summary(treeCHBR)

24
Starting R Data types- Factor
  • summary(treeCHBR)
  • Note the output of the summary operation produces
    quartiles. A quartile is one of three points
    (including the median), that divide a data set
    into four equal groups, each representing a
    fourth of the distributed sampled population.

25
Starting R Data types- Factor
  • A nominal value is represented as a factor in R.
    The factor stores the nominal values as a vector
    of integers in the range 1... k
  • where k is the number of unique values in the
    nominal variable e.g. male1,female2,
  • and an internal vector of character strings (the
    original values) mapped to these integers.

26
Starting R Data types- Factor
  • Consider variable gender with 20 male entries and
    30 female
  • gender lt- c(rep("male",20), rep("female", 30))
  • gender lt- factor(gender)
  • Stores gender as 20 1s and 30 2s, where 1female,
    2male internally (alphabetically)
  • R now treats gender as a nominal variable
  • summary(gender)
  • What does rep() do? How would you find out?
  • Type ? rep() into R and see.

27
Starting R Data types- Factor
  • An ordered factor is used to represent an ordinal
    variable. Consider a variable rating coded as
    large, medium, small
  • rating lt- c(rep("large",10), rep("medium",
    10),rep("small", 10) )
  • rating lt- ordered(rating)
  • R codes rating to 1,2,3 and associates 1large,
    2medium, 3small internally
  • R uses factor for nominal variables and ordered
    for ordinal variables in statistical procedures
    and graphical analyses.
  • Try the command plot(rating)

28
Starting R Data types- Factor
  • A factor is a vector object used to specify a
    discrete classification (grouping) of the
    components of other vectors of the same length. R
    provides both ordered and unordered factors. The
    application of factors is with model formulae. A
    sample of 30 tax accountants from all the states
    of Australia by a character vectors as
  • state lt- c("tas", "sa", "qld", "nsw", "nsw",
    "nt", "wa", "wa", "qld", "vic", "nsw", "vic",
    "qld", "qld", "sa", "tas", "sa", "nt", "wa",
    "vic", "qld", "nsw", "nsw", "wa", "sa", "act",
    "nsw", "vic", "vic", "act")
  • A factor is created using the factor() function
  • statef lt- factor(state)
  • summary(statef)
  • To find out the levels of a factor the function
    levels() can be used.
  • levels(statef) 1 "act" "nsw" "nt" "qld" "sa"
    "tas" "vic" "wa"

29
Starting R Data types- Matrix
  • A matrix is a collection of data elements
    arranged in a two-dimensional rectangular layout.
    The following is an example of a matrix with 2
    rows and 3 columns.

30
Starting R Data types- Matrix
  • gt A  matrix(    c(2, 4, 3, 1, 5, 7),  the data
     elements    nrow2,               number of ro
    ws    ncol3,               number of columns 
       byrow  TRUE)         fill matrix by rows  
  • gt A    print the matrix      ,1 ,2 ,3 1,
        2    4    3 2,    1    5    7
  • An element at the mth row, nth column of A can be
    accessed by the expression Am, n.
  • gt A2, 3       element at 2nd row, 3rd column 
    1 7
  • The entire mth row A can be extracted as Am, .
  • gt A2,         the 2nd row 1 1 5 7
  • Similarly, the entire nth column A can be
    extracted as A ,n.
  • gt A ,3        the 3rd column 1 3 7

31
Starting R Data types- Dataframe
  • A dataframe is more general than a matrix, in
    that different columns can have different modes
    (numeric, character, factor, etc.). It is a bit
    like an SQL table.
  • d lt- c(1,2,3,4)e lt- c("red", "white", "red",
    NA)f lt- c(TRUE,TRUE,TRUE,FALSE)mydata lt-
    data.frame(d,e,f)names(mydata) lt-
    c("ID","Color","Passed")
  • There are a variety of ways to identify the
    elements of a dataframe .
  • mydata23 columns 2,3 of dataframe
  • mydatac("ID",Color") columns ID,Color
  • myframeID name in dataframe

32
Starting R Data types- data.frame
  • Here we create a data.frame called d.
  • L3 lt- LETTERS13
  • (d lt- data.frame(cbind(x1, y110),
  • facsample(L3, 10, replaceTRUE)))
  • To view four rows df14,
  • To view a column dy, dy, dfac
  • Alternative way to view a column d,3

33
Starting R Data types- Table
  • One way tables are created with table command,
    its arguments are a vector of factors, and it
    calculates the frequency that each factor occurs.

34
Starting R Data types- one-way Table
  • gt a lt- factor(c("A","A","B","A","B","B","C","A","C
    "))
  • gt results lt- table(a)
  • gt results
  • gta
  • A B C
  • 4 3 2
  • gt attributes(results)
  • gtattributes(results) dimnamesa
  • gtattributes(results) dim
  • gtattributes(results) class
  • gt summary(results)

35
Starting R Data types- two-way Table
  • Say we want to put the results of two questions
    into a table
  • First question responses are Never, Sometimes,
    Always,
  • Second question responses are Yes, No, and
    Maybe. Two vectors a and b contain the response
    for each measurement.
  • In the vectors, responses are represented by
    position. The third item in a is how the third
    person responded to the first question, and the
    third item in b is how the third person responded
    to the second question.
  • In the following we can see that two people who
    said "Maybe" to the first question also said
    "Sometimes" to the second question.

36
Starting R Data types- two-way Table
ROW COLUMN
  • a lt-
  • c("Sometimes","Sometimes","Never","Always","Always
    ","Sometimes","Sometimes","Never")
  • b lt- c("Maybe","Maybe","Yes","Maybe","Maybe","No",
    "Yes","No")
  • results lt- table(a,b)
  • gt results
  • b
  • a Maybe No Yes
  • Always 2 0 0
  • Never 0 1 1
  • Sometimes 2 1 1
  • The table shows that two people who said Maybe to
    the first question
  • also said Sometimes to the second question.
  • The elements are accessed like a matrix
    (result(,1). )
  • How many people responded?

The third item in a is how the third person
responded to the first question, and the third
item in b is how the third person responded to
the second question.
37
Useful functions
  • length(object) number of elements or
    componentsstr(object)     structure of an
    object class(object)   class or type of an
    objectnames(object)   namesc(object,object,...)
    combine objects into a vectorcbind(object,
    object, ...) combine objects as
    columnsrbind(object, object, ...) combine
    objects as rows

38
Useful functions
  • object()      prints the
    objectls(),objects()    list current
    objectsrm(object) delete an
    objectnewobjectlt-edit(object) edit,copy,save
  • fix(object)    edit in place
  • data.entry(result) GUI edit in place
  • mode(object) type of the object.

39
Starting R Input-Output IO
  • There are many ways to data into R. We focus on
    just three
  • Assignment
  • Reading a CSV File (writing later)
  • Loading data from PostgreSQL (later)

40
Starting R IO-Assignment
  • Assignment (RHS lt- LHS) allows an expression on
    the RHS to be stored in a name object on the LHS.
    In R
  • gt a lt- c(3,5,7,9) gt
  • The above assignment uses the combine command.
    (c means combine). This makes a vector called a.
    No output is produced yet. Now we can retrieve
    the contents of a just by typing it in.
  • gt a
  • gt a3
  • The command gives all of a the second command
    gives the third element of a . 3 is called the
    index. The zero entry hold the data type of the a
    vector. Try
  • b lt- c("one","two","three")

41
Starting R IO-Assignment
  • cells lt- c(1,26,24,68)
  • rnames lt- c("R1", "R2")
  • cnames lt- c("C1", "C2")
  • mymatrix lt-
  • matrix(cells, nrow2, ncol2,
  • byrowTRUE,
  • dimnames
  • list(rnames, cnames))

Type gtattributes(mymatrix) Type gthelp(array) to
find more details on Arrays
42
Starting R Input File
  • Place the file simple.csv in a directory
    (folder).
  • Load the file into R using
  • h lt- read.csv(fileC\\My-R-Dir\\simple.csv,head
    TRUE,sep,)
  • View the contents of h
  • Now the contents of the file are stored in R as
    the object named h.
  • Type gtnames(h)

43
Starting R Data types-Matrices
  • All columns in a matrix must have the same data
    type (numeric, character, etc.) and the same
    length. The general format is
  • mymatrix lt-
  • matrix(vector, nrowr, ncolc,
  • byrowFALSE,
  • dimnameslist(char_vector_rownames,
  • char_vector_colnames))
  • byrowTRUE indicates that the matrix should be
    filled by rows. byrowFALSE indicates that the
    matrix should be filled by columns (the default).
    dimnames provides optional labels for the columns
    and rows.

44
Review - vectors, lists, matrices, data frames
  • To make vectors x, y, year, names
  • x lt- c(2,3,7,9)
  • y lt- c(9,7,3,2)
  • year lt- 19901993
  • names lt- c("payal", "shraddha", "kritika",
    "itida")
  • Accessing last element
  • ylength(y)
  • To make a list person
  • person lt- list(name"payal", x2, y9, year1990)
  • Accessing personname, personx , person1
  • names(person)

45
Review - vectors, lists, matrices, data frames
  • To make a matrix, pasting together the columns
    year , x, y using column bind.
  • m lt- cbind(year, x, y)
  • To make a data frame, which is a list of vectors
    of the same length
  • D lt- data.frame(names, year, x, y)
  • nrow(D)
  • Accessing one of these vectors
  • Dnames
  • Accessing the last element of this vector
    Dnamesnrow(D)
  • Dnameslength(Dnames)

46
Finding the type and class
  • gt g lt- c(1,3,2)
  • gt class(g)
  • 1 "numeric"
  • gt typeof(g)
  • 1 "double
  • gt is(g)
  • 1 "numeric" "vector"

47
Sorting
  • The variable i is a vector of integers, then the
    data frame Di, picks up rows from D based on
    the values found in i'. The order() function
    makes an integer vector which is a correct
    ordering for the purpose of sorting.
  • D lt- data.frame(xc(1,2,3,1), yc(7,19,2,2))
  • Sort on x
  • indexes lt- order(Dx)
  • Dindexes,
  • Print out sorted dataset, sorted in reverse by y
    Drev(order(Dy)),

48
Logical constants variables
  • TRUE and FALSE are logical constants
  • T and F are logical variables
  • T and F are quite not synonyms for TRUE and FALSE
    but variables that have the expected values by
    default
  • TRUE TRUE
  • T T
  • Normally give the expected result.

49
Missing Values NA
  • Not Available or Missing Values are represented
    as NA, which is a logical constant (either T or
    F) which contains a missing value indicator.
  • Examples
  • is.na(c(1, NA)) FALSE TRUE
  • is.na(c(NA, NA)) TRUE TRUE
  • is.na(paste(c(1, NA))) FALSE FALSE
  • xx lt- c(04)
  • is.na(xx) lt- c(2, 4)
  • xx 0 NA 2 NA 4

50
Writing your own functions.
  • R comes with a built-in median function.
  • Usage median(x, na.rm FALSE)
  • Where x an object for which a method has been
    defined, or a numeric vector containing the
    values whose median is to be computed.
  • na.rm a logical value indicating whether NA
    values should be removed before the computation
    proceeds.

51
Control - If
  • gt if (T) print("Hello") else print("Good Bye")
  • 1 "Hello"
  • gt if (F) print("Hello") else print("Good Bye")
  • 1 "Good Bye"

52
Control - Sequence
  • a lt- c(1,2,3,4,5)
  • b lt- c(2,3,4,5)
  • odd.even lt- length(a) 2
  • if (odd.even 0)
  • (sort(a)length(a)/2
  • sort(a)1 length(a)/2)/2 else
  • sort(a)ceiling(length(a)/2)
  • If we want to find the median of b we have to
    type the whole thing again.
  • gt if (odd.even 0) (sort(b)length(b)/2
    sort(b)1 length(b)/2)/2 else
    sort(b)ceiling(length(b)/2)
  • It would be better to write a function.

53
User Written - Functions
  • a lt- c(1,2,3,4,5)
  • b lt- c(2,3,4,5)
  • mymedian lt- function(x)
  • odd.even lt- length(x) 2
  • if (odd.even 0)
  • (sort(x)length(x)/2
  • sort(x)1 length(x)/2)/2 else
  • sort(x)ceiling(length(x)/2)
  • Now we can call, run, execute or invoke my median
    on any vector.
  • gt mymedian(a)
  • gt mymedian(b)

54
References
Applied Spatial Data Analysis with R Bivand,
Pebesma, Gómez-Rubio
Lloyd Spatial Data Analysis
http//www.manning.com/obe/
http//www.spatial.cs.umn.edu/Book/
Write a Comment
User Comments (0)
About PowerShow.com