inst/materials/lectures/day1.md

.small-code pre code { font-size: 1em; }

Geospatial Data Analysis in R

author: Timothy H. Keitt date: May 20, 2018 width: 1440 height: 900

Instructor

type: section

Tim Keitt \<tkeitt@utexas.edu>

Tim's research

I study the spatial and temporal organization of populations, communities and ecosystems.

More at http://www.keittlab.org/

Tim's research

Jaguar landscape

A little history

A little history

A little history

A little history

New developments

Why R?

type: section

Why R?

Why R?

CRAN package growth

Why R?

reproducible research book

Caveat emptor!

Why R for GIS?

type: section

Why R for GIS?

Why R for GIS?

mesh grid

A quick example

data(meuse) # load sample data from sp pacakge
meuse <- st_as_sf(meuse, coords = c("x", "y")) # convert to sf format

A quick example

class: small-code

ggplot(meuse) +
  geom_sf(aes(color = rank(zinc)), size = 2, show.legend = FALSE) +
  scale_color_distiller(palette = "Spectral") +
  xlab("Easting") + ylab("Northing") +
  coord_sf() + theme_bw()

plot of chunk unnamed-chunk-2

Another example

class: small-code

Adapted from Create Maps With R Geospatial Classes and Graphics Tools. (Note that their example files have spatial reference systems that need correcting. Does not influence plotting.)

nceas_dat <- "materials/lectures/example-data/NCEAS sample"
states <- read_sf(system.file(nceas_dat, "western-states.shp", package = "keitt.ssi.2019"))
reservoirs <- read_sf(system.file(nceas_dat, "western-reservoirs.shp", package = "keitt.ssi.2019"))
rivers <- read_sf(system.file(nceas_dat, "western-rivers.shp", package = "keitt.ssi.2019"))
dams <- read_sf(system.file(nceas_dat, "western-dams.shp", package = "keitt.ssi.2019"))
st_crs(reservoirs) <- st_crs(states) # quick fix for missing CRS

Another example

class: small-code

Adapted from Create Maps With R Geospatial Classes and Graphics Tools.

ggplot() +
  geom_sf(data = states, color = "wheat3", fill = "wheat1") +
  geom_sf(data = rivers, color = "dodgerblue3") +
  geom_sf(data = reservoirs, color = "darkgreen", fill = "darkgreen") +
  geom_sf(data = dams, color = "darkred") +
  geom_label_repel(aes(LONGITUDE, LATITUDE, label = DAM_NAME), data = dams, size = 2, alpha = 0.5) +
  coord_sf() + theme_bw()

plot of chunk unnamed-chunk-4

Another example

plot of chunk unnamed-chunk-5

Geospatial data concepts

type: section

Geospatial data concepts

type: sub-section - Types of geospatial data - Vectors and simple features - Raster data - Topologies - Networks - Accuracy and precision - Map projections - Spatial indices

Types of geospatial data

Vector data - Points - Lines - Polygons

Types of geospatial data

Raster data - Spatial grids - Lookup tables

raster

Types of geospatial data

Topology - Areal data - Vertices - Faces

topology

Types of geospatial data

Network - Relational - Vertices - Edges - Planar network - Topology is a special case

network

Geospatial data concepts

type: sub-section - Types of geospatial data - Vectors and simple features - Raster data - Accuracy and precision - Map projections - Spatial indices

Geospatial data concepts

OGC Simple Feature Hierarchy

Geospatial data concepts

OGC Simple Feature Hierarchy

Geospatial data concepts

OGC Simple Feature Predicates

Geospatial data concepts

OGC Simple Feature Set Operations

Handled by GEOS library; bindings in rgeos package

Geospatial data concepts

type: sub-section - Types of geospatial data - Vectors and simple features - Raster data - Accuracy and precision - Map projections - Spatial indices

Geospatial data concepts

Raster data

Location is implicit using row and column offsets

Geospatial data concepts

Raster data

Geospatial data concepts

Raster data

For geospatially registered data, two coordinate systems - Row and column offsets - Geospatial coordinates

For many spatial reference systems, conversion from map coordinates to raster offsets can be achieved via an affine transform.

$$ \begin{bmatrix} x \ y \end{bmatrix} = \begin{bmatrix} x_0 \ y_0 \end{bmatrix} + \begin{bmatrix} a_{11} & a_{12} \ a_{21} & a_{22} \end{bmatrix} \begin{bmatrix} c \ r \end{bmatrix} $$

$$ \begin{bmatrix} a_{11} & a_{12} \ a_{21} & a_{22} \end{bmatrix}^{-1} \left( \begin{bmatrix} x \ y \end{bmatrix} - \begin{bmatrix} x_0 \ y_0 \end{bmatrix} \right) = \begin{bmatrix} c \ r \end{bmatrix} $$

If $a_{11}$ or $a_{22}$ are negative, then axis is "flipped".

Geospatial data concepts

Affine transforms

Geospatial data concepts

type: sub-section - Types of geospatial data - Vectors and simple features - Raster data - Accuracy and precision - Map projections - Spatial indices

Geospatial data concepts

Accuracy and precision

Geospatial data concepts

Accuracy and precision

Geospatial data concepts

Resampling

Geospatial data concepts

Resampling methods

Geospatial data concepts

Accuracy and precision

Geospatial data concepts

Accuracy and precision

Geospatial data concepts

Complex versus simple features

Sources of imprecision - Measurement, recording and transcription errors - Have to clean the data - "Snapping" to reduced precision coordinates sometimes works - Numerical imprecision - Floating point round-off and other effects - Snapping may help - Infinite precision arithmetic possible

Geospatial data concepts

Complex versus simple features

Most spatial operators in R require simple features

Geospatial data concepts

Complex versus simple features

Most spatial operators in R require simple features

Geospatial data concepts

type: sub-section - Types of geospatial data - Vectors and simple features - Raster data - Accuracy and precision - Map projections - Spatial indices

Geospatial data concepts

Map projections

Geospatial data concepts

Map projections

Latitude and longitude are natural coordinates

Geospatial data concepts

Map projections

Nonetheless they are still relative to a model of earth

Geospatial data concepts

Map projections

Geospatial data concepts

Map projections

Geospatial data concepts

Map projections

Geospatial data concepts

Map projections

Geospatial data concepts

Example systems

Geospatial data concepts

Example coordinate systems

Geospatial data concepts

Example coordinate systems

Geospatial data concepts

type: sub-section - Types of geospatial data - Vectors and simple features - Raster data - Accuracy and precision - Map projections - Spatial indices

Geospatial data concepts

Map queries

Geospatial data concepts

Map queries

Geospatial data concepts

Map queries

Geospatial data concepts

type: sub-section - Types of geospatial data - Vectors and simple features - Raster data - Accuracy and precision - Map projections - Spatial indices

RefresheR

type: section

RefresheR

type: sub-section - Getting started - Getting data in and out - Basic syntax and control flow - Vectorized expressions - Array indices - Functions and functional programming - Lists and data frames - Model syntax and S3 methods - Model syntax and S4 methods - Iterators and foreach - Matrix-vector ops

RefresheR

R interprets expressions

x <- 3
print(x)
[1] 3
y <- 2 * x + 4
print(y)
[1] 10

RefresheR

Getting help

help(ls)
?ls
??predict
help(package = stats)

Try googling "\<topic> in R"

RefresheR

Session environment

library(nlme) # attach package
getwd() # where am I?
setwd("my.dir") # go there
ls() # list R objects
dir() # list files
q() # all done

RefresheR

type: sub-section - Getting started - Getting data in and out - Basic syntax and control flow - Vectorized expressions - Array indices - Functions and functional programming - Lists and data frames - Model syntax and S3 methods - Model syntax and S4 methods - Iterators and foreach - Matrix-vector ops

RefresheR

Basic IO

x <- readr::read_csv("the-data.csv") # 90% of what I do
x <- readr::read_delim("other-data.asc", header = TRUE, as.is = TRUE)
readr::write_csv(object, file = "output.csv")

These return data frames. More on that in a bit.

RefresheR

Database access

library(RPostgreSQL)
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, dbname = "testing")
res <- dbSendQuery(con, "select length from coastlines")
dframe <- fetch(res)

Enables powerful SQL queries to RDMS. See also the sqldf package.

RefresheR

type: sub-section - Getting started - Getting data in and out - Basic syntax and control flow - Vectorized expressions - Array indices - Functions and functional programming - Lists and data frames - Model syntax and S3 methods - Model syntax and S4 methods - Iterators and foreach - Matrix-vector ops

RefresheR

Assignment

a = 1
b <- 2
3 -> c
print(a); print(b); print(c)
[1] 1
[1] 2
[1] 3

RefresheR

Control flow (compact)

if (TRUE) print("yes") else print("no")
[1] "yes"
z <- rep(1, 10)
print(z)
 [1] 1 1 1 1 1 1 1 1 1 1
for (i in 3:10) z[i] <- z[i - 1] + z[i - 2]
print(z)
 [1]  1  1  2  3  5  8 13 21 34 55

if and for are sufficient for the vast majority of programs

RefresheR

Control flow (better layout)

i <- 7
while (i) {
  z[i] <- z[i] / 2
  i <- i - 1
  if (i < 3) break
}
print(z) # divide elements 4:7 by 2
 [1]  1.0  1.0  1.0  1.5  2.5  4.0  6.5 21.0 34.0 55.0

while is less common but useful in cases with an indeterminate number of loops

RefresheR

Control flow (better layout)

i <- 7
while (i) {
  z[i] <- z[i] / 2
  i <- i - 1
  if (i < 3) break
}

RefresheR

type: sub-section - Getting started - Getting data in and out - Basic syntax and control flow - Vectorized expressions - Array indices - Functions and functional programming - Lists and data frames - Model syntax and S3 methods - Model syntax and S4 methods - Iterators and foreach - Matrix-vector ops

RefresheR

R expressions are vectorized

x <- 1:5
print(x)
[1] 1 2 3 4 5
y <- 2 * x
print(y)
[1]  2  4  6  8 10

RefresheR

The rule is that the expression is evaluated for each element

z <- 1:5
print(2 * z)
[1]  2  4  6  8 10
for (i in 1:5)
{
  z[i] <- 2 * z[i]
}
print(z)
[1]  2  4  6  8 10

RefresheR

Functions may or may not return a value for each element of an input vector

print(sqrt(1:5)) # a 'map'
[1] 1.000000 1.414214 1.732051 2.000000 2.236068
print(sum(1:5)) # a 'reduce'
[1] 15
print(summary(1:5)) # a more complex 'reduce'
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      1       2       3       3       4       5 

Can be tricky in complex code

RefresheR

type: sub-section - Getting started - Getting data in and out - Basic syntax and control flow - Vectorized expressions - Array indices - Functions and functional programming - Lists and data frames - Model syntax and S3 methods - Model syntax and S4 methods - Iterators and foreach - Matrix-vector ops

RefresheR

2-D indices are nested

a <- matrix(1:9, 3)
print(a)
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
print(a[1:2, 3:2])
     [,1] [,2]
[1,]    7    4
[2,]    8    5

RefresheR

2-D indices are nested

b <- matrix(NA, 2, 2)
ri <- 1:2
ci <- 3:2
for (i in seq(along = ri)) # for each index in ri
  for (j in seq(along = ci)) # loop over indices in ci
    b[i, j] <- a[ri[i], ci[j]] # ith ri and jth ci
print(b)
     [,1] [,2]
[1,]    7    4
[2,]    8    5
print(a[ri, ci]) # equivalent expression
     [,1] [,2]
[1,]    7    4
[2,]    8    5

You can omit braces for a single loop expression (not preferred)

RefresheR

Rule is if no comma, then use each element of index

print(a)
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
print(a[1:5]) # 1-D index gives 1-D result
[1] 1 2 3 4 5

Matrices are stored column-wise

RefresheR

Index can be higher dimensional

i <- which(diag(3) == 1, arr.ind = TRUE)
print(i)
     row col
[1,]   1   1
[2,]   2   2
[3,]   3   3
print(diag(a[i])) # a[i[1,]], a[i[2,]]...
     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    5    0
[3,]    0    0    9
print(a[i[, 1], i[, 2]]) # a[i[1, 1], i[1, 2]]...
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

RefresheR

Note the difference

print(a[i])
[1] 1 5 9
print(a[i[, 1], i[, 2]])
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

Multiple indices separated by commas are nested

RefresheR

Using indices creatively is often the fastest way to extract and rearrange data in R

i <- rep(1:3, each = 2)
print(i)
[1] 1 1 2 2 3 3
print(a[i, i])
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    1    4    4    7    7
[2,]    1    1    4    4    7    7
[3,]    2    2    5    5    8    8
[4,]    2    2    5    5    8    8
[5,]    3    3    6    6    9    9
[6,]    3    3    6    6    9    9

RefresheR

Using indices creatively is often the fastest way to extract and rearrange data in R

i <- 1:3
print(sample(i))
[1] 3 2 1
print(a[sample(i), sample(i)]) # row-column permutation
     [,1] [,2] [,3]
[1,]    6    9    3
[2,]    5    8    2
[3,]    4    7    1

RefresheR

type: sub-section - Getting started - Getting data in and out - Basic syntax and control flow - Vectorized expressions - Array indices - Functions and functional programming - Lists and data frames - Model syntax and S3 methods - Model syntax and S4 methods - Iterators and foreach - Matrix-vector ops

RefresheR

Functions

f <- function(a, b = 2, c = NULL) {
  res <- a * b
  if (!is.null(c)) res <- res * c
  return(res)
}
print(f(1, 2, 3))
[1] 6
print(f(4))
[1] 8

Using NULL as a flag facilitates reuse

RefresheR

Functions are objects

print(class(f))
[1] "function"
print(formals(f))
$a


$b
[1] 2

$c
NULL

RefresheR

Functions are objects

print(body(f))
{
    res <- a * b
    if (!is.null(c)) 
        res <- res * c
    return(res)
}
body(f) = "gotcha"
print(f(6))
[1] "gotcha"

RefresheR

Scope

c <- "mice" # global
f <- function(a = "three") # function formals
{
  b <- "blind" # function body
  return(paste(a, b, c)) # three scopes
}
print(f())
[1] "three blind mice"

Object not found in current scope initiates search upward into enclosing scopes

RefresheR

Scope

x <- 2 # global scope
f <- function() x <- 2 * x # 2 different x's here
print(x)
[1] 2
print(f())
[1] 4
print(x)
[1] 2

The assignment in the function body creates a variable x whose scope is the function body

RefresheR

Useful for closures

f <- function() {
  x <- sum(rnorm(100)) # 1-time stuff here
  function(y) x * y # return a function
}
g <- f() # closure factory
print(c(g(2), f()(2))) # uses x created by f()
[1]  6.5240334 -0.4093613

The factory function is just a convenient way to bind an anonymous environment to the returned closure. I use this all the time to speed up calculations.

RefresheR

Function objects and closures are the key to functional programming

Not functional

x <- matrix(rnorm(25), 5)
row.sums <- rep(NA, 5)
for (i in 1:5) row.sums[i] <- sum(x[i, ])
print(row.sums)
[1] -0.3583968 -1.4738219  0.6191057 -2.1549036 -0.6935966

RefresheR

Function objects and closures are the key to functional programming

Functional

f <- function(i) sum(x[i, ]) # x global scope
row.sums <- sapply(1:5, f) # f(1), f(2)...
print(row.sums)
[1] -0.3583968 -1.4738219  0.6191057 -2.1549036 -0.6935966

RefresheR

type: sub-section - Getting started - Getting data in and out - Basic syntax and control flow - Vectorized expressions - Array indices - Functions and functional programming - Lists and data frames - Model syntax and S3 methods - Model syntax and S4 methods - Iterators and foreach - Matrix-vector ops

RefresheR

Arrays, including vectors and matrices, hold one type; lists hold different types

# coerced to character
print(c(1, "a", TRUE))
[1] "1"    "a"    "TRUE"
# retain individual types
print(list(1, "a", TRUE))
[[1]]
[1] 1

[[2]]
[1] "a"

[[3]]
[1] TRUE

RefresheR

Single v. double braces

x <- as.list(1:3)
print(x[2])
[[1]]
[1] 2
print(class(x[2])) # returns list
[1] "list"
print(x[[2]])
[1] 2
print(class(x[[2]])) # returns list element
[1] "integer"

RefresheR

Data frames are lists of vectors

a <- 1:3
b <- factor(1:3)
c <- letters[1:3]
x <- data.frame(a = a, b = b, c = c)
print(x)
  a b c
1 1 1 a
2 2 2 b
3 3 3 c
print(names(x))
[1] "a" "b" "c"

RefresheR

Use list operators to extract columns

print(x$a)
[1] 1 2 3
print(x[[2]]) # a vector of factors
[1] 1 2 3
Levels: 1 2 3
print(x[["c"]]) # different factors
[1] a b c
Levels: a b c

RefresheR

Or matrix or vector indexing

print(x[1:2, 2:3])
  b c
1 1 a
2 2 b
print(x[2])
  b
1 1
2 2
3 3

RefresheR

Subsetting

y <- subset(x, b %in% 2:3, select = c(b, c))
print(y)
  b c
2 2 b
3 3 c

Lots of new fancy packages for manipulating data frames. See reshape, plyr, dplyr, sqldf and others.

RefresheR

type: sub-section - Getting started - Getting data in and out - Basic syntax and control flow - Vectorized expressions - Array indices - Functions and functional programming - Lists and data frames - Model syntax and S3 methods - Model syntax and S4 methods - Iterators and foreach - Matrix-vector ops

RefresheR

Raison d'être of R is modeling

x <- rnorm(100)
y <- 1 + 2 * x + rnorm(100)
plot(y ~ x)

plot of chunk unnamed-chunk-39

RefresheR

Raison d'être of R is modeling

mod1 <- lm(y ~ x) # y = b0 + b1 * x
summary(mod1)

Call:
lm(formula = y ~ x)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.0720 -0.7088  0.0185  0.6910  2.4782 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.03133    0.09603   10.74   <2e-16 ***
x            2.18854    0.08683   25.20   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9556 on 98 degrees of freedom
Multiple R-squared:  0.8664,    Adjusted R-squared:  0.865 
F-statistic: 635.3 on 1 and 98 DF,  p-value: < 2.2e-16

RefresheR

S3 classes and methods

print(class(mod1)) # S3 class
[1] "lm"
methods(class = class(mod1)) # S3 methods
 [1] add1           alias          anova          case.names    
 [5] coerce         confint        cooks.distance deviance      
 [9] dfbeta         dfbetas        drop1          dummy.coef    
[13] effects        extractAIC     family         formula       
[17] fortify        hatvalues      influence      initialize    
[21] kappa          labels         logLik         model.frame   
[25] model.matrix   nobs           plot           predict       
[29] print          proj           qqnorm         qr            
[33] residuals      rstandard      rstudent       show          
[37] simulate       slotsFromS3    summary        variable.names
[41] vcov          
see '?methods' for accessing help and source code

RefresheR

Calling a generic method

plot(mod1)

plot of chunk unnamed-chunk-42plot of chunk unnamed-chunk-42plot of chunk unnamed-chunk-42plot of chunk unnamed-chunk-42

RefresheR

Raison d'être of R is modeling

anova(mod1)
Analysis of Variance Table

Response: y
          Df Sum Sq Mean Sq F value    Pr(>F)    
x          1 580.18  580.18  635.31 < 2.2e-16 ***
Residuals 98  89.50    0.91                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Model with x is much better

RefresheR

Model syntax

Y ~ X          # Y = B0 + B1 * X
Y ~ 0 + X      # Y = B1 * X
Y ~ X1 + X2    # Y = B0 + B1 * X1 + B2 * X2
Y ~ X1 * X2    # Y = B0 + B1 * X1 + B2 * X2 + B3 * X1 * X2
Y ~ I(X / 2)   # Y = B0 + B1 * (X / 2)

RefresheR

Peaking under the hood

str(mod1[1:6])
List of 6
 $ coefficients : Named num [1:2] 1.03 2.19
  ..- attr(*, "names")= chr [1:2] "(Intercept)" "x"
 $ residuals    : Named num [1:100] -1.1958 0.0563 -1.4464 -0.7541 0.6201 ...
  ..- attr(*, "names")= chr [1:100] "1" "2" "3" "4" ...
 $ effects      : Named num [1:100] -12.687 24.087 -1.347 -0.637 0.748 ...
  ..- attr(*, "names")= chr [1:100] "(Intercept)" "x" "" "" ...
 $ rank         : int 2
 $ fitted.values: Named num [1:100] -1.107 0.654 -0.35 2.345 3.837 ...
  ..- attr(*, "names")= chr [1:100] "1" "2" "3" "4" ...
 $ assign       : int [1:2] 0 1

- S3 objects are usually lists with a class attribute - str can be helpful with "what the heck is that?"

RefresheR

type: sub-section - Getting started - Getting data in and out - Basic syntax and control flow - Vectorized expressions - Array indices - Functions and functional programming - Lists and data frames - Model syntax and S3 methods - Model syntax and S4 methods - Iterators and foreach - Matrix-vector ops

RefresheR

Mixed effects with S4 classes $$Y = X \beta + Zu + \epsilon$$

x <- rnorm(100)
z <- rbinom(100, 1, 0.5)
y <- 1 + 2 * x - 3 * z + rnorm(100)

RefresheR

Mixed effects with S4 classes $$Y = X \beta + Zu + \epsilon$$

xyz <- data_frame(x = x, y = y, z = z)
ggplot(xyz, aes(x, y)) + geom_point(color= "steelblue") + facet_wrap(~ z)

plot of chunk unnamed-chunk-47

RefresheR

Mixed effects with S4 classes $$Y = X \beta + Zu + \epsilon$$

mod2 <- lme4::lmer(y ~ x + (1 | z)) # z is random intercept
print(mod2)
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ x + (1 | z)
REML criterion at convergence: 284.3722
Random effects:
 Groups   Name        Std.Dev.
 z        (Intercept) 1.9888  
 Residual             0.9578  
Number of obs: 100, groups:  z, 2
Fixed Effects:
(Intercept)            x  
    -0.4823       1.8987  

RefresheR

What is mod2?

class(mod2)
[1] "lmerMod"
attr(,"package")
[1] "lme4"
isS4(mod2)
[1] TRUE

RefresheR

A generic method applied to S4 object

ranef(mod2)
$z
  (Intercept)
0    1.402999
1   -1.402999

with conditional variances for "z" 

RefresheR

S4 object have slots

slotNames(mod2)
 [1] "resp"    "Gp"      "call"    "frame"   "flist"   "cnms"    "lower"  
 [8] "theta"   "beta"    "u"       "devcomp" "pp"      "optinfo"
show(mod2@beta)  # show is S4 print
[1] -0.4822955  1.8987147

Fixed effects stored in beta

RefresheR

Access S4 slots with @ or slot method

head(mod2@frame, n = 3)  # the data
            y         x z
1  1.99277124 0.4223786 0
2 -0.32581618 1.2849734 1
3  0.07546225 0.9662490 1
head(slot(mod2, 'frame'), n = 3)
            y         x z
1  1.99277124 0.4223786 0
2 -0.32581618 1.2849734 1
3  0.07546225 0.9662490 1

RefresheR

type: sub-section - Getting started - Getting data in and out - Basic syntax and control flow - Vectorized expressions - Array indices - Functions and functional programming - Lists and data frames - Model syntax and S3 methods - Model syntax and S4 methods - Iterators and foreach - Matrix-vector ops

RefresheR

Iterators increment or decrement on each call rather than existing as a vector of values.

library(iterators)
i <- iter(1:3)
c(nextElem(i), nextElem(i), nextElem(i))
[1] 1 2 3

Iterators can be convenient but really shine when they allow a computation to proceed incrementally without storing the entire iterator sequence in memory.

RefresheR

foreach evaluates an expression for each value of an iterator sequence

library(foreach)
foreach(i = 1:3) %do% rnorm(1)
[[1]]
[1] -0.3790396

[[2]]
[1] -0.9846234

[[3]]
[1] -0.7216653

The real action is in the %do% infix operator.

RefresheR

The .combine argument gives more control

library(foreach)
foreach(i = 1:3, .combine = c) %do% rnorm(1)
[1] -0.82188854  0.09837345 -0.66390420

This is a form of map (the expression) and reduce (the .combine function)

RefresheR

Something a bit more challenging

f <- function(n = 1e7) prod(rnorm(n))
system.time(f())
   user  system elapsed 
  0.828   0.038   0.900 

About 1.5 seconds on my laptop

RefresheR

Something a bit more challenging

system.time(
  foreach(i = 1:5, .combine = prod) %do% f()
)
   user  system elapsed 
  3.327   0.054   3.476 

About 6.5 seconds on my laptop

RefresheR

Now the really cool part

library(doMC) # multicore extensions
registerDoMC() # create parallel engine
system.time(
  foreach(i = 1:5, .combine = prod) %dopar% f()
)
   user  system elapsed 
  1.461   0.107   2.292 

About 2.5 seconds on my laptop. Here %dopar% automatically splits the computation across all of the available cores.

RefresheR

type: sub-section - Getting started - Getting data in and out - Basic syntax and control flow - Vectorized expressions - Array indices - Functions and functional programming - Lists and data frames - Model syntax and S3 methods - Model syntax and S4 methods - Iterators and foreach - Matrix-vector ops

RefresheR

R does linear algebra

x <- 1:3 # [1, 2, 3]
x %*% x # inner product
     [,1]
[1,]   14
x %*% t(x) # outer product
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    2    4    6
[3,]    3    6    9

RefresheR

R does linear algebra

a <- matrix(1:9, 3)
print(a)
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
a %*% x # matrix - vector product
     [,1]
[1,]   30
[2,]   36
[3,]   42

RefresheR

R does linear algebra

a %*% a # matrix - matrix product
     [,1] [,2] [,3]
[1,]   30   66  102
[2,]   36   81  126
[3,]   42   96  150
eigen(a) # eigenvalue decomposition
eigen() decomposition
$values
[1]  1.611684e+01 -1.116844e+00 -5.700691e-16

$vectors
           [,1]       [,2]       [,3]
[1,] -0.4645473 -0.8829060  0.4082483
[2,] -0.5707955 -0.2395204 -0.8164966
[3,] -0.6770438  0.4038651  0.4082483

RefresheR

R does linear algebra

RefresheR

type: sub-section - Getting started - Getting data in and out - Basic syntax and control flow - Vectorized expressions - Array indices - Functions and functional programming - Lists and data frames - Model syntax and S3 methods - Model syntax and S4 methods - Iterators and foreach - Matrix-vector ops



thk686/keitt.ssi.2019 documentation built on June 28, 2019, 1:28 p.m.