In dgl5gw/geocompr: Geocomputation with R: a book

knitr::opts_chunk$set(collapse = TRUE, comment = "#>")

Prerequisites {-}

The solutions assume the following packages are loaded (other packages will be loaded when needed):

library(sf)
library(tidyverse)
library(spData)

Chapter 1

Think about the terms 'GIS', 'GDS' and 'Geocomputation' described above. Which is your favorite and and why?
Provide 3 reasons for using a scriptable language such R for geocomputation instead of established GIS programs such as QGIS.
Reproducibility: a sequence of operations is much easier to save and share when using a scripting language.
Scalability: scripting languages make it easy to break-up the work and parallelise, making it more scalable.
Flexibility: the building blocks of the language allows it to interface efficiently with external software, enabling R to perform a wide range of tasks (e.g. animations, online interactive maps).
Efficiency: the GUI of GIS programs can be slow to use and install.
Name two advantages and two disadvantages of using the older sp package compared with the new sf package.
+ Number of packages supported.
+ Stability
- Slow performance
- Non standard data format

Chapter 2

What does the summary of the geometry column tell us about the world dataset, in terms of:
- The geometry type?
- How many countries there are?
- The coordinate reference system (CRS)?
Using sf's plot() command, create a map of Nigeria in context, building on the code that creates and plots Asia above (see Figure \@ref(fig:asia) for an example of what this could look like).
- Hint: this used the lwd, main and col arguments of plot().
- Bonus: make the country boundaries a dotted grey line.
- Hint: border is an additional argument of plot() for sf objects.

nigeria = world[world$name_long == "Nigeria", ]
africa = world[world$continent == "Africa", ]
bb_africa = st_bbox(africa)
plot(africa[2], col = "white", lwd = 3, main = "Nigeria in context", border = "lightgrey")
# plot(world, lty = 3, add = TRUE, border = "grey")
plot(world, add = TRUE, border = "grey")
plot(nigeria, col = "yellow", add = TRUE, border = "darkgrey")
ncentre = st_centroid(nigeria)
ncentre_num = st_coordinates(ncentre)
text(x = ncentre_num[1], y = ncentre_num[2], labels = "Nigeria")

What does the cex argument do in the plot() function that generates Figure \@ref(fig:contpop)?
- Why was cex passed the sqrt(world$pop) / 10000 instead of just the population directly?
- Bonus: what equivalent arguments to cex exist in the dedicated vizualisation package tmap?
Re-run the code that 'generated' Figure \@ref(fig:contpop) at the end of \@ref(base-args) and find 3 similarities and 3 differences between the plot produced on your computer and that in the book.
- What is similar?
  - The map orientation (north is up), colors of the continents and sizes of the circles are the same, among other things.
- What has changed?
  - The shape of the countries (an equal area projection has been used), the color of the points has changed (to red) and are now filled in (using the pch argument to change the point symbol), the plot has graticules and (most subtly) the points are now in the centroid of the largest polygon of each country, rather than in the area-weighted centroid across all polygons per country.
- Bonus: play around with and research base plotting arguments to make your version of Figure \@ref(fig:contpop) more attractive. Which arguments were most useful.
  - This is a subjective question: have fun!
- Advanced: try to reproduce the map presented in Figure \@ref(base-args). Copy-and-pasting is prohibited!
  - You can look at the code that generated the plot here: https://github.com/Robinlovelace/geocompr/blob/master/02-spatial-data.Rmd . If you memorise its important parts, and type them in your own script to reproduce the code that does not count as cheating. That will be a valuable learning experience.

Read the raster/nlcd2011.tif file from the spDataLarge package. What kind of information can you get about the properties of this file?
Create an empty RasterLayer object called my_raster with 10 columns and 10 rows, and resolution of 10 units. Assign random values between 0 and 10 to the new raster and plot it.
Exercise 3
What does the lwd argument do in the plot() code that generates Figure \@ref(fig:africa).
Perform the same operations and map making for another continent of your choice.
Bonus: Download some global geographic data and add attribute variables assigning them to the continents of the world.

Chapter 3

Select only the NAME column in us_states and create a new object called us_states_name. What is the class of the new object?

us_states_name = us_states %>% dplyr::select(NAME)
class(us_states_name)

Select columns which contain information about a total population. Think about as many ways as possible to do it. Hint: try to use helper functions, such as contains or starts_with.

us_states %>% select(total_pop_10, total_pop_15)

us_states %>% select(starts_with("total_pop"))

us_states %>% select(contains("total_pop"))

Find all states that:
Belongs to the Midwest region

us_states %>% filter(total_pop_15 < 750000)

Had a total population in 2015 lower than 750,000 residents

us_states %>% filter(total_pop_15 < 750000)

Belongs to the West region, had an area smaller than 250,000 km^2^ and total population in 2015 larger than 5,000,000 residents

us_states %>% filter(REGION == "West", AREA < units::set_units(250000, km^2),total_pop_15 > 5000000)
# or
us_states %>% filter(REGION == "West", as.numeric(AREA) < 250000,total_pop_15 > 5000000)

Belongs to the South region, had an area larger than 150,000 km^2^ or total population in 2015 larger than 7,000,000 residents

us_states %>% filter(REGION == "South", AREA > units::set_units(150000, km^2), total_pop_15 > 7000000)
# or
us_states %>% filter(REGION == "South", as.numeric(AREA) > 150000, total_pop_15 > 7000000)

What was the total population in 2015 in the us_states database? What was the minimum and maximum total population in 2015?

us_states %>% summarize(total_pop = sum(total_pop_15),
                        min_pop = min(total_pop_15),
                        max_pop = max(total_pop_15))

How many states are in each region?

us_states %>%
  group_by(REGION) %>%
  summarize(nr_of_states = n())

What was the minimum and maximum total population in 2015 in each region? What was the total population in 2015 in each region?

us_states %>%
  group_by(REGION) %>%
  summarize(min_pop = min(total_pop_15),
            max_pop = max(total_pop_15),
            tot_pop = sum(total_pop_15))

Add variables from us_states_df to us_states and create a new object called us_states_stats. What is the best function to do it? Which variable is the key in the both datasets? What is the class of a new object?

us_states_stats = us_states %>%
  left_join(us_states_df, by = c("NAME" = "state"))
class(us_states_stats)

us_states_df has two more variables than us_states. How you can find them?

us_states_df %>%
  anti_join(us_states, by = c("state" = "NAME"))

What was the population density in 2015 in each state? What was the population density in 2010 in each state?

us_states2 = us_states %>%
  mutate(pop_dens_15 = total_pop_15/AREA,
         pop_dens_10 = total_pop_10/AREA)

How much the population density changed between 2010 and 2015 in each state? Calculate the change in percentages.

us_states2 %>%
  mutate(pop_dens_diff_10_15 = pop_dens_15 - pop_dens_10,
         pop_dens_diff_10_15p = (pop_dens_diff_10_15/pop_dens_15) * 100)

Change the columns names in us_states to lowercase. Try to use two helper functions - tolower() and colnames().

us_states %>%
  set_names(tolower(colnames(.)))

Using us_states and us_states_df create a new object called us_states_sel. The new object should have only two variables - median_income_15 and geometry. Change the name of the median_income_15 column to Income.

us_states %>%
  left_join(us_states_df, by = c("NAME" = "state")) %>%
  select(Income = median_income_15)

Calculate the change in median income between 2010 and 2015 for each state. What was the minimum, average and maximum median income in 2015 for each region? What is the region with the largest increase of the median income?

us_states %>%
  left_join(us_states_df, by = c("NAME" = "state")) %>%
  mutate(income_change = median_income_15 - median_income_10) %>%
  group_by(REGION) %>%
  summarize(min_income_change = min(income_change),
            mean_income_change = mean(income_change),
            max_income_change = max(income_change)) %>%
  filter(mean_income_change == max(mean_income_change)) %>%
  pull(REGION) %>%
  as.character()

Create a raster from scratch with nine rows and columns and a resolution of 0.5 decimal degrees (WGS84). Fill it with random numbers. Extract the values of the four corner cells.

library(raster)
r = raster(nrow = 9, ncol = 9, res = 0.5, xmn = 0, xmx = 4.5,
           ymn = 0, ymx = 4.5, vals = rnorm(81))
# using cell IDs
r[c(1, 9, 81 - 9, 81)]
# using indexing
r[c(1, nrow(r)), c(1, ncol(r))]
# corresponds to [1, 1], [1, 9], [9, 1], [9, 9]

What is the most common class of our example raster grain (hint: modal())?

grain_size = c("clay", "silt", "sand")
grain = raster(nrow = 6, ncol = 6, res = 0.5, 
               xmn = -1.5, xmx = 1.5, ymn = -1.5, ymx = 1.5,
               vals = factor(sample(grain_size, 36, replace = TRUE), 
                             levels = grain_size))
cellStats(grain, modal) %>%
  factorValues(grain, .)
factorValues(grain, modal(values(grain)))

Plot the histogram and the boxplot of the data(dem, package = "RQGIS") raster.

data(dem, package = "RQGIS")
par(mfrow = c(1, 2))
hist(dem)
boxplot(dem)

Now attach also data(ndvi, package = "RQGIS"). Create a raster stack using dem and ndvi, and make a pairs() plot

data(ndvi, package = "RQGIS")
s = stack(dem, ndvi)
pairs(s)

Chapter 4

Use data(dem, package = "RQGIS"), and reclassify the elevation in three classes: low, middle and high. Secondly, compute the NDVI (data(ndvi, package = "RQGIS")) and the mean elevation for each altitudinal class.

library(classInt)
data(dem, package = "RQGIS")
summary(dem)
# find quantile breaks
brk = classIntervals(values(dem), n = 3)$brk
# also try
# breask = classIntervals(values(dem), n = 3, style = "fisher")
# construct reclassification matrix
rcl = matrix(c(brk[1] - 1, brk[2], 1, brk[2], brk[3], 2, brk[3], brk[4], 3), 
             ncol = 3, byrow = TRUE)
# reclassify altitudinal raster
recl = reclassify(dem, rcl = rcl)
# compute the mean dem and ndvi values for each class
zonal(stack(dem, ndvi), recl, fun = "mean")

Apply a line detection filter to data(dem, package = "RQGIS").

# from the focal help page:
# Laplacian filter: filter=matrix(c(0,1,0,1,-4,1,0,1,0), nrow=3)
# Sobel filter: filter=matrix(c(1,2,1,0,0,0,-1,-2,-1) / 4, nrow=3)
# compute the Sobel filter
# check if there are NAs
is.na(dem)  # just 0s, so no NAs
sobel = focal(dem, w = matrix(c(1, 2, 1, 0, 0, 0, -1, -2, -1) / 4, nrow = 3))
# CHECK IF THIS IS CORRECT

Calculate the NDVI of a Landsat image. Use the Landsat image provided by the spDataLarge package (system.file("raster/landsat.tif", package="spDataLarge")).

file = system.file("raster/landsat.tif", package="spDataLarge")
r = stack(file)
# compute NDVI manually
ndvi = (r[["landsat.4"]] - r[["landsat.3"]]) / (r[["landsat.4"]] + r[["landsat.3"]])
# compute NDVI with the help of RStoolbox
library(RStoolbox)
ndvi_rstoolbox = spectralIndices(r, red = 3, nir = 4, indices = "NDVI")
all.equal(ndvi, ndvi_rstoolbox)

This post shows how to use raster::distance(). Extract Spain, calculate a distance raster and weight it with elevation. Finally, compute the difference between the raster using the euclidean distance and the raster weighted by elevation. (Hint: Have a look at getData() to retrieve a digital elevation model for Spain.)

library(raster)
# find out the ISO_3 code of Spain
dplyr::filter(ccodes(), NAME %in% "Spain")
# retrieve a dem of Spain
dem = getData("alt", country = "ESP", mask = FALSE)
# change the resolution to decrease computing time
agg = aggregate(dem, fact = 5)
poly = getData("GADM", country = "ESP", level = 1)
plot(dem)
plot(poly, add = TRUE)
# visualize NAs
plot(is.na(agg))
# construct a distance input raster
# we have to set the land cells to NA and the sea cells to an arbitrary value since 
# raster::distance computes the distance to the nearest non-NA cell
dist = is.na(agg)
cellStats(dist, summary)
# convert land cells into NAs and sea cells into 1s
dist[dist == FALSE] = NA
dist[dist == TRUE] = 1
plot(dist)
# compute distance to nearest non-NA cell
dist = raster::distance(dist)
# just keep Spain
dist = mask(dist, poly)
# convert distance into km
dist = dist / 1000
# now let's weight each 100 altitudinal meters by an additionaly distance of 10 km
agg = mask(agg, poly)
agg[agg < 0] = 0
weight = dist + agg / 100 * 10
plot(weight - dist)

Chapter 5 Chapter 8 Location analysis

Donwload the csv file containing inhabitant information for a 100 m cell resolution ( https://www.zensus2011.de/SharedDocs/Downloads/DE/Pressemitteilung/DemografischeGrunddaten/csv_Bevoelkerung_100m_Gitter.zip?__blob=publicationFile&v=3). Please note that the unzipped file has a size of 1.23 GB. To read it into R you can use readr::read_csv. This takes 30 seconds on my machine (16 GB RAM) data.table::fread() might be even faster, and returns an object of class data.table(). Use as.tibble() to convert it into a tibble. Build an inhabitant raster, aggregate it to a cell resolution of 1 km, and compare the difference with the inhabitant raster (inh) we have created using class mean values.

library(tidyverse)
library(raster)
library(sp)

build_census_raster = function(url) {
  download.file(url = url, destfile = file.path(tempdir(), "census.zip"),
                method = "auto", mode = "wb")
  # list the file names
  nms = unzip(file.path(tempdir(), "census.zip"), list = TRUE)
  # unzip only the csv file
  base_name = grep(".csv$", nms$Name, value = TRUE)
  unzip(file.path(tempdir(), "census.zip"), files = base_name, exdir = tempdir())
  # read in the csv file
  input = data.table::fread(file.path(tempdir(), base_name)) %>%
    as.tibble
  input = dplyr::select(input, x = starts_with("x_mp_1"), y = starts_with("y_mp_1"),
                        inh = Einwohner)
  # set -1 and -9 to NA
  input = mutate_all(input, funs(ifelse(. %in% c(-1, -9), NA, .)))
  # convert table into a raster (x and y are cell midpoints)
  coordinates(input) =~ x + y
  # use the correct projection
  proj4string(input) = CRS("+init=epsg:3035")
  gridded(input) = TRUE
  # convert into a raster stack
  raster(input)
}

# download 1km resolution
url = paste0("https://www.zensus2011.de/SharedDocs/Downloads/DE/", 
             "Pressemitteilung/DemografischeGrunddaten/csv_Zensusatlas_", 
             "klassierte_Werte_1km_Gitter.zip?__blob=publicationFile&v=8")
inp_coarse = build_census_raster(url)
# reclassify
rcl = matrix(c(1, 1, 125, 2, 2, 375, 3, 3, 1250, 4, 4, 3000, 5, 5, 6000,
               6, 6, 8000), ncol = 3, byrow = TRUE)
inh_coarse = reclassify(inp_coarse, rcl = rcl, right = NA)

# Download and build 1km inhabitant raster
url = paste0("https://www.zensus2011.de/SharedDocs/Downloads/DE/Pressemitteilung/",
             "DemografischeGrunddaten/csv_Bevoelkerung_100m_Gitter.zip", 
             "?__blob=publicationFile&v=3")
inh_fine = build_census_raster(url)
inh_fine = aggregate(inh_fine, fact = 1000 / res(inp_fine)[1], fun = sum)
inh_fine - inh_coarse  # origin has to be the same
origin(inh_fine) = origin(inh_coarse)
summary(inh_fine - inh_coarse)
plot(inh_fine - inh_coarse)
plot(abs(inh_fine - inh_coarse) > 1000)
cellStats((abs(inh_fine - inh_coarse) > 1000), stat = "sum")
cellStats((abs(inh_fine - inh_coarse) > 5000), stat = "sum")

Chapter R-GIS bridges

Create two overlapping polygons (poly_1 and poly_2) with the help of the sf-package (see chapter \@ref(spatial-class)). Calculate the intersection using:
RQGIS, RSAGA and rgrass7
sf

library(sf)
library(RQGIS)
# create two polygons
coords_1 =  
  matrix(data = c(0, 0, 1, 0, 1, 1, 0, 1, 0, 0),
         ncol = 2, byrow = TRUE)
coords_2 =
  matrix(data = c(-0.5, -0.5, 0.5, -0.5, 0.5, 0.5, 
                  -0.5, 0.5, -0.5, -0.5),
         ncol = 2, byrow = TRUE)

# create the first polygon
poly_1 = st_polygon(list((coords_1))) 
# convert it into a simple feature collection 
poly_1 = st_sfc(poly_1)
poly_1 = st_sfc(poly_1, crs = 4326)
# finally, convert it into an sf-object
poly_1 = st_sf(geometry = poly_1, data = data.frame(id = 1))

# create a second polygon
poly_2 = st_polygon(list((coords_2))) %>%
  st_sfc(., crs = 4326) %>%
  st_sf(geometry = ., data = data.frame(id = 1))
# visualize it
plot(st_geometry(poly_1), xlim = c(-1, 1), ylim = c(-1, 1))
plot(st_geometry(poly_2), add = TRUE)

# INTERSECTION USING RQGIS
#***************************
# first of all, we need to find out which function might do this for us
find_algorithms("intersec")
open_help("qgis:intersection")
get_usage("qgis:intersection")
# using R named arguments#
int_qgis = run_qgis("qgis:intersection", INPUT = poly_1, INPUT2 = poly_2,
                    OUTPUT = "int_qgis.shp", load_output = TRUE)
# visualize it
plot(st_geometry(poly_1), xlim = c(-1, 1), ylim = c(-1, 1))
plot(st_geometry(poly_2), add = TRUE)
plot(int_qgis, col = "lightblue", add = TRUE)


# INTERSECTION USING RSAGA
#***************************
# The RSAGA examples only work with SAGA < 2.3. We have informed the
# package maintainer to update SAGA
library(RSAGA)
library(link2GI)
linkSAGA()
rsaga.env()
# save shapefile layers
write_sf(poly_1, file.path(tempdir(), "poly_1.shp"))
write_sf(poly_2, file.path(tempdir(), "poly_2.shp"))
# find out how to union shapefiles with the help of SAGA
rsaga.get.modules(lib = "shapes_polygons")
rsaga.get.usage(lib = "shapes_polygons", module = "Intersect")
# create parameter-argument list for RSAGA
params = list(A = file.path(tempdir(), "poly_1.shp"),
              B = file.path(tempdir(), "poly_2.shp"),
              RESULT = file.path(tempdir(), "int_saga.shp"))
rsaga.geoprocessor(lib = "shapes_polygons", module = "Intersect", 
                   param = params)
int_saga = st_read(file.path(tempdir(), "int_saga.shp"))
# visualize it
plot(st_geometry(poly_1), xlim = c(-1, 1), ylim = c(-1, 1))
plot(st_geometry(poly_2), add = TRUE)
plot(st_geometry(int_saga), col = "lightblue", add = TRUE)

# INTERSECTION USING rgrass7
#***************************
library(link2GI)
library(rgrass7)
link2GI::linkGRASS7(rbind(poly_1, poly_2), ver_select = TRUE)
# let's have a look at the help of v.overlay via rgrass7
execGRASS("g.manual", entry = "v.overlay")
# RQGIS::open_help("grass7:v.overlay")
writeVECT(as(poly_1, "Spatial"), vname = "poly_1")
writeVECT(as(poly_2, "Spatial"), vname = "poly_2")
execGRASS("v.overlay", ainput = "poly_1", binput = "poly_2",
          output = "int_grass", operator = "and", flag = "overwrite")
out_grass <- readVECT("int_grass")
plot(st_geometry(poly_1), xlim = c(-1, 1), ylim = c(-1, 1))
plot(st_geometry(poly_2), add = TRUE)
plot(int_grass, add = TRUE, col = "lightblue")

# INTERSECTION USING sf
#***************************

int_sf = st_intersection(poly_1, poly_2)
plot(st_geometry(poly_1), xlim = c(-1, 1), ylim = c(-1, 1))
plot(st_geometry(poly_2), add = TRUE)
plot(int_sf, add = TRUE, col = "lightblue")

Run data(dem, package = "RQGIS") and data(random_points, package = "RQGIS"). Select randomly a point from random_points and find all dem pixels that can be seen from this point (hint: viewshed). Visualize your result. For example, plot a hillshade, and on top of it the digital elevation model, your viewshed output and the point. Additionally, give mapview a try.

library(RQGIS)
library(raster)
data(dem, package = "RQGIS")
data(random_points, package = "RQGIS")

find_algorithms("viewshed")
alg = "grass7:r.viewshed"
get_usage(alg)
open_help(alg)
# let's find out about the default values
get_args_man(alg)
point = random_points[sample(1:nrow(random_points), 1), ]
coord = paste(sf::st_coordinates(point), collapse = ",")
out = run_qgis(alg, input = dem, coordinates = coord,
               output = "out.tif", load_output = TRUE)

# under Linux this might not work, in this case use rgrass7 directly
library(rgrass7)
link2GI::linkGRASS7(dem, ver_select = TRUE)
writeRAST(as(dem, "SpatialGridDataFrame"), "dem")
writeVECT(as(random_points, "Spatial"), vname = "points")
execGRASS("r.viewshed", input = "dem", coordinates = sf::st_coordinates(point),
          output = "view")
out = raster(readRAST("view"))

hs = hillShade(terrain(dem), terrain(dem, "aspect"), 40, 270)
plot(hs, col = gray(0:100 / 100), legend = FALSE)
plot(dem, add = TRUE, alpha = 0.5, legend = FALSE)
plot(point, add = TRUE, col = "red", pch = 16)
plot(out, add = TRUE, col = "lightgray", legend = FALSE)
plot(point, add = TRUE, col = "red", pch = 16)

# or using mapview
library(mapview)
mapview(out, col = "white", map.type = "Esri.WorldImagery") +
  mapview(point)