knitr::opts_chunk$set(cache = FALSE, echo = TRUE, eval = FALSE)
Caching is the ability to save some sort of output from an operation, and then retrieve these outputs when the operation is repeated in the same way - meaning the inputs of this operation and the actual tasks it performs are unchanged.
Caching becomes fundamental when we can expect to re-run operations several times, particularly if they they a while to compute each time. Some examples of these operations are: - downloading data - (spatial) data processing/munging - fitting statistical models to large datasets, or that are complex in nature - running simulations with no stochasticity
SpaDES
(via the reproducible
package) offers a number of functions that make caching these operations a lot easier for non-programmers. Two fundamental ones are Cache
and prepInputs
.
library(reproducible) library(sp) library(raster) ## from ?prepInputs dPath <- file.path("modules/Biomass_borealDataPrep/data") url <- file.path("ftp://ftp.ccrs.nrcan.gc.ca/ad/NLCCLandCover", "LandcoverCanada2005_250m/LandCoverOfCanada2005_V1_4.zip") landCover <- prepInputs(url = url, destinationPath = asPath(dPath))
Now do it again. Notice any difference?
landCover <- prepInputs(url = url, destinationPath = asPath(dPath))
Now try wrapping the previous operation in Cache
call, and run it twice. Notice differences in speed.
landCover <- Cache(prepInputs, url = url, destinationPath = asPath(dPath)) landCover <- Cache(prepInputs, url = url, destinationPath = asPath(dPath))
The previous code is great but we don't have as much control as we'd like on where Cache
is storing cached objects. To do that, we can explicitly provide a cache folder and add tags to the object so that we can find it more easily if we ever need to "clean it".
cPath <- file.path(tempdir(), "cache") ## run this twice landCover <- Cache(prepInputs, url = url, destinationPath = asPath(dPath), cacheRepo = cPath, userTags = "landCover") showCache(x = cPath, userTags = "landCover") reproducible::clearCache(x = cPath, userTags = "landCover") ## notice how Cache needs to re-do things landCover <- Cache(prepInputs, url = url, destinationPath = asPath(dPath), userTags = "landCover")
We can also force Cache
to redo operations and re-cache, or simply to ignore caching altogether. See more options for Cache(useCache)
in ?Cache
landCover <- Cache(prepInputs, url = url, destinationPath = asPath(dPath), cacheRepo = cPath, userTags = "landCover", useCache = "overwrite") options("reproducible.useCache" = FALSE) landCover <- Cache(prepInputs, url = url, destinationPath = asPath(dPath), cacheRepo = cPath, userTags = "landCover", useCache = TRUE)
Try to provide a study area now. Hints:
?reproducible::prepInputs
and ?reproducible::postProcess
?SpaDES.tools::randomStudyArea
Cache(prepInputs(...))
with the new study area(s)?library(SpaDES.tools) StudyArea <- randomStudyArea(size = 100000^2) ## cheating to visualise beforehand if (identical(crs(StudyArea), crs(landCover))) StudyArea <- spTransform(StudyArea, crs = crs(landCover)) plot(landCover) plot(StudyArea, add = TRUE, col = "red") options(reproducible.useCache = TRUE) landCover <- Cache(prepInputs, url = url, destinationPath = asPath(dPath), useSAcrs = TRUE, overwrite = TRUE, studyArea = StudyArea, cacheRepo = cPath, userTags = "landCover") plot(landCover)
What if my study area is a raster?
Assuming you don't have raster at hand, try using a raster from the SpaDESInAction
example ("inputs/rasterToMatch.rds").
## assuming you're in the SpaDESInAction2 project folder templateRaster <- readRDS(file.path(getwd(), "inputs/rasterToMatch.rds")) landCover <- Cache(prepInputs, url = url, destinationPath = asPath(dPath), # useSAcrs = TRUE, ## we don't need this anymore rasterToMatch = templateRaster, ## a reproducible::postProcess argument maskWithRTM = TRUE, ## a reproducible::maskInputs argument overwrite = TRUE, cacheRepo = cPath, userTags = "landCover", useCache = TRUE) plot(landCover)
What if I have both? In some cases your study area may be defined by a polygon, but you may have a raster that will dictate the e.g. projection and resolution of the output (remember, polygons have no resolution).
landCover <- Cache(prepInputs, url = url, destinationPath = asPath(dPath), studyArea = StudyArea, useSAcrs = FALSE, ## use the template raster projection rasterToMatch = templateRaster, maskWithRTM = FALSE, ## mask using the study area overwrite = TRUE, cacheRepo = cPath, userTags = "landCover", useCache = TRUE)
Now imagine someone told you there is a more up to date Land Cover map for Canada. And they told you where to look to get to the .zip
file - the Canadian Gov. open data portal
url <- "http://ftp.maps.canada.ca/pub/nrcan_rncan/Land-cover_Couverture-du-sol/canada-landcover_canada-couverture-du-sol/CanadaLandcover2010.zip" landCover <- Cache(prepInputs, targetFile = "CAN_LC_2010_CAL.tif", url = url, destinationPath = asPath(dPath), studyArea = StudyArea, overwrite = TRUE, cacheRepo = cPath, userTags = "landCover", useCache = TRUE) plot(landCover)
Learn more about caching
Now that you have a flavour of caching, we're going to explore debugging a bit and put our new "caching skills" in practice in a SpaDES modelling context.
We're going to run the caribouRSF
module by itself.
## Restart your R session if not running from a "clean" environment library("reproducible") library(SpaDES) library(LandR) library(raster) library(data.table) options( "spades.recoveryMode" = 2, "spades.lowMemory" = TRUE, "LandR.assertions" = FALSE, "LandR.verbose" = 1, "reproducible.useMemoise" = TRUE, # Brings cached stuff to memory during the second run "reproducible.useNewDigestAlgorithm" = TRUE, # use the new less strict hashing algo "reproducible.useCache" = TRUE, "pemisc.useParallel" = FALSE ) ## assuming you're in the SpaDESInAction2 project folder inputDirectory <- checkPath(file.path(getwd(), "inputs"), create = TRUE) outputDirectory <- checkPath(file.path(getwd(), "outputs"), create = TRUE) modulesDirectory <- checkPath(file.path(getwd(), "modules"), create = TRUE) cacheDirectory <- checkPath(file.path(getwd(), "cache"), create = TRUE) setPaths(cachePath = cacheDirectory, modulePath = c(modulesDirectory, file.path(modulesDirectory, "scfm/modules")), inputPath = inputDirectory, outputPath = outputDirectory) times <- list(start = 0, end = 10) successionTimestep <- 1L parameters <- list( caribouRSF = list( "decidousSp" = c("Betu_Pap", "Popu_Tre", "Popu_Bal"), "predictionInterval" = 20 ) ) # load studyArea studyArea <- readRDS(file.path(getPaths()$inputPath, "studyArea.rds")) objects <- list( "studyArea" = studyArea ) caribou <- simInitAndSpades(times = times, objects = objects, params = parameters, modules = as.list("caribouRSF"), paths = getPaths(), debug = 1)
Oops, something doesn't seem to be right! We start by looking carefully at the printed output, then we use traceback
to help us locate the problem.
In this case, it seems to be a particular line of caribouRSF.R
traceback() # 11: stop("This module does not work without data. Please provide the necessary layers") at caribouRSF.R#154 # 10: get(moduleCall, envir = fnEnv)(sim, cur[["eventTime"]], cur[["eventType"]]) # 9: eval(fnCallAsExpr) # 8: eval(fnCallAsExpr) # (...) file.edit("modules/caribouRSF/caribouRSF.R") ## got to caribouRSF.R#154
browser()
before the line with the stop()
. Save and re-rerun.
ORUse the debug
option in simInitAndSpaDES
or spades
to go in into "browser mode"
Check ?browser
, while you're at it ;)
What is P(sim)$.useDummyData
? Where does its value come from?
# Browse[1]> P(sim)$.useDummyData # [1] TRUE # Browse[1]> mod$pixelGroupMap # NULL # Browse[1]> mod$cohortData # NULL
You can also try to enter the "browser mode" in specific events or all events of a module via the simInitAndSpades
or spades
functions.
## browse at the init event(s) - if you had more than one module it would stop at each caribou <- simInitAndSpades(times = times, objects = objects, params = parameters, modules = as.list("caribouRSF"), paths = getPaths(), debug = "init") ## browse at the event that triggered the error caribou <- simInitAndSpades(times = times, objects = objects, params = parameters, modules = as.list("caribouRSF"), paths = getPaths(), debug = "lookingForCaribou") ## "browse" at each event of the caribouRSF module caribou <- simInitAndSpades(times = times, objects = objects, params = parameters, modules = as.list("caribouRSF"), paths = getPaths(), debug = "caribouRSF")
We are going to supply these objects - note that the dynamic part will not be simulated.
Check the .inputObjects
function and the metadata for inputs.
Can you see a pattern in how prepInputs
gets data from online sources? Try to do the same for pixelGroupMap
and cohortData
How are sources for objects given? Try adding sources the following sources:
for pixelGroupMap
: "https://drive.google.com/open?id=1IUEuH55su8X7JCWt8LXy_hTAQz0cfCmU"
cohortData
: "https://drive.google.com/open?id=1R_wGGvzUI0gGZ5NOs2KmT2KrmXaTm4NS"## add sourceURL to pixelGroupMap and cohortData expectsInput(objectName = "pixelGroupMap", objectClass = "RasterLayer", desc = paste0("Map of groups of pixels that share the same info from cohortData (sp, age, biomass, etc).", "Here is mainly used to determine old and recent burns based on tree age,", " and if deciduous by species"), sourceURL = "https://drive.google.com/open?id=1IUEuH55su8X7JCWt8LXy_hTAQz0cfCmU") expectsInput(objectName = "cohortData", objectClass = "data.table", desc = paste0("data.table with information by pixel group of sp, age, biomass, etc"), sourceURL = "https://drive.google.com/open?id=1R_wGGvzUI0gGZ5NOs2KmT2KrmXaTm4NS") ## add defaults for these objects in .inputObjects, so that the module can get them if they are not supplied if (!suppliedElsewhere("pixelGroupMap", sim = sim, where = "sim")) { sim$pixelGroupMap <- Cache(prepInputs, targetFile = "pixelGroupMapCaribouEg.rds", fun = "readRDS", url = extractURL("pixelGroupMap"), studyArea = sim$studyArea, destinationPath = dataPath(sim), filename2 = NULL, rasterToMatch = sim$rasterToMatch) } if (!suppliedElsewhere("cohortData", sim = sim, where = "sim")) { sim$cohortData <- Cache(prepInputs, targetFile = "cohortDataCaribouEg.rds", fun = "readRDS", url = extractURL("cohortData"), destinationPath = dataPath(sim)) }
You can now use restartSpades
or simply re-run simInitAndSpades
caribou <- restartSpaDES() caribou <- simInitAndSpades(times = times, objects = objects, params = parameters, modules = as.list("caribouRSF"), paths = getPaths(), debug = 1)
There seem to be no more issues. Maybe we don't need to print all those debug messages anymore - less verbose - next time we run.
caribou <- simInitAndSpades(times = times, objects = objects, params = parameters, modules = as.list("caribouRSF"), paths = getPaths(), debug = FALSE)
Or maybe, we don't want to see them printed, but want to keep a log of all messages to check later.
?spades
caribou <- simInitAndSpades(times = times, objects = objects, params = parameters, modules = as.list("caribouRSF"), paths = getPaths(), debug = list(console = list(level = 40), file = list(append = FALSE, file = "logCaribou.txt", level = 0), debug = TRUE) ) ## notivce how different the file is from what was printed on the console. file.edit("logCaribou.txt")
What about all that purple text? That's the module code checking. It's helpful, but not always accurate - meant to be informative rather than enforced.
options("spades.moduleCodeChecks" = FALSE) caribou <- simInitAndSpades(times = times, objects = objects, params = parameters, modules = as.list("caribouRSF"), paths = getPaths(), debug = list(console = list(level = 40), file = list(append = FALSE, file = "logCaribou.txt", level = 0), debug = TRUE) ) ## notice how different the file is from what was printed on the console. file.edit("logCaribou.txt")
Learn about debugging in SpaDES and with RStudio
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.