prepInputs | R Documentation |
prepInputs(
targetFile = NULL,
url = NULL,
archive = NULL,
alsoExtract = NULL,
destinationPath = getOption("reproducible.destinationPath", "."),
fun = NULL,
quick = getOption("reproducible.quick"),
overwrite = getOption("reproducible.overwrite", FALSE),
purge = FALSE,
useCache = getOption("reproducible.useCache", 2),
.tempPath,
verbose = getOption("reproducible.verbose", 1),
...
)
targetFile |
Character string giving the filename (without relative or
absolute path) to the eventual file
(raster, shapefile, csv, etc.) after downloading and extracting from a zip
or tar archive. This is the file before it is passed to
|
url |
Optional character string indicating the URL to download from.
If not specified, then no download will be attempted. If not entry
exists in the |
archive |
Optional character string giving the path of an archive
containing |
alsoExtract |
Optional character string naming files other than
|
destinationPath |
Character string of a directory in which to download
and save the file that comes from |
fun |
Optional. If specified, this will attempt to load whatever
file was downloaded during |
quick |
Logical. This is passed internally to |
overwrite |
Logical. Should downloading and all the other actions occur even if they pass the checksums or the files are all there. |
purge |
Logical or Integer. |
useCache |
Passed to |
.tempPath |
Optional temporary path for internal file intermediate steps. Will be cleared on.exit from this function. |
verbose |
Numeric, -1 silent (where possible), 0 being very quiet,
1 showing more messaging, 2 being more messaging, etc.
Default is 1. Above 3 will output much more information about the internals of
Caching, which may help diagnose Caching challenges. Can set globally with an
option, e.g., |
... |
Additional arguments passed to
|
This function can be used to prepare R objects from remote or local data sources.
The object of this function is to provide a reproducible version of
a series of commonly used steps for getting, loading, and processing data.
This function has two stages: Getting data (download, extracting from archives,
loading into R) and post-processing (for Spatial*
and Raster*
objects, this is crop, reproject, mask/intersect).
To trigger the first stage, provide url
or archive
.
To trigger the second stage, provide studyArea
or rasterToMatch
.
See examples.
This is an omnibus function that will return an R object that will have resulted from
the running of preProcess()
and postProcess()
or postProcessTo()
. Thus,
if it is a GIS object, it may have been cropped, reprojected, "fixed", masked, and
written to disk.
See preProcess()
for combinations of arguments.
Download from the web via either googledrive::drive_download()
,
utils::download.file()
;
Extract from archive using unzip()
or untar()
;
Load into R using terra::rast
,
sf::st_read
, or any other function passed in with fun
;
Checksumming of all files during this process. This is put into a
‘CHECKSUMS.txt’ file in the destinationPath
, appending if it is
already there, overwriting the entries for same files if entries already exist.
This will be triggered if either rasterToMatch
or studyArea
is supplied.
Fix errors. Currently only errors fixed are for SpatialPolygons
using buffer(..., width = 0)
;
Crop using cropTo()
;
Project using projectTo()
;
Mask using maskTo()
;
Determine file name determineFilename()
via filename2
;
Optionally, write that file name to disk via writeTo()
.
NOTE: checksumming does not occur during the post-processing stage, as
there are no file downloads. To achieve fast results, wrap
prepInputs
with Cache
.
NOTE: sf
objects are still very experimental.
Spat*
, sf
, Raster*
and Spatial*
objects:The following has been DEPRECATED because there are a sufficient number of
ambiguities that this has been changed in favour of from
and the *to
family.
See postProcessTo()
.
DEPRECATED: If rasterToMatch
or studyArea
are used, then this will
trigger several subsequent functions, specifically the sequence,
Crop, reproject, mask, which appears to be a common sequence while
preparing spatial data from diverse sources.
See postProcess()
documentation section on
Backwards compatibility with rasterToMatch
and/or studyArea
arguments
to understand various combinations of rasterToMatch
and/or studyArea
.
fun
fun
offers the ability to pass any custom function with which to load
the file obtained by preProcess
into the session. There are two cases that are
dealt with: when the preProcess
downloads a file (including via dlFun
),
fun
must deal with a file; and, when preProcess
creates an R object
(e.g., raster::getData returns an object), fun
must deal with an object.
fun
can be supplied in three ways: a function, a character string
(i.e., a function name as a string), or an expression.
If a character string or function, is should have the package name e.g.,
"terra::rast"
or as an actual function, e.g., base::readRDS
.
In these cases, it will evaluate this function call while passing targetFile
as the first argument. These will only work in the simplest of cases.
When more precision is required, the full call can be written and where the
filename can be referred to as targetFile
if the function
is loading a file. If preProcess
returns an object, fun
should be set to
fun = NA
.
If there is a custom function call, is not in a package, prepInputs
may not find it. In such
cases, simply pass the function as a named argument (with same name as function) to prepInputs
.
See examples.
NOTE: passing fun = NA
will skip loading object into R. Note this will essentially
replicate the functionality of simply calling preProcess
directly.
purge
In options for control of purging the CHECKSUMS.txt
file are:
0
keep file
1
delete file in destinationPath
, all records of downloads need to be rebuilt
2
delete entry with same targetFile
4
delete entry with same alsoExtract
3
delete entry with same archive
5
delete entry with same targetFile
& alsoExtract
6
delete entry with same targetFile
, alsoExtract
& archive
7
delete entry that same targetFile
, alsoExtract
& archive
& url
will only remove entries in the CHECKSUMS.txt
that are associated with
targetFile
, alsoExtract
or archive
When prepInputs
is called,
it will write or append to a (if already exists) CHECKSUMS.txt
file.
If the CHECKSUMS.txt
is not correct, use this argument to remove it.
This function is still experimental: use with caution.
Eliot McIntire, Jean Marchal, and Tati Micheletti
postProcessTo()
, downloadFile()
, extractFromArchive()
,
postProcess()
.
if (requireNamespace("terra", quietly = TRUE) &&
requireNamespace("sf", quietly = TRUE)) {
library(reproducible)
# Make a dummy study area map -- user would supply this normally
coords <- structure(c(-122.9, -116.1, -99.2, -106, -122.9, 59.9, 65.7, 63.6, 54.8, 59.9),
.Dim = c(5L, 2L)
)
studyArea <- terra::vect(coords, "polygons")
terra::crs(studyArea) <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
# Make dummy "large" map that must be cropped to the study area
outerSA <- terra::buffer(studyArea, 50000)
terra::crs(outerSA) <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
tf <- normPath(file.path(tempdir2("prepInputsEx"), "prepInputs2.shp"))
terra::writeVector(outerSA, tf, overwrite = TRUE)
# run prepInputs -- load file, postProcess it to the studyArea
studyArea2 <- prepInputs(
targetFile = tf, to = studyArea,
fun = "terra::vect",
destinationPath = tempdir2()
) |>
suppressWarnings() # not relevant warning here
# clean up
unlink("CHECKSUMS.txt")
##########################################
# Remote file using `url`
##########################################
if (internetExists()) {
data.table::setDTthreads(2)
origDir <- getwd()
# download a zip file from internet, unzip all files, load as shapefile, Cache the call
# First time: don't know all files - prepInputs will guess, if download file is an archive,
# then extract all files, then if there is a .shp, it will load with sf::st_read
dPath <- file.path(tempdir(), "ecozones")
shpUrl <- "http://sis.agr.gc.ca/cansis/nsdb/ecostrat/zone/ecozone_shp.zip"
# Wrapped in a try because this particular url can be flaky
shpEcozone <- try(prepInputs(
destinationPath = dPath,
url = shpUrl
))
if (!is(shpEcozone, "try-error")) {
# Robust to partial file deletions:
unlink(dir(dPath, full.names = TRUE)[1:3])
shpEcozone <- prepInputs(
destinationPath = dPath,
url = shpUrl
)
unlink(dPath, recursive = TRUE)
# Once this is done, can be more precise in operational code:
# specify targetFile, alsoExtract, and fun, wrap with Cache
ecozoneFilename <- file.path(dPath, "ecozones.shp")
ecozoneFiles <- c(
"ecozones.dbf", "ecozones.prj",
"ecozones.sbn", "ecozones.sbx", "ecozones.shp", "ecozones.shx"
)
shpEcozone <- prepInputs(
targetFile = ecozoneFilename,
url = shpUrl,
fun = "terra::vect",
alsoExtract = ecozoneFiles,
destinationPath = dPath
)
unlink(dPath, recursive = TRUE)
# Add a study area to Crop and Mask to
# Create a "study area"
coords <- structure(c(-122.98, -116.1, -99.2, -106, -122.98, 59.9, 65.73, 63.58, 54.79, 59.9),
.Dim = c(5L, 2L)
)
studyArea <- terra::vect(coords, "polygons")
terra::crs(studyArea) <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
# specify targetFile, alsoExtract, and fun, wrap with Cache
ecozoneFilename <- file.path(dPath, "ecozones.shp")
# Note, you don't need to "alsoExtract" the archive... if the archive is not there, but the
# targetFile is there, it will not redownload the archive.
ecozoneFiles <- c(
"ecozones.dbf", "ecozones.prj",
"ecozones.sbn", "ecozones.sbx", "ecozones.shp", "ecozones.shx"
)
shpEcozoneSm <- Cache(prepInputs,
url = shpUrl,
targetFile = reproducible::asPath(ecozoneFilename),
alsoExtract = reproducible::asPath(ecozoneFiles),
studyArea = studyArea,
fun = "terra::vect",
destinationPath = dPath,
filename2 = "EcozoneFile.shp"
) # passed to determineFilename
terra::plot(shpEcozone[, 1])
terra::plot(shpEcozoneSm[, 1], add = TRUE, col = "red")
unlink(dPath)
}
}
}
## Using quoted dlFun and fun -- this is not intended to be run but used as a template
## prepInputs(..., fun = customFun(x = targetFile), customFun = customFun)
## # or more complex
## test5 <- prepInputs(
## targetFile = targetFileLuxRDS,
## dlFun =
## getDataFn(name = "GADM", country = "LUX", level = 0) # preProcess keeps file from this!
## ,
## fun = {
## out <- readRDS(targetFile)
## sf::st_as_sf(out)}
## )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.