library(knitr) knit_engines$set(asis = function(options) { if (options$echo && options$eval) knit_child(text = options$code) }) knitr::opts_chunk$set(message = FALSE, warning = FALSE, eval = FALSE)
# comp1 occs.db <- rvs$comp1 == 'db' occs.csv <- rvs$comp1 == 'csv' occs.any <- occs.db | occs.db # comp2 poccs.rem <- 'rem' %in% rvs$comp2 poccs.sel <- 'sel' %in% rvs$comp2 poccs.thin <- 'thin' %in% rvs$comp2 poccs.any <- poccs.rem | poccs.sel | poccs.thin # comp3 env.bc <- rvs$comp3 == 'bc' env.user <- rvs$comp3 == 'user' env.any <- env.bc | env.user # comp4 bg.bb <- rvs$comp4.shp == 'bb' bg.mcp <- rvs$comp4.shp == 'mcp' bg.ptbuf <- rvs$comp4.shp == 'ptbuf' bg.userCSV <- rvs$comp4.shp == 'csv' bg.userShp <- rvs$comp4.shp == 'shp' bg.user <- bg.userShp | bg.userCSV bg.any <- rvs$comp4.shp != '' bg.bufWid <- rvs$comp4.buf > 0 & bg.ptbuf == FALSE # comp5 part.block <- rvs$comp5 == 'block' part.cb1 <- rvs$comp5 == 'cb1' part.cb2 <- rvs$comp5 == 'cb2' part.jack <- rvs$comp5 == 'jack' part.rand <- rvs$comp5 == 'rand' part.any <- rvs$comp5 != '' # comp6 mod.bioclim <- rvs$comp6 == 'bioclim' mod.maxent <- rvs$comp6 == 'maxent' mod.any <- mod.bioclim | mod.maxent alg.maxent <- if (mod.maxent) {rvs$algMaxent == 'maxent.jar'} else {FALSE} alg.maxnet <- if (mod.maxent) {rvs$algMaxent == 'maxnet'} else {FALSE} # comp7 viz.pred.thr <- if (!(is.null(rvs$comp7.thr))) {!(rvs$comp7.thr == 'noThresh')} else {FALSE} viz.pred.p10 <- rvs$comp7.thr == 'p10' viz.pred.raw <- rvs$comp7.type == 'raw' viz.pred.log <- rvs$comp7.type == 'logistic' viz.pred.cll <- rvs$comp7.type == 'cloglog' viz.pred <- 'map' %in% rvs$comp7 viz.mxEval <- 'mxEval' %in% rvs$comp7 viz.bcPlot <- 'bcPlot' %in% rvs$comp7 viz.resp <- 'resp' %in% rvs$comp7 viz.any <- rvs$comp7 != '' # comp8 pj.area <- rvs$comp8.pj == 'area' pj.time <- rvs$comp8.pj == 'time' pj.mess <- rvs$comp8.esim == 'mess' pj.thr <- if (!(is.null(rvs$comp8.thr))) {!(rvs$comp8.thr == 'noThresh')} else {FALSE} pj.any <- pj.area | pj.time | pj.mess
Please find below the R code history from your Wallace v1.0.6 session.
You can reproduce your session results by running this R Markdown file in RStudio.
Each code block is called a "chunk", and you can run them either one-by-one or all at once by choosing an option in the "Run" menu at the top-right corner of the "Source" pane in RStudio.
For more detailed information see http://rmarkdown.rstudio.com).
Wallace uses the following R packages that must be installed and loaded before starting.
library(spocc) library(spThin) library(dismo) library(rgeos) library(ENMeval) library(dplyr)
Wallace also includes several functions developed to help integrate different packages and some additional functionality. For this reason, it is necessary to load the file functions.R
, The function system.file()
finds this script, and source()
loads it.
source(system.file('shiny/funcs', 'functions.R', package = 'wallace'))
r "{{spName}}"
.```{asis, echo = occs.any, eval = occs.any, include = occs.any}
```{asis, echo = occs.db, eval = occs.db, include = occs.db} The search for occurrences was limited to `r {{occNum}}` records. Obtain occurrence records of the selected species from the `r "{{dbName}}"` database.
# query selected database for occurrence records results <- spocc::occ(query = "{{spName}}", from = "{{dbName}}", limit = {{occNum}}, has_coords = TRUE) # retrieve data table from spocc object results.data <- results[["{{dbName}}"]]$data[[formatSpName("{{spName}}")]] # remove rows with duplicate coordinates occs.dups <- duplicated(results.data[c('longitude', 'latitude')]) occs <- results.data[!occs.dups,] # make sure latitude and longitude are numeric (sometimes they are characters) occs$latitude <- as.numeric(occs$latitude) occs$longitude <- as.numeric(occs$longitude) # give all records a unique ID occs$occID <- row.names(occs)
```{asis, echo = occs.csv, eval = occs.csv, include = occs.csv} User CSV path with occurrence data. If the CSV file is not in the current workspace, change to the correct file path (e.g. "/Users/darwin/Documents/occs.csv").
```r # NOTE: provide the path to the folder that contains the CSV file d.occs <- '' # create path to user occurrences csv file userOccs.path <- file.path(d.occs, "{{occsCSV}}") # read in csv userOccs.csv <- read.csv(userOccs.path, header = TRUE) # remove rows with duplicate coordinates occs.dups <- duplicated(userOccs.csv[c('longitude', 'latitude')]) occs <- userOccs.csv[!occs.dups,] # remove NAs occs <- occs[complete.cases(occs$longitude, occs$latitude), ] # give all records a unique ID occs$occID <- row.names(occs)
```{asis, echo = poccs.any, eval = poccs.any, include = poccs.any}
```{asis, echo = poccs.rem, eval = poccs.rem, include = poccs.rem} Remove the occurrence localities with the following IDs: `r {{occsRemoved}}`.
# remove the rows that match the occIDs selected occs <- occs %>% filter(!(occID %in% {{occsRemoved}}))
```{asis, echo = poccs.sel, eval = poccs.sel, include = poccs.sel} The following code recreates the polygon used to select occurrences to keep in the analysis.
```r selCoords <- data.frame(x = {{occsSelX}}, y = {{occsSelY}}) selPoly <- sp::SpatialPolygons(list(sp::Polygons(list(sp::Polygon(selCoords)), ID=1))) occs.xy <- occs[c('longitude', 'latitude')] sp::coordinates(occs.xy) <- ~ longitude + latitude intersect <- sp::over(occs.xy, selPoly) intersect.rowNums <- as.numeric(which(!(is.na(intersect)))) occs <- occs[intersect.rowNums, ]
``{asis, echo = poccs.thin, eval = poccs.thin, include = poccs.thin}
Spatial thinning selected. Thin distance selected is
r {{thinDist}}` km.
```r output <- spThin::thin(occs, 'latitude', 'longitude', 'name', thin.par = {{thinDist}}, reps = 100, locs.thinned.list.return = TRUE, write.files = FALSE, verbose = FALSE)
```{asis, echo = poccs.thin, eval = poccs.thin, include = poccs.thin} Since spThin did 100 iterations, there are 100 different variations of how it thinned your occurrence localities. As there is a stochastic element in the algorithm, some iterations may include more localities than the others, and we need to make sure we maximize the number of localities we proceed with.
```r # find the iteration that returns the max number of occurrences maxThin <- which(sapply(output, nrow) == max(sapply(output, nrow))) # if there's more than one max, pick the first one maxThin <- output[[ifelse(length(maxThin) > 1, maxThin[1], maxThin)]] # subset occs to match only thinned occs occs <- occs[as.numeric(rownames(maxThin)),]
```{asis, echo = env.any, eval = env.any, include = env.any}
``` {asis, echo = env.bc, eval = env.bc, include = env.bc} Using WorldClim (http://www.worldclim.org/) bioclimatic dataset at resolution of `r {{bcRes}}` arcmin.
# get WorldClim bioclimatic variable rasters envs <- raster::getData(name = "worldclim", var = "bio", res = {{bcRes}}, lat = {{bcLat}}, lon = {{bcLon}}) # change names rasters variables envRes <- {{bcRes}} if (envRes == 0.5) { i <- grep('_', names(envs)) editNames <- sapply(strsplit(names(envs)[i], '_'), function(x) x[1]) names(envs)[i] <- editNames } i <- grep('bio[0-9]$', names(envs)) editNames <- paste('bio', sapply(strsplit(names(envs)[i], 'bio'), function(x) x[2]), sep='0') names(envs)[i] <- editNames # subset by those variables selected envs <- envs[[{{bcSels}}]] # extract environmental values at occ grid cells locs.vals <- raster::extract(envs[[1]], occs[, c('longitude', 'latitude')]) # remove occs without environmental values occs <- occs[!is.na(locs.vals), ]
# NOTE: provide the path to the folder that contains the rasters d.envs <- '' # create paths to the raster files userRas.paths <- file.path(d.envs, {{userEnvs}}) # make a RasterStack out of the raster files envs <- raster::stack(userRas.paths)
bgType <- switch("{{bgSel}}", 'bb'='Bounding Box', 'mcp'='Minimum Convex Polygon', 'ptsbuf'='Buffered Points', 'user'='User-defined')
```{asis, echo = bg.any, eval = bg.any, include = bg.any}
Background selection technique chosen as r bgType
.
```r xmin <- min(occs$longitude) xmax <- max(occs$longitude) ymin <- min(occs$latitude) ymax <- max(occs$latitude) bb <- matrix(c(xmin, xmin, xmax, xmax, xmin, ymin, ymax, ymax, ymin, ymin), ncol=2) bgExt <- sp::SpatialPolygons(list(sp::Polygons(list(sp::Polygon(bb)), 1)))
occs.xy <- occs[c('longitude', 'latitude')] sp::coordinates(occs.xy) <- ~ longitude + latitude bgExt <- mcp(occs.xy)
# make SpatialPoints object for buffering occs.xy <- occs[c('longitude', 'latitude')] sp::coordinates(occs.xy) <- ~ longitude + latitude bgExt <- rgeos::gBuffer(occs.xy, width = {{bgBuf}})
```{asis, echo = bg.userCSV, eval = bg.userCSV, include = bg.userCSV} Read a .csv file and generate a Spatial Polygon object.
```r # NOTE: provide the full path to the CSV file csvPath <- '' # read csv with coordinates for polygon shp <- read.csv(csvPath, header = TRUE) bgExt <- sp::SpatialPolygons(list(sp::Polygons(list(sp::Polygon(shp)), 1)))
``{asis, echo = bg.userShp, eval = bg.userShp, include = bg.userShp}
User study extent name is
r "{{bgUserShpName}}". User study extent path is
r "{{bgUserShpPath}}"`. Read in the .shp file and generate a Spatial Polygon object.
```r # NOTE: provide the path to the folder that contains the shapefile d.bg <- '' # read csv with coordinates for polygon bgExt <- rgdal::readOGR(d.bg, "{{bgUserShpName}}")
``{asis, echo = bg.bufWid, eval = bg.bufWid, include = bg.bufWid}
Buffer size of the study extent polygon defined as
r {{bgBuf}}` degrees.
```r bgExt <- rgeos::gBuffer(bgExt, width = {{bgBuf}})
``{asis, echo = bg.any, eval = bg.any, include = bg.any}
Mask environmental variables by
r bgType`, and take a random sample of background values from the study extent. As the sample is random, your results may be different than those in the session. If there seems to be too much variability in these background samples, try increasing the number from 10,000 to something higher (e.g. 50,000 or 100,000). The better your background sample, the less variability you'll have between runs.
```r # crop the environmental rasters by the background extent shape envsBgCrop <- raster::crop(envs, bgExt) # mask the background extent shape from the cropped raster envsBgMsk <- raster::mask(envsBgCrop, bgExt) # sample random background points bg.xy <- dismo::randomPoints(envsBgMsk, {{bgPtsNum}}) # convert matrix output to data frame bg.xy <- as.data.frame(bg.xy)
partSwitch <- switch("{{partSel}}", 'block'='Block', 'cb1'='Checkerboard 1', 'cb2'='Checkerboard 2', 'jack'='Jackknife', 'random'='Random')
```{asis, echo = part.any, eval = part.any, include = part.any}
Occurrence data is now partitioned for cross-validation, a method that iteratively builds a model on all but one group and evaluates that model on the left-out group.
For example, if the data is partitioned into 3 groups A, B, and C, a model is first built with groups A and B and is evaluated on C. This is repeated by building a model with B and C and evaluating on A, and so on until all combinations are done.
Cross-validation operates under the assumption that the groups are independent of each other, which may or may not be a safe assumption for your dataset. Spatial partitioning is one way to ensure more independence between groups.
You selected to partition your occurrence data by the r partSwitch
method.
```r occs.xy <- occs[c('longitude', 'latitude')] group.data <- ENMeval::get.block(occ=occs.xy, bg.coords=bg.xy)
occs.xy <- occs[c('longitude', 'latitude')] group.data <- ENMeval::get.checkerboard1(occ=occs.xy, env=envsBgMsk, bg.coords=bg.xy, aggregation.factor={{aggFact}})
occs.xy <- occs[c('longitude', 'latitude')] group.data <- ENMeval::get.checkerboard2(occ=occs.xy, env=envsBgMsk, bg.coords=bg.xy, aggregation.factor={{aggFact}})
occs.xy <- occs[c('longitude', 'latitude')] group.data <- ENMeval::get.jackknife(occ=occs.xy, bg.coords=bg.xy)
occs.xy <- occs[c('longitude', 'latitude')] group.data <- ENMeval::get.randomkfold(occ=occs.xy, bg.coords=bg.xy, kfolds={{kfolds}})
# pull out the occurrence and background partition group numbers from the list occs.grp <- group.data[[1]] bg.grp <- group.data[[2]]
```{asis, echo = mod.any, eval = mod.any, include = mod.any}
You selected the r "{{enmSel}}"
model.
```r e <- BioClim_eval(as.data.frame(occs.xy), bg.xy, occs.grp, bg.grp, envsBgMsk) # unpack the results data frame and list of models evalTbl <- e$results evalMods <- e$models names(e$predictions) <- "BIOCLIM" evalPreds <- e$predictions
# define the vector of regularization multipliers to test rms <- seq({{rms1}}, {{rms2}}, {{rmsStep}}) # iterate model building over all chosen parameter settings e <- ENMeval::ENMevaluate(occs.xy, envsBgMsk, bg.coords = bg.xy, RMvalues = rms, fc = {{fcs}}, method = 'user', occs.grp, bg.grp, clamp = {{clamp}}, algorithm = "{{algMaxent}}") # unpack the results data frame, the list of models, and the RasterStack of raw predictions evalTbl <- e@results evalMods <- e@models names(evalMods) <- e@results$settings evalPreds <- e@predictions
```{asis, echo = viz.any, eval = viz.any, include = viz.any}
Below are some visualizations of your modeling analysis results.
```r}' == "maxent.jar", include = viz.resp} # view response curves for environmental variables with non-zero coefficients dismo::response(evalMods[["{{modSel}}"]], var = {{mxNonZeroCoefs}})
```r}' == "maxnet", include = viz.resp, warning = F, message = F}
plot(evalMods[["{{modSel}}"]], vars = {{mxNonZeroCoefs}}, type = "cloglog")
```r # view BIOCLIM envelope plot plot(evalMods[['BIOCLIM']], a = {{bcPlot1}}, b = {{bcPlot2}}, p = {{bcPlotP}})
# view ENMeval results ENMeval::eval.plot(evalTbl, value = "{{mxEvalSel}}")
# Select your model from the models list mod <- evalMods[["{{modSel}}"]]
# generate bioclim prediction pred <- evalPreds[["{{modSel}}"]]
# generate raw prediction pred <- evalPreds[["{{modSel}}"]]
# generate logistic prediction pred <- ENMeval::maxnet.predictRaster(mod, envsBgMsk, type = 'logistic', clamp = {{clamp}})
# generate cloglog prediction pred <- ENMeval::maxnet.predictRaster(mod, envsBgMsk, type = 'cloglog', clamp = {{clamp}})
# generate logistic prediction pred <- dismo::predict(mod, envsBgMsk, args = c("outputformat=logistic"))
# generate cloglog prediction pred <- dismo::predict(mod, envsBgMsk, args = c("outputformat=cloglog"))
# get predicted values for occurrence grid cells occPredVals <- raster::extract(pred, occs.xy) # define minimum training presence threshold thr <- thresh(occPredVals, "{{comp7.thresh}}") # threshold model prediction pred <- pred > thr
# plot the model prediction plot(pred)
```{asis, echo = pj.any, eval = pj.any, include = pj.any}
You selected to project your model. First define a polygon with the coordinates you chose, then crop and mask your predictor rasters. Finally, predict suitability values for these new raster cells based on the model you selected.
```r projCoords <- data.frame(x = {{occsPjX}}, y = {{occsPjY}}) projPoly <- sp::SpatialPolygons(list(sp::Polygons(list(sp::Polygon(projCoords)), ID=1)))
```{asis, echo = pj.area & alg.maxnet, eval = pj.area, include = pj.area}
Now use crop and mask the predictor variables by projPoly, and use the maxnet.predictRaster() function to predict the values for the new extent based on the model selected.
```r predsProj <- raster::crop(envs, projPoly) predsProj <- raster::mask(predsProj, projPoly) proj <- ENMeval::maxnet.predictRaster(mod, predsProj, type = 'exponential', clamp = {{clamp}})
predsProj <- raster::crop(envs, projPoly) predsProj <- raster::mask(predsProj, projPoly) proj <- ENMeval::maxnet.predictRaster(mod, predsProj, type = 'logistic', clamp = {{clamp}})
predsProj <- raster::crop(envs, projPoly) predsProj <- raster::mask(predsProj, projPoly) proj <- ENMeval::maxnet.predictRaster(mod, predsProj, type = 'cloglog', clamp = {{clamp}})
```{asis, echo = pj.area & alg.maxent, eval = pj.area, include = pj.area}
Now use crop and mask the predictor variables by projPoly, and use the dismo::predict() function to predict the values for the new extent based on the model selected.
```r predsProj <- raster::crop(envs, projPoly) predsProj <- raster::mask(predsProj, projPoly) proj <- dismo::predict(mod, predsProj, args = c("outputformat=raw"))
predsProj <- raster::crop(envs, projPoly) predsProj <- raster::mask(predsProj, projPoly) proj <- dismo::predict(mod, predsProj, args = c("outputformat=logistic"))
predsProj <- raster::crop(envs, projPoly) predsProj <- raster::mask(predsProj, projPoly) proj <- dismo::predict(mod, predsProj, args = c("outputformat=cloglog"))
```{asis, echo = pj.area & mod.bioclim, eval = pj.area, include = pj.area}
Now use crop and mask the predictor variables by projPoly, and use the dismo::predict() function to predict the values for the new extent based on the model selected.
```r predsProj <- raster::crop(envs, projPoly) predsProj <- raster::mask(predsProj, projPoly) proj <- dismo::predict(mod, predsProj)
```{asis, echo = pj.time & alg.maxnet, eval = pj.time, include = pj.time}
Now download the future climate variables chosen with Wallace, crop and mask them by projPoly, and use the maxnet.predictRaster() function to predict the values for the new time based on the model selected.
```{asis, echo = pj.time & alg.maxent, eval = pj.time, include = pj.time} ### Project Niche Model to New Time Now download the future climate variables chosen with *Wallace*, crop and mask them by projPoly, and use the dismo::predict() function to predict the values for the new time based on the model selected.
```{asis, echo = pj.time & mod.bioclim, eval = pj.time, include = pj.time}
Now download the future climate variables chosen with Wallace, crop and mask them by projPoly, and use the dismo::predict() function to predict the values for the new time based on the model selected.
```r envsFuture <- raster::getData("CMIP5", var = "bio", res = {{bcRes}}, rcp = {{pjRCP}}, model = "{{pjGCM}}", year = {{pjYear}}) predsProj <- raster::crop(envsFuture, projPoly) predsProj <- raster::mask(predsProj, projPoly) # rename future climate variable names names(predsProj) <- paste0('bio', sprintf("%02d", 1:19)) # select climate variables predsProj <- raster::subset(predsProj, names(envs))
# predict model proj <- ENMeval::maxnet.predictRaster(mod, predsProj, type = 'exponential', clamp = {{clamp}})
# predict model proj <- ENMeval::maxnet.predictRaster(mod, predsProj, type = 'logistic', clamp = {{clamp}})
# predict model proj <- ENMeval::maxnet.predictRaster(mod, predsProj, type = 'cloglog', clamp = {{clamp}})
# predict model proj <- dismo::predict(mod, predsProj, args = c('outputformat=raw'))
# predict model proj <- dismo::predict(mod, predsProj, args = c('outputformat=logistic'))
# predict model proj <- dismo::predict(mod, predsProj, args = c('outputformat=cloglog'))
# predict model proj <- dismo::predict(mod, predsProj)
# get predicted values for occurrence grid cells occPredVals <- raster::extract(pred, occs.xy) # define minimum training presence threshold thr <- thresh(occPredVals, "{{comp8.thresh}}") # threshold model prediction proj <- proj > thr
# plot the model prediction plot(proj)
```{asis, echo = pj.mess, eval = pj.mess, include = pj.mess}
To visualize the environmental difference between the occurrence localities and your selected projection extent, calculate a multidimensional environmental similarity surface (MESS). High negative values mean great difference, whereas high positive values mean great similarity. Interpreting the projected suitability for areas with high negative values should be done with extreme caution, as they are outside the environmental range of the occurrence localities.
```r # extract environmental values from occurrence localities and background -- these were the values that went into the model names(bg.xy) <- names(occs.xy) all.xy <- rbind(occs.xy, bg.xy) occEnvVals <- raster::extract(envs, all.xy) # compare these values with the projection extent (envsMsk) proj.mess <- dismo::mess(predsProj, occEnvVals) plot(proj.mess)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.