knitr::opts_chunk$set(collapse = TRUE, comment = "#>", fig.align = "center") options(knitr.table.format = "html")
In this article you will learn how to prepare the data to train models using SDMtune
. We will use the virtualSp dataset included in the package and environmental predictors from the WorldClim dataset.
Load the required packages for the analysis:
library(ggplot2) # To plot locations library(maps) # To access useful maps library(rasterVis) # To plot raster objects
For the analysis we use the climate data of WorldClim version 1.4 [@Hijmans2005] and the terrestrial ecoregions from WWF [@Olson2001] included in the dismo
package:
files <- list.files(path = file.path(system.file(package = "dismo"), "ex"), pattern = "grd", full.names = TRUE)
We convert the files in a raster object that will be used later in the analysis:
predictors <- terra::rast(files)
There are nine environmental variables, eight continuous and one categorical:
names(predictors)
We can plot bio1 using the gplot function from the rasterVis
package:
gplot(predictors$bio1) + geom_tile(mapping = aes(fill = value)) + coord_equal() + scale_fill_gradientn(colours = c("#2c7bb6", "#abd9e9", "#ffffbf", "#fdae61", "#d7191c"), na.value = "transparent", name = "°C x 10") + labs(title = "Annual Mean Temperature", x = "longitude", y = "latitude") + scale_x_continuous(expand = c(0, 0)) + scale_y_continuous(expand = c(0, 0)) + theme_minimal() + theme(plot.title = element_text(hjust = 0.5), axis.ticks.x = element_blank(), axis.ticks.y = element_blank())
Let's load the SDMtune package:
library(SDMtune)
For demonstrating how to use SDMtune
we use the random generated virtual species virtualSp dataset included in the package. The dataset contains r nrow(virtualSp$presence)
coordinates for presence and r nrow(virtualSp$background)
for background locations.
help(virtualSp) p_coords <- virtualSp$presence bg_coords <- virtualSp$background
Plot the study area together with the presence locations:
ggplot(data = map_data("world"), mapping = aes(x = long, y = lat)) + geom_polygon(aes(group = group), fill = "grey95", color = "gray40", size = 0.2) + geom_jitter(data = p_coords, aes(x = x, y = y), color = "red", alpha = 0.4, size = 1) + labs(x = "longitude", y = "latitude") + theme_minimal() + theme(legend.position = "none") + coord_fixed() + scale_x_continuous(limits = c(-125, -32)) + scale_y_continuous(limits = c(-56, 40))
To plot the background locations run the following code:
ggplot(data = map_data("world"), mapping = aes(x = long, y = lat)) + geom_polygon(aes(group = group), fill = "grey95", color = "gray40", size = 0.2) + geom_jitter(data = as.data.frame(bg_coords), aes(x = x, y = y), color = "blue", alpha = 0.4, size = 0.5) + labs(x = "longitude", y = "latitude") + theme_minimal() + theme(legend.position = "none") + coord_fixed() + scale_x_continuous(limits = c(-125, -32)) + scale_y_continuous(limits = c(-56, 40))
Before training a model we have to prepare the data in the correct format. The prepareSWD()
function creates an SWD()
object that stores the species name, the coordinates of the species at presence and absence/background locations and the value of the environmental variables at the locations. The argument categorical
indicates which environmental variables are categorical. In our example biome is categorical (we can pass a vector if we have more than one categorical environmental variable). The function extracts the value of the environmental variables for each location and excludes those locations that have NA
value for at least one environmental variable.
data <- prepareSWD(species = "Virtual species", p = p_coords, a = bg_coords, env = predictors, categorical = "biome")
data <- prepareSWD(species = "Virtual species", p = p_coords, a = bg_coords, env = predictors, categorical = "biome", verbose = FALSE)
Let's have a look at the created SWD()
object:
data
When we print an SWD()
object we get a bunch of information:
The object contains four slots: @species
, @coords
@data
and @pa
. @pa
contains a vector with 1 for presence and 0 for absence/background locations. To visualize the data we run:
head(data@data)
kableExtra::kable(head(data@data)) |> kableExtra::kable_styling(bootstrap_options = c("striped", "hover"), position = "center", full_width = FALSE)
We can visualize the coordinates with:
head(data@coords)
kableExtra::kable(head(data@coords)) |> kableExtra::kable_styling(bootstrap_options = c("striped", "hover"), position = "center", full_width = FALSE)
or the name of the species with:
data@species
We can save the SWD()
object in a .csv file using the function swd2csv()
(the function saves the file in the working directory). There are two possibilities:
swd2csv(data, file_name = "data.csv")
swd2csv(data, file_name = c("presence.csv", "background.csv"))
In this article you have learned:
gplot
function included in the rasterVis
package;ggplot
and the maps
packages;SWD()
objects;SWD()
object;SWD()
object in a .csv file.Move on to the second article and learn how to train models using SDMtune
.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.