# Introduction to occupancy In hypervolume: High Dimensional Geometry, Set Operations, Projection, and Inference Using Kernel Density Estimation, Support Vector Machines, and Convex Hulls

knitr::opts_chunk$set(echo = TRUE)  The occupancy routine provides a framework to join and to calculate summary statistics for multiple hypervolumes simultaneously by leveraging on the concept of occupancy, a function of the number of hypervolumes including a point in the trait space. At first, a set of random points covering all the hypervolumes is selected through one of the two methods available. The subsampling method joins the random points of the input hypervolumes and then selects a uniformly distributed subset of them. The method box creates a bounding box around the union of the q hypervolumes that is then filled with random points drawn from a uniform distribution at a specified density. Secondly, a function (e.g. mean, sum) is applied to each random point. Occupancy can be calculated for groups of observations such that emerging from repeated measures over space, time or treatments and is intended to reflect the heterogeneity of trait space utilization by multiple hypervolumes, that could be in turn associated with relevant ecological processes. The occupancy routine comes with a permutation test to detect between-group differences in occupancy values. For each pairwise group comparison, the original hypervolumes are randomly assigned to one of the two groups under comparison. The test itself is performed by counting the number of times the observed differences are smaller or greater than those expected by chance, or by combining them for obtaining a two-tailed test. ## SIMULATED EXAMPLE We generated hypervolumes by picking 100 points randomly within a circle of radius 1. The first group (a) is composed of one hypervolume centered at coordinates x = -1 and y = 1 and 9 hypervolumes at coordinates x = 1 and y = 1. The second group (b) is built using the same strategy but by inverting x and y the coordinates. We then estimated occupancy as the average number of hypervolumes including a given random point. We tested each scenario (number of hypervolumes) with an increasing number of permutations (9, 19, 99, 199, 999) for 19 times in order to evaluate the consistency of the results. Moreover, we performed a two-tailed test with a probability threshold of 0.05 and reported the volume resulting from the permutation test was used as the target metric. Given the simulation setting, we expect a volume of the significant fraction to be close to the true value of 2$\pi$=6.28, corresponding to both hypervolumes together. The uncertainty about the volume is due to the uncertainty during hypervolume construction. library(tidyverse) library(ggpubr) library(hypervolume) # load the example dataset data(circles) # set seed for reproducibility set.seed(42) # create gaussian hypervolumes from points using lapply # and transform the resulting list in a HypervolumeList hyper_list <- lapply(circles, function(x) hypervolume_gaussian(x)) hyper_list <- hypervolume_join(hyper_list) # create labels to divide the hypervolumes in the resulting HypervolumeList # into two groups group_labels <- rep(letters[1:2], each = 10) # create an occupancy object from the hyper_list # we will use the box method, and the summary statistics of each random point # calculated as the mean value hyper_occupancy <- hypervolume_n_occupancy(hyper_list, method = "box", classification = group_labels, box_density = 5000, FUN = mean) # transform the object hyper_occupancy to a data.frame for plotting plot_hyper_occupancy <- hypervolume_to_data_frame(hyper_occupancy) # plot the hyper_occupancy, by removing ValueAtRandomPoints equal to 0 # and by taking a subsample for shortening the time needed to generate the plot plot_hyper_occupancy %>% select(X.1, X.2, Name, ValueAtRandomPoints) %>% filter(ValueAtRandomPoints != 0) %>% group_by(Name) %>% sample_n(10000) %>% ungroup() %>% ggplot(aes(X.1, X.2, col = ValueAtRandomPoints), alpha = 0.7) + geom_point(size = 1, alpha = 0.5) + theme_bw() + facet_wrap(~ Name, ncol = 1) + labs(x = "Axis 1", y = "Axis 2", color = "") + scale_colour_continuous(high = "#132B43", low = "#add8e6") + coord_fixed()  url <- "g1.png" knitr::include_graphics(url)  The permutation test can be performed using the functions hypervolume_n_occupancy_permute() and hypervolume_n_occupancy_test(). # set the folder to store the permutate hypervolumes folder_name <- paste("circles_99", "20", sep = "_") # permute the hypervolumes 99 times hyper_op <- hypervolume_n_occupancy_permute(folder_name, hyper_occupancy, hyper_list, n = 99, cores = 4) # perform a two-tailed permutation test hyper_op_test<- hypervolume_n_occupancy_test(hyper_occupancy, hyper_op, alternative = "two_sided") # transform the results of the permutation test into a data.frame for plotting # we will take a subset of the random points to reduce plotting times plot_circles_test <- hypervolume_to_data_frame(hyper_occupancy_permute_test_ts) %>% rename(occupancy = ValueAtRandomPoints) %>% filter(occupancy != 0) %>% sample_n(5000) # plot the results with ggplot2 ggplot(plot_circles_test) + geom_point(aes(x = X.1, y = X.2, col = occupancy)) + theme_bw() + scale_color_gradient(low = "orange", high = "blue", limits = c(-1, 1)) + labs(x = "Axis 1", y = "Axis 2", color = "")  We can assess how many permutations we need to have stable results. We will explore different number of permutation multiple times (19). Results show that we need at least 19 permutations to detect a significant differences at 0.05 significance level. url <- "g2.png" knitr::include_graphics(url)  # number of iterations N <- 19 ### 9 permutations # initialize an empty vector to store the volume of the significant # fraction evaluated with the permutation test res_9 <- c() for(i in 1:N){ # name of the folder to store the permutations # 9 = number of permutation # i = number of iteration # 20 = number of hypervolumes in hyper_list folder_name <- paste("vol_same_coord_9", i, "20", sep = "_") # perform 9 hypervolume permutation hyper_op <- hypervolume_n_occupancy_permute(folder_name, hyper_occupancy, hyper_list, n = 9, cores = 4) # perform a two-tailed test hyper_op_test <- hypervolume_n_occupancy_test(hyper_occupancy, hyper_op, alternative = "two_sided") # store the volume of the signifiant fraction res_9 <- c(res_9, hyper_op_test@Volume) } ### 19 permutations res_19 <- c() for(i in 1:N){ folder_name <- paste("vol_same_coord_19", i, "16", sep = "_") hyper_occupancy_permute <- hypervolume_n_occupancy_permute(folder_name, hyper_occupancy, hyper_list, n = 19, cores = 4) hyper_occupancy_permute_test_ts <- hypervolume_n_occupancy_test(hyper_occupancy, hyper_occupancy_permute, alternative = "two_sided") res_19 <- c(res_19, hyper_occupancy_permute_test_ts@Volume) } ### 99 permutations res_99 <- c() for(i in 1:N){ folder_name <- paste("vol_same_coord_99", i, "16", sep = "_") hyper_occupancy_permute <- hypervolume_n_occupancy_permute(folder_name, hyper_occupancy, hyper_list, n = 99, cores = 4) hyper_occupancy_permute_test_ts <- hypervolume_n_occupancy_test(hyper_occupancy, hyper_occupancy_permute, alternative = "two_sided") res_99 <- c(res_99, hyper_occupancy_permute_test_ts@Volume) } ### 199 permutations res_199 <- c() for(i in 1:N){ folder_name <- paste("vol_same_coord_199", i, "16", sep = "_") hyper_occupancy_permute <- hypervolume_n_occupancy_permute(folder_name, hyper_occupancy, hyper_list, n = 199, cores = 4) hyper_occupancy_permute_test_ts <- hypervolume_n_occupancy_test(hyper_occupancy, hyper_occupancy_permute, alternative = "two_sided") res_199 <- c(res_199, hyper_occupancy_permute_test_ts@Volume) } res_199 <- na.omit(res_199) range(res_199) mean(res_199) median(res_199) quantile(res_199, c(0.025, 0.975)) ### 999 permutations res_999 <- c() for(i in 1:N){ folder_name <- paste("vol_same_coord_999", i, "16", sep = "_") hyper_occupancy_permute <- hypervolume_n_occupancy_permute(folder_name, hyper_occupancy, hyper_list, n = 999, cores = 4) hyper_occupancy_permute_test_ts <- hypervolume_n_occupancy_test(hyper_occupancy, hyper_occupancy_permute, alternative = "two_sided") res_999 <- c(res_999, hyper_occupancy_permute_test_ts@Volume) } res_999 <- na.omit(res_999)  url <- "simulation_results.png" knitr::include_graphics(url)  ## REAL CASE EXAMPLE We assessed the climatic niche occupancy of two highly speciose plant genera, Acacia and Pinus, to evaluate patterns in climatic preferences within each genus. Here, occupancy represent the mean number of species that occupy a given point in the multidimensional space. For each species within these genera (60 and 63 for Acacia and Pinus, respectively) we downloaded occurrence data from the BIEN dataset (Maitner, 2022). Climatic data for each occurrence was obtained from worldclim (Fick and Hijmans, 2017) using the raster package (Hijmans et al. 2022). As in Blonder et al. (2018), we selected three climate variables: mean annual temperature, mean annual precipitation, and precipitation in warmest quarter/(precipitation in warmest quarter + precipitation in coldest quarter). For each species, niches were built using the function hypervolume_gaussian with standard settings. Climate layers were z- transformed (centered relative to mean and scaled relative to standard deviation) prior to hypervolume construction. Occupancy at genus level was obtained with the function hypervolume_n_occupancy using the box method and the mean as the summary statistics. # set.seed set.seed(42) # get climatic variables using the raster package climate <- raster::getData('worldclim', var='bio', res=10) # Create a new climate variable bio20<-climate[[18]]/(climate[[18]]+climate[[19]]) slot(bio20@data, "names")<-"bio20" # Restrict our stack to 3 variables with low correlation climate<-raster::stack(climate[[c(1,12)]],bio20) rm(bio20) # Do a z-transform on our data climate <- raster::scale(climate) # load data reporting occurrences of Pinus and Acacia species downloaded from BIEN data("acacia_pinus") # get unique species species_unique <- unique(acacia_pinus$species)

# create hypervolumes for each species with a loop
# at first, create a list to store the hypervolumes
# and also a vector to store the genus information
hyper_list <- list()
genus_unique <- c()

for(i in 1:length(species_unique)){
# get climatic data of the ith species
data_i <- acacia_pinus %>%
filter(species == species_unique[i]) %>%
select(longitude, latitude) %>%
raster::extract(x = climate,
y = .)

# remove NA data
data_i <- na.omit(data_i)

# calculate the hypervolume assigning the species name
hyper_list[[i]] <- hypervolume(data_i, name = species_unique[i], verbose = FALSE)
# store the genus level information
genus_unique <- c(genus_unique, unlist(strsplit(species_unique[i], " "))[1])
print(paste(species_unique[i], ": done", sep = ""))
}

# transform hyper_list in an HypervolumeList
hyper_list <- hypervolume_join(hyper_list)

# increase box_density if you want more accurate results
# with mean as summary statistics
hyper_occupancy <- hypervolume_n_occupancy(hyper_list,
classification = genus_unique,
method = "box",
FUN = "mean")

# transform hyper_occupancy into a data.frame for plotting
climatic_occupancy <- hypervolume_to_data_frame(hyper_occupancy) %>%
filter(ValueAtRandomPoints != 0) %>%
group_by(Name) %>%
sample_n(2000) %>%
ungroup() %>%
sample_n(nrow(.)) %>%
rename(genus = Name, occupancy = ValueAtRandomPoints)

# plot bio1 vs bio12
g1_12_points <- ggplot(climatic_occupancy) +
geom_point(aes(bio1, bio12, col = genus, size = occupancy), alpha = 0.1) +
theme_bw() +
scale_size_continuous(limits = c(0, 1)) +
scale_colour_manual(values = c("#FFC20A", "#0C7BDC")) +
labs(title = "bio1 - bio12")

# plot bio1 vs bio20
g1_20_points <- ggplot(climatic_occupancy) +
geom_point(aes(bio1, bio20, col = genus, size = occupancy), alpha = 0.1) +
theme_bw() +
scale_size_continuous(limits = c(0, 1)) +
scale_colour_manual(values = c("#FFC20A", "#0C7BDC")) +
labs(title = "bio1 - bio20")

# plot bio12 vs bio20
g12_20_points <- ggplot(climatic_occupancy) +
geom_point(aes(bio12, bio20, col = genus, size = occupancy), alpha = 0.1) +
theme_bw() +
scale_size_continuous(limits = c(0, 1)) +
scale_colour_manual(values = c("#FFC20A", "#0C7BDC")) +
labs(title = "bio12 - bio20")

# compose the plot
ggarrange(g1_12_points, g1_20_points, g12_20_points,
common.legend = TRUE,
legend = "right",
ncol = 3,
nrow = 1)

url <- "g3.png"
knitr::include_graphics(url)


We then performed the permutation test to evaluate with region of the multidimensional space is occupied more frequently by one genus compared to the other.

# permute hypervolumes 999 times, can take long
hyper_occupancy,
hyper_list,
n = 999,
cores = 4,
verbose = FALSE)

# perform a two-tailed test
pa_occupancy_test <- hypervolume_n_occupancy_test(hyper_occupancy,
alternative = "two_sided",

# transform hyper_occupancy into a data.frame for plotting
climatic_test <- hypervolume_to_data_frame(pa_occupancy_test) %>%
filter(ValueAtRandomPoints != 0) %>%
sample_n(2000) %>%
mutate(genus = ifelse(ValueAtRandomPoints>0, "Acacia", "Pinus")) %>%
mutate(occupancy = abs(ValueAtRandomPoints))

# plot bio1 vs bio12
g1_12_test <- ggplot(climatic_test) +
geom_point(aes(bio1, bio12, col = genus, size = occupancy), alpha = 0.1) +
theme_bw() +
scale_size_continuous(limits = c(0, 1)) +
scale_colour_manual(values = c("#E66100", "#5D3A9B")) +
labs(title = "bio1 - bio12")

# plot bio1 vs bio20
g1_20_test <- ggplot(climatic_test) +
geom_point(aes(bio1, bio20, col = genus, size = occupancy), alpha = 0.1) +
theme_bw() +
scale_size_continuous(limits = c(0, 1)) +
scale_colour_manual(values = c("#E66100", "#5D3A9B")) +
labs(title = "bio1 - bio20")

# plot bio12 vs bio20
g12_20_test <- ggplot(climatic_test) +
geom_point(aes(bio12, bio20, col = genus, size = occupancy), alpha = 0.1) +
theme_bw() +
scale_size_continuous(limits = c(0, 1)) +
scale_colour_manual(values = c("#E66100", "#5D3A9B")) +
labs(title = "bio12 - bio20")

# compose the plot
ggarrange(g1_12_test, g1_20_test, g12_20_test,
common.legend = TRUE,
legend = "right",
ncol = 3,
nrow = 1)

url <- "g4.png"
knitr::include_graphics(url)


The two genera significantly occupy different parts of the climatic niche space, as highlighted by the permutation test performed on occupancy values. This suggests that (unsurprisingly in this demo case) that species in the Acacia genus, native to tropical and subtropical regions, preferentially occupies drier and hotter climates than species in the Pinus genus.

## Try the hypervolume package in your browser

Any scripts or data that you put into this service are public.

hypervolume documentation built on May 29, 2024, 8:19 a.m.