View source: R/hypervolume_n_occupancy.R
hypervolume_n_occupancy | R Documentation |
Computes the occupancy of hyperspace by one or more groups of hypervolumes.
hypervolume_n_occupancy(hv_list,
classification = NULL,
method = "subsample",
FUN = mean,
num.points.max = NULL,
verbose = TRUE,
distance.factor = 1,
check.hyperplane = FALSE,
box_density = 5000,
thin = FALSE,
quant.thin = 0.5,
seed = NULL,
print_log = FALSE)
hypervolume_n_occupancy_bootstrap(path,
name = NULL,
classification = NULL,
method = "subsample",
FUN = mean,
num.points.max = NULL,
verbose = TRUE,
distance.factor = 1,
check.hyperplane = FALSE,
box_density = 5000,
thin = FALSE,
quant.thin = 0.5,
seed = NULL)
hv_list |
An |
classification |
A vector assigning each hypervolume in the |
method |
Can be |
FUN |
A function to aggregate points within each group. Default to |
num.points.max |
Maximum number of random points to use for set operations. If |
verbose |
Logical value; print diagnostic output if |
distance.factor |
Numeric value; multiplicative factor applied to the critical distance for all inclusion tests (see below). Recommended to not change this parameter. |
check.hyperplane |
Check if data is hyperplanar. |
box_density |
Density of random points to fill the hyperbox when method is equal to |
thin |
Take a subsample of random points to get a more uniform distribution of random points. Intended to be used with |
quant.thin |
Set quantile for using when |
seed |
Set seed for random number generation. Useful for having reproducible results and with the use of |
print_log |
Save a log file with the volume of each input hypervolume, recomputed volume and the ratio between the original and recomputed hypervolumes. It works for |
path |
A path to a directory of bootstrapped hypervolumes obtained with |
name |
File name; The function writes hypervolumes to file in "./Objects/<name>". |
Uses the inclusion test approach to count how many hypervolumes include each random point. Counts range from 0 (no hypervolumes contain a given random point), to the number of hypervolumes in a group (all the hypervolumes contain a given random point). A function FUN
, usually mean
or sum
, is then applied. A hypervolume is then returned for each group and the occupancy stored in ValueAtRandomPoints
. IMPORTANT: random points with
ValueAtRandomPoints
equal to 0 are not removed to ease downstream calculation.
When method = "subsample"
the computation is performed on a random sample from input hypervolumes, constraining each to have the same point density given by the minimum of the point density of each input hypervolume and the point density calculated using the volumes of each input hypervolume divided by num.points.max
.
Because this algorithm is based on distances calculated between the distributions of random points, the critical distance (point density ^ (-1/n)) can be scaled by a user-specified factor to provide more or less liberal estimates (distance_factor
greater than or less than 1).
Two methods can be used for calculating the occupancy. The method subsample
is based on a random sample of points from input hypervolumes. Each point is selected with a probability set to the inverse of the number of neighbour points calculated according to the critical distance. This method performs accurately when input hypervolumes have a low degree of overlap. The method box
create a bounding box around the union of input hypervolumes. The bounding box is filled with points following a uniform distribution and with a density set with the argument box_density
. A greater density usually provides more accurate results. The method box
performs better than the method subsample
in low dimensions, while in higher dimensions the method box
become computationally inefficient as nearly all of the hyperbox sampling space will end up being empty and most of the points will be rejected.
When verbose = TRUE
the volume of each input hypervolume will be printed to screen togheter with the recomputed volume and the ratio between the original and recomputed hypervolumes. Mean absolute error (MAE) and root mean square error (RMSE) are also provided as overall measures of the goodness of fit. A log file will be saved in the working directory with the information about the volume of input hypervolumes, the recomputed volume and the ratio between the original and recomputed hypervolumes.
When thin = TRUE
an algorithm is applied to try to make the distribution of random points more uniform. Moderate departures from uniform distribution can in fact result from applying hypervolume_n_occupancy()
on hypervolumes with a high overlap degree. At first, the algorithm in thin
calculates the minimum distance from the neighboor points within the critical distance for each random point. A quantile (set with quant.thin
) of these distances is taken and set as the threshold distance. Random points are then subset so that the distance of a point to another is greater than the threshold distance.
The function hypervolume_n_occupancy_bootstrap()
takes a path of bootstrapped hypervolumes generated with hypervolume_n_resample()
as input. It creates a directory called Objects in the current working directory if a directory of that name doesn't already exist where storing occupancy objects. The function hypervolume_n_occupancy_bootstrap()
returns the absolute path to the directory with bootstrapped hypervolumes. It automatically saves a log file with the volume of each input hypervolume, the recomputed volume and the ratio between the original and recomputed hypervolumes. The log file is used with occupancy_bootstrap_gof()
.
hypervolume_n_occupancy()
returns a Hypervolume
or HypervolumeList
whose number of hypervolumes equals the number of groups in classification
. hypervolume_n_occupancy_bootstrap()
returns a string containing an absolute path equivalent to ./Objects/<name>.
find_optimal_occupancy_thin
, occupancy_bootstrap_gof
## Not run:
data(penguins,package='palmerpenguins')
penguins_no_na = as.data.frame(na.omit(penguins))
# split the dataset on species and sex
penguins_no_na_split = split(penguins_no_na,
paste(penguins_no_na$species, penguins_no_na$sex, sep = "_"))
# calculate the hypervolume for each element of the splitted dataset
hv_list = mapply(function(x, y)
hypervolume_gaussian(x[, c("bill_length_mm", "flipper_length_mm")],
samples.per.point=100, name = y),
x = penguins_no_na_split,
y = names(penguins_no_na_split))
hv_list <- hypervolume_join(hv_list)
# calculate occupancy without groups
hv_occupancy <- hypervolume_n_occupancy(hv_list)
plot(hv_occupancy, cex.random = 1)
# calculate occupancy with groups
hv_occupancy_list_sex <- hypervolume_n_occupancy(hv_list,
classification = rep(c("female", "male"), each = 3))
plot(hv_occupancy_list_sex, cex.random = 1, show.density = FALSE)
### hypervolume_n_occupancy_bootstrap ###
# bootstrap the hypervolumes
hv_list_boot = hypervolume_n_resample(name = "example", hv_list)
# calculate occupancy on bootstrapped hypervolumes
hv_occupancy_boot_sex = hypervolume_n_occupancy_bootstrap(path = hv_list_boot,
name = "example_occ",
classification = rep(c("female", "male"), 3))
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.