hypervolume_n_occupancy: Operations for groups of hypervolumes
In bblonder/hypervolume: High Dimensional Geometry, Set Operations, Projection, and Inference Using Kernel Density Estimation, Support Vector Machines, and Convex Hulls

View source: R/hypervolume_n_occupancy.R

hypervolume_n_occupancy

R Documentation

Operations for groups of hypervolumes

Description

Computes the occupancy of hyperspace by one or more groups of hypervolumes.

Usage

hypervolume_n_occupancy(hv_list,
                        classification = NULL,
                        method = "subsample",
                        FUN = mean,
                        num.points.max = NULL,
                        verbose = TRUE,
                        distance.factor = 1,
                        check.hyperplane = FALSE,
                        box_density = 5000,
                        thin = FALSE,
                        quant.thin = 0.5,
                        seed = NULL,
                        print_log = FALSE)
                        
hypervolume_n_occupancy_bootstrap(path,
                                  name = NULL,
                                  classification = NULL,
                                  method = "subsample",
                                  FUN = mean,
                                  num.points.max = NULL,
                                  verbose = TRUE,
                                  distance.factor = 1,
                                  check.hyperplane = FALSE,
                                  box_density = 5000,
                                  thin = FALSE,
                                  quant.thin = 0.5,
                                  seed = NULL)

Arguments

`hv_list`	An `HypervolumeList`.
`classification`	A vector assigning each hypervolume in the `HypervolumeList` to a group.
`method`	Can be `subsample` or `box`. See details.
`FUN`	A function to aggregate points within each group. Default to `mean`.
`num.points.max`	Maximum number of random points to use for set operations. If `NULL` defaults to 10^(3+sqrt(n)) where n is the dimensionality of the input hypervolumes. Note that this default parameter value has been increased by a factor of 10 since the 1.2 release of this package.
`verbose`	Logical value; print diagnostic output if `TRUE`.
`distance.factor`	Numeric value; multiplicative factor applied to the critical distance for all inclusion tests (see below). Recommended to not change this parameter.
`check.hyperplane`	Check if data is hyperplanar.
`box_density`	Density of random points to fill the hyperbox when method is equal to `box`.
`thin`	Take a subsample of random points to get a more uniform distribution of random points. Intended to be used with `method = "subsample"`, but can be used with `method = "box"` too. Can be slow, especially in high dimensions. See details.
`quant.thin`	Set quantile for using when `thin = TRUE`. See details.
`seed`	Set seed for random number generation. Useful for having reproducible results and with the use of `find_optimal_occupancy_thin()`
`print_log`	Save a log file with the volume of each input hypervolume, recomputed volume and the ratio between the original and recomputed hypervolumes. It works for `hypervolume_n_occupancy()` only.
`path`	A path to a directory of bootstrapped hypervolumes obtained with `hypervolume_n_resample()`.
`name`	File name; The function writes hypervolumes to file in "./Objects/<name>".

Details

Uses the inclusion test approach to count how many hypervolumes include each random point. Counts range from 0 (no hypervolumes contain a given random point), to the number of hypervolumes in a group (all the hypervolumes contain a given random point). A function FUN, usually mean or sum, is then applied. A hypervolume is then returned for each group and the occupancy stored in ValueAtRandomPoints. IMPORTANT: random points with ValueAtRandomPoints equal to 0 are not removed to ease downstream calculation.
When method = "subsample" the computation is performed on a random sample from input hypervolumes, constraining each to have the same point density given by the minimum of the point density of each input hypervolume and the point density calculated using the volumes of each input hypervolume divided by num.points.max.
Because this algorithm is based on distances calculated between the distributions of random points, the critical distance (point density ^ (-1/n)) can be scaled by a user-specified factor to provide more or less liberal estimates (distance_factor greater than or less than 1).
Two methods can be used for calculating the occupancy. The method subsample is based on a random sample of points from input hypervolumes. Each point is selected with a probability set to the inverse of the number of neighbour points calculated according to the critical distance. This method performs accurately when input hypervolumes have a low degree of overlap. The method box create a bounding box around the union of input hypervolumes. The bounding box is filled with points following a uniform distribution and with a density set with the argument box_density. A greater density usually provides more accurate results. The method box performs better than the method subsample in low dimensions, while in higher dimensions the method box become computationally inefficient as nearly all of the hyperbox sampling space will end up being empty and most of the points will be rejected.
When verbose = TRUE the volume of each input hypervolume will be printed to screen togheter with the recomputed volume and the ratio between the original and recomputed hypervolumes. Mean absolute error (MAE) and root mean square error (RMSE) are also provided as overall measures of the goodness of fit. A log file will be saved in the working directory with the information about the volume of input hypervolumes, the recomputed volume and the ratio between the original and recomputed hypervolumes.
When thin = TRUE an algorithm is applied to try to make the distribution of random points more uniform. Moderate departures from uniform distribution can in fact result from applying hypervolume_n_occupancy() on hypervolumes with a high overlap degree. At first, the algorithm in thin calculates the minimum distance from the neighboor points within the critical distance for each random point. A quantile (set with quant.thin) of these distances is taken and set as the threshold distance. Random points are then subset so that the distance of a point to another is greater than the threshold distance.
The function hypervolume_n_occupancy_bootstrap() takes a path of bootstrapped hypervolumes generated with hypervolume_n_resample() as input. It creates a directory called Objects in the current working directory if a directory of that name doesn't already exist where storing occupancy objects. The function hypervolume_n_occupancy_bootstrap() returns the absolute path to the directory with bootstrapped hypervolumes. It automatically saves a log file with the volume of each input hypervolume, the recomputed volume and the ratio between the original and recomputed hypervolumes. The log file is used with occupancy_bootstrap_gof().

Value

hypervolume_n_occupancy() returns a Hypervolume or HypervolumeList whose number of hypervolumes equals the number of groups in classification. hypervolume_n_occupancy_bootstrap() returns a string containing an absolute path equivalent to ./Objects/<name>.

Examples

## Not run: 
data(penguins,package='palmerpenguins')
penguins_no_na = as.data.frame(na.omit(penguins))

# split the dataset on species and sex
penguins_no_na_split = split(penguins_no_na, 
                        paste(penguins_no_na$species, penguins_no_na$sex, sep = "_"))


# calculate the hypervolume for each element of the splitted dataset
hv_list = mapply(function(x, y) 
  hypervolume_gaussian(x[, c("bill_length_mm", "flipper_length_mm")],
                       samples.per.point=100, name = y), 
  x = penguins_no_na_split, 
  y = names(penguins_no_na_split))

hv_list <- hypervolume_join(hv_list)

# calculate occupancy without groups
hv_occupancy <- hypervolume_n_occupancy(hv_list)
plot(hv_occupancy, cex.random = 1)

# calculate occupancy with groups
hv_occupancy_list_sex <- hypervolume_n_occupancy(hv_list, 
                          classification = rep(c("female", "male"), each = 3))

plot(hv_occupancy_list_sex, cex.random = 1, show.density = FALSE)


### hypervolume_n_occupancy_bootstrap  ###

# bootstrap the hypervolumes
hv_list_boot = hypervolume_n_resample(name = "example", hv_list)

# calculate occupancy on bootstrapped hypervolumes
hv_occupancy_boot_sex = hypervolume_n_occupancy_bootstrap(path = hv_list_boot,
                                    name = "example_occ",
                                    classification = rep(c("female", "male"), 3))


## End(Not run)

bblonder/hypervolume documentation built on Jan. 21, 2025, 3:09 p.m.

bblonder/hypervolume index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

bblonder/hypervolume
High Dimensional Geometry, Set Operations, Projection, and Inference Using Kernel Density Estimation, Support Vector Machines, and Convex Hulls

hypervolume_n_occupancy: Operations for groups of hypervolumes
In bblonder/hypervolume: High Dimensional Geometry, Set Operations, Projection, and Inference Using Kernel Density Estimation, Support Vector Machines, and Convex Hulls

Operations for groups of hypervolumes

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to hypervolume_n_occupancy in bblonder/hypervolume...

R Package Documentation

Browse R Packages

We want your feedback!

bblonder/hypervolume High Dimensional Geometry, Set Operations, Projection, and Inference Using Kernel Density Estimation, Support Vector Machines, and Convex Hulls

hypervolume_n_occupancy: Operations for groups of hypervolumes In bblonder/hypervolume: High Dimensional Geometry, Set Operations, Projection, and Inference Using Kernel Density Estimation, Support Vector Machines, and Convex Hulls

Operations for groups of hypervolumes

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to hypervolume_n_occupancy in bblonder/hypervolume...

R Package Documentation

Browse R Packages

We want your feedback!

bblonder/hypervolume
High Dimensional Geometry, Set Operations, Projection, and Inference Using Kernel Density Estimation, Support Vector Machines, and Convex Hulls

hypervolume_n_occupancy: Operations for groups of hypervolumes
In bblonder/hypervolume: High Dimensional Geometry, Set Operations, Projection, and Inference Using Kernel Density Estimation, Support Vector Machines, and Convex Hulls