histogram_stats: Statistics for Histogram Data

histogram_statsR Documentation

Statistics for Histogram Data

Description

Functions to compute the mean, variance, covariance, and correlation of histogram-valued data.

Usage

hist_mean(x, var_name, method = "BG", ...)

hist_var(x, var_name, method = "BG", ...)

hist_cov(x, var_name1, var_name2, method = "BG", ...)

hist_cor(x, var_name1, var_name2, method = "BG", ...)

Arguments

x

histogram-valued data object.

var_name

the variable name or the column location.

method

method to calculate statistics. One of "BG" (Bertrand and Goupil, 2000; default), "BD" (Billard and Diday, 2006), "B" (Billard, 2008), or "L2W" (L2 Wasserstein). All four methods are available for all four functions.

...

additional parameters.

var_name1

the variable name or the column location.

var_name2

the variable name or the column location.

Details

Four functions are provided:

  • hist_mean: Compute the mean of histogram-valued data.

  • hist_var: Compute the variance of histogram-valued data.

  • hist_cov: Compute the covariance between two histogram-valued variables.

  • hist_cor: Compute the correlation between two histogram-valued variables.

Four methods are supported for all functions:

BG

Bertrand and Goupil (2000) method. Uses histogram bin boundaries and probabilities to compute first and second moments.

BD

Billard and Diday (2006) method. A signed decomposition using the sign of each bin's midpoint deviation from the overall mean and a quadratic form on the bin boundaries.

B

Billard (2008) method. Uses cross-products of deviations of the bin boundaries from the overall mean.

L2W

L2 Wasserstein method. Uses optimal-transport (Wasserstein) distances between the quantile functions of the histogram distributions.

For the mean, BG, BD, and B return the same value because they share the same first-order moment definition; only L2W uses a different (quantile-based) mean. For variance, covariance, and correlation, all four methods generally produce different results.

For hist_cor, the BG, BD, and B correlations all use the Bertrand-Goupil standard deviation S(Y) in the denominator, following Irpino and Verde (2015, Eqs. 30–32). Only the L2W method uses its own Wasserstein-based standard deviation in the denominator.

Value

A numeric value or vector for hist_mean and hist_var; a single numeric value for hist_cov and hist_cor.

Author(s)

Po-Wei Chen, Han-Ming Wu

See Also

int_mean int_var int_cov int_cor

Examples

library(HistDAWass)
x <- HistDAWass::BLOOD
hist_mean(x, var_name = "Cholesterol", method = "BG")
hist_mean(x, var_name = "Cholesterol", method = "BD")
hist_var(x, var_name = "Cholesterol", method = "BG")
hist_var(x, var_name = "Cholesterol", method = "BD")
hist_cov(x, var_name1 = "Cholesterol", var_name2 = "Hemoglobin", method = "BG")
hist_cor(x, var_name1 = "Cholesterol", var_name2 = "Hemoglobin", method = "BG")

dataSDA documentation built on June 12, 2026, 9:06 a.m.