optbins: Optimal Numbers of Bins Calculation

optbins-methodsR Documentation

Optimal Numbers of Bins Calculation

Description

Returns the matrix of size n_{\mathrm{D}} \times d containing optimal numbers of bins v_{1}, \ldots, v_{d} for all processed datasets.

Usage

## S4 method for signature 'list'
optbins(Dataset = list(), Rule = "Knuth equal",
        ymin = numeric(), ymax = numeric(), kmin = numeric(),
        kmax = numeric(), ...)
## ... and for other signatures

Arguments

Dataset

a list of length n_{\mathrm{D}} of data frames of size n \times d containing d-dimensional datasets. Each of the d columns represents one random variable. Numbers of observations n equal the number of rows in the datasets.

Rule

a character giving the histogram binning rule. One of "Sturges", "Log10", "RootN", default "Knuth equal" or "Knuth unequal".

ymin

a vector of length d containing minimum observations. The default value is numeric().

ymax

a vector of length d containing maximum observations. The default value is numeric().

kmin

lower limit of the number of bins. The default value is numeric().

kmax

upper limit of the number of bins. The default value is numeric().

...

currently not used.

Methods

signature(x = "list")

a list of data frames.

Author(s)

Branislav Panic, Marko Nagode

References

K. K. Knuth. Optimal data-based binning for histograms and histogram-based probability density models. Digital Signal Processing, 95:102581, 2019. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.dsp.2019.102581")}.

B. Panic, J. Klemenc, M. Nagode. Improved initialization of the EM algorithm for mixture model parameter estimation. Mathematics, 8(3):373, 2020. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.3390/math8030373")}.

Examples

# Generate multivariate normal datasets.

n <- c(750, 1000)

Theta <- new("RNGMVNORM.Theta", c = 2, d = 2)

a.theta1(Theta, 1) <- c(8, 6)
a.theta1(Theta, 2) <- c(6, 8)
a.theta2(Theta, 1) <- c(8, 2, 2, 4)
a.theta2(Theta, 2) <- c(2, 1, 1, 4)

sim2d <- RNGMIX(model = "RNGMVNORM", 
  Dataset.name = paste("sim2d_", 1:5, sep = ""),
  rseed = -1,
  n = n,
  Theta = a.Theta(Theta))

# Calculate optimal numbers of bins.

opt.k <- optbins(Dataset = sim2d@Dataset,
  Rule = "Knuth equal",
  ymin = sim2d@ymin,
  ymax = sim2d@ymax,
  kmin = 2, 
  kmax = 20)

opt.k

# Create object of class EM.Control.

EM <- new("EM.Control", strategy = "exhaustive", variant = "EM",
  acceleration = "fixed", acceleration.multiplier = 1.0, tolerance = 1.0E-4,
  maximum.iterations = 1000)

# Estimate number of components, component weights and component parameters.

sim2dest <- REBMIX(model = "REBMVNORM", 
  Dataset = a.Dataset(sim2d),
  Preprocessing = "h",
  cmax = 10,
  ymin = a.ymin(sim2d),
  ymax = a.ymax(sim2d),
  K = opt.k,
  Criterion = "BIC",
  EMcontrol = EM)

# Plot finite mixture.

plot(sim2dest, pos = 3, nrow = 4, what = c("pdf", "marginal pdf", "IC"))

# Estimate number of components, component weights and component 
# parameters for well known Iris dataset.

Dataset <- list(iris[, c(1:4)])

# Calculate optimal numbers of bins using non-equal number of bins in each dimension.

opt.k <- optbins(Dataset = Dataset,
  Rule = "Knuth unequal",
  kmin = 2, 
  kmax = 20)

opt.k

# Estimate number of components, component weights and component parameters.

irisest <- REBMIX(model = "REBMVNORM", 
  Dataset = Dataset,
  Preprocessing = "h",
  cmax = 10,
  K = opt.k,
  Criterion = "BIC",
  EMcontrol = EM)
  
irisest

rebmix documentation built on Feb. 9, 2024, 3:01 p.m.