optbins: Optimal Numbers of Bins Calculation
In rebmix: Finite Mixture Modeling, Clustering & Classification

optbins-methods

R Documentation

Optimal Numbers of Bins Calculation

Description

Returns the matrix of size n_{\mathrm{D}} \times d containing optimal numbers of bins v_{1}, \ldots, v_{d} for all processed datasets.

Usage

## S4 method for signature 'list'
optbins(Dataset = list(), Rule = "Knuth equal",
        ymin = numeric(), ymax = numeric(), kmin = numeric(),
        kmax = numeric(), ...)
## ... and for other signatures

Arguments

`Dataset`	a list of length `n_{\mathrm{D}}` of data frames of size `n \times d` containing d-dimensional datasets. Each of the `d` columns represents one random variable. Numbers of observations `n` equal the number of rows in the datasets.
`Rule`	a character giving the histogram binning rule. One of `"Sturges"`, `"Log10"`, `"RootN"`, default `"Knuth equal"` or `"Knuth unequal"`.
`ymin`	a vector of length `d` containing minimum observations. The default value is `numeric()`.
`ymax`	a vector of length `d` containing maximum observations. The default value is `numeric()`.
`kmin`	lower limit of the number of bins. The default value is `numeric()`.
`kmax`	upper limit of the number of bins. The default value is `numeric()`.
`...`	currently not used.

Methods

signature(x = "list"): a list of data frames.

Author(s)

Branislav Panic, Marko Nagode

References

K. K. Knuth. Optimal data-based binning for histograms and histogram-based probability density models. Digital Signal Processing, 95:102581, 2019. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.dsp.2019.102581")}.

B. Panic, J. Klemenc, M. Nagode. Improved initialization of the EM algorithm for mixture model parameter estimation. Mathematics, 8(3):373, 2020. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.3390/math8030373")}.

Examples

# Generate multivariate normal datasets.

n <- c(750, 1000)

Theta <- new("RNGMVNORM.Theta", c = 2, d = 2)

a.theta1(Theta, 1) <- c(8, 6)
a.theta1(Theta, 2) <- c(6, 8)
a.theta2(Theta, 1) <- c(8, 2, 2, 4)
a.theta2(Theta, 2) <- c(2, 1, 1, 4)

sim2d <- RNGMIX(model = "RNGMVNORM", 
  Dataset.name = paste("sim2d_", 1:5, sep = ""),
  rseed = -1,
  n = n,
  Theta = a.Theta(Theta))

# Calculate optimal numbers of bins.

opt.k <- optbins(Dataset = sim2d@Dataset,
  Rule = "Knuth equal",
  ymin = sim2d@ymin,
  ymax = sim2d@ymax,
  kmin = 2, 
  kmax = 20)

opt.k

# Create object of class EM.Control.

EM <- new("EM.Control", strategy = "exhaustive", variant = "EM",
  acceleration = "fixed", acceleration.multiplier = 1.0, tolerance = 1.0E-4,
  maximum.iterations = 1000)

# Estimate number of components, component weights and component parameters.

sim2dest <- REBMIX(model = "REBMVNORM", 
  Dataset = a.Dataset(sim2d),
  Preprocessing = "h",
  cmax = 10,
  ymin = a.ymin(sim2d),
  ymax = a.ymax(sim2d),
  K = opt.k,
  Criterion = "BIC",
  EMcontrol = EM)

# Plot finite mixture.

plot(sim2dest, pos = 3, nrow = 4, what = c("pdf", "marginal pdf", "IC"))

# Estimate number of components, component weights and component 
# parameters for well known Iris dataset.

Dataset <- list(iris[, c(1:4)])

# Calculate optimal numbers of bins using non-equal number of bins in each dimension.

opt.k <- optbins(Dataset = Dataset,
  Rule = "Knuth unequal",
  kmin = 2, 
  kmax = 20)

opt.k

# Estimate number of components, component weights and component parameters.

irisest <- REBMIX(model = "REBMVNORM", 
  Dataset = Dataset,
  Preprocessing = "h",
  cmax = 10,
  K = opt.k,
  Criterion = "BIC",
  EMcontrol = EM)
  
irisest

rebmix documentation built on Sept. 11, 2024, 6:30 p.m.