Object of class `REBMIX`

.

Objects can be created by calls of the form `new("REBMIX", ...)`

.

`Dataset`

:-
a list of data frames of size

*n \times d*containing*d*-dimensional datasets. Each of the*d*columns represents one random variable. Number of observations*n*equals the number of rows in the datasets. `Preprocessing`

:-
a character vector giving the preprocessing types. One of

`"histogram"`

,`"Parzen window"`

or`"k-nearest neighbour"`

. `cmax`

:-
maximum number of components

*c_{\mathrm{max}} > 0*. The default value is`15`

. `Criterion`

:-
a character vector giving the information criterion types. One of default Akaike

`"AIC"`

,`"AIC3"`

,`"AIC4"`

or`"AICc"`

, Bayesian`"BIC"`

, consistent Akaike`"CAIC"`

, Hannan-Quinn`"HQC"`

, minimum description length`"MDL2"`

or`"MDL5"`

, approximate weight of evidence`"AWE"`

, classification likelihood`"CLC"`

, integrated classification likelihood`"ICL"`

or`"ICL-BIC"`

, partition coefficient`"PC"`

, total of positive relative deviations`"D"`

or sum of squares error`"SSE"`

. `Variables`

:-
a character vector of length

*d*containing types of variables. One of`"continuous"`

or`"discrete"`

. `pdf`

:-
a character vector of length

*d*containing continuous or discrete parametric family types. One of`"normal"`

,`"lognormal"`

,`"Weibull"`

,`"gamma"`

,`"binomial"`

,`"Poisson"`

or`"Dirac"`

. `theta1`

:-
a vector of length

*d*containing initial component parameters. One of*n_{il} = \textrm{Number of categories} - 1*for`"binomial"`

distribution or`"NA"`

otherwise. `theta2`

:-
a vector of length

*d*containing initial component parameters. Currently not used. `K`

:-
a vector or a list of vectors containing numbers of bins

*v*for the histogram and the Parzen window or numbers of nearest neighbours*k*for the*k*-nearest neighbour. There is no genuine rule to identify*v*or*k*. Consequently, the REBMIX algorithm identifies them from the set`K`

of input values by minimizing the information criterion. The Sturges rule*v = 1 + \mathrm{log_{2}}(n)*,*\mathrm{Log}_{10}*rule*v = 10 \mathrm{log_{10}}(n)*or RootN rule*v = 2 √{n}*can be applied to estimate the limiting numbers of bins or the rule of thumb*k = √{n}*to guess the intermediate number of nearest neighbours. If, e.g.,`K = c(10, 20, 40, 60)`

and minimum`IC`

coincides, e.g.,`40`

, brackets are set to`20`

and`60`

and the golden section is applied to refine the minimum search. See also`kseq`

for sequence of bins or nearest neighbours generation. `y0`

:-
a vector of length

*d*containing origins. The default value is`numeric()`

. `ymin`

:-
a vector of length

*d*containing minimum observations. The default value is`numeric()`

. `ymax`

:-
a vector of length

*d*containing maximum observations. The default value is`numeric()`

. `ar`

:-
acceleration rate

*0 < a_{\mathrm{r}} ≤q 1*. The default value is`0.1`

and in most cases does not have to be altered. `Restraints`

:-
a character giving the restraints type. One of

`"rigid"`

or default`"loose"`

. The rigid restraints are obsolete and applicable for well separated components only. `w`

:-
a list of vectors of length

*c*containing component weights*w_{l}*summing to 1. `Theta`

:-
a list of lists each containing

*c*parametric family types`pdfi`

. One of`"normal"`

,`"lognormal"`

,`"Weibull"`

,`"gamma"`

,`"binomial"`

,`"Poisson"`

or`"Dirac"`

. Component parameters`theta1.i`

follow the parametric family types. One of*μ_{il}*for normal and lognormal distributions and*θ_{il}*for Weibull, gamma, binomial, Poisson and Dirac distributions. Component parameters`theta2.i`

follow`theta1.i`

. One of*σ_{il}*for normal and lognormal distributions,*β_{il}*for Weibull and gamma distributions and*p_{il}*for binomial distribution. `summary`

:-
a data frame with additional information about dataset, preprocessing,

*c_{\mathrm{max}}*, information criterion type,*a_{\mathrm{r}}*, restraints type, optimal*c*, optimal*v*or*k*,*K*,*y_{i0}*,*y_{i\mathrm{min}}*,*y_{i\mathrm{max}}*, optimal*h_{i}*, information criterion*\mathrm{IC}*, log likelihood*\mathrm{log}\, L*and degrees of freedom*M*. `pos`

:-
position in the

`summary`

data frame at which log likelihood*\mathrm{log}\, L*attains its maximum. `opt.c`

:-
a list of vectors containing numbers of components for optimal

*v*for the histogram and the Parzen window or for optimal number of nearest neighbours*k*for the*k*-nearest neighbour. `opt.IC`

:-
a list of vectors containing information criteria for optimal

*v*for the histogram and the Parzen window or for optimal number of nearest neighbours*k*for the*k*-nearest neighbour. `opt.logL`

:-
a list of vectors containing log likelihoods for optimal

*v*for the histogram and the Parzen window or for optimal number of nearest neighbours*k*for the*k*-nearest neighbour. `opt.D`

:-
a list of vectors containing totals of positive relative deviations for optimal

*v*for the histogram and the Parzen window or for optimal number of nearest neighbours*k*for the*k*-nearest neighbour. `all.K`

:-
a list of vectors containing all processed numbers of bins

*v*for the histogram and the Parzen window or all processed numbers of nearest neighbours*k*for the*k*-nearest neighbour. `all.IC`

:-
a list of vectors containing information criteria for all processed numbers of bins

*v*for the histogram and the Parzen window or for all processed numbers of nearest neighbours*k*for the*k*-nearest neighbour.

Marko Nagode

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.