tclustregIC: Computes 'tclustreg' for different number of groups 'k' and...
In fsdaR: Robust Data Analysis Through Monitoring and Dynamic Visualization

tclustregIC

R Documentation

Computes `tclustreg` for different number of groups `k` and restriction factors `c`.

Description

(the last two letters stand for 'Information Criterion') computes the values of BIC (MIXMIX), ICL (MIXCLA) or CLA (CLACLA), for different values of k (number of groups) and different values of c (restriction factor for the variances of the residuals), for a prespecified level of trimming. In order to minimize randomness, given k, the same subsets are used for each value of c.

Usage

tclustregIC(
  y,
  x,
  alphaLik,
  alphaX,
  intercept = TRUE,
  plot = FALSE,
  nsamp,
  refsteps = 10,
  reftol = 1e-13,
  equalweights = FALSE,
  wtrim = 0,
  we,
  msg = TRUE,
  RandNumbForNini,
  trace = FALSE,
  ...
)

Arguments

`y`	Response variable. A vector with `n` elements that contains the response variable.
`x`	An n x p data matrix (n observations and p variables). Rows of x represent observations, and columns represent variables. Missing values (NA's) and infinite values (Inf's) are allowed, since observations (rows) with missing or infinite values will automatically be excluded from the computations.
`alphaLik`	Trimming level, a scalar between 0 and 0.5 or an integer specifying the number of observations which have to be trimmed. If `alphaLik=0`, there is no trimming. More in detail, if `0 < alphaLik < 1` clustering is based on `h = floor(n * (1 - alphaLik))` observations. If `alphaLik` is an integer greater than 1 clustering is based on `h = n - floor(alphaLik)`. More in detail, likelihood contributions are sorted and the units associated with the smallest `n - h` contributions are trimmed.
`alphaX`	Second-level trimming or constrained weighted model for `x`.
`intercept`	wheather to use constant term (default is `intercept=TRUE`
`plot`	If `plot=FALSE` (default) or `plot=0` no plot is produced. If `plot=TRUE` a plot with the final allocation is shown (using the spmplot function). If `X` is 2-dimensional, the lines associated to the groups are shown too.
`nsamp`	If a scalar, it contains the number of subsamples which will be extracted. If `nsamp = 0` all subsets will be extracted. Remark - if the number of all possible subset is greater than 300 the default is to extract all subsets, otherwise just 300. If `nsamp` is a matrix it contains in the rows the indexes of the subsets which have to be extracted. `nsamp` in this case can be conveniently generated by function `subsets()`. `nsamp` must have `k * p` columns. The first `p` columns are used to estimate the regression coefficient of group 1, ..., the last `p` columns are used to estimate the regression coefficient of group `k`.
`refsteps`	Number of refining iterations in each subsample. Default is `refsteps=10`. `refsteps = 0` means "raw-subsampling" without iterations.
`reftol`	Tolerance of the refining steps. The default value is 1e-14
`equalweights`	A logical specifying wheather cluster weights in the concentration and assignment steps shall be considered. If `equalweights=TRUE` we are (ideally) assuming equally sized groups, else if `equalweights = false` (default) we allow for different group weights. Please, check in the given references which functions are maximized in both cases.
`wtrim`	How to apply the weights on the observations - a flag taking values in c(0, 1, 2, 3, 4). If `wtrim==0` (no weights), the algorithm reduces to the standard `tclustreg` algorithm. If `wtrim==1`, trimming is done by weighting the observations using values specified in vector `we`. In this case, vector `we` must be supplied by the user. If `wtrim==2`, trimming is again done by weighting the observations using values specified in vector `we`. In this case, vector `we` is computed from the data as a function of the density estimate pdfe. Specifically, the weight of each observation is the probability of retaining the observation, computed as `pretain_{ig} = 1-pdfe_{ig}/max_{ig}(pdfe_{ig})` If `wtrim==3`, trimming is again done by weighting the observations using values specified in vector `we`. In this case, each element wei of vector `we` is a Bernoulli random variable with probability of success `pdfe_{ig}`. In the clustering framework this is done under the constraint that no group is empty. If `wtrim==4`, trimming is done with the tandem approach of Cerioli and Perrotta (2014).
`we`	Weights. A vector of size n-by-1 containing application-specific weights Default is a vector of ones.
`msg`	Controls whether to display or not messages on the screen If `msg==TRUE` (default) messages are displayed on the screen. If `msg=2`, detailed messages are displayed, for example the information at iteration level.
`RandNumbForNini`	pre-extracted random numbers to initialize proportions. Matrix of size k-by-nrow(nsamp) containing the random numbers which are used to initialize the proportions of the groups. This option is effective only if `nsamp` is a matrix which contains pre-extracted subsamples. The purpose of this option is to enable the user to replicate the results when the function `tclustreg()` is called using a parfor instruction (as it happens for example in routine IC, where `tclustreg()` is called through a parfor for different values of the restriction factor). The default is that `RandNumbForNini` is empty - then uniform random numbers are used.
`trace`	Whether to print intermediate results. Default is `trace=FALSE`.
`...`	potential further arguments passed to lower level functions.

Value

An S3 object of class tclustreg.object

Author(s)

FSDA team, valentin.todorov@chello.at

References

Torti F., Perrotta D., Riani, M. and Cerioli A. (2019). Assessing Robust Methodologies for Clustering Linear Regression Data, Advances in Data Analysis and Classification, Vol. 13, pp 227-257.

fsdaR documentation built on May 29, 2024, 5:35 a.m.