cosa2: COSA 2 Dissimilarities
In mkampert/rCOSA: Clustering Objects on Subsets of Attributes (COSA)

Description Usage Arguments Value Note Author(s) References See Also Examples

This function outputs a dissimilarity matrix of dissimilarities between the rows a data matrix computed by the COSA 2 algorithm. It is assumed that users are familiar with the COSA paper(s) or the vignette that comes with the rCOSA package, see references below.

cosa2(
  X,
  lX = NULL,
  targ = NULL,
  targ2 = NULL,
  knear = sqrt(nrow(X)),
  xmiss = NULL,
  lambda = 0.2,
  qntls = c(0.05, 0.95),
  wtcomb = "each",
  relax = 0.1,
  conv = 1e-05,
  niter = 1,
  noit = 100,
  stand = 1,
  pwr = 1
)

`X`	input data.frame, or matrix object in numeric mode. COSA calculates the dissimilarities for the rows in X.
`lX`	either an integer or a vector with as much elements as columns in `X`. This argument is ignored if `X` is a data.frame . Each element `lX[k]` represents a label for `X[,k]` as indicated below: 0 => ignore `X[, k]`, `1` => `X[, k]` is a numeric attribute, no target value, `2` => `X[, k]` is a numeric attribute, single target value, `3` => `X[, k]` is a numeric attribute, dual target values, `4` => `X[, k]` is a categorical attribute, no target value, `5` => `X[, k]` is a categorical attribute, single target value, `6` => `X[, k]` is a categorical attribute, dual target values. If only an integer scalar is give for `lX` than it is assumed that the integer value is the label for all columns in `X`. Target values are specified in the targ, targ2 or qntls arguments (see below)
`targ`	target values for computing targeted dissimilarities. The `targ[k]` is the target value for `X[,k]`. The value is ignored if `lX[k] = 0`, `1`, or `4`. If `lX[k]=2` or `5`, then `targ[k]` contains the single target value. If `lX[k]=3` or `6`, then `targ[k]` contains one of the two target values. Special values: `targ = "low"` => use the `qntls[1]` quantile of `X[,k]` as a single target value. `targ = "high"` => use the `qntls[2]` quantile of `X[,k]` as a single target value. `targ = "high/low"` => use the `qntls[1]` and `qntls[2]` quantiles of `X[,k]` as dual target values (see `qntls` argument described below).
`targ2`	the second target value when computing dual targeted dissimilarities. The `targ2[j]` = second target value for `X[,k]`. The value is ignored if `lX[k] = 0`,`1`,`2`,`4`,`5`, or when `targ = "low"`,`"high"`, or `"high/low"`.
`knear`	size of number of objects in the near-neighborhoods which is used to calculate attribute weights for each object. By default `knear = sqrt(nrow(X))`, which, inside the function, is truncated to an integer.
`xmiss`	numeric value for missings in the data, by default it is set to `NULL`. In case you have coded missings in the data differently by using a number then indicate that number in this argument. The coded value for missings must be larger than any data value on any input variable.
`lambda`	multiple attribute clustering incentive parameter. By default, the regularization parameter lambda is set to equal `0.2`.
`qntls`	quantiles used for calculating high and/or low targets (ignored if `lX[k] = 0`,`1`,`4` or `targ` is NOT set to `"low"`, `"high"`, or `"high/low"`). `qntls[1]` = data quantile on each attribute used for low target `qntls[2]` = data quantile on each attribute used for high target NOTE: The `lX` vector and/or the value of `xmiss` can be specified as attributes to the input data matrix before invoking cosa `attr(X,"lX")<-lX` `attr(X,"xmiss")<-xmiss` When present the attribute values will be used whenever the corresponding arguments are missing. Specifying an argument value overrides the corresponding attribute values if present. If the attribute and its corresponding argument are both missing, the default values above are used.
`wtcomb`	by default is set to 'each', meaning that the maximum of the weights of object `X[i,]` and object `X[j,]` is choosen for each attribute `X[,k]`. One can also choose for the option 'whole', meaning that a whole weight vector is choosen of either object `i` or `j`, depending on which of the two vectors give the maximum weighted dissimilarity between object `i` and `j`.
`relax`	the number with which the homotopy parameter eta should be incremented at each outer iteration (for more info see noit)
`conv`	the convergence treshold that can reduce the maximum number of inner iterations.
`niter`	the maximum number of inner iterations to stabilize the weights and dissimilarties given the homotopy parameter
`noit`	the number of outer iterations ( make sure relax > 0 ) to transfer from the inverse exponential distance more closely to the sum of the weighted dissimilarities, obtained when a large enough number of outer iterations is chosen. Starting with the initial value of the homotopy parameter (equal to lambda) and using increments determined by relax one can calculate at what value the homotopy parameter will end.
`stand`	equals `0` for no standardisation, `1` for robust standard scaling of the data, and `2` for standard scaling of the data. The defeault equals `1`.
`pwr`	`1` for L_1 norm attribute distances, and `2` for L_2 norm attribute distances. The default equals `1`. This argument is ignored for categorical/ordinal attributes.

This function outputs a list that has as the first element the call, as the second element the dissimilarity matrix out$D of class dist, and by default also the weights of class matrix, and, if crit is set to TRUE, the values of the tuning parameters are also given in the output.

The output dissimilarity matrix can be used as input to most dissimilarity based clustering procedures in R in the same manner as the output of the procedure dist.

Maarten M.D. Kampert, Jacqueline J. Meulman, and Jerome H. Friedman.
Correspondence: mkampert@math.leidenuniv.nl

Friedman, J. H. and Meulman, J. J. (2004). Clustering objects on subsets of attributes.
URL: http://www-stat.stanford.edu/~jhf/ftp/cosa.pdf Kampert, M.M., Meulman J.J., Friedman J.H. (2017). rCOSA: A Software Package for Clustering Objects on Subsets of Attributes
URL: https://link.springer.com/article/10.1007/s00357-017-9240-z

hierclust, getclust, and smacof.

data(ApoE3)
rslt_dflt_cosa2 <- cosa2(X = ApoE3)
# The weight of object 1 on attribute 1 in the NxP weight matrix W
rslt_dflt_cosa2$W[1, 1]
# COSA procedure for dual targeted dissimilarities:
rslt_duotrg_cosa <- cosa2(X = iris[, 1:4], lX = rep(3, ncol(iris[, 1:4])), targ = 'high/low')