cosa2: COSA 2 Dissimilarities

Description Usage Arguments Value Note Author(s) References See Also Examples

View source: R/cosa2.R

Description

This function outputs a dissimilarity matrix of dissimilarities between the rows a data matrix computed by the COSA 2 algorithm. It is assumed that users are familiar with the COSA paper(s) or the vignette that comes with the rCOSA package, see references below.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
cosa2(
  X,
  lX = NULL,
  targ = NULL,
  targ2 = NULL,
  knear = sqrt(nrow(X)),
  xmiss = NULL,
  lambda = 0.2,
  qntls = c(0.05, 0.95),
  wtcomb = "each",
  relax = 0.1,
  conv = 1e-05,
  niter = 1,
  noit = 100,
  stand = 1,
  pwr = 1
)

Arguments

X

input data.frame, or matrix object in numeric mode. COSA calculates the dissimilarities for the rows in X.

lX

either an integer or a vector with as much elements as columns in X. This argument is ignored if X is a data.frame

. Each element lX[k] represents a label for X[,k] as indicated below:
0 => ignore X[, k],

1 => X[, k] is a numeric attribute, no target value,
2 => X[, k] is a numeric attribute, single target value,
3 => X[, k] is a numeric attribute, dual target values,
4 => X[, k] is a categorical attribute, no target value,
5 => X[, k] is a categorical attribute, single target value,
6 => X[, k] is a categorical attribute, dual target values.

If only an integer scalar is give for lX than it is assumed that the integer value is the label for all columns in X.
Target values are specified in the targ, targ2 or qntls arguments (see below)

targ

target values for computing targeted dissimilarities. The targ[k] is the target value for X[,k]. The value is ignored if lX[k] = 0, 1, or 4.
If lX[k]=2 or 5, then targ[k] contains the single target value.
If lX[k]=3 or 6, then targ[k] contains one of the two target values.

Special values:

targ = "low" => use the qntls[1] quantile of X[,k] as a single target value.
targ = "high" => use the qntls[2] quantile of X[,k] as a single target value.
targ = "high/low" => use the qntls[1] and qntls[2] quantiles of X[,k] as dual target values (see qntls argument described below).

targ2

the second target value when computing dual targeted dissimilarities. The targ2[j] = second target value for X[,k]. The value is ignored if lX[k] = 0,1,2,4,5, or when targ = "low","high", or "high/low".

knear

size of number of objects in the near-neighborhoods which is used to calculate attribute weights for each object. By default knear = sqrt(nrow(X)), which, inside the function, is truncated to an integer.

xmiss

numeric value for missings in the data, by default it is set to NULL. In case you have coded missings in the data differently by using a number then indicate that number in this argument. The coded value for missings must be larger than any data value on any input variable.

lambda

multiple attribute clustering incentive parameter. By default, the regularization parameter lambda is set to equal 0.2.

qntls

quantiles used for calculating high and/or low targets (ignored if lX[k] = 0,1,4 or targ is NOT set to "low", "high", or "high/low").

qntls[1] = data quantile on each attribute used for low target
qntls[2] = data quantile on each attribute used for high target

NOTE: The lX vector and/or the value of xmiss can be specified as attributes to the input data matrix before invoking cosa
attr(X,"lX")<-lX
attr(X,"xmiss")<-xmiss

When present the attribute values will be used whenever the corresponding arguments are missing. Specifying an argument value overrides the corresponding attribute values if present. If the attribute and its corresponding argument are both missing, the default values above are used.

wtcomb

by default is set to 'each', meaning that the maximum of the weights of object X[i,] and object X[j,] is choosen for each attribute X[,k]. One can also choose for the option 'whole', meaning that a whole weight vector is choosen of either object i or j, depending on which of the two vectors give the maximum weighted dissimilarity between object i and j.

relax

the number with which the homotopy parameter eta should be incremented at each outer iteration (for more info see noit)

conv

the convergence treshold that can reduce the maximum number of inner iterations.

niter

the maximum number of inner iterations to stabilize the weights and dissimilarties given the homotopy parameter

noit

the number of outer iterations ( make sure relax > 0 ) to transfer from the inverse exponential distance more closely to the sum of the weighted dissimilarities, obtained when a large enough number of outer iterations is chosen. Starting with the initial value of the homotopy parameter (equal to lambda) and using increments determined by relax one can calculate at what value the homotopy parameter will end.

stand

equals 0 for no standardisation, 1 for robust standard scaling of the data, and 2 for standard scaling of the data. The defeault equals 1.

pwr

1 for L_1 norm attribute distances, and 2 for L_2 norm attribute distances. The default equals 1. This argument is ignored for categorical/ordinal attributes.

Value

This function outputs a list that has as the first element the call, as the second element the dissimilarity matrix out$D of class dist, and by default also the weights of class matrix, and, if crit is set to TRUE, the values of the tuning parameters are also given in the output.

Note

The output dissimilarity matrix can be used as input to most dissimilarity based clustering procedures in R in the same manner as the output of the procedure dist.

Author(s)

Maarten M.D. Kampert, Jacqueline J. Meulman, and Jerome H. Friedman.
Correspondence: mkampert@math.leidenuniv.nl

References

Friedman, J. H. and Meulman, J. J. (2004). Clustering objects on subsets of attributes.
URL: http://www-stat.stanford.edu/~jhf/ftp/cosa.pdf Kampert, M.M., Meulman J.J., Friedman J.H. (2017). rCOSA: A Software Package for Clustering Objects on Subsets of Attributes
URL: https://link.springer.com/article/10.1007/s00357-017-9240-z

See Also

hierclust, getclust, and smacof.

Examples

1
2
3
4
5
6
data(ApoE3)
rslt_dflt_cosa2 <- cosa2(X = ApoE3)
# The weight of object 1 on attribute 1 in the NxP weight matrix W
rslt_dflt_cosa2$W[1, 1]
# COSA procedure for dual targeted dissimilarities:
rslt_duotrg_cosa <- cosa2(X = iris[, 1:4], lX = rep(3, ncol(iris[, 1:4])), targ = 'high/low')

mkampert/rCOSA documentation built on Dec. 23, 2019, 8:21 p.m.