copstressMin: Fitting a COPS-C Model (COPS Variant 1).

copstressMinR Documentation

Fitting a COPS-C Model (COPS Variant 1).

Description

Minimizing Copstress to obtain a clustered ratio, interval or ordinal PS configuration with given explicit power transformations theta. The function allows mix-and-match of explicit (via theta) and implicit (via type) transformations by setting the kappa, lambda, nu (or theta) and type arguments.

Usage

copstressMin(
  delta,
  kappa = 1,
  lambda = 1,
  nu = 1,
  theta = c(kappa, lambda, nu),
  type = c("ratio", "interval", "ordinal"),
  ties = "primary",
  weightmat = 1 - diag(nrow(delta)),
  ndim = 2,
  init = NULL,
  stressweight = 0.975,
  cordweight = 0.025,
  q = 1,
  minpts = ndim + 1,
  epsilon = max(10, max(delta)),
  dmax = NULL,
  rang,
  optimmethod = c("NelderMead", "Newuoa", "BFGS", "SANN", "hjk", "solnl", "solnp",
    "subplex", "snomadr", "hjk-Newuoa", "hjk-BFGS", "BFGS-hjk", "Newuoa-hjk", "cmaes",
    "direct", "direct-Newuoa", "direct-BFGS", "genoud", "gensa"),
  verbose = 0,
  scale = c("sd", "rmsq", "proc", "none"),
  normed = TRUE,
  accuracy = 1e-07,
  itmax = 10000,
  stresstype = c("stress-1", "stress"),
  principal = FALSE,
  ...
)

copsc(
  delta,
  kappa = 1,
  lambda = 1,
  nu = 1,
  theta = c(kappa, lambda, nu),
  type = c("ratio", "interval", "ordinal"),
  ties = "primary",
  weightmat = 1 - diag(nrow(delta)),
  ndim = 2,
  init = NULL,
  stressweight = 0.975,
  cordweight = 0.025,
  q = 1,
  minpts = ndim + 1,
  epsilon = max(10, max(delta)),
  dmax = NULL,
  rang,
  optimmethod = c("NelderMead", "Newuoa", "BFGS", "SANN", "hjk", "solnl", "solnp",
    "subplex", "snomadr", "hjk-Newuoa", "hjk-BFGS", "BFGS-hjk", "Newuoa-hjk", "cmaes",
    "direct", "direct-Newuoa", "direct-BFGS", "genoud", "gensa"),
  verbose = 0,
  scale = c("sd", "rmsq", "proc", "none"),
  normed = TRUE,
  accuracy = 1e-07,
  itmax = 10000,
  stresstype = c("stress-1", "stress"),
  principal = FALSE,
  ...
)

copStressMin(
  delta,
  kappa = 1,
  lambda = 1,
  nu = 1,
  theta = c(kappa, lambda, nu),
  type = c("ratio", "interval", "ordinal"),
  ties = "primary",
  weightmat = 1 - diag(nrow(delta)),
  ndim = 2,
  init = NULL,
  stressweight = 0.975,
  cordweight = 0.025,
  q = 1,
  minpts = ndim + 1,
  epsilon = max(10, max(delta)),
  dmax = NULL,
  rang,
  optimmethod = c("NelderMead", "Newuoa", "BFGS", "SANN", "hjk", "solnl", "solnp",
    "subplex", "snomadr", "hjk-Newuoa", "hjk-BFGS", "BFGS-hjk", "Newuoa-hjk", "cmaes",
    "direct", "direct-Newuoa", "direct-BFGS", "genoud", "gensa"),
  verbose = 0,
  scale = c("sd", "rmsq", "proc", "none"),
  normed = TRUE,
  accuracy = 1e-07,
  itmax = 10000,
  stresstype = c("stress-1", "stress"),
  principal = FALSE,
  ...
)

Arguments

delta

numeric matrix or dist object of a matrix of proximities

kappa

power transformation for fitted distances

lambda

power transformation for proximities (only used if type="ratio" or "interval")

nu

power transformation for weights

theta

the theta vector of powers; the first is kappa (for the fitted distances if it exists), the second lambda (for the observed proximities if it exist and type="ratio" or "interval"), the third is nu (for the weights if it exists). If less than three elements are is given as argument, it will be recycled. Defaults to 1 1 1. Will override any kappa, lambda, nu parameters if they are given and do not match.

type

what type of MDS to fit. Currently one of "ratio", "interval" or "ordinal". Default is "ratio".

ties

the handling of ties for ordinal (nonmetric) MDS. Possible are "primary" (default), "secondary" or "tertiary".

weightmat

(optional) a matrix of nonnegative weights; defaults to 1 for all off diagonals

ndim

number of dimensions of the target space

init

(optional) initial configuration

stressweight

weight to be used for the fit measure; defaults to 0.975

cordweight

weight to be used for the cordillera; defaults to 0.025

q

the norm of the cordillera; defaults to 1

minpts

the minimum points to make up a cluster in OPTICS, see [dbscan::optics()] where it is called minPts; defaults to ndim+1.

epsilon

the epsilon parameter of OPTICS, the neighbourhood that is checked, see [dbscan::optics()]; defaults to 10. Note this means we do not expect any noise objects per default. This number will rarely be exceeded if we standardize the configuration as is the default in cops. However if no standardization is applied or there is a procrustes adjustment to a configuration with variance of 10 or more on any of the axes, it can have the effect of being too small. In that case just set a much higher epsilon.

dmax

The winsorization limit of reachability distances in the OPTICS Cordillera. If supplied, it should be either a numeric value that matches max(rang) or NULL; if NULL it is found as 1.5 times (for kappa >1) or 1 times (for kappa <=1) the maximum reachbility value of the power torgerson model with the same lambda. If dmax and rang are supplied and dmax is not max(rang), a warning is given and rang takes precedence.

rang

range of the reachabilities to be considered. If missing it is found from the initial configuration by taking 0 as the lower boundary and dmax (see above) as upper boundary. See also cordillera

optimmethod

What optimizer to use? Choose one string of 'Newuoa' (from package minqa), 'NelderMead', 'hjk' (Hooke-Jeeves algorithm from dfoptim), 'solnl' (from nlcOptim), 'solnp' (from Rsolnp), 'subplex' (from subplex), 'SANN' (simulated annealing), 'BFGS', 'snomadr' (from crs), 'genoud' (from rgenoud), 'gensa' (from GenSA), 'cmaes' (from cmaes) and 'direct' (from nloptr). See the according R packages for details on these solvers. There are also combinations that proved to work well good, like 'hjk-Newuoa', 'hjk-BFGS', 'BFGS-hjk', 'Newuoa-hjk', 'direct-Newuoa' and 'direct-BFGS'. Usually everything with hjk, BFGS, Newuoa, subplex and solnl in it work rather well in an acceptable time frame (depending on the smoothness of copstress). Default is 'hjk-Newuoa'.

verbose

numeric value hat prints information on the fitting process; >2 is very verbose

scale

Scale the configuration (in MDS stress is invariant up to a scaling factor). One of "none" (so no extra scaling of the configuration but normalized to sum delta^2=1), "sd" (configuration divided by the highest standard deviation of any the columns), "proc" (procrustes adjustment to the initial fit) and "rmsq" (configuration divided by the maximum root mean square of the columns). Default is "sd" which often gives a nicer spread on the axes. Note that the scaled configuration is returned as $conf and the unscaled as $usconf, so manual calculation of the OC should be done with $conf.

normed

should the Cordillera be normed; defaults to TRUE.

accuracy

numerical accuracy, defaults to 1e-7.

itmax

maximum number of iterations. Defaults to 10000. For the two-step algorithms if itmax is exceeded by the first solver, the second algorithm is run for at least 0.1*itmax (so overall itmax may be exceeded by a factor of 1.1).

stresstype

which stress to use in the copstress. Defaults to stress-1. If anything else is set, explicitly normed stress which is (stress-1)^2 is used. Using stress-1 puts more weight on MDS fit.

principal

If ‘TRUE’, principal axis transformation is applied to the final configuration.

...

additional arguments to be passed to the optimization procedure

Details

This is an extremely flexible approach to least squares proximity scaling: It supports ratio power stress; ratio, interval and ordinal r stress and ratio, interval and ordinal MDS with or without a COPS penalty. Famous special cases of these models that can be fitted are multiscale MDS if kappa->0 and delta=log(delta), Alscal MDS (sstress) with lambda=kappa=2, sammon type mapping with weightmat=delta and nu=-1, elastic scaling with weightmat=delta and nu=-2. Due to mix-and-match this function also allows to fit models that have not yet been published, such as for example an "elastic scaling ordinal s-stress with cops penalty".

If one wants to fit these models without the cops penalty, we recommend to use powerStressMin (for ratio MDS with any power transformation for weights, dissimilarities and distances) or rStressMin (for interval and ordinal MDS with power transformations for distances and weights) as these use majorization.

Some optimizers (including the default hjk-Newuoa) will print a warning if itmax is (too) small or if there was no convergence. Consider increasing itmax then.

For some solvers theresometimes may be an error [NA/NaN/Inf in foreign function call (arg 3)] stemming from smacof::transform(). This happens when the algorithm places two object at exactly the same place so their fitted distance is 0. This is good from an OPTICS Cordillera point of view (as it is more clustered) which is why some solvers like to pick that up, but it can lead to an issue in the optimal scaling in smacof. This can usually be mitigated when specifying the model by either using less cordweight, less itmax, less accuracy or combining the two offending objects into one (so include them as a combined row in the distance matrix).

We might eventually switch to newuoa in nloptr.

Value

A copsc object (inheriting from smacofP). A list with the components

  • delta: the original untransformed dissimilarities

  • tdelta: the explicitly transformed dissimilarities

  • dhat: the explicitly transformed dissimilarities (dhats), optimally scaled and normalized (which are approximated by the fit)

  • confdist: Configuration distances, the fitted distances

  • conf: the configuration (normed) and scaled as specified in scale.

  • usconf: the unscaled configuration (normed to sum delta^2=1). Scaling applied to usconf gives conf.

  • parameters, par, pars : the theta vector of powers tranformations (kappa, lambda, nu)

  • niter: number of iterations of the optimizer.

  • stress: the square root of explicitly normalized stress (calculated for confo).

  • spp: stress per point

  • ndim: number of dimensions

  • model: Fitted model name

  • call: the call

  • nobj: the number of objects

  • type, loss, losstype: stresstype

  • stress.m: The stress used for copstress. If stresstype="stress-1" this is like $stress else it is stress^2

  • copstress: the copstress loss value

  • resmat: the matrix of residuals

  • weightmat: the matrix of untransformed weights

  • tweightmat: the transformed weighting matrix (here weightmat^nu)

  • OC: the (normed) OPTICS Cordillera object (calculated for scaled conf)

  • OCv: the (normed) OPTICS Cordillera value alone (calculated for scaled conf)

  • optim: the object returned from the optimization procedure

  • stressweight, cordweight: the weights of the stress and OC respectively (v_1 and v_2)

  • optimmethod: The solver used

  • type: the type of MDS fitted

Examples

dis<-as.matrix(smacof::kinshipdelta)

set.seed(1)
## Copstress with equal weight to stress and cordillera 
res1<-copstressMin(dis,stressweight=0.5,cordweight=0.5,
                  itmax=100) #use higher itmax about 10000 
res1
summary(res1)
plot(res1)  #super clustered 

##Alias name 
res1<-copsc(dis,stressweight=0.5,
                  cordweight=0.5,itmax=100) 


## Elastic scaling ordinal s-stress with cops penalty
res1<-copsc(dis,type="ordinal",kappa=2,nu=-2,weightmat=dis,
            stressweight=0.5, cordweight=0.5,itmax=100)



cops documentation built on Feb. 2, 2024, 3:02 p.m.