ssMRCD: Spatially Smoothed MRCD Estimator

View source: R/ssMRCD.R

ssMRCDR Documentation

Spatially Smoothed MRCD Estimator

Description

The ssMRCD function calculates the spatially smoothed MRCD estimator from Puchhammer and Filzmoser (2023).

Usage

ssMRCD(
  X,
  groups = NULL,
  weights,
  lambda = 0.5,
  tuning = list(method = NULL, plot = FALSE, k = 10, repetitions = 5, cont = 0.05),
  TM = NULL,
  alpha = 0.75,
  maxcond = 50,
  maxcsteps = 200,
  n_initialhsets = NULL
)

Arguments

X

a list of matrices containing the observations per neighborhood sorted, or matrix or data frame containing data. If matrix or data.frame, group vector has to be given.

groups

vector of neighborhood assignments

weights

weighting matrix, symmetrical, rows sum up to one and diagonals need to be zero (see also geo_weights or time_weights .

lambda

numeric between 0 and 1.

tuning

default NULL. List of tuning specifications if lambda contains more than one value. See Details.

TM

target matrix (optional), default value is the covMcd from robustbase.

alpha

numeric, proportion of values included, between 0.5 and 1.

maxcond

optional, maximal condition number used for rho-estimation.

maxcsteps

maximal number of c-steps before algorithm stops.

n_initialhsets

number of initial h-sets, default is 6 times number of neighborhoods.

Details

The necessary list elements for the parameter tuning depend on the method specified. For both tuning approaches (residual-based or contamination-based) the element method needs to be specified to "residuals" and "local contamination", respectively. The boolean list element plot is available for both methods and specifies if a plot should be constructed after tuning.

For tuning$method = "local contamination", additional information needs to be passed. The number of nearest neighbors tuning$k used for the local outlier detection method is 10 by default. The percentage of exchanged/contaminated observations is specified by tuning$cont and is set to 0.05 by default. Also the coordinates must be given in tuning$coords and the number of repetitions for the switching procedure, tuning$repetitions.

For tuning$method = "local contamination" no optimal value is returned but the choice has to be made by the user. Be aware that the FNR does not take into account that there are also natural outliers included in the data set that might or might not be found. The best parameter selection depends on the goal of the analysis and whether false negatives should be avoided or whether the number of flagged outliers should be low.

Value

The output depends on whether parameters are tuned. If there is no tuning the output is an object of class "ssMRCD" containing the following elements:

MRCDcov List of ssMRCD-covariance matrices sorted by neighborhood.
MRCDicov List of inverse ssMRCD-covariance matrices sorted by neighborhood.
MRCDmu List of ssMRCD-mean vectors sorted by neighborhood.
mX List of data matrices sorted by neighborhood.
N Number of neighborhoods.
mT Target matrix.
rho Vector of regularization values sorted by neighborhood.
alpha Scalar what percentage of observations should be used.
h Vector of how many observations are used per neighborhood, sorted.
numiter The number of iterations for the best initial h-set combination.
c_alpha Consistency factor for normality.
weights The weighting matrix.
lambda Smoothing factor.
obj_fun_values A matrix with objective function values for all initial h-set combinations (rows) and iterations (columns).
best6pack initial h-set combinations with best objective function value after c-step iterations.
Kcov returns MRCD-estimates without smoothing.

If parameters are tuned, the output consists of:

ssMRCD Object of class ssMRCD with optimally selected parameter lambda.
tuning_grid Vector of lambda to tune over given by the input.
tuning_values If tuning$method = "residuals" then a vector returning the values of the residual criteria for the corresponding values of lambda in tuning_grid.
If tuning$method = "local contamination", then matrix with false negative rates and the total number of flagged outliers.
plot If tuning$plot = TRUE, then a plot for parameter tuning is added.

References

Puchhammer P. and Filzmoser P. (2023). Spatially Smoothed Robust Covariance Estimation for Local Outlier Detection. Journal of Computational and Graphical Statistics, 33(3), 928–940. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/10618600.2023.2277875")}

See Also

plot.ssMRCD

Examples

# create data set
x1 = matrix(runif(200), ncol = 2)
x2 = matrix(rnorm(200), ncol = 2)
x = list(x1, x2)

# create weighting matrix
W = matrix(c(0, 1, 1, 0), ncol = 2)

# calculate ssMRCD
out = ssMRCD(X = x, weights = W, lambda = 0.5)
str(out)

ssMRCD documentation built on Nov. 5, 2025, 7:44 p.m.