MOE: Detecting cellwise outliers using Shapley values based on...

View source: R/MOE.R

MOER Documentation

Detecting cellwise outliers using Shapley values based on local outlyingness.

Description

The MOE function indicates outlying cells for a data vector with p entries or data matrix with n \times p entries containing only numeric entries x for a given center mu and covariance matrix Sigma using the Shapley value. It is a more sophisticated alternative to the SCD algorithm, which uses the information of the regular cells to derive an alternative reference point \insertCiteMayrhofer2022ShapleyOutlier.

Usage

MOE(
  x,
  mu,
  Sigma,
  Sigma_inv = NULL,
  step_size = 0.1,
  min_deviation = 0,
  max_step = NULL,
  local = TRUE,
  max_iter = 1000,
  q = 0.99,
  check_outlyingness = FALSE,
  check = TRUE,
  cells = NULL,
  method = "cellMCD"
)

Arguments

x

Data vector with p entries or data matrix with n \times p entries containing only numeric entries.

mu

Either NULL (default) or mean vector of x. If NULL, method is used for parameter estimation.

Sigma

Either NULL (default) or covariance matrix p \times p of x. If NULL, method is used for parameter estimation.

Sigma_inv

Either NULL (default) or Sigma's inverse p \times p matrix. If NULL, the inverse of Sigma is computed using solve(Sigma).

step_size

Numeric. Step size for the imputation of outlying cells, with step_size \in [0,1]. Defaults to 0.1.

min_deviation

Numeric. Detection threshold, with min_deviation \in [0,1]. Defaults to 0.2

max_step

Either NULL (default) or an integer. The maximum number of steps in each iteration. If NULL, max_step = p.

local

Logical. If TRUE (default), the non-central Chi-Squared distribution is used to determine the cutoff value based on mu_tilde.

max_iter

Integer. The maximum number of iterations.

q

Numeric. The quantile of the Chi-squared distribution for detection and imputation of outliers. Defaults to 0.99.

check_outlyingness

Logical. If TRUE (default), the outlyingness is rechecked after applying min_deviation.

check

Logical. If TRUE (default), inputs are checked before running the function and an error message is returned if one of the inputs is not as expected.

cells

Either NULL (default) or a vector/matrix of the same dimension as x, indicating the outlying cells. The matrix must contain only zeros and ones, or TRUE/FALSE.

method

Either "cellMCD" (default) or "MCD". Specifies the method used for parameter estimation if mu and/or Sigma are not provided.

Value

A list of class shapley_algorithm (new_shapley_algorithm) containing the following:

x

A p-dimensional vector (or a n \times p matrix) containing the imputed data.

phi

A p-dimensional vector (or a n \times p matrix) containing the Shapley values (outlyingness-scores) of x; see shapley.

mu_tilde

A p-dimensional vector (or a n \times p matrix) containing the alternative reference points based on the regular cells of the original observations.

x_original

A p-dimensional vector (or a n \times p matrix) containing the original data.

x_original

The non-centrality parameters for the Chi-Squared distribution

x_history

A list with n elements, each containing the path of how the original data vector was modified.

phi_history

A list with n elements, each containing the Shapley values corresponding to x_history.

mu_tilde_history

A list with n elements, each containing the alternative reference points corresponding to x_history.

S_history

A list with n elements, each containing the indices of the outlying cells in each iteration.

References

\insertAllCited

Examples

p <- 5
mu <- rep(0,p)
Sigma <- matrix(0.9, p, p); diag(Sigma) = 1
Sigma_inv <- solve(Sigma)
x <- c(0,1,2,2.3,2.5)
MOE_x <- MOE(x = x, mu = mu, Sigma = Sigma)
plot(MOE_x)

library(MASS)
set.seed(1)
n <- 100; p <- 10
mu <- rep(0,p)
Sigma <- matrix(0.9, p, p); diag(Sigma) = 1
X <- mvrnorm(n, mu, Sigma)
X[sample(1:(n*p), 100, FALSE)] <- rep(c(-5,5),50)
MOE_X <- MOE(X, mu, Sigma)
plot(MOE_X, subset = 20)

ShapleyOutlier documentation built on Oct. 17, 2024, 5:08 p.m.