gen_glob_outl: Contamination with Global Outliers

View source: R/outl_gen.R

gen_glob_outlR Documentation

Contamination with Global Outliers

Description

Generates synthetic global outliers and contaminates a given p-variate random field

Usage

gen_glob_outl(x, alpha = 0.05, h = 10, random_sign = FALSE)

Arguments

x

a numeric matrix of dimension c(n, p) where the p columns correspond to the entries of the random field and the n rows are the observations.

alpha

a numerical value between 0 and 1 giving the proportion of observations to contaminate.

h

a numerical constant to determine how large the contaminated outliers are, see details.

random_sign

logical. If TRUE, the sign of each component of the outlier is randomly selected. Default is FALSE. See more in details.

Details

gen_glob_outl generates outliers for a given field by selecting randomly round(alpha * n) observations x_i to be the outliers and contaminating them by setting x^{out}_i = (c^i)'x_i, where the elements c^i_j of vector c^i are determined by the parameter random_sign. If random_sign = TRUE, c^i_j is either h or -h with P(c^i_j = h) = P(c^i_j = -h) = 0.5. If random_sign = FALSE, c^i_j=h for all j=1,...p, i=1,...,n. The parameter alpha determines the contamination rate \alpha and the parameter h determines the size of the outliers.

Value

gen_glob_outl returns a data.frame containing the contaminated fields as p first columns. The column p + 1 contains a logical indicator whether the observation is outlier or not.

See Also

gen_loc_outl

Examples

# simulate coordinates
coords <- runif(1000 * 2) * 20
dim(coords) <- c(1000, 2)
coords_df <- as.data.frame(coords)
names(coords_df) <- c("x", "y")
# simulate random field
if (!requireNamespace('gstat', quietly = TRUE)) {
  message('Please install the package gstat to run the example code.')
} else {
  library(gstat)
  model_1 <- gstat(formula = z ~ 1, locations = ~ x + y, dummy = TRUE, beta = 0, 
                   model = vgm(psill = 0.025, range = 1, model = 'Exp'), nmax = 20)
  model_2 <- gstat(formula = z ~ 1, locations = ~ x + y, dummy = TRUE, beta = 0, 
                   model = vgm(psill = 0.025, range = 1, kappa = 2, model = 'Mat'), 
                   nmax = 20)
  model_3 <- gstat(formula = z ~ 1, locations = ~ x + y, dummy = TRUE, beta = 0, 
                   model = vgm(psill = 0.025, range = 1, model = 'Gau'), nmax = 20)
  field_1 <- predict(model_1, newdata = coords_df, nsim = 1)$sim1
  field_2 <- predict(model_2, newdata = coords_df, nsim = 1)$sim1
  field_3 <- predict(model_3, newdata = coords_df, nsim = 1)$sim1
  field <- cbind(field_1, field_2, field_3)
  # Generate 10 % global outliers to data, with size h=15.
  field_cont <- gen_glob_outl(field, alpha = 0.1, h = 15)
  
  # Generate 5 % global outliers to data, with size h = 10 and random sign.
  field_cont2 <- gen_glob_outl(field, alpha = 0.05, h = 10, random_sign = TRUE)
}

SpatialBSS documentation built on July 26, 2023, 5:37 p.m.