classCleaner: classCleaner: A package for cleaning outliers when data is...
In melissakey/classCleaner: Tools to Identify Outliers in grouped (class) Data

Description Usage Arguments Details Examples

View source: R/clean_classes.R

Test whether each instance in a class actually belongs.

classCleaner(
  D,
  assignment,
  classes = "all",
  alpha0 = 0.05,
  q = 0.5,
  labels = NULL,
  exclude_classes = NULL
)

`D`	A distance matrix containing the pairwise dissimilarity scores between instances
`assignment`	The assigned group of each instance
`classes`	The subset of classes on which filtering is performed, or "all" if all classes should be analyzed.
`alpha0`	Desired global type I (v1) or type II (v2) error rate.
`q`	(v2 only) - the proportion of distances expected to be "close enough" to keep an instance. Defaults to 0.5.
`labels`	a vector of labels for each instance. Must be the same length as D. If NULL, the algorithm will check for rownames and column names in D. If none are found, the instances will be labeled with numbers 1:nrow(D).
`exclude_classes`	names of "mega" classes which should not be included in determining whether or not classCleaner2 is appropriate. By default, these classes will not be included in the analysis.

For each instance in an analyzed class, this function will estimate the probability that it was correctly placed in that class.

set.seed(23)

X <- simulate_clustered_data(
  n = 200,
  Nk = rep(50, 100),
  s = rep(1, 100),
  rho = .2,
  tau = 1,
  method = "by-class"
)
# true assignment
a <- rep(1:100, each = 50)

# corrupted assignment
b <- sample(100, 50 * 100, replace = TRUE)

# corrupt 10% of samples
a.corrupt <- ifelse(runif(50 * 1000) < 0.1, b, a)

D <- 1 - cor(X)
result <- identify_outliers(a.corrupt, D, 1000, colnames(D))