MAXLEN_est: Estimate MAXLEN
In SONO: Scores of Nominal Outlyingness (SONO)

View source: R/MAXLEN_est.R

MAXLEN_est

R Documentation

Estimate MAXLEN

Description

Function estimating the value of MAXLEN (stopping criterion) prior to running the SONO algorithm. The estimation is done using the ideas described in \insertCitecosta_novel_2025;textualSONO, using simultaneous confidence intervals for Multinomial proportions, as done by \insertCitesison_simultaneous_1995;textualSONO.

Usage

MAXLEN_est(data, probs, alpha = 0.01, frequent = FALSE)

Arguments

`data`	Dataset; needs to be of class data.frame and consist of factor variables only.
`probs`	List of probability vectors for each variable. Each element of the list must include as many probabilities as the number of levels associated with it in the dataset.
`alpha`	Significance level for the simultaneous Multinomial confidence intervals constructed, determining what the frequency thresholds should be for itemsets of different length, used for outlier detection for discrete features. Must be a positive real, at most equal to 0.50. A greater value leads to a much more conservative algorithm. Default value is 0.01.
`frequent`	Logical determining whether highly frequent or highly infrequent itemsets are considered as outliers. Defaults to FALSE, treating highly infrequent itemsets as outlying.

Value

Estimated MAXLEN value.

References

\insertRef

costa_novel_2025SONO

\insertRef

sison_simultaneous_1995SONO

Examples

dt <- as.data.frame(sample(c(1:2), 100, replace = TRUE, prob = c(0.5, 0.5)))
dt <- cbind(dt, sample(c(1:3), 100, replace = TRUE, prob = c(0.5, 0.3, 0.2)))
dt[, 1] <- as.factor(dt[, 1])
dt[, 2] <- as.factor(dt[, 2])
colnames(dt) <- c('V1', 'V2')
MAXLEN_est(data = dt, probs = list(c(0.5, 0.5), c(1/3, 1/3, 1/3)), alpha = 0.01, frequent = FALSE)

SONO documentation built on April 12, 2025, 1:13 a.m.