| bayeswise | R Documentation |
The function uses a Bayesian approach to determine if a data entry is an outlier or not.
The function takes a long-format data.frame object as input and returns it with two appended vectors.
The first vector contains the posterior probabilities for a cell to be anomalous, and the second vector provides
a set of logical values indicating whether the data entry is an outlier (TRUE) or not (FALSE).
bayeswise(a, prior = NULL, epochs = 1000L)
a |
A long-format |
prior |
A numerical value or vector of cell-level prior probabilities of observing an outlier. It is |
epochs |
Number of epochs used to train a nontrivial robust linear model via the lion algorithm. By default, the algorithm will run 1000 iterations. |
The argument a is provided as an object of class data.frame.
This object is considered as a long-format data.frame, and it must have at least five columns with the following names:
"strata"a character or factor column containing the information on the stratification.
"unit_id"a character or factor column containing the ID of the statistical unit in the survey sample(x, size, replace = FALSE, prob = NULL).
"master_varname"a character column containing the name of the observed variable.
"current_value_num"a numeric the observed value, i.e., a data entry
"pred_value"a numeric a value observed on a previous survey for the same variable if available. If not available, the value can be set to NA or NaN. When working with longitudinal data, the value can be set to a time-series forecast or a filtered value.
"prior"a numeric a value of prior probabilities of observing an outlier for the cell. If this column is omitted in the dataset provided, the function will use the values provided through the argument prior.
The data.frame object in input can have more columns, but the extra columns would be ignored in the analyses.
However, these extra columns would be preserved in the system memory and returned along with the results from the cellwise outlier-detection analysis.
The use of the R-packages dplyr, purrr, and tidyr is highly recommended to simplify the conversion of datasets between long and wide formats.
A data frame with the same columns as the input data frame, plus the following additional columns:
The prior probability used for the cell (either input or derived).
The z-score of the cell.
The h-score of the cell.
The r-score of the cell.
The t-score of the cell.
The final outlier score of the cell, representing the posterior probability of being anomalous.
A boolean indicating whether the cell is an outlier.
A character string indicating the type of anomaly detected, if any.
Luca Sartore drwolf85@gmail.com
# Load the package
library(HRTnomaly)
set.seed(2025L)
# Load the 'toy' data
data(toy)
# Detect cellwise outliers using Bayesian Analysis
res <- bayeswise(toy[sample.int(100), ], 0.5, 10L)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.