Attribute-wise Learning for Scoring Outliers (ALSO) is an unsupervised anomaly detection algorithm for multidimensional data. The main goal is to automate separate predictive models for each feature in the dataset. In each model, one feature in the dataset is the target and the remaining are predictors. A classifier or regressor is called to model the relationships. Observations are scored and compared to true values, and a numerical outlier score is returned for each observation based on the magnitude of deviation from expected values. Given the vector of numeric outlier scores, simple univariate techniques can be used to extract insights. Larger outlier scores suggest greater outlierness.

Install

devtools::install_github("dannymorris/ALSO")

In Outlier Analysis (C.C Aggarwal. Springer, 2017), the author recommends the use of random forests as the base regressor/classifier. Random forest are highly robust regressors and classifiers due to bagging and ensembling.

Here we'll examine the datasets::state.x77 dataset. It contains 50 observations and 8 variables. The dataset is rather small, but it works well for illustration.

library(dplyr)
library(ALSO)

Quick data prep

Currently only data frame or tibbles are supported as inputs to the function.

# z-standardize all columns
zstd <- function(x) {
    (x - mean(x)) / sd(x)
}
state_tbl <- state.x77 %>%
    dplyr::as_tibble() %>%
    dplyr::mutate_all(zstd)

Using the ALSO_RF() function

rf_also <- ALSO::ALSO_RF(data = state_tbl,
                      cross_validate = TRUE,
                      n_folds = 5,
                      scores_only = TRUE)

rf_also

Here we return only the outlier scores. Observations are given out-of-sample scores via 5-fold cross validation.

state_tbl %>%
    mutate(also_score = rf_also) %>%
    mutate(state = rownames(state.x77)) %>%
    arrange(desc(also_score)) %>%
    select(state, also_score) 


dannymorris/ALSO documentation built on May 4, 2019, 7:42 p.m.