View source: R/api_fit_outlier.R
fit_outlier | R Documentation |
Detecting outliers within a dataset or test if a new (novel) observation is an outlier.
fit_outlier( A, adj, z = NULL, alpha = 0.05, nsim = 10000, ncores = 1, validate = TRUE )
A |
Character matrix or data.frame. All values must be limited to a single character. |
adj |
Adjacency list or |
z |
Named vector (same names as |
alpha |
Significance level |
nsim |
Number of simulations |
ncores |
Number of cores to use in parallelization |
validate |
Logical. If true, it checks if |
If the goal is to detect outliers within A
set z
to NULL
;
this procedure is most often just referred to as outlier detection. Once fit_outlier
has been called in this situation, one can exploit the outliers
function to get the
indicies for which observations in A
that are outliers. See the examples.
On the other hand, if the goal is test if the new unseen observation z
is an outlier
inA
, then supply a named vector to z
.
All values must be limited to a single character representation; if not, the function will
internally convert to one such representation. The reason for this, is a speedup in runtime
performance. One can also use the exported function to_chars
on A
in
advance and set validate
to FALSE
.
The adj
object is most typically found using fit_graph
from the ess
package. But the user can supply an adjacency list, just a named list
, of their own
choice if needed.
A outlier_model
object with either novelty
or outlier
as child classes. These are used for different purposes. See the details
fit_mixed_outlier
, fit_multiple_models
,
outliers
, pval
, deviance
library(dplyr) library(ess) # For the fit_graph function set.seed(7) # For reproducibility # Psoriasis patients d <- derma %>% filter(ES == "psoriasis") %>% select(1:20) %>% # only a subset of data is used to exemplify as_tibble() # Fitting the interaction graph # see package ess for details g <- fit_graph(d, trace = FALSE) plot(g) # ----------------------------------------------------------- # EXAMPLE 1 # Testing which observations within d are outliers # ----------------------------------------------------------- # Only 500 simulations is used here to exeplify # The default number of simulations is 10,000 m1 <- fit_outlier(d, g, nsim = 500) print(m1) outs <- outliers(m1) douts <- d[which(outs), ] douts # Notice that m1 is of class 'outlier'. This means, that the procedure has tested which # observations _within_ the data are outliers. This method is most often just referred to # as outlier detection. The following plot is the distribution of the test statistic. Think # of a simple t-test, where the distribution of the test statistic is a t-distribution. # In order to conclude on the hypothesis, one finds the critical value and verify if the # test statistic is greater or less than this. # Retrieving the test statistic for individual observations x1 <- douts[1, ] %>% unlist() x2 <- d[1, ] %>% unlist() dev1 <- deviance(m1, x1) # falls within the critical region in the plot (the red area) dev2 <- deviance(m1, x2) # falls within the acceptable region in the plot dev1 dev2 # Retrieving the pvalues pval(m1, dev1) pval(m1, dev2) # ----------------------------------------------------------- # EXAMPLE 2 # Testing if a new observation is an outlier # ----------------------------------------------------------- # An observation from class "chronic dermatitis" z <- derma %>% filter(ES == "chronic dermatitis") %>% select(1:20) %>% slice(1) %>% unlist() # Test if z is an outlier in class "psoriasis" # Only 500 simulations is used here to exeplify # The default number of simulations is 10,000 m2 <- fit_outlier(d, g, z, nsim = 500) print(m2) plot(m2) # Try using more simulations and the complete derma data # Notice that m2 is of class 'novelty'. The term novelty detection # is sometimes used in the litterature when the goal is to verify # if a new unseen observation is an outlier in a homogen dataset. # Retrieving the test statistic and pvalue for z dz <- deviance(m2, z) pval(m2, dz)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.