Description Usage Arguments Details Value See Also Examples
View source: R/api_fit_outlier.R
Detecting outliers within a dataset or test if a new (novel) observation is an outlier.
1 2 3 4 5 6 7 8 9 | fit_outlier(
A,
adj,
z = NULL,
alpha = 0.05,
nsim = 10000,
ncores = 1,
validate = TRUE
)
|
A |
Character matrix or data.frame. All values must be limited to a single character. |
adj |
Adjacency list or |
z |
Named vector (same names as |
alpha |
Significance level |
nsim |
Number of simulations |
ncores |
Number of cores to use in parallelization |
validate |
Logical. If true, it checks if |
If the goal is to detect outliers within A
set z
to NULL
;
this procedure is most often just referred to as outlier detection. Once fit_outlier
has been called in this situation, one can exploit the outliers
function to get the
indicies for which observations in A
that are outliers. See the examples.
On the other hand, if the goal is test if the new unseen observation z
is an outlier
inA
, then supply a named vector to z
.
All values must be limited to a single character representation; if not, the function will
internally convert to one such representation. The reason for this, is a speedup in runtime
performance. One can also use the exported function to_chars
on A
in
advance and set validate
to FALSE
.
The adj
object is most typically found using fit_graph
from the ess
package. But the user can supply an adjacency list, just a named list
, of their own
choice if needed.
A outlier_model
object with either novelty
or outlier
as child classes. These are used for different purposes. See the details
fit_mixed_outlier
, fit_multiple_models
,
outliers
, pval
, deviance
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | library(dplyr)
library(ess) # For the fit_graph function
set.seed(7) # For reproducibility
# Psoriasis patients
d <- derma %>%
filter(ES == "psoriasis") %>%
select(1:20) %>% # only a subset of data is used to exemplify
as_tibble()
# Fitting the interaction graph
# see package ess for details
g <- fit_graph(d, trace = FALSE)
plot(g)
# -----------------------------------------------------------
# EXAMPLE 1
# Testing which observations within d are outliers
# -----------------------------------------------------------
# Only 500 simulations is used here to exeplify
# The default number of simulations is 10,000
m1 <- fit_outlier(d, g, nsim = 500)
print(m1)
outs <- outliers(m1)
douts <- d[which(outs), ]
douts
# Notice that m1 is of class 'outlier'. This means, that the procedure has tested which
# observations _within_ the data are outliers. This method is most often just referred to
# as outlier detection. The following plot is the distribution of the test statistic. Think
# of a simple t-test, where the distribution of the test statistic is a t-distribution.
# In order to conclude on the hypothesis, one finds the critical value and verify if the
# test statistic is greater or less than this.
# Retrieving the test statistic for individual observations
x1 <- douts[1, ] %>% unlist()
x2 <- d[1, ] %>% unlist()
dev1 <- deviance(m1, x1) # falls within the critical region in the plot (the red area)
dev2 <- deviance(m1, x2) # falls within the acceptable region in the plot
dev1
dev2
# Retrieving the pvalues
pval(m1, dev1)
pval(m1, dev2)
# -----------------------------------------------------------
# EXAMPLE 2
# Testing if a new observation is an outlier
# -----------------------------------------------------------
# An observation from class "chronic dermatitis"
z <- derma %>%
filter(ES == "chronic dermatitis") %>%
select(1:20) %>%
slice(1) %>%
unlist()
# Test if z is an outlier in class "psoriasis"
# Only 500 simulations is used here to exeplify
# The default number of simulations is 10,000
m2 <- fit_outlier(d, g, z, nsim = 500)
print(m2)
plot(m2) # Try using more simulations and the complete derma data
# Notice that m2 is of class 'novelty'. The term novelty detection
# is sometimes used in the litterature when the goal is to verify
# if a new unseen observation is an outlier in a homogen dataset.
# Retrieving the test statistic and pvalue for z
dz <- deviance(m2, z)
pval(m2, dz)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.