R/TestPointIsAnomaly_TDist.R

Defines functions TestPointIsAnomaly_TDist

Documented in TestPointIsAnomaly_TDist

#' @title Check Whether Test Point is Anomaly
#'
#' @description Assume the \code{training} set was generated by a process that
#'   follows t-distribution with \code{degF} degrees of freedom, this function
#'   checks whether to reject the null hypothesis that the \code{test} point are
#'   generated by the same process. If the probability of obtaining a result
#'   equals to or more extreme than \code{test} is lower than \code{p}, then
#'   this function returns \code{TRUE}, meaning the null hypothesis is rejected
#'   and the \code{test} point is likely to be an anomoly. If argument
#'   \code{exclude} is specified, elements at those designated positions are
#'   removed from the training set.
#'
#' @param training A numeric vector containing the samples used to fit the
#'   t-distribution
#' @param test A numeric value to be tested
#' @param exclude A logical vector with length equals to
#'   \code{length(training)}. It is used to remove elements at designated
#'   positions from fitting the t-distribution. By default, \code{exclude =
#'   NULL}, which means no element is excluded when fitting the t-distribution.
#' @param p p-value threshold with values in \emph{[0, 1]}.
#' @param degF Degrees of freedom (>0, maybe non-integer)
#' @return returns \code{TRUE} if the test point is likely to be an anomaly and
#'   \code{FALSE} otherwise. For debugging purpose, this function also returns
#'   metadata \code{stdev} and \code{tscore}, which equals to the sample
#'   standard deviation calculated from \code{training} set and t-score of
#'   \code{test} point, respectively.
#' @examples
#' set.seed(1)
#' training <- runif(1000)
#' test <- 0.95
#' exclude <- sample(c(T, F), 1000, replace = T, prob = c(0.005, 0.995))
#' TestPointIsAnomaly_TDist(training, test, exclude)
#' TestPointIsAnomaly_TDist(training, test, exclude, 0.1)
#' @importFrom stats qt
#' @export
#'
TestPointIsAnomaly_TDist <- function(training, test, exclude = NULL, p = 0.01, degF = 10){
  if(!is.null(exclude)){
    training <- training[!exclude]
  }
  sd <- sd(training)
  tscore <- (test - mean(training))/sd
  tT <- qt(1-p, degF)
  isAnomaly <- (tscore > tT)
  attr(isAnomaly, "stdev") <- sd
  attr(isAnomaly, "tscore") <- tscore
  isAnomaly
}
jingjin1018/anetimeseries documentation built on May 19, 2019, 10:35 a.m.