statistics: Compute generalized Kolmogorov-Smirnov test statistics

GKSStatR Documentation

Compute generalized Kolmogorov-Smirnov test statistics

Description

Compute the Kolmogorov-Smirnov, Berk-Jones or the higher criticism statistics to test whether the data is from an uniform(0,1) distribution. The function GKSStat provides an uniform way to computes different test statistics. To be consistent with the other statistics, the traditional higher criticism statistic is named HC+ and the statistic HCStat computes the two-sided higher criticism statistic.

Usage

GKSStat(
  x,
  index = NULL,
  indexL = NULL,
  indexU = NULL,
  statName = c("KS", "KS+", "KS-", "BJ", "BJ+", "BJ-", "HC", "HC+", "HC-", "Simes"),
  pvalue = TRUE
)

Arguments

x

Numeric, the samples that the test statistics will be based on.

index

Integer, controlling which ordered samples will be used in the statistics, see details.

indexL

Integer, controlling which ordered samples will be used in the statistics, see details.

indexU

Integer, controlling which ordered samples will be used in the statistics, see details.

statName

Character, the name of the statistic that will be computed. The default is "KS".

pvalue

Logical, whether to compute the p-value of the statistic. The default is TRUE

alpha0

Numeric, controlling which ordered samples will be used in the statistics, the default value is 1. see details.

Details

statistics definitions

The function compute the test statistics which aggregate the significant signal from the order statistics of the samples, that is, if T is a statistic and X_1,X_2,...,X_n are the samples, the value of T is purely based on the value of X_(1),X_(2),...,X_(n), where X_(i) is the ith ascending sorted samples of X1,X2,...,Xn. Moreover, the rejection region of the statistic T can be written as a set of rejection regions of the ordered samples X_(1),X_(2),...,X_(n). In other words, there exist two sequences {l_i} and {u_i} for i=1,...,n and the statistic T is rejected if and only if there exist one i such that X_(i) < l_i or X_(i) > u_i.

The most well-known statistic which takes this form is the Kolmogorov-Smirnov statistic. Other statistics like Berk-Jones or the higher criticism also have similar formulas but define different sets of {l_i} and {u_i}.

alpha0, index, indexL and indexU

As mentioned previouly, the rejection of a test can be determined by the sequences of {l_i} and {u_i}. Therefore, the parameter alpha0, index indexL and indexU. provide a way to control which l_i and u_i will be considered in the test procedure. If no argument is provided, all l_is and u_is will be compared with their corresponding sorted sample X_(i). This yields the traditional test statistics. If alpha0 is used, only the data X_(1),...X_(k) will be used in the test where k is the nearest integer of alpha0*n. If index is provided, only X_(i) for i in index will be considered in the test. If indexL and/or indexU is not NULL, only l_i for i in indexL and u_i for i in indexU will be used as the rejection boundary for the test. These can be used to generate an one-sided version of the test statistic. For example, if indexL is from 1 to the length of x and indexU is NULL, this will yield a test specifically sensitive to smaller samples. The test statistics like KS+, HC+ and BJ+ are implemented by calling GKSStat(..., indexU = NULL), where indexU is always NULL.

Value

a GKSStat S3 object

Examples

## Generate samples
x <- rbeta(10, 1, 2)

## Perform KS test
GKSStat(x = x, statName = "KS")

## Perform one-sided KS test
GKSStat(x = x, statName = "KS+")
GKSStat(x = x, statName = "KS-")


Jiefei-Wang/exceedance documentation built on May 11, 2022, 1:43 a.m.