m_test: Two sample location test based on M-estimators

View source: R/mTest.R

m_testR Documentation

Two sample location test based on M-estimators

Description

m_test performs a two-sample location test based on an M-estimator.

Usage

m_test(
  x,
  y,
  alternative = c("two.sided", "greater", "less"),
  delta = ifelse(scale.test, 1, 0),
  method = c("asymptotic", "permutation", "randomization"),
  psi = c("huber", "hampel", "bisquare"),
  k = robustbase::.Mpsi.tuning.default(psi),
  n.rep = 10000,
  na.rm = FALSE,
  scale.test = FALSE,
  wobble.seed = NULL,
  ...
)

Arguments

x

a (non-empty) numeric vector of data values.

y

a (non-empty) numeric vector of data values.

alternative

a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater", or "less".

delta

a numeric value indicating the true difference in the location or scale parameter, depending on whether the test should be performed for a difference in location or in scale. The default is delta = 0 for a location test and delta = 1 for a scale test. In case of scale.test = TRUE, delta represents the ratio of the squared scale parameters.

method

a character string specifying how the p-value is computed with possible values "asymptotic" for an asymptotic test based on a normal approximation, "permutation" for a permutation test, and "randomization" for a randomization test. The permutation test uses all splits of the joint sample into two samples of sizes m and n, while the randomization test draws n.rep random splits with replacement. The values m and n denote the sample sizes. If not specified explicitly, defaults to "permutation" if m < 30, n < 30 and n.rep >= choose(m + n, m), "randomization" if m < 30, n < 30 and n.rep < choose(m + n, m), and "asymptotic" if m >= 30 and n >= 30.

psi

kernel used for optimization. Must be one of "bisquare", "hampel" and "huber". The default is "huber".

k

tuning parameter(s) for the respective kernel function, defaults to parameters implemented in .Mpsi.tuning.default(psi) in the package robustbase.

n.rep

an integer value specifying the number of random splits used to calculate the randomization distribution if method = "randomization". This argument is ignored if method = "permutation" or method = "asymptotic". The default is n.rep = 10000.

na.rm

a logical value indicating whether NA values in x and y should be stripped before the computation proceeds. The default is na.rm = FALSE.

scale.test

a logical value to specify if the samples should be compared for a difference in scale. The default is scale.test = FALSE.

wobble.seed

an integer value used as a seed for the random number generation in case that scale.test = TRUE and one of the vectors x and y contains zeros. When no seed is specified, it is chosen randomly and printed in a message. The argument is ignored if scale.test = FALSE.

...

additional arguments c1 and c2 that can be passed to the function scaleTau2(), which is used internally for estimating the within-sample dispersion, in order to account for non-normal distributions; see \insertCiteMarZam02robu;textualrobnptests.

Details

The test statistic for this test is based on the difference of the M-estimates of location of x and y, see m_est.

Three different psi-functions can be used: huber, hampel, and bisquare. The corresponding tuning parameter(s) can be set by the argument k of the function.

The estimate for the location difference is scaled by a pooled estimate for the standard deviation. This estimate is based on the tau-estimate of scale and is computed with the default parameter settings of the function scaleTau2. These can be changed if by setting c1 and c2.

More details on the construction of the test statistic are given in the vignettes vignette("robnptests") and vignette("m_tests").

Three versions of the test are implemented: randomization, permutation, and asymptotic.

The randomization distribution is based on randomly drawn splits with replacement. The function permp \insertCitePhiSmy10permrobnptests is used to calculate the p-value. The psi-function for the the M-estimate is computed with the implementations in the package robustbase.

For the asymptotic test, the distribution of the test statistic is approximated by a standard normal distribution. However, this is only justified under the normality assumption. When the observations do not come from a normal distribution, the tests might not keep the desired significance level. Simulations indicate that the level is kept under symmetric distributions if the variance exists. Under skewed distributions, it tends to be anti-conservative, see the vignette vignette("m_tests"). The test statistic can be corrected by a factor which has to be determined individually for a specific distribution in such cases.

For scale.test = TRUE, the test compares the two samples for a difference in scale. This is achieved by log-transforming the original squared observations, i.e. x is replaced by log(x^2) and y by log(y^2). A potential scale difference then appears as a location difference between the transformed samples, see \insertCiteFri12onli;textualrobnptests. Note that the samples need to have equal locations. The sample should not contain zeros to prevent problems with the necessary log-transformation. If it contains zeros, uniform noise is added to all variables in order to remove zeros and a message is printed.

If the sample has been modified because of zeros when scale.test = TRUE, the modified samples can be retrieved using

set.seed(wobble.seed); wobble(x, y)

Both samples need to contain at least 5 non-missing values.

Value

A named list with class "htest" containing the following components:

statistic

the value of the test statistic.

parameter

the degrees of freedom for the test statistic.

p.value

the p-value for the test.

estimate

the M-estimates of x and y (if scale.test = FALSE) or of log(x^2) and log(y^2) (if scale.test = TRUE).

null.value

the specified hypothesized value of the mean difference/squared scale ratio.

alternative

a character string describing the alternative hypothesis.

method

a character string indicating how the p-value was computed.

data.name

a character string giving the names of the data.

References

\insertRef

Fri12onlirobnptests

\insertRef

MarZam02roburobnptests

\insertRef

PhiSmy10permrobnptests

Examples

# Generate random samples
set.seed(108)
x <- rnorm(20); y <- rnorm(20)

# Asymptotic test based on Huber M-estimator
m_test(x, y, method = "asymptotic", psi = "huber")

## Not run: 
# Randomization test based on Hampel M-estimator with 1000 random permutations
# drawn with replacement

m_test(x, y, method = "randomization", n.rep = 1000, psi = "hampel")

## End(Not run)


s-abbas/robTests documentation built on Feb. 20, 2023, 10:14 a.m.