monotonicity_test | R Documentation |
Performs a monotonicity test between the vectors X
and Y
as described in Hall and Heckman (2000).
This function uses a bootstrap approach to test for monotonicity
in a nonparametric regression setting.
monotonicity_test(
X,
Y,
bandwidth = bw.nrd(X) * (length(X)^-0.1),
boot_num = 200,
m = floor(0.05 * length(X)),
ncores = 1,
negative = FALSE,
seed = NULL
)
X |
Numeric vector of predictor variable values. Must not contain missing or infinite values. |
Y |
Numeric vector of response variable values. Must not contain missing or infinite values. |
bandwidth |
Numeric value for the kernel bandwidth used in the
Nadaraya-Watson estimator. Default is calculated as
|
boot_num |
Integer specifying the number of bootstrap samples.
Default is |
m |
Integer parameter used in the calculation of the test statistic.
Corresponds to the minimum window size to calculate the test
statistic over or a "smoothing" parameter. Lower values increase
the sensitivity of the test to local deviations from monotonicity.
Default is |
ncores |
Integer specifying the number of cores to use for parallel
processing. Default is |
negative |
Logical value indicating whether to test for a monotonic
decreasing (negative) relationship. Default is |
seed |
Optional integer for setting the random seed. If NULL (default), the global random state is used. |
The test evaluates the following hypotheses:
H_0
: The regression function is monotonic
Non-decreasing if negative = FALSE
Non-increasing if negative = TRUE
H_A
: The regression function is not monotonic
A list with the following components:
p
The p-value of the test. A small p-value (e.g., < 0.05) suggests evidence against the null hypothesis of monotonicity.
dist
The distribution of test statistic under the null from
bootstrap samples. The length of dist
is equal
to boot_num
.
stat
The test statistic T_m
calculated from the original data.
plot
A ggplot object with a scatter plot where the points of the
"critical interval" are highlighted. This critical interval
is the interval where T_m
is greatest.
interval
Numeric vector containing the indices of the "critical interval".
The first index indicates where the interval starts, and
the second indicates where it ends in the sorted X
vector.
For large datasets (e.g., n \geq 6500
) this function may require
significant computation time due to having to compute the statistic
for every possible interval. Consider reducing boot_num
, using
a subset of the data, or using parallel processing with ncores
to improve performance.
In addition to this, a minimum of 300 observations is recommended for kernel estimates to be reliable.
Hall, P., & Heckman, N. E. (2000). Testing for monotonicity of a regression mean by calibrating for linear functions. The Annals of Statistics, 28(1), 20–39.
# Example 1: Usage on monotonic increasing function
# Generate sample data
seed <- 42
set.seed(seed)
X <- runif(500)
Y <- 4 * X + rnorm(500, sd = 1)
result <- monotonicity_test(X, Y, boot_num = 25, seed = seed)
print(result$p)
print(result$stat)
print(result$dist)
print(result$interval)
# Example 2: Usage on non-monotonic function
seed <- 42
set.seed(seed)
X <- runif(500)
Y <- (X - 0.5) ^ 2 + rnorm(500, sd = 0.5)
result <- monotonicity_test(X, Y, boot_num = 25, seed = seed)
print(result$p)
print(result$stat)
print(result$dist)
print(result$interval)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.