#' Testing the equality of the \emph{M} curves specific to each level
#'@description This function can be used to test the equality of the
#'\eqn{M} curves specific to each level.
#' @param formula An object of class \code{formula}: a sympbolic
#' description of the model to be fitted. The details of model
#' specification are given under 'Details'.
#' @param data An optional data frame, matrix or list required by
#' the formula. If not found in data, the variables are taken from
#' \code{environment(formula)}, typically the environment from which
#' \code{globaltest} is called.
#' @param na.action A function which indicates what should happen when the
#' data contain 'NA's. The default is 'na.omit'.
#' @param der Number which determines any inference process.
#' By default \code{der} is \code{NULL}. If this term is \code{0},
#' the testing procedures is applied for the estimate. If it is \code{1} or
#' \code{2}, it is designed for the first or second derivative, respectively.
#' @param smooth Type smoother used: \code{smooth = "kernel"} for local polynomial
#' kernel smoothers and \code{smooth = "splines"} for splines using the
#' \code{mgcv} package.
#' @param weights Prior weights on the data.
#' @param nboot Number of bootstrap repeats.
#' @param h0 The kernel bandwidth smoothing parameter for the global effect (see
#' references for more details at the estimation). Large values of the bandwidth lead
#' to smoothed estimates; smaller values of the bandwidth lead lo undersmoothed estimates.
#' By default, cross validation is used to obtain the bandwidth.
#' @param h The kernel bandwidth smoothing parameter for the partial effects.
#' @param nh Integer number of equally-spaced bandwidth on which the
#' \code{h} is discretised, to speed up computation.
#' @param kernel A character string specifying the desired kernel.
#' Defaults to \code{kernel = "epanech"}, where the Epanechnikov
#' density function kernel will be used. Also, several types of kernel funcitons
#' can be used: triangular and Gaussian density function,
#' with \code{"triang"} and \code{"gaussian"} term, respectively.
#' @param p Degree of polynomial to be used. Its value must be the value of
#' derivative + 1. The default value is 3 due to the function
#' returns the estimation, first and second derivative.
#' @param kbin Number of binning nodes over which the function
#' is to be estimated.
#' @param seed Seed to be used in the bootstrap procedure.
#' @param cluster A logical value. If \code{TRUE} (default), the
#' bootstrap procedure is parallelized (only for \code{smooth = "splines"}.
#' Note that there are cases
#' (e.g., a low number of bootstrap repetitions) that R will gain in
#' performance through serial computation. R takes time to distribute tasks
#' across the processors also it will need time for binding them all together
#' later on. Therefore, if the time for distributing and gathering pieces
#' together is greater than the time need for single-thread computing, it does
#' not worth parallelize.
#'@param ncores An integer value specifying the number of cores to be used
#' in the parallelized procedure. If \code{NULL} (default), the number of cores
#' to be used is equal to the number of cores of the machine - 1.
#' @param \ldots Other options.
#'
#'
#' @details \code{globaltest} can be used to test the equality of the \eqn{M}
#' curves specific to each level. This bootstrap based test assumes the
#' following null hypothesis:
#'
#' \deqn{H_0^r: m_1^r(\cdot) = \ldots = m_M^r(\cdot)}
#'
#' versus the general alternative
#'
#' \deqn{H_1^r: m_i^r (\cdot) \ne m_j^r (\cdot) \quad \rm{for} \quad \rm{some}
#' \quad i, j \in \{ 1, \ldots, M\}. }
#'
#' Note that, if \eqn{H_0} is not rejected, then the equality of critical points
#' will also accepted.
#'
#' To test the null hypothesis, it is used a test statistic,
#' \eqn{T}, based on direct nonparametric estimates of the curves.
#'
#' If the null hypothesis is true, the \eqn{T} value should be close to zero
#' but is generally greater. The test rule based on \eqn{T} consists of
#' rejecting the null hypothesis if \eqn{T > T^{1- \alpha}}, where \eqn{T^p}
#' is the empirical \eqn{p}-percentile of \eqn{T} under the null hypothesis. To
#' obtain this percentile, we have used bootstrap techniques. See details in
#' references.
#'
#' Note that the models fitted by \code{globaltest} function are specified
#' in a compact symbolic form. The ~ operator is basic in the formation
#' of such models. An expression of the form \code{y ~ model} is interpreted as
#' a specification that the response \code{y} is modelled by a predictor
#' specified symbolically by \code{model}. The possible terms consist of a
#' variable name or a variable name and a factor name separated by : operator.
#' Such a term is interpreted as the interaction of the continuous variable and
#' the factor. However, if \code{smooth = "splines"}, the formula is based on the function
#' formula.gam of the mgcv package.
#'@return The \eqn{T} value and the \eqn{p}-value are returned. Additionally,
#'it is shown the decision, accepted or rejected, of the global test.
#'The null hypothesis is rejected if the \eqn{p}-value\eqn{< 0.05}.
#'
#'@author Marta Sestelo, Nora M. Villanueva and Javier Roca-Pardinas.
#'
#' @references
#' Sestelo, M. (2013). Development and computational implementation of
#' estimation and inference methods in flexible regression models.
#' Applications in Biology, Engineering and Environment. PhD Thesis, Department
#' of Statistics and O.R. University of Vigo.
#'
#'
#' Sestelo, M., Villanueva, N.M., Meira-Machado, L., Roca-Pardinas, J. (2017).
#' npregfast: An R Package for Nonparametric Estimation and Inference in Life
#' Sciences. Journal of Statistical Software, 82(12), 1-27.
#'
#'@examples
#' library(npregfast)
#' data(barnacle)
#' globaltest(DW ~ RC : F, data = barnacle, der = 1, seed = 130853, nboot = 100)
#'
#' # globaltest(height ~ s(age, by = sex), data = children,
#' # seed = 130853, der = 0, smooth = "splines")
#'
#'
#' @importFrom stats na.omit runif
#' @importFrom mgcv interpret.gam gam predict.gam
#' @importFrom sfsmisc D1D2
#' @importFrom doParallel registerDoParallel
#' @importFrom parallel detectCores
#' @importFrom foreach foreach %dopar%
#' @export
globaltest <- function(formula, data, na.action = "na.omit",
der, smooth = "kernel", weights = NULL,
nboot = 500, h0 = -1, h = -1, nh = 30,
kernel = "epanech", p = 3, kbin = 100, seed = NULL,
cluster = TRUE, ncores = NULL, ...) {
# utils::globalVariables("i") # for the note at checking --as-cran
# globaltest: no visible binding for global variable 'i'
# utils::suppressForeignCheck("i")
if (missing(der)) {
stop("Argument \"der\" is missing, with no default")
}
if (missing(formula)) {
stop("Argument \"formula\" is missing, with no default")
}
# if (missing(data)) {
# stop("Argument \"data\" is missing, with no default")
# }
if(!isTRUE(der %in% c(0, 1, 2))) {
stop("",paste(der)," is not a r-th derivative implemented, only
permitted 0, 1 or 2.")
}
if (kernel == "gaussian")
kernel <- 3
if (kernel == "epanech")
kernel <- 1
if (kernel == "triang")
kernel <- 2
if (!(kernel %in% 1:3)) {
stop("Kernel not suported")
}
if (!(smooth %in% c("kernel", "splines"))) {
stop("Smoother not suported")
}
# if (is.null(seed))
# seed <- -1
if (!is.null(seed)) {
set.seed(seed)
}
if (isTRUE(cluster) & smooth == "splines") {
if (is.null(ncores)) {
num_cores <- detectCores() - 1
}else{
num_cores <- ncores
}
registerDoParallel(cores = num_cores)
}
if (smooth != "splines") {
cl <- match.call()
mf <- match.call(expand.dots = FALSE)
m <- match(x = c("formula", "data", "subset", "weights",
"na.action", "offset"), table = names(mf), nomatch = 0L)
mf <- mf[c(1L, m)]
mf$drop.unused.levels <- TRUE
mf[[1L]] <- quote(stats::model.frame)
mf <- eval(expr = mf, envir = parent.frame())
mt <- attr(mf, "terms")
y <- model.response(mf, "numeric")
w <- as.vector(model.weights(mf))
if (!is.null(w) && !is.numeric(w))
stop("'weights' must be a numeric vector")
terms <- attr(mt, "term.labels")
aux <- unlist(strsplit(terms,split = ":"))
varnames <- aux[1]
namef <- aux[2]
response <- as.character(attr(mt, "variables")[2])
if (unlist(strsplit(varnames,split = ""))[1] == "s") {
stop("Argument \"formula\" is wrong specified, see details of
model specification in 'Details' of the frfast help." )
}
data <- mf
if (na.action == "na.omit"){ # ver la f, corregido
data <- na.omit(data)
}else{
stop("The actual version of the package only supports 'na.omit' (observations are removed
if they contain any missing values)")
}
if (length(aux) == 1) {f <- NULL}else{f <- data[ ,namef]}
n <- nrow(data)
}else{
ffr <- interpret.gam(formula)
cl <- match.call()
mf <- match.call(expand.dots = FALSE)
mf$formula <- ffr$fake.formula
m <- match(x = c("formula", "data", "subset", "weights",
"na.action", "offset"), table = names(mf), nomatch = 0L)
mf <- mf[c(1L, m)]
mf$drop.unused.levels <- TRUE
mf[[1L]] <- quote(stats::model.frame)
mf <- eval(expr = mf, envir = parent.frame())
mt <- attr(mf, "terms")
y <- model.response(mf, "numeric")
w <- as.vector(model.weights(mf))
if (!is.null(w) && !is.numeric(w))
stop("'weights' must be a numeric vector")
terms <- attr(mt, "term.labels")
response <- as.character(attr(mt, "variables")[2])
varnames <- terms[1]
if (":" %in% unlist(strsplit(ffr$fake.names,split = ""))) {
stop("Argument \"formula\" is wrong specified, see details of
model specification in 'Details' of the frfast help." )
}
if (length(ffr$smooth.spec) == 0) {
warning("Argument \"formula\" could be wrong specified without an 's', see details of
model specification in 'Details' of the frfast help." )
}
datam <- mf
if (na.action == "na.omit"){
datam <- na.omit(datam)
}else{
stop("The actual version of the package only supports 'na.omit'
(observations are removed if they contain any missing values)")
}
if (length(terms) == 1) {
f <- NULL
namef <- 1
}else{
namef <- terms[2]
f <- mf[ ,namef]
}
n <- nrow(datam)
}
if(missing(data)) {
response <- strsplit(response, "\\$")[[1]][2]
terms2 <- strsplit(terms, "\\$")
if(length(terms) == 1){
formula <- as.formula(paste0(response, "~s(",terms2[[1]][2],")"))
names(datam) <- c(response, terms2[[1]][2])
varnames <- terms2[[1]][2]
namef <- "F"
}else{
formula <- as.formula(paste0(response, "~s(",terms2[[1]][2],", by = ", terms2[[2]][2],")"))
#data2 <- data
names(datam) <- c(response, terms2[[1]][2], terms2[[2]][2])
varnames <- terms2[[1]][2]
namef <- terms2[[2]][2]
}
data <- datam
}
#
#
# if (smooth != "splines") {
#
# ffr <- interpret.frfastformula(formula, method = "frfast")
# varnames <- ffr$II[2, ]
# aux <- unlist(strsplit(varnames,split = ":"))
# varnames <- aux[1]
# if (unlist(strsplit(varnames,split = ""))[1] == "s") {
# stop("Argument \"formula\" is wrong specified, see details of
# model specification in 'Details' of the frfast help." )
# }
# namef <- aux[2]
# if (length(aux) == 1) {f <- NULL}else{f <- data[ ,namef]}
# newdata <- data
# data <- data[ ,c(ffr$response, varnames)]
#
#
# if (na.action == "na.omit"){ # ver la f
# data <- na.omit(data)
# }else{
# stop("The actual version of the package only supports 'na.omit' (observations are removed
# if they contain any missing values)")
# }
# #newdata <- na.omit(newdata[ ,varnames])
# n <- nrow(data)
#
# }else{
# ffr <- interpret.gam(formula)
# varnames <- ffr$pred.names[1]
# if (":" %in% unlist(strsplit(ffr$fake.names,split = ""))) {
# stop("Argument \"formula\" is wrong specified, see details of
# model specification in 'Details' of the frfast help." )
# }
# if (length(ffr$smooth.spec) == 0) {
# warning("Argument \"formula\" could be wrong specified without an 's', see details of
# model specification in 'Details' of the frfast help." )
# }
#
# namef <- ffr$pred.names[2]
# if (length(ffr$pred.names) == 1) {f <- NULL}else{f <- data[ ,namef]}
# newdata <- data
#
# if (length(ffr$pred.names) == 1) {
# data <- data[ ,c(ffr$response, varnames)]
# }else{
# data <- data[ ,c(ffr$response, varnames, namef)]
# }
#
# if (na.action == "na.omit"){
# data <- na.omit(data)
# }else{
# stop("The actual version of the package only supports 'na.omit' (observations are removed
# if they contain any missing values)")
# }
#
# n <- nrow(data)
# }
#
#
#
if (is.null(f))
f <- rep(1, n)
etiquetas <- unique(f)
nf <- length(etiquetas)
if (nf == 1) {
stop("Function not supported.
There is not factor in the model.")
}
if (is.null(h0)) {
h0 <- -1
}
if (is.null(h)) {
h <- rep(-1, nf)
} else {
if (length(h) == 1)
h <- rep(h, nf)
}
if (is.null(weights)) {
weights <- rep(1, n)
} else {
if (sum(weights) <= 0 || any(weights) < 0 || length(weights) != n)
stop("The specified weights are not correct")
}
if (smooth != "splines") {
umatrix <- matrix(runif(n*nboot), ncol = nboot, nrow = n)
globaltest <- .Fortran("globaltest_",
f = as.integer(f),
x = as.double(data[ ,varnames]),
y = as.double(data[ ,response]),
w = as.double(weights),
n = as.integer(n),
h0 = as.double(h0),
h = as.double(h),
nh = as.integer(nh),
p = as.integer(p),
kbin = as.integer(kbin),
#fact = as.integer(c(1:nf)),
fact = unique(as.integer(f)),
nf = as.integer(nf),
kernel = as.integer(kernel),
nboot = as.integer(nboot),
r = as.integer(der),
T = as.double(rep(-1, 4)),
pvalor = as.double(rep(-1, 4)),
# umatrix = as.double(umatrix)
#seed = as.integer(seed),
umatrix = array(umatrix, c(n, nboot)),
PACKAGE = "npregfast"
)
decision <- character(4)
for (j in 1:4){
if (globaltest$pvalor[j] < 0.05) {
decision[j] <- "Rejected"
} else {
decision[j] <- "Accepted"
}
}
res <- data.frame(cbind(Statistic = globaltest$T[1], pvalue = globaltest$pvalor[1]),
Decision = I(decision)[1])
# res=cbind(Statistic=round(globaltest$T,digits=4),pvalue=round(globaltest$pvalor,digits=4),Decision=I(decision))
# res=as.numeric(res) res=as.data.frame(res) class(res) <- 'globaltest'
#
}else{
mainfun_globaltest <- function(formula, data, weights, ...){
# grid
xgrid <- seq(min(data[ ,varnames]), max(data[ ,varnames]), length.out = kbin)
newd <- expand.grid(xgrid, unique(f))
names(newd) <- c(varnames, namef)
# estimations
data0 <- data
data0[,namef] <- 1
p <- array(NA, dim = c(kbin, 3, nf))
m <- gam(formula, weights = weights, data = data.frame(data0, weights), ...)
muhat <- as.vector(predict(m, type = "response"))
e <- data0[ ,response] - muhat
data2 <- data
data2[ ,response] <- e
m <- gam(formula, weights = weights, data = data.frame(data2, weights), ...)
pred1 <- as.vector(predict(m, newdata = newd, type = "response"))
p[, 1, 1:nf] <- pred1
t <- sum(abs(pred1))
if (der == 1) {
d1 <- apply(p, 3, function(z){D1D2(x = xgrid, y = z[, 1], deriv = 1)$D1})
t <- sum(abs(as.vector(d1)))
}
if (der == 2) {
d2 <- apply(p, 3, function(z){D1D2(x = xgrid, y = z[, 1], deriv = 2)$D2})
t <- sum(abs(as.vector(d2)))
}
return(t)
}
d <- mainfun_globaltest(formula, data = data, weights = weights, ...)
data0 <- data
data0[,namef] <- 1
m <- gam(formula, weights = weights, data = data.frame(data0, weights), ...)
muhatg <- as.vector(predict(m, type = "response"))
errg <- data0[ ,response] - muhatg
#estimo polinomios
#
if (der == 0) {pred1 <- rep(0, n)}
if (der == 1) {
data2 <- data
data2[ ,response] <- errg
formula0 <- paste(response, "~", namef)
m <- lm(formula0, weights = weights, data = data.frame(data2, weights), ...)
pred1 <- as.vector(predict(m, type = "response"))
}
if (der == 2) {
data2 <- data
data2[ ,response] <- errg
formula0 <- paste(response,"~", varnames, "*",namef)
m <- lm(formula0, weights = weights, data = data.frame(data2, weights), ...)
pred1 <- as.vector(predict(m, type = "response"))
}
muhatg2 <- muhatg + pred1
errg2 <- data0[ ,response] - muhatg2
yboot <- replicate(nboot, muhatg2 + errg2 *
sample(c(-sqrt(5) + 1, sqrt(5) + 1)/2, size = n,
replace = TRUE,
prob = c(sqrt(5) + 1, sqrt(5) - 1)/(2 * sqrt(5))))
i <- NULL
dboot <- foreach(i = 1:nboot) %dopar% {
datab <- data
datab[ ,response] <- yboot[, i]
aux <- mainfun_globaltest(formula, data = data.frame(datab, weights),
weights = weights, ...)
return(aux)
}
pvalue <- sum(dboot > d)/nboot
if (pvalue < 0.05) {
decision <- "Rejected"
} else {
decision <- "Acepted"
}
res <- data.frame(cbind(Statistic = d, pvalue = pvalue),
Decision = I(decision))
}
return(res)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.