R/cv-select.R
In cv: Cross-Validating Regression Models

Documented in coef.cvSelect compareFolds cv.function cvInfo.cvSelect selectModelList selectStepAIC selectTrans selectTransStepAIC

#' Cross-Validate a Model-Selection Procedure
#'
#' The \code{cv()} \code{"function"} method
#' is a general function to cross-validate a model-selection procedure,
#' such as the following:
#' \code{selectStepAIC()} is a procedure that applies the \code{\link[MASS]{stepAIC}()}
#' model-selection function in the \pkg{MASS} package; \code{selectTrans()} is a procedure
#' for selecting predictor and response transformations in regression, which
#' uses the \code{\link[car]{powerTransform}()} function in the
#' \pkg{car} package; \code{selectTransAndStepAIC()} combines predictor and response
#' transformations with predictor selection; and \code{selectModelList()}
#' uses cross-validation to select a model from a list of models created by
#' \code{\link{models}()} and employs (meta) cross-validation to assess the predictive
#' accuracy of this procedure.
#'
#' @param data full data frame for model selection.
#' @param y.expression normally the response variable is found from the
#' \code{model} or \code{working.model} argument; but if, for a particular selection procedure, the
#' \code{model} or \code{working.model} argument is absent, or if the response can't be inferred from the
#' model, the response can be specified by an expression, such as \code{expression(log(income))},
#' to be evaluated within the data set provided by the \code{data} argument.
#' @param k perform k-fold cross-validation (default is 10); \code{k}
#' may be a number or \code{"loo"} or \code{"n"} for n-fold (leave-one-out)
#' cross-validation.
#' @param confint if \code{TRUE} (the default if the number of cases is 400
#' or greater), compute a confidence interval for the bias-corrected CV
#' criterion, if the criterion is the average of casewise components.
#' @param level confidence level (default \code{0.95}).
#' @param details if \code{TRUE}, save detailed information about the value of the
#' CV criterion for the cases in each fold and the regression coefficients
#' (and possibly other information)
#' with that fold deleted; default is \code{TRUE} if \code{k} is 10 or smaller,
#' \code{FALSE} otherwise.
#' @param save.model save the model that's selected using the \emph{full} data set
#' (default, \code{FALSE}).
#' @param reps number of times to replicate k-fold CV (default is \code{1})
#' @param seed for R's random number generator; not used for n-fold cross-validation.
#' If not explicitly set, a seed is randomly generated and saved to make the results
#' reproducible. In some cases, for internal use only, \code{seed} is set to
#' \code{FALSE} to suppress automatically setting the seed.
#' @param ncores number of cores to use for parallel computations
#'        (default is \code{1}, i.e., computations aren't done in parallel)
#' @param ... for \code{cvSelect()} and the \code{cv()} \code{"function"} method,
#' arguments to be passed to \code{procedure()};
#' for \code{selectStepAIC()} and \code{selectTransStepAIC()},
#' arguments to be passed to \code{stepAIC()}.
#' @importFrom MASS stepAIC
#' @returns An object of class \code{"cvSelect"},
#' inheriting from class \code{"cv"}, with the CV criterion
#' (\code{"CV crit"}), the bias-adjusted CV criterion (\code{"adj CV crit"}),
#' the criterion for the model applied to the full data (\code{"full crit"}),
#' the confidence interval and level for the bias-adjusted CV criterion (\code{"confint"}),
#' the number of folds (\code{"k"}), the seed for R's random-number
#' generator (\code{"seed"}), and (optionally) a list of coefficients
#' (or, in the case of \code{selectTrans()}, estimated transformation
#' parameters, and in the case of \code{selectTransAndStepAIC()}, both regression coefficients
#' and transformation parameters) for the selected models
#' for each fold (\code{"coefficients"}).
#' If \code{reps} > \code{1}, then an object of class \code{c("cvSelectList", "cvList")} is returned,
#' which is literally a list of \code{c("cvSelect", "cv")} objects.
#' @details
#' The model-selection function supplied as the \code{procedure} (for \code{cvSelect()})
#' or \code{model} (for \code{cv()}) argument
#' should accept the following arguments:
#' \describe{
#'  \item{\code{data}}{set to the \code{data} argument to \code{cvSelect()} or \code{cv()}.}
#'  \item{\code{indices}}{the indices of the rows of \code{data} defining the current fold; if missing,
#'  the model-selection procedure is applied to the full \code{data}.}
#'   \item{other arguments}{to be passed via \code{...}
#'   from \code{cvSelect()} or \code{cv()}.}
#' }
#' \code{procedure()} or \code{model()} should return a list with the following
#' named elements: \code{fit.i}, the vector of predicted values for the cases in
#' the current fold computed from the model omitting these cases;
#' \code{crit.all.i}, the CV criterion computed for all of the cases using
#' the model omitting the current fold; and (optionally) \code{coefficients},
#' parameter estimates from the model computed omitting the current fold.
#'
#' When the \code{indices} argument is missing, \code{procedure()} returns the cross-validation criterion for all of the cases based on
#' the model fit to all of the cases.
#'
#' For examples of model-selection functions for the \code{procedure}
#' argument, see the code for \code{selectStepAIC()},
#' \code{selectTrans()}, and \code{selectTransAndStepAIC()}.
#'
#' For additional information, see the "Cross-validating model selection"
#' vignette (\code{vignette("cv-select", package="cv")})
#' and the "Extending the cv package" vignette
#' (\code{vignette("cv-extend", package="cv")}).
#'
#' @seealso \code{\link[MASS]{stepAIC}}, \code{\link[car]{bcPower}},
#' \code{\link[car]{powerTransform}}, \code{\link{cv}}.
#'
#' @param indices indices of cases in data defining the current fold.
#' @param model a regression model object fit to data, or for the
#' \code{cv()} \code{"function"} method, a model-selection procedure function
#' (see Details).
#' @param working.model a regression model object fit to data, typically
#' to begin a model-selection process; for use with \code{selectModelList()},
#' a list of competing models created by \code{\link{models}()}.
#' @param criterion a CV criterion ("cost" or lack-of-fit) function.
#' @param AIC if \code{TRUE} (the default) use the AIC as the
#' model-selection criterion; if \code{FALSE}, use the BIC.
#' The \code{k} argument to \code{\link[MASS]{stepAIC}()}
#' is set accordingly (note that this is distinct from the number of
#' folds \code{k}).
#' @examples
#' if (requireNamespace("ISLR2", quietly=TRUE)){
#' withAutoprint({
#' data("Auto", package="ISLR2")
#' m.auto <- lm(mpg ~ . - name - origin, data=Auto)
#' cv(selectStepAIC, Auto, seed=123, working.model=m.auto)
#' cv(selectStepAIC, Auto, seed=123, working.model=m.auto,
#'          AIC=FALSE, k=5, reps=3) # via BIC
#' })
#' } else {
#' cat("\n install the 'ISLR2' package to run these examples\n")
#' }

#' @describeIn cv.function \code{cv()} method for applying a model
#' model-selection (or specification) procedure.
#' @export
cv.function <- function(model,
                        data,
                        criterion = mse,
                        k = 10L,
                        reps = 1L,
                        seed = NULL,
                        working.model = NULL,
                        y.expression = NULL,
                        confint = n >= 400L,
                        level = 0.95,
                        details = k <= 10L,
                        save.model = FALSE,
                        ncores = 1L,
                        ...) {
  n <- nrow(data)
  cvSelect(
    procedure = model,
    data = data,
    criterion = criterion,
    criterion.name = deparse(substitute(criterion)),
    model = working.model,
    y.expression = y.expression,
    k = k,
    confint = confint,
    level = level,
    reps = reps,
    details = details,
    save.model = save.model,
    seed = seed,
    ncores = ncores,
    ...
  )
}

#' @describeIn cv.function select a regression model using the
#' \code{\link[MASS]{stepAIC}()} function in the \pkg{MASS} package.
#' @export
selectStepAIC <- function(data,
                          indices,
                          model,
                          criterion = mse,
                          AIC = TRUE,
                          details = TRUE,
                          save.model = FALSE,
                          ...) {
  y <- GetResponse(model)
  if (missing(indices)) {
    k. = if (AIC)
      2
    else
      log(nrow(data))
    model <- MASS::stepAIC(model, trace = FALSE, k = k., ...)
    yhat <- predict(model, newdata = data, type = "response")
    return(list(
      criterion = criterion(y, yhat),
      model = if (save.model)
        model
      else
        NULL
    ))
  }
  k. <- if (AIC)
    2
  else
    log(nrow(data) - length(indices))
  model <- update(model, data = data[-indices,])
  model.i <- MASS::stepAIC(model, trace = FALSE, k = k., ...)
  fit.all.i <- predict(model.i, newdata = data, type = "response")
  fit.i <- fit.all.i[indices]
  list(
    fit.i = fit.i,
    crit.all.i = criterion(y, fit.all.i),
    coefficients = if (details)
      coef(model.i)
    else
      NULL
  )
}


transX <- function(model,
                   predictors,
                   family = "bcPower",
                   rounded = TRUE,
                   data = insight::get_data(model)) {
  # Find transformations of predictors towards normality
  # Returns: a list of the selected transformations, with $lamdas and
  #          $gammas components (the latter may be NULL)
  # Args:
  #   model: e.g., an "lm" model object
  #   predictors: names of predictors to transform
  #   rounded: use "rounded" transformations, see ?car::powerTransform
  #   family: transformation family recognized by car::powerTransform()
  #   data: data to which the model was fit
  trans <- car::powerTransform(data[, predictors], family = family)
  lambdas <- if (rounded)
    trans$roundlam
  else
    trans$lambda
  names(lambdas) <- paste0("lam.", predictors)
  gammas <- trans$gamma
  if (!is.null(gammas)) {
    names(gammas) <- paste0("gam.", predictors)
  }
  list(lambdas = lambdas, gammas = gammas) # gammas may be NULL
}

transy <-  function(model,
                    family = "bcPower",
                    rounded = TRUE) {
  trans <- car::powerTransform(model, family = family)
  lambda <- if (rounded)
    trans$roundlam
  else
    trans$lambda
  gamma <- trans$gamma
  c(lambda = as.vector(lambda), gamma = as.vector(gamma))
}

bcPowerInverse <- function (y, lambda) {
  if (abs(lambda) < sqrt(.Machine$double.eps)) {
    exp(y)
  } else {
    (y * lambda + 1) ^ (1 / lambda)
  }
}

basicPowerInverse <- function (y, lambda) {
  if (abs(lambda) < sqrt(.Machine$double.eps)) {
    exp(y)
  } else {
    y ^ (1 / lambda)
  }
}

yjPowerInverse <- function(y, lambda) {
  neg <- y < 0
  y[!neg] <- if (abs(lambda) < sqrt(.Machine$double.eps)) {
    exp(y[!neg]) - 1
  } else {
    (y[!neg] * lambda + 1) ^ (1 / lambda) - 1
  }
  y[neg] <- if (abs(lambda - 2) < sqrt(.Machine$double.eps)) {
    -expm1(-y[neg])
  } else {
    1 - (-(2 - lambda) * y[neg] + 1) ^ (1 / (2 - lambda))
  }
  y
}

#' @param predictors character vector of names of the predictors in the model
#' to transform; if missing, no predictors will be transformed.
#' @param response name of the response variable; if missing, the response
#' won't be transformed.
#' @param family transformation family for the predictors, one of
#' \code{"bcPower", "bcnPower", "yjPower", "basicPower"},
#' with \code{"bcPower"} as the default. These are the names of transformation
#' functions in the \pkg{car} package; see \code{\link[car]{bcPower}()}.
#' @param family.y transformation family for the response,
#' with \code{"bcPower"} as the default.
#' @param rounded if \code{TRUE} (the default) use nicely rounded versions
#' of the estimated transformation parameters (see \code{\link[car]{bcPower}()}).
#' @examples
#' if (requireNamespace("carData", quietly=TRUE)){
#' withAutoprint({
#' data("Prestige", package="carData")
#' m.pres <- lm(prestige ~ income + education + women,
#'              data=Prestige)
#' cvt <- cv(selectTrans, data=Prestige, working.model=m.pres, seed=123,
#'           predictors=c("income", "education", "women"),
#'           response="prestige", family="yjPower")
#' cvt
#' compareFolds(cvt)
#' coef(cvt, average=median, NAs=1) # NAs not really needed here
#' cv(m.pres, seed=123)
#' })
#' } else {
#' cat("install the 'carData' package to run these examples\n")
#' }
#' @describeIn cv.function select transformations of the predictors and response
#' using \code{\link[car]{powerTransform}()} in the \pkg{car} package.
#' @export
selectTrans <- function(data,
                        indices,
                        details = TRUE,
                        save.model = FALSE,
                        model,
                        criterion = mse,
                        predictors,
                        response,
                        family = c("bcPower", "bcnPower", "yjPower", "basicPower"),
                        family.y = c("bcPower", "bcnPower", "yjPower", "basicPower"),
                        rounded = TRUE,
                        ...) {
  if (missing(predictors) && missing(response))
    stop(
      "'predictors' and 'response' arguments both missing;",
      "\n no transformations specified"
    )
  y <- GetResponse(model)
  family <- match.arg(family)
  family.y <- match.arg(family.y)
  powertrans <- switch(
    family,
    bcPower = car::bcPower,
    bcnPower = car::bcnPower,
    yjPower = car::yjPower,
    basicPower = car::basicPower
  )

  powertrans.y <- switch(
    family.y,
    bcPower = car::bcPower,
    bcnPower = car::bcnPower,
    yjPower = car::yjPower,
    basicPower = car::basicPower
  )

  inverse.pt.y <- switch(
    family.y,
    bcPower = bcPowerInverse,
    bcnPower = car::bcnPowerInverse,
    yjPower = yjPowerInverse,
    basicPower = basicPowerInverse
  )

  if (missing(indices)) {
    indices <- nrow(data) + 1L # will use full sample
    full.sample <- TRUE
  } else {
    # refit model omitting current fold
    model <- update(model, data = data[-indices,])
    full.sample <- FALSE
  }

  # find predictor transformations:
  if (!missing(predictors)) {
    trans <- transX(model, predictors, family, rounded)
    lambdas <- trans$lambdas
    gammas <- trans$gammas # nb: there may be no gammas
    # transform predictors:
    for (pred in predictors) {
      data[, pred] <- if (is.null(gammas)) {
        powertrans(data[, pred], lambdas[paste0("lam.", pred)])
      } else {
        powertrans(data[, pred],
                   lambda = lambdas[paste0("lam.", pred)],
                   gamma = gammas[paste0("gam.", pred)])
      }
    }
    model <- update(model, data = data[-indices,])
  } else {
    lambdas <- gammas <- NULL
  }

  # transform response:
  if (!missing(response)) {
    transy <- transy(model, family.y)
    data[, response] <- if (is.na(transy["gamma"])) {
      powertrans.y(data[, response], lambda = transy["lambda"])
    } else {
      powertrans.y(data[, response],
                   lambda = transy["lambda"],
                   gamma = transy["gamma"])
    }
    # refit model with transformed predictors and response,
    #   omitting current fold
    model <- update(model, data = data[-indices,])
  } else {
    transy <- NULL
  }

  # get predicted values for *all* cases:

  fit.o.i <- if (!missing(response)) {
    if (is.na(transy["gamma"])) {
      inverse.pt.y(predict(model, newdata = data),
                   lambda = transy["lambda"])
    } else {
      inverse.pt.y(predict(model, newdata = data),
                   lambda = transy["lambda"],
                   gamma = transy["gamma"])
    }
  } else {
    predict(model, newdata = data)
  }

  if (full.sample) {
    return(list(
      criterion = criterion(y, fit.o.i),
      model = if (save.model){
        if (details){
          model$additional.coefficients <- c(lambdas, gammas, transy)
        }
        model
      }
      else
        NULL
    ))
  }
  # ... and for current fold only:
  # compute and return CV info and transformation parameters:
  list(
    fit.i = fit.o.i[indices],
    crit.all.i = criterion(y, fit.o.i),
    coefficients = if (details)
      c(lambdas, gammas, transy)
  )
}


#' @describeIn cv.function select transformations of the predictors and response,
#' and then select predictors.
#' @examples
#' if (requireNamespace("ISLR2", quietly=TRUE)){
#' withAutoprint({
#' Auto$year <- as.factor(Auto$year)
#' Auto$origin <- factor(Auto$origin,
#'                       labels=c("America", "Europe", "Japan"))
#' rownames(Auto) <- make.names(Auto$name, unique=TRUE)
#' Auto$name <- NULL
#' m.auto <- lm(mpg ~ . , data=Auto)
#' cvs <- cv(selectTransStepAIC, data=Auto, seed=76692, working.model=m.auto,
#'           criterion=medAbsErr,
#'           predictors=c("cylinders", "displacement", "horsepower",
#'                        "weight", "acceleration"),
#'           response="mpg", AIC=FALSE)
#' cvs
#' compareFolds(cvs)
#' })
#' }
#' @export
selectTransStepAIC <- function(data,
                               indices,
                               details = TRUE,
                               save.model = FALSE,
                               model,
                               criterion = mse,
                               predictors,
                               response,
                               family = c("bcPower", "bcnPower", "yjPower", "basicPower"),
                               family.y = c("bcPower", "bcnPower", "yjPower", "basicPower"),
                               rounded = TRUE,
                               AIC = TRUE,
                               ...) {
  family <- match.arg(family)
  family.y <- match.arg(family.y)

  powertrans <- switch(
    family,
    bcPower = car::bcPower,
    bcnPower = car::bcnPower,
    yjPower = car::yjPower,
    basicPower = car::basicPower
  )

  powertrans.y <- switch(
    family.y,
    bcPower = car::bcPower,
    bcnPower = car::bcnPower,
    yjPower = car::yjPower,
    basicPower = car::basicPower
  )

  inverse.pt.y <- switch(
    family.y,
    bcPower = bcPowerInverse,
    bcnPower = car::bcnPowerInverse,
    yjPower = yjPowerInverse,
    basicPower = basicPowerInverse
  )

  y <- GetResponse(model) # untransformed response

  # find tranformations of predictors and/or response:

  # if indices is missing, use full data set;
  # otherwise remove cases in current fold (indices)
  inds <- if (missing(indices))
    length(y) + 1L
  else
    indices

  trans <- if (!missing(predictors) && !missing(response)) {
    selectTrans(
      data = data,
      indices = inds,
      save.coef = TRUE,
      model = model,
      criterion = criterion,
      predictors = predictors,
      response = response,
      family = family,
      family.y = family.y,
      rounded = rounded,
      ...
    )

  } else if (!missing(predictors)) {
    selectTrans(
      data = data,
      indices = inds,
      save.coef = TRUE,
      model = model,
      criterion = criterion,
      predictors = predictors,
      family = family,
      family.y = family.y,
      rounded = rounded,
      ...
    )

  } else if (!missing(response)) {
    selectTrans(
      data = data,
      indices = inds,
      save.coef = TRUE,
      model = model,
      criterion = criterion,
      response = response,
      family = family,
      family.y = family.y,
      rounded = rounded,
      ...
    )

  } else {
    stop(
      "'predictors' and 'response' arguments both missing;",
      "\n no transformations specified"
    )
  }

  # apply transformations to the data:

  powers <- coef(trans)
  if (!missing(predictors)) {
    if (family == "bcnPower") {
      for (predictor in predictors) {
        lambda <- powers[paste0("lam.", predictor)]
        gamma <- powers[paste0("gam.", predictor)]
        data[, predictor] <- car::bcnPower(data[, predictor],
                                           lambda = lambda, gamma = gamma)
      }
    } else {
      for (predictor in predictors) {
        lambda <- powers[paste0("lam.", predictor)]
        data[, predictor] <-
          do.call(powertrans, list(U = data[, predictor],
                                   lambda = lambda))
      }
    }
  }

  if (!missing(response)) {
    if (family.y == "bcnPower") {
      lambda <- powers["lambda"]
      gamma <- powers["gamma"]
      data[, response] <- car::bcnPower(data[, response],
                                        lambda = lambda, gamma = gamma)
    } else {
      lambda <- powers["lambda"]
      data[, response] <-
        do.call(powertrans.y, list(U = data[, response],
                                   lambda = lambda))
    }
  }

  # re-estimate with transformed data:
  model <- update(model, data = data)

  # perform variable selection:

  if (missing(indices)) {
    k. = if (AIC)
      2
    else
      log(nrow(data))
    model.i <- MASS::stepAIC(model, trace = FALSE, k = k., ...)
    fit.all.i <- predict(model.i, newdata = data, type = "response")
  } else {
    k. <- if (AIC)
      2
    else
      log(nrow(data) - length(indices))
    model <- update(model, data = data[-indices,])
    model.i <- MASS::stepAIC(model, trace = FALSE, k = k., ...)
    fit.all.i <- predict(model.i, newdata = data, type = "response")
  }

  # express fitted values on original response scale
  if (!missing(response)) {
    if (is.na(powers["gamma"])) {
      fit.all.i  <- inverse.pt.y(fit.all.i,
                                 lambda = powers["lambda"])
    } else {
      fit.all.i  <- inverse.pt.y(fit.all.i,
                                 lambda = powers["lambda"],
                                 gamma = powers["gamma"])
    }
  }

  # for full sample:
  if (missing(indices)) {
    return(list(
      criterion = criterion(y, fit.all.i),
      model = if (save.model){
        model.i$additional.coefficients <- powers
        model.i
      }
      else
        NULL
    ))
  }

  # ... and for current fold only:
  # compute and return CV info, transformation parameters,
  #   and regression coefficients:
  list(
    fit.i = fit.all.i[indices],
    crit.all.i = criterion(y, fit.all.i),
    coefficients = if (details)
      c(powers, coef(model.i))
    else
      NULL
  )
}

#' @describeIn cv.function select a model using (meta) CV.
#' @param quietly if \code{TRUE} (the default), simple messages (for example about the
#' value to which the random-number generator seed is set), but not warnings or
#' errors, are suppressed.
#' @param k.meta the number of folds for meta CV; defaults
#' to the value of \code{k}; may be specified as \code{"loo"} or
#' \code{"n"} as well as an integer.
#' @export
selectModelList <-
  function(data,
           indices,
           model,
           criterion = mse,
           k = 10L,
           k.meta = k,
           details =  k <= 10L,
           save.model = FALSE,
           seed = FALSE,
           quietly = TRUE,
           ...) {
    if (missing(indices)) {
      if ((!isFALSE(seed)) && is.null(seed)) {
        seed <- sample(1e6, 1L)
        set.seed(seed)
        message("R RNG seed set to ", seed)
      }
      result <-
        cv(
          model,
          data,
          criterion = criterion,
          k = k.meta,
          details = FALSE,
          quietly = quietly,
          seed = FALSE,
          ...
        )
      cv.min <- which.min(sapply(result, function(x)
        x$"CV crit"))
      return(list(criterion = result[[cv.min]][["CV crit"]],
                  model = if (save.model)
                    model[[cv.min]]
                  else
                    NULL))
    }
    y <- GetResponse(model[[1L]])
    for (i in seq_along(model)) {
      mod <- model[[i]]
      model[[i]] <- update(mod, data = data[-indices,])
    }
    result <-
      cv(
        model,
        data[-indices,],
        criterion = criterion,
        k = k.meta,
        details = details,
        quietly = quietly,
        seed = FALSE,
        ...
      )
    cv.min <- which.min(sapply(result, function(x)
      x$"CV crit"))
    model.name <- names(result)[cv.min]
    fit.all.i <-
      predict(model[[cv.min]], newdata = data, type = "response")
    fit.i <- fit.all.i[indices]
    list(
      fit.i = fit.i,
      crit.all.i = criterion(y, fit.all.i),
      coefficients = if (details)
        coef(model[[cv.min]])
      else
        NULL,
      model.name = model.name
    )
  }
#' @examples
#' data("Duncan", package="carData")
#' m1 <- lm(prestige ~ income + education, data=Duncan)
#' m2 <- lm(prestige ~ income + education + type, data=Duncan)
#' m3 <- lm(prestige ~ (income + education)*type, data=Duncan)
#' summary(cv.sel <- cv(selectModelList, data=Duncan, seed=5963,
#'                      working.model=models(m1, m2, m3),
#'                      save.model=TRUE)) # meta CV
#' cvInfo(cv.sel, "selected model")
#'
#' @describeIn cv.function print the coefficients from the selected models
#' for the several folds.
#' @param object an object of class \code{"cvSelect"}.
#' @param digits significant digits for printing coefficients
#' (default \code{3}).
#' @export
compareFolds <- function(object, digits = 3, ...) {
  UseMethod("compareFolds")
}

#' @export
compareFolds.default <- function(object, digits = 3, ...) {
  coefficients <- object$details$coefficients
  if (is.null(coefficients))
    stop("details for folds not available")
  cat("CV criterion by folds:\n")
  print(object$details$criterion)
  names <- unlist(lapply(coefficients, names))
  counts <- table(names)
  counts <- sort(counts, decreasing = TRUE)
  table <- matrix(NA, length(coefficients), length(counts))
  colnames(table) <- names(counts)
  rownames(table) <- paste("Fold", seq(along = coefficients))
  for (i in seq(along = coefficients)) {
    table[i, names(coefficients[[i]])] <- coefficients[[i]]
  }
  cat("\nCoefficients by folds:\n")
  printCoefmat(table, na.print = "", digits = digits)
}

#' @describeIn cv.function extract the coefficients from the selected models
#' for the several folds and possibly average them.
#' @param average if supplied, a function, such as \code{mean} or \code{median},
#' to use us in averaging estimates across folds; if missing, the
#' estimates for each fold are returned.
#' @param NAs values to substitute for \code{NA}s in calculating
#' averaged estimates; the default, \code{0}, is appropriate, e.g.,
#' for regression coefficients; the value \code{1} might be appropriate
#' for power-transformation estimates.
#' @importFrom utils capture.output
#' @export
coef.cvSelect <- function(object, average, NAs = 0, ...) {
  capture.output(coef <- compareFolds(object))
  if (missing(average)) {
    return(coef)
  }
  coef[is.na(coef)] <- NAs
  apply(coef, 2, average)
}

#' @param what the information to extract from a \code{"cvSelect"} object,
#' one of: \code{"CV criterion"}, \code{"adjusted CV criterion"},
#' \code{"full CV criterion"} (the CV criterion applied to the model fit to the
#' full data set), \code{"SE"} (the standard error of the adjusted CV criterion),
#' \code{"confint"} (confidence interval for the adjusted CV criterion),
#' \code{"k"}, (the number of folds), \code{"seed"} (the seed employed for
#' R's random-number generator), \code{"method"} (the computational method
#' employed, e.g., for a \code{"lm"} model object), \code{"criterion name"}
#' (the CV criterion employed), or \code{"selected model"} (the model object
#' for the model that was selected); not all of these elements may be present, in
#' which case \code{cvInfo()} would return \code{NULL}.
#' @rdname cv.function
#' @export
cvInfo.cvSelect <- function(object,
                            what=c("CV criterion",
                                   "adjusted CV criterion",
                                   "full CV criterion",
                                   "confint", "SE", "k", "seed",
                                   "method", "criterion name",
                                   "selected model"),
                            ...){
  what <- match.arg(what)
  if (what == "selected model"){
    object$selected
  } else {
    NextMethod()
  }
}
Any scripts or data that you put into this service are public.
cv documentation built on Sept. 23, 2024, 1:08 a.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
cv
Cross-Validating Regression Models

R/cv-select.R
In cv: Cross-Validating Regression Models

Defines functions cvInfo.cvSelect coef.cvSelect compareFolds.default compareFolds selectModelList selectTransStepAIC selectTrans yjPowerInverse transy transX selectStepAIC cv.function

Documented in coef.cvSelect compareFolds cv.function cvInfo.cvSelect selectModelList selectStepAIC selectTrans selectTransStepAIC

Try the cv package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

cv Cross-Validating Regression Models

R/cv-select.R In cv: Cross-Validating Regression Models

Defines functions cvInfo.cvSelect coef.cvSelect compareFolds.default compareFolds selectModelList selectTransStepAIC selectTrans yjPowerInverse transy transX selectStepAIC cv.function

Documented in coef.cvSelect compareFolds cv.function cvInfo.cvSelect selectModelList selectStepAIC selectTrans selectTransStepAIC

Try the cv package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

cv
Cross-Validating Regression Models

R/cv-select.R
In cv: Cross-Validating Regression Models