cv.cmls: Cross-Validation for cmls

View source: R/cv.cmls.R

cv.cmlsR Documentation

Cross-Validation for cmls

Description

Does k-fold or generalized cross-validation to tune the constraint options for cmls. Tunes the model with respect to any combination of the arguments const, df, degree, and/or intercept.

Usage

cv.cmls(X, Y, nfolds = 2, foldid = NULL, parameters = NULL,
        const = "uncons", df = 10, degree = 3, intercept = TRUE,
        mse = TRUE, parallel = FALSE, cl = NULL, verbose = TRUE, ...)

Arguments

X

Matrix of dimension n x p.

Y

Matrix of dimension n x m.

nfolds

Number of folds for k-fold cross-validation. Ignored if foldid argument is provided. Set nfolds=1 for generalized cross-validation (GCV).

foldid

Factor or integer vector of length n giving the fold identification for each observation.

parameters

Parameters for tuning. Data frame with columns const, df, degree, and intercept. See Details.

const

Parameters for tuning. Character vector specifying constraints for tuning. See Details.

df

Parameters for tuning. Integer vector specifying degrees of freedom for tuning. See Details.

degree

Parameters for tuning. Integer vector specifying polynomial degrees for tuning. See Details.

intercept

Parameters for tuning. Logical vector specifying intercepts for tuning. See Details.

mse

If TRUE (default), the mean squared error is used as the CV loss function. Otherwise the mean absolute error is used.

parallel

Logical indicating if parSapply should be used. See Examples.

cl

Cluster created by makeCluster. Only used when parallel = TRUE. Recommended usage: cl = makeCluster(detectCores())

verbose

If TRUE, tuning progress is printed via txtProgressBar. Ignored if parallel = TRUE.

...

Additional arguments to the cmls function, e.g., z, struc, backfit, etc.

Details

The parameters for tuning can be supplied via one of two options:

(A) Using the parameters argument. In this case, the argument parameters must be a data frame with columns const, df, degree, and intercept, where each row gives a combination of parameters for the CV tuning.

(B) Using the const, df, degree, and intercept arguments. In this case, the expand.grid function is used to create the parameters data frame, which contains all combinations of the arguments const, df, degree, and intercept. Duplicates are removed before the CV tuning.

Value

best.parameters

Best combination of parameters, i.e., the combination that minimizes the cvloss.

top5.parameters

Top five combinations of parameters, i.e., the combinations that give the five smallest values of the cvloss.

full.parameters

Full set of parameters. Data frame with cvloss (GCV, MSE, or MAE) for each combination of parameters.

Author(s)

Nathaniel E. Helwig <helwig@umn.edu>

References

Helwig, N. E. (in prep). Constrained multivariate least squares in R.

See Also

See the cmls and const functions for further details on the available constraint options.

Examples

# make X
set.seed(1)
n <- 50
m <- 20
p <- 2
Xmat <- matrix(rnorm(n*p), nrow = n, ncol = p)


# make B (which satisfies all constraints except monotonicity)
x <- seq(0, 1, length.out = m)
Bmat <- rbind(sin(2*pi*x), sin(2*pi*x+pi)) / sqrt(4.75)
struc <- rbind(rep(c(TRUE, FALSE), each = m / 2),
               rep(c(FALSE, TRUE), each = m / 2))
Bmat <- Bmat * struc


# make noisy data
Ymat <- Xmat %*% Bmat + rnorm(n*m, sd = 0.5)


# 5-fold CV:  tune df (5,...,15) for const = "smooth"
kcv <- cv.cmls(X = Xmat, Y = Ymat, nfolds = 5,
               const = "smooth", df = 5:15)
kcv$best.parameters
kcv$top5.parameters
plot(kcv$full.parameters$df, kcv$full.parameters$cvloss, t = "b")


## Not run: 

# sample foldid for 5-fold CV
set.seed(2)
foldid <- sample(rep(1:5, length.out = n))


# 5-fold CV:  tune df (5,...,15) w/ all 20 relevant constraints (no struc)
#             using sequential computation (default)
myconst <- as.character(const(print = FALSE)$label[-c(13:16)])
system.time({
  kcv <- cv.cmls(X = Xmat, Y = Ymat, foldid = foldid,
                 const = myconst, df = 5:15)
})
kcv$best.parameters
kcv$top5.parameters


# 5-fold CV:  tune df (5,...,15) w/ all 20 relevant constraints (no struc)
#             using parallel package for parallel computations
myconst <- as.character(const(print = FALSE)$label[-c(13:16)])
system.time({
   cl <- makeCluster(2L)  # using 2 cores
   kcv <- cv.cmls(X = Xmat, Y = Ymat, foldid = foldid,
                  const = myconst, df = 5:15,
                  parallel = TRUE, cl = cl)
   stopCluster(cl)                  
})
kcv$best.parameters
kcv$top5.parameters


# 5-fold CV:  tune df (5,...,15) w/ all 20 relevant constraints (w/ struc)
#             using sequential computation (default)
myconst <- as.character(const(print = FALSE)$label[-c(13:16)])
system.time({
  kcv <- cv.cmls(X = Xmat, Y = Ymat, foldid = foldid,
                 const = myconst, df = 5:15, struc = struc)
})
kcv$best.parameters
kcv$top5.parameters


# 5-fold CV:  tune df (5,...,15) w/ all 20 relevant constraints (w/ struc)
#             using parallel package for parallel computations
myconst <- as.character(const(print = FALSE)$label[-c(13:16)])
system.time({
  cl <- makeCluster(2L)  # using 2 cores
  kcv <- cv.cmls(X = Xmat, Y = Ymat, foldid = foldid,
                 const = myconst, df = 5:15, struc = struc,
                 parallel = TRUE, cl = cl)
  stopCluster(cl)
})
kcv$best.parameters
kcv$top5.parameters


## End(Not run) 


CMLS documentation built on April 3, 2023, 5:24 p.m.