cv.pengls: Peform cross-validation pengls
In sthawinke/pengls: Fit Penalised Generalised Least Squares models

View source: R/cv.pengls.R

cv.pengls

R Documentation

Peform cross-validation pengls

Description

Peform cross-validation pengls

Usage

cv.pengls(
  data,
  glsSt,
  xNames,
  outVar,
  corMat,
  nfolds,
  foldid,
  scale = FALSE,
  center = FALSE,
  cvType = "blocked",
  lambdas,
  transFun = "identity",
  exclude = NULL,
  transFunArgs = list(),
  loss = c("R2", "MSE"),
  verbose = FALSE,
  ...
)

Arguments

`data`	A data matrix or data frame
`glsSt`	a covariance structure, as supplied to nlme::gls as "correlation"
`xNames`	names of the regressors in data
`outVar`	name of the outcome variable in data
`corMat`	a starting value for the correlation matrix. Taken to be a diagonal matrix if missing
`nfolds`	an integer, the number of folds used in cv.glmnet to find lambda
`foldid`	An optional vector defining the fold
`scale, center`	booleans, should regressors be scaled to zero mean and variance 1? Defaults to TRUE
`cvType`	A character vector defining the type of cross-validation. Either "random" or "blocked", ignored if foldid is provided
`lambdas`	an optional lambda sequence
`transFun`	a transformation function to apply to predictions and outcome in the cross-validation
`exclude`	indices of predictors to be excluded from intercept + xNames
`transFunArgs`	Additional arguments passed onto transFun
`loss`	a character vector, currently either 'R2' or 'MSE' indicating the loss function (although R2 is not a proper loss...)
`verbose`	a boolean, should output be printed?
`...`	passed onto glmnet::glmnet

Value

A list with components

`lambda`	The series of lambdas
`cvm`	The vector of mean R2's
`cvsd`	The standard error of R2 at the maximum
`cvOpt`	The R2 according to the 1 standard error rule
`coefs`	The matrix of coefficients for every lambda value
`bestFit`	The best fitting pengls model according to the 1 standard error rule
`lambda.min`	Lambda value with maximal R2
`lambda.1se`	Smallest lambda value within 1 standard error from the maximum
`foldid`	The folds
`glsSt`	The nlme correlation object
`loss`	The loss function used

Examples

library(nlme)
library(BiocParallel)
n <- 20 #Sample size
p <- 50 #Number of features
g <- 10 #Size of the grid
#Generate grid
Grid <- expand.grid("x" = seq_len(g), "y" = seq_len(g))
# Sample points from grid without replacement
GridSample <- Grid[sample(nrow(Grid), n, replace = FALSE),]
#Generate outcome and regressors
b <- matrix(rnorm(p*n), n , p)
a <- rnorm(n, mean = b %*% rbinom(p, size = 1, p = 0.2)) #20% signal
#Compile to a matrix
df <- data.frame("a" = a, "b" = b, GridSample)
# Define the correlation structure (see ?nlme::gls), with initial nugget 0.5 and range 5
corStruct = corGaus(form = ~ x + y, nugget = TRUE,
value = c("range" = 5, "nugget" = 0.5))
#Fit the pengls model, for simplicity for a simple lambda
register(MulticoreParam(3)) #Prepare multithereading
penglsFitCV = cv.pengls(data = df, outVar = "a", xNames = grep(names(df),
pattern = "b", value = TRUE),
glsSt = corStruct, nfolds = 5)
penglsFitCV$lambda.1se #Lambda for 1 standard error rule
penglsFitCV$cvOpt #Corresponding R2
coef(penglsFitCV)
penglsFitCV$foldid #The folds used
#With MSE as loss function
penglsFitCVmse = cv.pengls(data = df, outVar = "a",
xNames = grep(names(df), pattern = "b", value =TRUE),
glsSt = corStruct, nfolds = 5, loss = "MSE")
penglsFitCVmse$lambda.1se #Lambda for 1 standard error rule
penglsFitCVmse$cvOpt #Corresponding MSE
coef(penglsFitCVmse)
predict(penglsFitCVmse)

sthawinke/pengls documentation built on July 2, 2023, 7:27 a.m.