ge_imCV: Cross validation comparison of geim models

View source: R/ge_imCV.R

ge_imCVR Documentation

Cross validation comparison of geim models

Description

This function performs cross-validation (CV) with the aim of finding the optimal model from the given formulas. Parameters are explored through the given list of formulas (e.g, 'df=3' or 'df=4' must be specified in the formulas).

Usage

ge_imCV(
  X,
  p,
  formula_list,
  cv.n = 50,
  cv.s = 0.8,
  method = c("gam", "glm", "limma"),
  dim_red = c("pca", "ica"),
  nc = ncol(X),
  to_compute = c("aRE", "MSE", "aRMSE"),
  nb.cores = 2,
  verbose = T,
  ...
)

Arguments

X

the gene expression matrix (genes as rows, samples as columns)

p

a dataframe with the pheno data used in the formula (samples as rows) e.g. time, covariates.

formula_list

a list of model formulas to compare, which must start with 'X ~' (as passed on to ge_im).

cv.n

number of cross-validation repeats.

cv.s

ratio of samples to use for training set. If cv.s > 1, then cv.s samples are used for the training set.

method

the model type to fit, one of c("gam", "glm", "limma").

dim_red

the dimension reduction method to use for interpolation, one of c("pca", "ica"), ignored when method is "limma".

nc

the number of components to extract from X for interpolation, defaults to ncol(X), ignored method is "limma".

to_compute

the model performance indices to compute during CV (see mperf)

nb.cores

the number of cores to use for parallel execution.

verbose

boolean ; if TRUE, displays messages of the various steps of the method.

...

extra arguments passed on to model functions.

Details

The CV training sets are defined to be representative of all variables included in the models. This is done with a function attributed to GitHub user mrdwab https://gist.github.com/mrdwab/6424112.

Note that only one method/dimension reduction can be used at a time through this function.

Examples


requireNamespace('wormRef', quietly = TRUE)
requireNamespace('stats', quietly = TRUE)

# gene expression data
X <- wormRef::Cel_larval$g

# pheno data (e.g time, batch)
p <- wormRef::Cel_larval$p

# do a pca & select nb of components to use for interpol
pca <- stats::prcomp(X, rank = 20)
nc <- sum(summary(pca)$importance[3, ] < .999) + 1


# find optimal spline type
# setup formulas
smooths <- c('bs', 'tp', 'cr', 'ds')
flist <- as.list(paste0('X ~ s(age, bs = \'', smooths, '\') + cov'))
# do CV
cvres <- ge_imCV(X = scale(X), p = p, formula_list = flist,
                 cv.n = 20, nc = nc)
# check results
plot(cvres, names.arrange = 4) # lowest pred error with 'ds' spline

# build model & make reference
m <- ge_im(X = X, p = p, formula = 'X ~ s(age, bs = \'ds\') + cov', nc = nc)

ref <- make_ref(m, cov.levels = list('cov'='O.20'), n.inter = 100, 
                t.unit='h past egg-laying (20C)')

# check model interpolation on pca components
par(mfrow = c(2,2))
plot(m, ref, ncs=1:4) # showing first 4 PCs


# test
ae_X <- ae(X, ref)
par(mfrow = c(1,2))
plot(p$age, ae_X$age.estimates[,1])
plot(ae_X, groups = p$cov)




LBMC/wormAge documentation built on April 6, 2023, 3:52 a.m.