ge_imCV | R Documentation |
This function performs cross-validation (CV) with the aim of finding the optimal model from the given formulas. Parameters are explored through the given list of formulas (e.g, 'df=3' or 'df=4' must be specified in the formulas).
ge_imCV(
X,
p,
formula_list,
cv.n = 50,
cv.s = 0.8,
method = c("gam", "glm", "limma"),
dim_red = c("pca", "ica"),
nc = ncol(X),
to_compute = c("aRE", "MSE", "aRMSE"),
nb.cores = 2,
verbose = T,
...
)
X |
the gene expression matrix (genes as rows, samples as columns) |
p |
a dataframe with the pheno data used in the formula (samples as rows) e.g. time, covariates. |
formula_list |
a list of model formulas to compare, which must start with 'X ~' (as passed on to |
cv.n |
number of cross-validation repeats. |
cv.s |
ratio of samples to use for training set. If |
method |
the model type to fit, one of c("gam", "glm", "limma"). |
dim_red |
the dimension reduction method to use for interpolation, one of c("pca", "ica"), ignored when method is "limma". |
nc |
the number of components to extract from |
to_compute |
the model performance indices to compute during CV (see |
nb.cores |
the number of cores to use for parallel execution. |
verbose |
boolean ; if TRUE, displays messages of the various steps of the method. |
... |
extra arguments passed on to model functions. |
The CV training sets are defined to be representative of all variables included in the models. This is done with a function attributed to GitHub user mrdwab https://gist.github.com/mrdwab/6424112.
Note that only one method/dimension reduction can be used at a time through this function.
requireNamespace('wormRef', quietly = TRUE)
requireNamespace('stats', quietly = TRUE)
# gene expression data
X <- wormRef::Cel_larval$g
# pheno data (e.g time, batch)
p <- wormRef::Cel_larval$p
# do a pca & select nb of components to use for interpol
pca <- stats::prcomp(X, rank = 20)
nc <- sum(summary(pca)$importance[3, ] < .999) + 1
# find optimal spline type
# setup formulas
smooths <- c('bs', 'tp', 'cr', 'ds')
flist <- as.list(paste0('X ~ s(age, bs = \'', smooths, '\') + cov'))
# do CV
cvres <- ge_imCV(X = scale(X), p = p, formula_list = flist,
cv.n = 20, nc = nc)
# check results
plot(cvres, names.arrange = 4) # lowest pred error with 'ds' spline
# build model & make reference
m <- ge_im(X = X, p = p, formula = 'X ~ s(age, bs = \'ds\') + cov', nc = nc)
ref <- make_ref(m, cov.levels = list('cov'='O.20'), n.inter = 100,
t.unit='h past egg-laying (20C)')
# check model interpolation on pca components
par(mfrow = c(2,2))
plot(m, ref, ncs=1:4) # showing first 4 PCs
# test
ae_X <- ae(X, ref)
par(mfrow = c(1,2))
plot(p$age, ae_X$age.estimates[,1])
plot(ae_X, groups = p$cov)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.