ge_im: Gene expression interpolation model

View source: R/ge_im.R

ge_imR Documentation

Gene expression interpolation model

Description

Build a model to interpolate on a gene expression dataset. This can be done either with gam or glm models fit on the components of a PCA or ICA. It's also possible to have a linear model fit directly (per gene) on the gene expression data (uses limma).

Usage

ge_im(
  X,
  p,
  formula,
  method = c("gam", "glm", "limma"),
  dim_red = c("pca", "ica"),
  nc = ncol(X),
  ...
)

Arguments

X

the gene expression matrix (genes as rows, samples as columns)

p

a dataframe with the phenotypic data used in the formula (samples as rows) e.g. time, covariates.

formula

the model formula, which must start with 'X ~'. See gam, glm or lmFit documentation for specifications.

method

the model to fit, one of c("gam", "glm", "limma").

dim_red

the dimension reduction method to use for interpolation, one of c("pca", "ica"), ignored if method is "limma".

nc

the number of components to extract from X for interpolation, defaults to ncol(X), ignored if method is "limma".

...

extra arguments passed on to model functions.

Details

We use components as "eigen genes" to find model parameters fitting the whole gene set \insertCitestorey2005significanceRAPToR.

Value

a 'geim' model object. This object has its predict method

Examples


requireNamespace('wormRef', quietly = TRUE)
requireNamespace('stats', quietly = TRUE)

# gene expression data
X <- wormRef::Cel_larval$g

# pheno data (e.g time, batch)
p <- wormRef::Cel_larval$p

# do a pca & select nb of components to use for interpol
pca <- stats::prcomp(X, rank = 20)
nc <- sum(summary(pca)$importance[3, ] < .999) + 1


# find optimal spline type
# setup formulas
smooths <- c('bs', 'tp', 'cr', 'ds')
flist <- as.list(paste0('X ~ s(age, bs = \'', smooths, '\') + cov'))
# do CV
cvres <- ge_imCV(X = scale(X), p = p, formula_list = flist,
                 cv.n = 20, nc = nc)
# check results
plot(cvres, names.arrange = 4) # lowest pred error with 'ds' spline

# build model & make reference
m <- ge_im(X = X, p = p, formula = 'X ~ s(age, bs = \'ds\') + cov', nc = nc)

ref <- make_ref(m, cov.levels = list('cov'='O.20'), n.inter = 100, 
                t.unit='h past egg-laying (20C)')

# check model interpolation on pca components
par(mfrow = c(2,2))
plot(m, ref, ncs=1:4) # showing first 4 PCs


# test
ae_X <- ae(X, ref)
par(mfrow = c(1,2))
plot(p$age, ae_X$age.estimates[,1])
plot(ae_X, groups = p$cov)




LBMC/wormAge documentation built on April 6, 2023, 3:52 a.m.