corrSeq_fit: Function to fit various types of models to longitudinal...
In ewynn610/corrRNASeq: Model Fitting for Correlated RNA-Seq Count Data

View source: R/corrSeq_fit.R

corrSeq_fit

R Documentation

Function to fit various types of models to longitudinal RNA-Seq data

Description

Wrapper function that fits one of five types of models to RNA-Seq data. Available model fitting methods are linear mixed models (lmm) (using transformed data), generalized estimating equations with an optional small sample adjustment (gee), and negative binomial models using a pseudo-likelhood approach (nbmm_pl), a maximum likelihood approach with Laplace approximation (nbmm_lp), or a maximum likelihood approach with adaptive Gaussian quadrature (nbmm_agq).

Usage

corrSeq_fit(
  formula = NULL,
  expr_mat = NULL,
  gene_names = NULL,
  sample_data = NULL,
  method,
  id,
  small.samp.method = NULL,
  parallel = F,
  cores = 2,
  ...
)

Arguments

`formula`	A one-sided linear formula describing the model variables. For models that include them, random effects should be included in the formula using the syntax of the lme4 package.
`expr_mat`	A (G x N) numeric matrix RNA-seq expression data with genes in rows and samples in columns. For `method="gee"`, `method="nbmm_pl"`, `method="nbmm_lp"` and `method="nbmm_agq"`, the matrix should contain raw counts and for `method="lmm"` the matrix should contain transformed counts (e.g. using VST from DESeq2). G = number of genes. N = number of samples.
`gene_names`	An optional character vector of gene names (length G).
`sample_data`	Data frame with N rows containing the fixed- and random-effects terms included in the formula. The rows of the data frame must correspond (and be in the same order as) the columns of the expression matrix.
`method`	Method to use to fit the models. Possible options are `"lmm"`, `"gee"`, `"nbmm_pl"`, `"nbmm_lp"` and `"nbmm_agq"`.
`id`	Only applicable for models fit using `method="gee"`. A vector or data column name which identifies the clusters. The length of ‘id’ should be the same as the number of observations. Data are assumed to be sorted so that observations on each cluster appear as contiguous rows in data. If data is not sorted this way, the function will not identify the clusters correctly. If `sort=TRUE` (default), the dataframe from the `data` argument is sorted by the id column to avoid this issue.
`small.samp.method`	Only applicable for models fit using `method="gee"`. A character string specifying the small sample method. The following are permitted: "pan" for the Pan (2001) method, "md" for the Mancl and Derouen (2001) method, and "wl" for the Wang and Long (2011) method. If `small.samp.method` is null, small sample variance estimates are not computed.
`parallel`	A logical variable indicating whether forking (via mclapply) should be used to parallelize fits. Only available on Mac or linux machines.
`cores`	Number of cores to use if parallelizing (default is 2)
`...`	additional arguments passed on to `lmer` (`method="lmm"`), `gee_small_sample` (`method="gee"`), `glmm_nb_lmer` (`method="nbmm_pl"`), `glmmadmb` (`method="nbmm_lp"`), or `mixed_model` (method="nbmm_agq").

Value

A list of length G of model objects from the following functions: lmer (method="lmm"), gee_small_sample(method="lmm"), glmm_nb_lmer (method="nbmm_pl"), glmmadmb (method="nbmm_lp"), or mixed_model (method="nbmm_agq").

Author(s)

Elizabeth Wynn

Examples

data("simdata")
sample_meta_data <- simdata$metadata

## Subset down to 10 observation (i.e. gene)
counts=simdata$counts[1:10,]

## Fit GEE models using Wang-Long small sample size estimator
## log_offset is the log size factors from the DESeq2 package
gee_fit <- corrSeq_fit(formula = ~ group * time+offset(log_offset),
                           expr_mat = counts,
                           sample_data = sample_meta_data,
                           method="gee",
                           id=ids,
                           small.samp.method="wl")

## Fit NBMM-PL models
nbmm_pl_fit <- corrSeq_fit(formula = ~ group * time+(1|ids)+offset(log_offset),
                           expr_mat = counts,
                           sample_data = sample_meta_data,
                           method="nbmm_pl")


## Fit NBMM-AGQ models
nbmm_agq_fit <- corrSeq_fit(formula = ~ group * time+(1|ids)+offset(log_offset),
                           expr_mat = counts,
                           sample_data = sample_meta_data,
                           method="nbmm_agq")

## Fit NBMM-LP models
## Random effects must be factors
sample_meta_data$ids<-factor(sample_meta_data$ids)
nbmm_lp_fit <- corrSeq_fit(formula = ~ group * time+(1|ids)+offset(log_offset),
                           expr_mat = counts,
                           sample_data = sample_meta_data,
                           method="nbmm_lp")

## Fit LMM models to transformed data
## Use the variance transformed counts in the simdata object
## Subset down to 10 genes
vst_expr<-simdata$vst_expr[1:10,]

##  Fit the Models
lmm_fit<- corrSeq_fit(formula = ~ group * time + (1|ids),
                           expr_mat = vst_expr,
                           sample_data = sample_meta_data,
                           method="lmm")

ewynn610/corrRNASeq documentation built on Sept. 29, 2023, 10:37 a.m.