createFormula: Create model formula and corresponding data frame of...

View source: R/createFormula.R

createFormulaR Documentation

Create model formula and corresponding data frame of variables

Description

Create model formula and corresponding data frame of variables for model fitting

Usage

createFormula(experiment_info, cols_fixed = NULL, cols_random = NULL)

Arguments

experiment_info

data.frame, DataFrame, or tbl_df of experiment information (which was also previously provided to prepareData). This should be a data frame containing all factors and covariates of interest; e.g. group IDs, block IDs, batch IDs, and continuous covariates.

cols_fixed

Argument specifying columns of experiment_info to include as fixed effect terms in the model formula. This can be provided as a character vector of column names, a numeric vector of column indices, or a logical vector.

cols_random

Argument specifying columns of experiment_info to include as random intercept terms in the model formula. This can be provided as a character vector of column names, a numeric vector of column indices, or a logical vector. Default = none.

Details

Creates a model formula and corresponding data frame of variables specifying the models to be fitted. (Alternatively, createDesignMatrix can be used to generate a design matrix instead of a model formula.)

The output is a list containing the model formula and corresponding data frame of variables (one column per formula term). These can then be provided to differential testing functions that require a model formula, together with the main data object and contrast matrix.

The experiment_info input (which was also previously provided to prepareData) should be a data frame containing all factors and covariates of interest. For example, depending on the experimental design, this may include the following columns:

  • group IDs (e.g. groups for differential testing)

  • block IDs (e.g. patient IDs in a paired design; these may be included as either fixed effect or random effects)

  • batch IDs (batch effects)

  • continuous covariates

  • sample IDs (e.g. to include random intercept terms for each sample, to account for overdispersion typically seen in high-dimensional cytometry data; this is known as an 'observation-level random effect' (OLRE); see see Nowicka et al., 2017, F1000Research for more details)

The arguments cols_fixed and cols_random specify the columns in experiment_info to include as fixed effect terms and random intercept terms respectively. These can be provided as character vectors of column names, numeric vectors of column indices, or logical vectors. The names for each formula term are taken from the column names of experiment_info.

Note that for some methods, random effect terms (e.g. for block IDs) must be provided directly to the differential testing function instead (testDA_voom and testDS_limma).

If there are no random effect terms, it will usually be simpler to use a design matrix instead of a model formula; see createDesignMatrix.

Value

formula: Returns a list with three elements:

  • formula: model formula

  • data: data frame of variables corresponding to the model formula

  • random_terms: TRUE if model formula contains any random effect terms

Examples

# For a complete workflow example demonstrating each step in the 'diffcyt' pipeline, 
# see the package vignette.

# Example: model formula
experiment_info <- data.frame(
  sample_id = factor(paste0("sample", 1:8)), 
  group_id = factor(rep(paste0("group", 1:2), each = 4)), 
  patient_id = factor(rep(paste0("patient", 1:4), 2)), 
  stringsAsFactors = FALSE
)
createFormula(experiment_info, cols_fixed = "group_id", cols_random = c("sample_id", "patient_id"))


lmweber/diffcyt documentation built on March 19, 2024, 5:24 a.m.