View source: R/createFormula.R

Create model formula and corresponding data frame of variables


Create model formula and corresponding data frame of variables for model fitting


  cols_fixed = NULL,
  cols_random = NULL,
  event_indicator = NULL



data.frame, DataFrame, or tbl_df of experiment information (which was also previously provided to prepareData). This should be a data frame containing all factors and covariates of interest; e.g. group IDs, block IDs, batch IDs, and continuous covariates.


Argument specifying columns of experiment_info to include as fixed effect terms in the model formula. This can be provided as a character vector of column names, a numeric vector of column indices, or a logical vector.


Argument specifying columns of experiment_info to include as random intercept terms in the model formula. This can be provided as a character vector of column names, a numeric vector of column indices, or a logical vector. Default = none.


Argument specifying columns of experiment_info to include as event indicator for the censored covariate in the model formula. The censored covariate is assumed to be the first element of argument cols_fixed. This can be provided as a character vector of column names, a numeric vector of column indices, or a logical vector. Default = none.


Creates a model formula and corresponding data frame of variables specifying the models to be fitted. Extends createFormula from diffcyt.

The output is a list containing the model formula and corresponding data frame of variables (one column per formula term). These can then be provided to differential testing functions that require a model formula, together with the main data object and contrast matrix.

The experiment_info input (which was also previously provided to prepareData) should be a data frame containing all factors and covariates of interest. For example, depending on the experimental design, this may include the following columns:

  • group IDs (e.g. groups for differential testing)

  • block IDs (e.g. patient IDs in a paired design; these may be included as either fixed effect or random effects)

  • batch IDs (batch effects)

  • continuous covariates

  • sample IDs (e.g. to include random intercept terms for each sample, to account for overdispersion typically seen in high-dimensional cytometry data; this is known as an 'observation-level random effect' (OLRE); see see Nowicka et al., 2017, F1000Research for more details)

The arguments cols_fixed and cols_random specify the columns in experiment_info to include as fixed effect terms and random intercept terms respectively. These can be provided as character vectors of column names, numeric vectors of column indices, or logical vectors. The names for each formula term are taken from the column names of experiment_info. The argument event_indicator specifies the column in experiment_info as the event indicator ('0' represents censored and '1' represents observed) of the first element in cols_fixed.


formula: Returns a list with three elements:

  • formula: model formula

  • data: data frame of variables corresponding to the model formula

  • random_terms: TRUE if model formula contains any random effect terms


# model formula with censored variable
experiment_info <- data.frame(
  survival_time = rexp(8),
  sample_id = factor(paste0("sample", 1:8)), 
  group_id = factor(rep(paste0("group", 1:2), each = 4)), 
  observed = factor(rep(c(0,1),4)),
  patient_id = factor(rep(paste0("patient", 1:4), 2)), 
  stringsAsFactors = FALSE
createFormula(experiment_info, cols_fixed = c("survival_time","group_id"), 
  cols_random = c("sample_id", "patient_id"), event_indicator="observed")

