createDesignMatrix: Create design matrix

View source: R/createDesignMatrix.R

createDesignMatrixR Documentation

Create design matrix

Description

Create design matrix for model fitting

Usage

createDesignMatrix(experiment_info, cols_design = NULL)

Arguments

experiment_info

data.frame, DataFrame, or tbl_df of experiment information (which was also previously provided to prepareData). This should be a data frame containing all factors and covariates of interest; e.g. group IDs, block IDs, batch IDs, and continuous covariates.

cols_design

Argument specifying the columns of experiment_info to include in the design matrix. This can be provided as a character vector of column names, a numeric vector of column indices, or a logical vector. Default = all columns.

Details

Creates a design matrix specifying the models to be fitted. (Alternatively, createFormula can be used to generate a model formula instead of a design matrix.)

The design matrix can then be provided to the differential testing functions, together with the data object and contrast matrix.

The experiment_info input (which was also previously provided to prepareData) should be a data frame containing all factors and covariates of interest. For example, depending on the experimental design, this may include the following columns:

  • group IDs (e.g. groups for differential testing)

  • block IDs (e.g. patient IDs in a paired design)

  • batch IDs (batch effects)

  • continuous covariates

The argument cols_design specifies which columns in experiment_info to include in the design matrix. (For example, there may be an additional column of sample IDs, which should not be included.) This can be provided as a character vector of column names, a numeric vector of column indices, or a logical vector. By default, all columns are included.

Columns of indicator variables (e.g. group IDs, block IDs, and batch IDs) in experiment_info must be formatted as factors (otherwise they will be treated as numeric values). The indicator columns will be expanded into the design matrix format. The names for each parameter are taken from the column names of experiment_info.

All factors provided here will be included as fixed effect terms in the design matrix. Alternatively, to use random effects for some factors (e.g. for block IDs), see createFormula; or, depending on the method used, provide them directly to the differential testing function (testDA_voom and testDS_limma).

Value

design: Returns a design matrix (numeric matrix), with one row per sample, and one column per model parameter.

Examples

# For a complete workflow example demonstrating each step in the 'diffcyt' pipeline, 
# see the package vignette.

# Example: simple design matrix
experiment_info <- data.frame(
  sample_id = factor(paste0("sample", 1:4)), 
  group_id = factor(c("group1", "group1", "group2", "group2")), 
  stringsAsFactors = FALSE
)
createDesignMatrix(experiment_info, cols_design = "group_id")

# Example: more complex design matrix: patient IDs and batch IDs
experiment_info <- data.frame(
  sample_id = factor(paste0("sample", 1:8)), 
  group_id = factor(rep(paste0("group", 1:2), each = 4)), 
  patient_id = factor(rep(paste0("patient", 1:4), 2)), 
  batch_id = factor(rep(paste0("batch", 1:2), 4)), 
  stringsAsFactors = FALSE
)
createDesignMatrix(experiment_info, cols_design = c("group_id", "patient_id", "batch_id"))

# Example: more complex design matrix: continuous covariate
experiment_info <- data.frame(
  sample_id = factor(paste0("sample", 1:4)), 
  group_id = factor(c("group1", "group1", "group2", "group2")), 
  age = c(52, 35, 71, 60), 
  stringsAsFactors = FALSE
)
createDesignMatrix(experiment_info, cols_design = c("group_id", "age"))


lmweber/diffcyt documentation built on March 19, 2024, 5:24 a.m.