flm_def: Builds (simple) design and contrast matrices for use with...
In facilebio/FacileAnalysis: Modularized and interactive analyses over a FacileDataStore

flm_def

R Documentation

Builds (simple) design and contrast matrices for use with `fdge()`

Description

This simplifies the design and contrast building process by allowing for simple model definitions that are, essentially, functions of a single covariate. More elaborate models can be analysed, but the user is left to define the design, coef / contrast to test manually and pass those into fdge().

Usage

flm_def(
  x,
  covariate,
  numer = NULL,
  denom = NULL,
  batch = NULL,
  block = NULL,
  on_missing = c("warning", "error"),
  ...,
  metadata = list()
)

## S3 method for class 'data.frame'
flm_def(
  x,
  covariate,
  numer = NULL,
  denom = NULL,
  batch = NULL,
  block = NULL,
  on_missing = c("warning", "error"),
  ...,
  metadata = list(),
  contrast. = NULL,
  .fds = NULL
)

## S3 method for class 'tbl'
flm_def(
  x,
  covariate,
  numer = NULL,
  denom = NULL,
  batch = NULL,
  block = NULL,
  on_missing = c("warning", "error"),
  ...,
  metadata = list()
)

## S3 method for class 'facile_frame'
flm_def(
  x,
  covariate,
  numer = NULL,
  denom = NULL,
  batch = NULL,
  block = NULL,
  on_missing = c("warning", "error"),
  ...,
  metadata = list(),
  custom_key = NULL
)

## S3 method for class 'FacileDataStore'
flm_def(
  x,
  covariate,
  numer = NULL,
  denom = NULL,
  batch = NULL,
  block = NULL,
  on_missing = c("warning", "error"),
  ...,
  metadata = list(),
  samples = NULL,
  custom_key = NULL
)

Arguments

`x`	a dataset
`covariate`	the name of the "main effect" sample_covariate we are performing a contrast against.
`numer`	character vector defining the covariate/groups that make up the numerator
`denom`	character vector defining the covariate/groups that make up the denominator
`batch`	character vector defining the covariate/groups to use as batch effects
`block`	a string that names the covariate to use for the blocking factor in a random effects model.
`on_missing`	when a covariate level is missing (NA) for a sample, the setting of this parameter (default `"warn"`) will dictate the behavior of this funciton. When `"warning"`, a warning will be raised, and the message will be stored in the `⁠$warning⁠` element of the resul. Otherwise, when `"error"`. See the "Missing Covariates" section for more information.
`contrast.`	A custom contrast vector can be passed in for extra tricky comparisons that we haven't figured out how to put a GUI in front of.

Details

Note: actually a (likely) small modification of this can have it support the "ratio of ratios" model setup.

Value

a list with:

⁠$test⁠: "ttest" or "anova"
⁠$covariates⁠: the pData over the samples (datset,sample_id, ...)
⁠$design⁠: the design matrix (always 0-intercept)
⁠$contrast⁠: the contrast vector that defines the comparison asked for
⁠$messages⁠: A character vector of messages generated
⁠$warnings⁠: A character vector of warnings generated
⁠$errors⁠: A character vector of errors generated

Missing Covariates

Given the "ragged" nature of sample annotations in a FacileDataStore, some samples may have NA's as their values for the covariates under test. In this case. In this case, if on_missing is set to "error", an error will be thrown, otherwise a message will be set in the warning list element.

The samples that the differential expression should be run on will be enumerated by the ⁠(dataset,sample_id)⁠ pair in the result$covariates tibble.

Alignment with assay data

This builds a linear model by working with the covariates that are defined over the samples. This does not ask which assay will be used downstream in combination with this linear model to run the fit and test. It is the responsibility of the downstream users/functions of this linear model to ensure that the samples defined in the linear model have data from the assay that the actual measurements/data is coming from.

data.frame

The ⁠*.data.frame⁠ function definition assumes that x is a data.frame of samples (dataset,sample_id) and the covariates defined on these samples (ie. all the other columns of x) contain a superset of the variable names used in the construction of the design matrix for the model definition.

facile_frame

When we define a model off of a facile_frame, we expect this to look like a wide covariate table. This defines the samples we will build a model on in its (datset, sample_id) columns, as well as any covaraites defined on these samples.

If there are covariates used in the covariate or batch parameters that are not found in colnames(x), we will attempt to retrieve them from the FacileDataStore fds(x). If they cannot be found, this function will raise an error.

Examples

efds <- FacileData::exampleFacileDataSet()

# Look for tumor vs normal differences, controling for stage and sex
model_info <- efds |>
  FacileData::filter_samples(indication == "BLCA") |>
  flm_def(covariate = "sample_type", numer = "tumor", denom = "normal",
          batch = "sex")
m2 <- efds |>
  FacileData::filter_samples(indication == "BLCA") |>
  flm_def(covariate = "sample_type", numer = "tumor", denom = "normal",
          batch = c("sex", "stage"))

# stageIV vs stageII & stageIII
m3 <- efds |>
  FacileData::filter_samples(indication == "BLCA", sample_type == "tumor") |>
  flm_def(covariate = "stage", numer = "IV", denom = c("II", "III"),
          batch = "sex")

# Incomplete ttest to help with custom contrast vector
mi <- efds |>
  FacileData::filter_samples(indication == "BLCA", sample_type == "tumor") |>
  flm_def(covariate = "stage", batch = "sex", contrast. = "help")

# ANOVA across stage in BLCA, control for sex
m3 <- efds |>
  FacileData::filter_samples(indication == "BLCA") |>
  flm_def(covariate = "stage", batch = "sex")

facilebio/FacileAnalysis documentation built on April 5, 2025, 2:42 p.m.