model-flagging: Functions to flag/exclude models

flag_model_residualR Documentation

Functions to flag/exclude models

Description

The functions listed on this help page (⁠flag_*⁠) allow to flag/identify potentially problematic model fits. The function were designed to be used with the models estimating an injection-order-dependent signal drift seen in LC-MS based untargeted metabolomics data. Such models can be fit with the rowFitModel function of the xcms package. Functions are expected to return TRUE for potentially problematic model fits and FALSE otherwise.

The functions are:

  • flag_model_residual: test whether the difference between the difference between the 25 and 75% quantile of residuals is larger than the user defined value diff_residual. This function identifies model fits with on average large deviations of the individual data points from the fitted line.

  • flag_model_mean_residual: tests if the mean of the absolute residuals is larger than the provided value.

  • flag_model_inj_range: tests if values on which the model was fitted spans a minimum required injection index range. This requires x being a model of the type y ~ inj_idx.

  • flag_model_cat_count: tests if the number of replicated measurents (categories) for a categorical variable (e.g. batch) are larger than min_count.

  • flag_model_coef_count: tests if the number of estimated coefficients matches the expected number. This is useful/required for linear models aimed to adjust a batch effect, but for which a coefficient was not estimated for each batch (=level of the categorical variable representing the batch). This could happen if only missing values were present for the respective batch.

Usage

flag_model_residual(x, diff_residual = 1)

flag_model_mean_residual(x, cut_off = 0.5)

flag_model_inj_range(x, min_range = 1, column = "inj_idx")

flag_model_cat_count(x, variable, min_count = 4)

flag_model_coef_count(x, n_coef)

Arguments

x

a linear model object such as generated by lm() or robustbase::lmrob().

diff_residual

for flag_model_residual: numeric(1) defining the cut-off to flag models with large residuals. TRUE is reported for all models with a difference between the 25 and 75 percent percentile of residuals larger than this value.

cut_off

for flag_model_mean_residual: numeric(1) defining the cut-off to flag models with large average residuals.

min_range

for flag_model_inj_range: numeric(1) defining the minimum range. This is an absolute value, not a percentage. Means, the function compares the diff(range(x$x[, column])) with min_range and flags models that don't fit that criteria (i.e. have a smaller range).

column

for flag_model_inj_range: character(1) specifying the column containing the injection index.

variable

for flag_model_cat_count: character(1) with the name of the categorical value. Should be one of colnames(x$model).

min_count

for flag_model_cat_count: integer(1) with the minimum required number of values within each category.

n_coef

for flag_model_coef_count: integer(1) with the expected number of coefficients.

Value

logical(1): TRUE if model fit is potentially problematic and FALSE otherwise or NA if no model provided.

Author(s)

Johannes Rainer


EuracBiomedicalResearch/CompMetaboTools documentation built on Jan. 31, 2024, 1:14 p.m.