linear_regres: Linear Regression

View source: R/managing_batch_effects.R

linear_regresR Documentation

Linear Regression

Description

This function fits linear regression (linear model or linear mixed model) on each microbial variable and includes treatment and batch effects as covariates. It generates p-values, adjusted p-values for multiple comparisons, and evaluation metrics of model quality.

Usage

linear_regres(
    data,
    trt,
    batch.fix = NULL,
    batch.fix2 = NULL,
    batch.random = NULL,
    type = "linear model",
    p.adjust.method = "fdr"
)

Arguments

data

A data frame that contains the response variables for the linear regression. Samples as rows and variables as columns.

trt

A factor or a class vector for the treatment grouping information (categorical outcome variable).

batch.fix

A factor or a class vector for the batch grouping information (categorical outcome variable), treated as a fixed effect in the model.

batch.fix2

A factor or a class vector for a second batch grouping information (categorical outcome variable), treated as a fixed effect in the model.

batch.random

A factor or a class vector for the batch grouping information (categorical outcome variable), treated as a random effect in the model.

type

The type of model to be used for fitting, either 'linear model' or 'linear mixed model'.

p.adjust.method

The method to be used for p-value adjustment, either "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr" or "none".

Value

linear_regres returns a list that contains the following components:

type

The type of model used for fitting.

model

Each object fitted.

raw.p

The p-values for each response variable.

adj.p

The adjusted p-values for each response variable.

p.adjust.method

The method used for p-value adjustment.

R2

The proportion of variation in the response variable that is explained by the predictor variables. A higher R2 indicates a better model. Results for 'linear model' only.

adj.R2

Adjusted R2 for many predictor variables in the model. Results for 'linear model' only.

cond.R2

The proportion of variation in the response variable that is explained by the "complete" model with all covariates. Results for 'linear mixed model' only. Similar to R2 in linear model.

marg.R2

The proportion of variation in the response variable that is explained by the fixed effects part only. Results for 'linear mixed model' only.

RMSE

The average error performed by the model in predicting the outcome for an observation. A lower RMSE indicates a better model.

RSE

also known as the model sigma, is a variant of the RMSE adjusted for the number of predictors in the model. A lower RSE indicates a better model.

AIC

A penalisation value for including additional predictor variables to a model. A lower AIC indicates a better model.

BIC

is a variant of AIC with a stronger penalty for including additional variables to the model.

Note

R2, adj.R2, cond.R2, marg.R2, RMSE, RSE, AIC, BIC all include the results of two models: (i) the full input model; (ii) a model without batch effects. It can help to decide whether it is better to include batch effects.

Author(s)

Yiwen Wang, Kim-Anh LĂȘ Cao

References

\insertRef

daniel2020performancePLSDAbatch

See Also

percentile_norm and PLSDA_batch as the other methods for batch effect management.

Examples

library(TreeSummarizedExperiment) # for functions assays(),rowData()
data('AD_data')

# centered log ratio transformed data
ad.clr <- assays(AD_data$EgData)$Clr_value
ad.batch <- rowData(AD_data$EgData)$Y.bat # batch information
ad.trt <- rowData(AD_data$EgData)$Y.trt # treatment information
names(ad.batch) <- names(ad.trt) <- rownames(AD_data$EgData)
ad.lm <- linear_regres(data = ad.clr, trt = ad.trt,
                        batch.fix = ad.batch,
                        type = 'linear model')
ad.p.adj <- ad.lm$adj.p
head(ad.lm$AIC)


EvaYiwenWang/PLSDAbatch documentation built on Sept. 25, 2024, 8:54 p.m.