mbecModelVarianceSCOEF: Estimate Explained Variance with Silhouette Coefficient

Description Usage Arguments Details Value

View source: R/mbecs_analyses.R

Description

The function offers a selection of methods/algorithms to estimate the proportion of variance that can be attributed to covariates of interest. This shows, how much variation is explained by the treatment effect, which proportion is introduced by processing in batches and the leftover variance, i.e., residuals that are not currently explained. Covariates of interest (CoI) are selected by the user and the function will incorporate them into the model.

Usage

1
mbecModelVarianceSCOEF(model.vars, tmp.cnts, tmp.meta, type)

Arguments

model.vars

Covariates to use for model building.

tmp.cnts

Abundance matrix in 'sample x feature' orientation.

tmp.meta

Covariate table that contains at least the used variables.

type

String the denotes data source, i.e., one of "otu","clr" or "tss" for the transformed counts or the label of the batch corrected count-matrix.

Details

Silhouette Coefficient (s.coef): Calculate principal components and get sample-wise distances on the resulting (sxPC) matrix. Then iterate over all the covariates and calculate the cluster silhouette (which is basically either zero, if the cluster contains only a single element, or it is the distance to the closest different cluster minus the distance of the sample within its own cluster divided (scaled) by the maximum distance). Average over each element in a cluster for all clusters and there is the representation of how good the clustering is. This shows how good a particular covariate characterizes the data, i.e., a treatment variable for instance may differentiate the samples into treated and untreated groups which implies two clusters. In an ideal scenario, the treatment variable, i.e., indicator for some biological effect would produce a perfect clustering. In reality, the confounding variables, e.g., batch, sex or age, will also influence the ordination of samples. Hence, the clustering coefficient is somewhat similar to the amount of explained variance metric that the previous methods used. If used to compare an uncorrected data-set to a batch-corrected set, the expected result would be an increase of clustering coefficient for the biological effect (and all other covariates - because a certain amount of uncertainty was removed from the data) and a decrease for the batch effect.

Value

Data.frame that contains proportions of variance for given covariates in a silhouette coefficient analysis approach.


buschlab/MBECS documentation built on Jan. 21, 2022, 1:27 a.m.