tree_and_independent_features: identify independent features in a numeric matrix

View source: R/tree_and_independent_features.R

tree_and_independent_featuresR Documentation

identify independent features in a numeric matrix

Description

This function identifies independent features using Spearman's rho correlation distances, and a dendrogram tree cut step.

Usage

tree_and_independent_features(
  wdata,
  minimum_samplesize = 50,
  tree_cut_height = 0.5,
  feature_names_2_exclude = NA
)

Arguments

wdata

the metabolite data matrix. samples in row, metabolites in columns

minimum_samplesize

the metabolite data matrix. samples in row, metabolites in columns

tree_cut_height

the tree cut height. A value of 0.2 (1-Spearman's rho) is equivalent to saying that features with a rho >= 0.8 are NOT independent.

feature_names_2_exclude

A vector of feature|metabolite names to exclude from this analysis. This might be features heavily present|absent like Xenobiotics or variables derived from two or more variable already in the dataset.

Value

a list object of (1) an hclust object, (2) independent features, (3) a data frame of feature ids, k-cluster identifiers, and a binary identifier of independent features

Examples

## define a covariance matrix
cmat = matrix(1, 4, 4 )
cmat[1,] = c(1, 0.7, 0.4, 0.2)
cmat[2,] = c(0.7, 1, 0.2, 0.05)
cmat[3,] = c(0.4, 0.2, 1, 0.375)
cmat[4,] = c(0.2, 0.05, 0.375,1)

## simulate the data (multivariable random normal)
set.seed(1110)
ex_data = MASS::mvrnorm(n = 500, mu = c(5, 45, 25, 15), Sigma = cmat )
rownames(ex_data) = paste0("ind", 1:nrow(ex_data))
colnames(ex_data) = paste0("var", 1:ncol(ex_data))

## run function to identify independent variables at a tree cut height
## of 0.5 which is equivalent to clustering variables with a Spearman's
## rho > 0.5 or (1 - tree_cut_height)
ind = tree_and_independent_features(ex_data, tree_cut_height = 0.5)


MRCIEU/metaboprep documentation built on Jan. 28, 2023, 7:29 p.m.