selectRatios: Pairwise Ratio Selection

View source: R/5-selectRatios.R

selectRatiosR Documentation

Pairwise Ratio Selection

Description

This function finds which feature ratios explain the most variance. This is a computationally expensive procedure that we approximate with the heuristic described below.

Usage

selectRatios(
  counts,
  ndim = 3,
  nclust = 2 * round(sqrt(ncol(counts))),
  nsearch = 3,
  ndenom = 4
)

Arguments

counts

A data.frame or matrix. A "count matrix" with subjects as rows and features as columns. Note that this matrix does not necessarily have to contain counts.

ndim

An integer. The number of ratios to find.

nclust

An integer. The number of clusters to build from the data.

nsearch

An integer. The number of clusters to search exhaustively.

ndenom

An integer. The number of best denominators to use when searching for the best numerators.

Details

This function resembles the method described by Michael Greenacre in "Variable Selection in Compositional Data Analysis Using Pairwise Logratios", except that we have modified the method to use a heuristic that scales to high-dimensional data.

For each ratio, the heuristic will search CLR-based clusters for the best denominator, and then will search ALR-based clusters for the best numerator. It does this by dividing the transformed data into nclust clusters, calculating vegan::rda on the geometric mean of each cluster, then searching the best clusters exhaustively. The ndenom argument toggles how many best denominators to use during the next step. This process is repeated ndim times, finding that number of ratios that explain the most variance.

Value

A list of: (1) "best", the best ratios and the variance they explain, (2) "all", all ratios tested and the variance they explain, (3) "Z", the standardized data used by the constrained PCA, and (4) "Y", the final ratios used to constrain the PCA.


tpq/propr documentation built on April 21, 2024, 12:50 p.m.