rowGroupMeans: Calculate row group means, or other statistics

rowGroupMeansR Documentation

Calculate row group means, or other statistics

Description

Calculate row group means, or other statistics, where: rowGroupMeans() calculates row summary stats; and rowGroupRmOutliers() is a convenience function to call rowGroupMeans(..., rmOutliers=TRUE, returnType="input").

Usage

rowGroupMeans(
  x,
  groups,
  na.rm = TRUE,
  useMedian = TRUE,
  rmOutliers = FALSE,
  crossGroupMad = TRUE,
  madFactor = 5,
  returnType = c("output", "input"),
  rowStatsFunc = NULL,
  groupOrder = c("same", "sort"),
  keepNULLlevels = FALSE,
  includeAttributes = FALSE,
  verbose = FALSE,
  ...
)

rowGroupRmOutliers(
  x,
  groups,
  na.rm = TRUE,
  rmOutliers = TRUE,
  crossGroupMad = TRUE,
  madFactor = 5,
  returnType = c("input"),
  groupOrder = c("same", "sort"),
  keepNULLlevels = FALSE,
  includeAttributes = FALSE,
  verbose = FALSE,
  ...
)

Arguments

x

numeric data matrix

groups

character or factor vector of group labels, either as a character vector, or a factor. See the parameter groupOrder for ordering of group labels in the output data matrix.

useMedian

logical indicating whether the default stat should be "mean" or "median".

rmOutliers

logical indicating whether to apply outlier detection and removal.

crossGroupMad

logical indicating whether to calculate row MAD values using the median across groups for each row. The median is calculated using non-NA and non-zero row group MAD values. When crossGroupMad=TRUE it also calculates the non-NA, non-zero median row MAD across all rows, which defines the minimum difference from median applied across all values to be considered an outlier.

madFactor

numeric value indicating the multiple of the MAD value to define outliers. For example madFactor=5 will take the MAD value for a group multiplied by 5, 5MAD, as a threshold for outliers. So any points more than 5MAD distance from the median per group are outliers.

returnType

character value indicating the return data type, "output" returns one summary stat value per group, per row; "input" is useful when rmOutliers=TRUE in that it returns a matrix with the same dimensions as the input, except with outlier points replaced with NA.

rowStatsFunc

optional function which takes a numeric matrix as input, and returns a numeric vector equal to the number of rows of the input data matrix. Examples: base::rowMeans(), matrixStats::rowMedians(), matrixStats::rowMads.

groupOrder

character string indicating how character group labels are ordered in the final data matrix, when returnType="output". Note that when groups is a factor, the factor levels are kept in that order. Otherwise, "same" keeps groups in the same order they appear in the input matrix; "sort" applies jamba::mixedSort() to the labels.

keepNULLlevels

logical indicating whether to keep factor levels even when there are no corresponding columns in x. When TRUE and returnType="output" the output matrix will contain one colname for each factor level, with NA values used to fill empty factor levels. This mechanism can be helpful to ensure that output matrices have consistent colnames.

includeAttributes

logical indicating whether to include attributes with "n" number of replicates per group, and "nLabel" with replicate label in ⁠n=#⁠ form.

verbose

logical indicating whether to print verbose output.

...

additional parameters are passed to rowStatsFunc, and if rmOutliers=TRUE to jamba::rowRmMadOutliers().

Details

This function by default calculates group mean values per row in a numeric matrix. However, the stat function can be changed to calculate row medians, row MADs, etc.

An added purpose of this function is optional outlier filtering, via calculation of MAD values and applying a MAD threshold cutoff. The intention is to identify technical outliers that otherwise adversely affect the calculated group mean or median values. To inspect the data after outlier removal, use the parameter returnType="input" which will return the input data matrix with NA substituted for outlier points. Outlier detection and removal is performed by jamba::rowRmMadOutliers().

Value

When returnType="output" the output is a numeric matrix with the same number of columns as the number of unique groups labels. When groups is a factor and keepNULLlevels=TRUE, the number of columns will be the number of factor levels, otherwise it will be the number of factor levels used in groups.

When returnType="input" the output is a numeric matrix with the same dimensions as the input data. This output is intended for use with rmOutliers=TRUE which will replace outlier points with NA values. Therefore, this matrix can be used to see the location of outliers.

The function also returns attributes that describe the number of samples per group overall:

attr(out, "n")

The attribute "n" is used to describe the number of replicates per group.

attr(out, "nLabel")

The attribute "nLabel" is a simple text label in the form "n=3".

Note that when rmOutliers=TRUE the number of replicates per group will vary depending upon the outliers removed. In that case, remember that the reported "n" is always the total possible columns available prior to outlier removal.

See Also

Other jam numeric functions: deg2rad(), fix_matrix_ratio(), noiseFloor(), normScale(), rad2deg(), rowRmMadOutliers(), warpAroundZero()

Examples

x <- matrix(ncol=9, rnorm(90));
colnames(x) <- LETTERS[1:9];
rowGroupMeans(x, groups=rep(letters[1:3], each=3))


jmw86069/jamba documentation built on Oct. 9, 2024, 10:52 a.m.