jammacalc: Calculate MA-plot data

jammacalcR Documentation

Calculate MA-plot data

Description

Calculate MA-plot data

Usage

jammacalc(
  x,
  na.rm = TRUE,
  controlSamples = NULL,
  centerGroups = NULL,
  controlFloor = NA,
  naControlAction = c("row", "floor", "min", "na"),
  naControlFloor = 0,
  groupedX = TRUE,
  useMedian = TRUE,
  useMean = NULL,
  whichSamples = NULL,
  noise_floor = -Inf,
  noise_floor_value = noise_floor,
  naValue = NA,
  mad_row_min = 0,
  grouped_mad = TRUE,
  centerFunc = centerGeneData,
  useRank = FALSE,
  returnType = c("ma_list", "tidy"),
  verbose = FALSE,
  ...
)

Arguments

x

numeric matrix typically containing log-normal measurements, with measurement rows, and sample columns.

na.rm

logical indicating whether to ignore NA values during numeric summary functions.

controlSamples

character vector containing values in colnames(x) to define control samples used during centering. These values are passed to centerGeneData().

centerGroups

character vector with length equal to ncol(x) which defines the group for each column in x. Data will be centered within each group.

groupedX

logical indicating how to calculate the x-axis value when centerGroups contains multiple groups. When groupedX=TRUE, the mean of each group median is used, which has the effect of representing each group equally. When groupedX=FALSE, the median across all columns is used, which can have the effect of preferring sample groups with a larger number of columns.

useMedian

logical indicating whether to use the median values when calculating the x-axis and during data centering. The median naturally reduces the effect of outlier points on the resulting MA-plots., when compared to using the mean. When useMedian=FALSE, the mean value is used.

useMean

(deprecated) logical indicating whether to use the mean instead of the median value. This argument is being removed in order to improve consistency with other Jam package functions.

whichSamples

character vector containing colnames(x), or integer vector referencing column numbers in x. This argument specifies which columns to return, but does not change the columns used to define the group centering values. For example, the group medians are calculated using all the data, but only the samples in whichSamples are centered to produce MA-plot data.

noise_floor

numeric value indicating the minimum numeric value allowed in the input matrix x. When NULL or -Inf no noise floor is applied. It is common to set noise_floor=0 to limit MA-plot data to use values zero and above.

noise_floor_value

single numeric value used to replace numeric values at or below noise_floor when noise_floor is not NULL. By default, noise_floor_value=noise_floor which means values at or below the noise floor are set to the floor. Another useful option is noise_floor_value=NA which has the effect of removing the point from the MA-plot altogether. This option is recommended for sparse data matrices where the presence of values at or below zero are indicative of missing data (zero-inflated data) and does not automatically reflect an actual value of zero.

naValue

single numeric value used to replace any NA values in the input matrix x. This argument can be useful to replace NA values with something like zero.

mad_row_min

numeric value defining the minimum group value, corresponding to the x-axis position on the MA-plot, required for a row to be included in the MAD calculation. This threshold is useful to filter outlier data below a noise threshold, so that the MAD calculation will include only the data above that value. For example, with count data, it is useful to filter out counts below roughly 8, where Poisson noise is a more dominant component than real count data. Remember that count data should already be log2-transformed, so the threshold should also be identically transformed, for example using log2(1 + 8) to set a minimum count threshold of at least 8.

grouped_mad

logical indicating whether the MAD value should be calculated per group when centerGroups is supplied, from which the MAD factor values are derived. When TRUE it has the effect of highlighting outliers within each group using the variability in that group. When FALSE the overall MAD is calculated, and a particularly high variability group may have all its group members labeled with a high MAD factor.

centerFunc

function used for centering data, by default one of the functions centerGeneData() or centerGeneData_v1(). This argument will be removed in the near future and is mainly intended to allow testing the two centering functions. The following arguments are passed to this function:

  • x: the input numeric data matrix

  • na.rm: logical whether to ignore NA value. Always use na.rm=TRUE.

  • controlSamples: character optional subset of colnames(x) to use as reference controls during centering

  • centerGroups: character vector of groups for colnames(x)

  • controlFloor: numeric optional minimum allowed value for control summary prior to centering

  • naControlAction: character string for how to handle entirely NA control groups during centering

  • naControlFloor: numeric used when naControlAction="floor" and all control values are NA. One numeric value is inserted into the control group.

  • useMedian: logical whether to use median (TRUE) or mean (FALSE)

  • returnGroups: logical whether to return summary of group assignment in attribute "center_df"

  • returnGroupedValues: logical whether to return group summary values in attribute "x_group"

  • ...: other arguments are passed along via ....

returnType

character string indicating the format of data to return: "ma_list" is a list of MA-plot two-column numeric matrices with colnames c("x","y"); "tidy" returns a tall data.frame suitable for use in ggplot2.

verbose

logical indicating whether to print verbose output.

...

additional arguments are ignored.

Details

This function takes a numeric matrix as input, and calculates data sufficient to produce MA-plots. The default output is a list of two-column numeric matrices with "x" and "y" coordinates, representing the group median and difference from median, respectively.

The mean value can be used by setting useMedian=FALSE.

Samples can be grouped using the argument centerGroups. In this case the y-axis value will be "difference from group median."

Control samples can be specified for centering using the argument controlSamples. In this case, the y-axis value will be "difference from control median".

The sample grouping, and control samples can be combined, in which case the y-axis values will be "difference from the control median within the centering group."

See Also

Other jam matrix functions: centerGeneData(), jammanorm(), matrix_to_column_rank()


jmw86069/jamma documentation built on Oct. 11, 2024, 7:08 a.m.