w4m_filter_by_sample_class: Filter W4M data matrix by sample-class
In HegemanLab/w4mclassfilter: W4M Class Filter

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/ClassFilter.R

Filter a set of retention-corrected W4M files (dataMatrix, sampleMetadata, variableMetadata) by sample-class or feature-attributes

w4m_filter_by_sample_class(
  dataMatrix_in,
  sampleMetadata_in,
  variableMetadata_in,
  dataMatrix_out,
  sampleMetadata_out,
  variableMetadata_out,
  classes = c(),
  include = FALSE,
  class_column = "class",
  samplename_column = "sampleMetadata",
  name_varmetadata_col1 = TRUE,
  name_smplmetadata_col1 = TRUE,
  variable_range_filter = c(),
  data_imputation = w4m_filter_zero_imputation,
  order_vrbl = "variableMetadata",
  order_smpl = "sampleMetadata",
  centering = c("none", "centroid", "median", "medoid")[1],
  failure_action = function(...) {     cat(paste(..., SEP = "\n")) }
)

`dataMatrix_in`	input data matrix (rows are feature names, columns are sample names
`sampleMetadata_in`	input sample metadata (rows are sample names, one column's name matches class_column)
`variableMetadata_in`	input variable metadata (rows are variable names)
`dataMatrix_out`	output data matrix (rows are feature names, columns are sample names
`sampleMetadata_out`	output sample metadata (rows are sample names, one column's name matches class_column)
`variableMetadata_out`	output variable metadata (rows are variable names)
`classes`	character vector or csv string: names of sample classes to include or exclude; default is an empty vector
`include`	logical: TRUE, include named sample classes; FALSE (the default), exclude named sample classes
`class_column`	character: name of "class" column, defaults to "class"
`samplename_column`	character: name of column with sample name, defaults to "sampleMetadata"
`name_varmetadata_col1`	logical: TRUE, name column 1 of variable metadata as "variableMetadata"; FALSE, no change; default is TRUE
`name_smplmetadata_col1`	logical: TRUE, name column 1 of sample metadata as "sampleMetadata"; FALSE, no change; default is TRUE
`variable_range_filter`	character vector or csv string: vector of filters specified as 'variableMetadataColumnName:min:max'; default is empty vector
`data_imputation`	function(m): default imputation method for 'intb' data, where intensities have background subtracted - impute zero for NA
`order_vrbl`	character vector or csv string: name(s) of column(s) of variableMetadata on which to sort, defaults to "variableMetadata" (i.e., the first column)
`order_smpl`	character vector or csv string: name(s) of column(s) of sampleMetadata on which to sort, defaults to "sampleMetadata" (i.e., the first column)
`centering`	character: center samples by class column (which names treatment). Possible choices: "none", "centroid", "medoid", or "median"
`failure_action`	function(x, ...): action to take upon failure - defaults to 'print(x,...)'

The W4M files dataMatrix, sampleMetadata, and variableMetadata must be a consistent set, i.e., there must be metadata in the latter two files for all (and only for) the samples and variables named in the columns and rows of dataMatrix.

For multivariate statistics functions, samples and variables with zero variance must be eliminated, and missing values are problematic.

Furthermore, frequently, it is desirable to analyze a subset of samples (or features) in the dataMatrix.

This function manipulates produces a set of files with imputed missing values, omitting features and samples that are not consistently present within the set or have zero variance. Secondly, it provides a selection-capability for samples based on whether their sample names match a regular expression pattern; this capability can be used either to select for samples with matching sample names or to exclude them. Thirdly, it provides a selection-capability for features based on whether their metadata lie within the ranges specified by 'variable_range_filter'.

Finally, this function provides as an advanced option to compute one of three types of centers for each treatment:

"centroid" - Return only treatment-centers computed for each treatment as the mean intensity for each feature.
"median" - Return only treatment-centers computed for each treatment as the median intensity for each feature.
"medoid" - Return only treatment-centers computed for each treatement as the sample most similar to the other samples (the medoid).
- By definition, the medoid is the sample having the smallest sum of its distances from other samples in the treatment.
- Distances computed in principal-components space.
  - Principal components are uncorrelated, so they are used here to minimize the distortion of computed distances by correlated features.
"none" - Return all samples; do not computing centers

Inputs (dataMatrix_in, sampleMetadata_in, variableMetadata_in) may be:

character: path to input tab-separated-values-file (TSV)
data.frame
matrix: allowed for dataMatrix_in only
list: must have a member named "dataMatrix", "sampleMetadata", or "variableMetadata" for dataMatrix_in, sampleMetadata_in, or variableMetadata_in, respectively.
environment: must have a member named "dataMatrix", "sampleMetadata", or "variableMetadata" for dataMatrix_in, sampleMetadata_in, or variableMetadata_in, respectively.

Outputs (dataMatrix_out, sampleMetadata_out, variableMetadata_out) may be:

character: path to write a tab-separated-values-file (TSV)
list: will add a member named "dataMatrix", "sampleMetadata", or "variableMetadata" for dataMatrix_out, sampleMetadata_out, or variableMetadata_out, respectively.
environment: will add a member named "dataMatrix", "sampleMetadata", or "variableMetadata" for dataMatrix_out, sampleMetadata_out, or variableMetadata_out, respectively.

Please see the package vignette for further details.

logical: TRUE only if filtration succeeded

Art Eschenlauer, esch0041@umn.edu

https://github.com/HegemanLab/w4mclassfilter

http://workflow4metabolomics.org/

## Not run: 
  # set the paths to your input files
  dataMatrix_in <- "tests/testthat/input_dataMatrix.tsv"
  sampleMetadata_in <- "tests/testthat/input_sampleMetadata.tsv"
  variableMetadata_in <- "tests/testthat/input_variableMetadata.tsv"

  # set the paths to your (nonexistent) output files
  #    in a directory that DOES need to exist
  dataMatrix_out <- "tests/testthat/output_dataMatrix.tsv"
  sampleMetadata_out <- "tests/testthat/output_sampleMetadata.tsv"
  variableMetadata_out <- "tests/testthat/output_variableMetadata.tsv"

  # Example: running the filter to exclude only unwanted samples
  #   include = FALSE means exclude samples with class blankpos
  w4m_filter_by_sample_class(
    dataMatrix_in = dataMatrix_in
  , dataMatrix_out = dataMatrix_out
  , variableMetadata_in = variableMetadata_in
  , variableMetadata_out = variableMetadata_out
  , sampleMetadata_out = sampleMetadata_out
  , sampleMetadata_in = sampleMetadata_in
  , classes = c("M")
  , include = TRUE
  , class_column = "gender"
  , samplename_column = "sampleMetadata"
  , name_varmetadata_col1 = TRUE
  , name_smplmetadata_col1 = TRUE
  , variable_range_filter = c()
  , data_imputation = w4m_filter_zero_imputation
  , order_vrbl = "variableMetadata"
  , order_smpl = "sampleMetadata"
  , centering  = "none"
  , failure_action = function(...) { cat(paste(..., SEP = "\n")) }
  )

## End(Not run)