w4m_filter_by_sample_class: Filter W4M data matrix by sample-class

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/ClassFilter.R

Description

Filter a set of retention-corrected W4M files (dataMatrix, sampleMetadata, variableMetadata) by sample-class or feature-attributes

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
w4m_filter_by_sample_class(
  dataMatrix_in,
  sampleMetadata_in,
  variableMetadata_in,
  dataMatrix_out,
  sampleMetadata_out,
  variableMetadata_out,
  classes = c(),
  include = FALSE,
  class_column = "class",
  samplename_column = "sampleMetadata",
  name_varmetadata_col1 = TRUE,
  name_smplmetadata_col1 = TRUE,
  variable_range_filter = c(),
  data_imputation = w4m_filter_zero_imputation,
  order_vrbl = "variableMetadata",
  order_smpl = "sampleMetadata",
  centering = c("none", "centroid", "median", "medoid")[1],
  failure_action = function(...) {     cat(paste(..., SEP = "\n")) }
)

Arguments

dataMatrix_in

input data matrix (rows are feature names, columns are sample names

sampleMetadata_in

input sample metadata (rows are sample names, one column's name matches class_column)

variableMetadata_in

input variable metadata (rows are variable names)

dataMatrix_out

output data matrix (rows are feature names, columns are sample names

sampleMetadata_out

output sample metadata (rows are sample names, one column's name matches class_column)

variableMetadata_out

output variable metadata (rows are variable names)

classes

character vector or csv string: names of sample classes to include or exclude; default is an empty vector

include

logical: TRUE, include named sample classes; FALSE (the default), exclude named sample classes

class_column

character: name of "class" column, defaults to "class"

samplename_column

character: name of column with sample name, defaults to "sampleMetadata"

name_varmetadata_col1

logical: TRUE, name column 1 of variable metadata as "variableMetadata"; FALSE, no change; default is TRUE

name_smplmetadata_col1

logical: TRUE, name column 1 of sample metadata as "sampleMetadata"; FALSE, no change; default is TRUE

variable_range_filter

character vector or csv string: vector of filters specified as 'variableMetadataColumnName:min:max'; default is empty vector

data_imputation

function(m): default imputation method for 'intb' data, where intensities have background subtracted - impute zero for NA

order_vrbl

character vector or csv string: name(s) of column(s) of variableMetadata on which to sort, defaults to "variableMetadata" (i.e., the first column)

order_smpl

character vector or csv string: name(s) of column(s) of sampleMetadata on which to sort, defaults to "sampleMetadata" (i.e., the first column)

centering

character: center samples by class column (which names treatment). Possible choices: "none", "centroid", "medoid", or "median"

failure_action

function(x, ...): action to take upon failure - defaults to 'print(x,...)'

Details

The W4M files dataMatrix, sampleMetadata, and variableMetadata must be a consistent set, i.e., there must be metadata in the latter two files for all (and only for) the samples and variables named in the columns and rows of dataMatrix.

For multivariate statistics functions, samples and variables with zero variance must be eliminated, and missing values are problematic.

Furthermore, frequently, it is desirable to analyze a subset of samples (or features) in the dataMatrix.

This function manipulates produces a set of files with imputed missing values, omitting features and samples that are not consistently present within the set or have zero variance. Secondly, it provides a selection-capability for samples based on whether their sample names match a regular expression pattern; this capability can be used either to select for samples with matching sample names or to exclude them. Thirdly, it provides a selection-capability for features based on whether their metadata lie within the ranges specified by 'variable_range_filter'.

Finally, this function provides as an advanced option to compute one of three types of centers for each treatment:

Inputs (dataMatrix_in, sampleMetadata_in, variableMetadata_in) may be:

Outputs (dataMatrix_out, sampleMetadata_out, variableMetadata_out) may be:

Please see the package vignette for further details.

Value

logical: TRUE only if filtration succeeded

Author(s)

Art Eschenlauer, esch0041@umn.edu

See Also

https://github.com/HegemanLab/w4mclassfilter

http://workflow4metabolomics.org/

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
## Not run: 
  # set the paths to your input files
  dataMatrix_in <- "tests/testthat/input_dataMatrix.tsv"
  sampleMetadata_in <- "tests/testthat/input_sampleMetadata.tsv"
  variableMetadata_in <- "tests/testthat/input_variableMetadata.tsv"

  # set the paths to your (nonexistent) output files
  #    in a directory that DOES need to exist
  dataMatrix_out <- "tests/testthat/output_dataMatrix.tsv"
  sampleMetadata_out <- "tests/testthat/output_sampleMetadata.tsv"
  variableMetadata_out <- "tests/testthat/output_variableMetadata.tsv"

  # Example: running the filter to exclude only unwanted samples
  #   include = FALSE means exclude samples with class blankpos
  w4m_filter_by_sample_class(
    dataMatrix_in = dataMatrix_in
  , dataMatrix_out = dataMatrix_out
  , variableMetadata_in = variableMetadata_in
  , variableMetadata_out = variableMetadata_out
  , sampleMetadata_out = sampleMetadata_out
  , sampleMetadata_in = sampleMetadata_in
  , classes = c("M")
  , include = TRUE
  , class_column = "gender"
  , samplename_column = "sampleMetadata"
  , name_varmetadata_col1 = TRUE
  , name_smplmetadata_col1 = TRUE
  , variable_range_filter = c()
  , data_imputation = w4m_filter_zero_imputation
  , order_vrbl = "variableMetadata"
  , order_smpl = "sampleMetadata"
  , centering  = "none"
  , failure_action = function(...) { cat(paste(..., SEP = "\n")) }
  )

## End(Not run)

HegemanLab/w4mclassfilter documentation built on March 14, 2021, 1:19 a.m.