FS: Filter and Feature Selection based on Wilcoxon Rank-Sum test

Description Usage Arguments Value Examples

View source: R/FS.r

Description

Given a training set, this function performs feature selection based on several thresholds: (1). Average relative abudance in each cohort class (minimum relative abundance by default is 0.25%), (2). Total reads per sample (minimum reads per sample is 500 by default), (3). Non-zero ratio out of all samples (By default, at least 10% of the samples should have have non zero value.)

Usage

1
2
3
4
5
6
7
8
9
FS(
  training = data,
  type_col = 2,
  col_start = 3,
  Cutoff_mean = 0.001,
  Cutoff_ratio = 0.1,
  totalReadsCutoff = 500,
  Cutoff_pvalue = 1
)

Arguments

training

A data frame of training set.

type_col

An index indicating at which column is group/type variable. The default is the 3rd column.

col_start

An index indicating at which column is the beginning of bacteria (features) data. Default is the 2nd column.

Cutoff_mean

The minimum average relative abundance allowed in filtering step. Default is 0.0005.

Cutoff_ratio

The non-zero ratio cutoff in filtering features.Default value is 0.1.

totalReadsCutoff

The minimum allowed total reads per sample. Any sample has less than this number of total reads will be removed. Default is 500.

Cutoff_pvalue

The maximum Pvalue allowed for a givien feature to be remained on the list of the selected features.

Value

A list of 2: Feature and CountData.

Feature A list of selected features sorted by their Wilcoxon P values.

CountData A data frame containing balanced data.

Examples

1
2

qunfengdong/DMBC documentation built on April 22, 2020, 7:27 p.m.