feature_removal: Stepwise feature removal method

Description Usage Arguments Details Value Other usages Author(s) Examples

View source: R/feature_removal.R

Description

This function screens features iteratively in consideration of limiting overfitting and overall performance.

Usage

1
2
3
4
feature_removal(g1 = NULL, g0 = NULL, cutoff1, cutoff0, lt = ">",
  offset = 1, weight.method = reciprocal_colSums,
  scoreStandardization.method = min_max,
  scoreCombine.method = linear_combine, SE = NULL, g0.filter = NULL, ...)

Arguments

g1

a dataframe with the row of feature, and the column of observation. Cells are numeric or bool. If NULL, input data should be param SE and g0.filter.

g0

a dataframe with the same row names as g1. Normally, the observations in g0 are in the distinct group of g1. If NULL, input data should be param SE and g0.filter.

cutoff1

g1 is converted to a dataframe filled with 1 or 0 by cutoff1 and lt. The result is called g1.signal. For example, if lt=">", the result of the step is g1.signal <- g1 > cutoff1. If you do not want the conversion, let lt="skip".

cutoff0

g0 is converted to dataframes of 1 or 0 by cutoff0 and lt. It has the same usage as cutoff1. Different cutoff1 and cutoff0 influence overfitting.

lt

An operator to compare gx and cutoffx. Default is ">". Other options include ">=", "<=", "<", etc. Additionally, lt="skip" skips the comparation and cutoffx will be ignored.

offset

a parameter in scoreCombine.method. It adjusts the score proportion of g1 and g2. Besides, offset can be a number or a numeric vector. If it is a vector, the overall iteration is done for each offset respectively. See more in parameter scoreCombine.method.

weight.method

gx.weight, weight of gx, is computed using weight.method. The weight is for the observations/columns, not the features/rows. The default weight method is reciprocal_colSums, ie. 1 / (1 + colSums(gx.signal, na.rm=T)). You can specify your own function, and the first parameter of the function should be the exact word of gx.signal.

scoreStandardization.method

Default standardization method is Min-Max, ie. normalizing the vector to 0-1 range. You can specify your own function, and the first parameter of the function is the sum-up dataframe. See more in Details section.

scoreCombine.method

to combine the feature score vectors of g1 and g0. This method must have three parameters in order, g1.score.feature, g0.score.feature, and offset. Default method is linear_combine. offset in the default method adjusts the proportion of g1.score.feature. Specifically, g1.score.feature * offset + g0.score.feature. Besides, offset can be a number or a vector. If it is a vector, the overall iteration is done for each offset respectively.

SE

a SummarizedExperiment object. If NULL, input data should be g1 and g0.

g0.filter

a logical vector g0.filter to define SE's columns that belong to g0. If NULL, input data should be param g1 and g0.

...

Other parameter passed to method of expression class.

Details

The method removes one feature/row in each iteration, and requires (A) two dataframes, g1 and g0, with identical row names; OR (B) A SummarizedExperiment object SE, and a logical vector g0.filter to define SE's columns that belong to g0. Normally, g0 is the control set. SE will be devided to g1 and g0 automatically.

In each iteration, first, g1 and g0 are converted to dataframes of 1 or 0 by cutoff1, cutoff0, and lt. The converted dataframes are called gx.singal, and x stands for 1 and 0. If you do not want the conversion, let lt="skip", and cutoffs will be ignored.

Second, gx.weight, weight of gx, is computed using weight.method. The weight is for the observations/columns, not the features/rows. The default weight method is reciprocal_colSums, ie. 1 / (1 + colSums(gx.signal, na.rm=T)). You can specify your own function, and the first parameter of the function should be the exact word of gx.signal.

Third, gx.score, the score dataframe for observations and features, is computed. It is the result of dot product of gx.signal and gx.weight.

Then, Summing up gx.score by row, and the result is standardized with function scoreStandardization.method. Default standardization method is Min-Max, ie. normalizing the vector to 0-1 range. You can specify your own function, and the first parameter of the function is the sum-up dataframe.

After that, gx.score.feature, the feature scores of gx are calculated. Now using scoreCombine.method to combine the feature score vectors of g1 and g0. This method must have three parameters in order, g1.score.feature, g0.score.feature, and offset. Default method is linear_combine. offset in the default method adjusts the proportion of g1.score.feature. Specifically, g1.score.feature * offset + g0.score.feature. Besides, offset can be a number or a vector. If it is a vector, the overall iteration is done for each offset respectively.

Value

a list with names "offset", "removed.feature_names", "removed.scores", and "max.scores".

Other usages

feature_removal(g1, g0, cutoff1, cutoff0, lt = ">", offset = 1, weight.method = reciprocal_colSums, scoreStandardization.method = min_max, scoreCombine.method = linear_combine, ...)

feature_removal(SE, g0.filter, cutoff1, cutoff0, lt = ">", offset = 1, weight.method = reciprocal_colSums, scoreStandardization.method = min_max, scoreCombine.method = linear_combine, ...)

Author(s)

Jiacheng CHUAN

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
g1 <- SWRG1; g0 <- SWRG0
result.simple.A <- feature_removal(g1, g0, cutoff1=0.95, cutoff0=0.95)

result.simple.B <- feature_removal(SummarizedData, SummarizedData$Group==0,
    cutoff1=0.95, cutoff0=0.95)

result.complex <- feature_removal(g1, g0,
    cutoff1=0.95, cutoff0=0.925, lt=">",
    offset=c(0.5, 2),
    weight.method="reciprocal_colSums",
    scoreStandardization.method="min_max",
    scoreCombine.method="linear_combine")

cihga39871/iteremoval documentation built on May 17, 2019, 10:12 p.m.