feature_removal: Stepwise feature removal method
In cihga39871/iteremoval: Iteration removal method for feature selection

Description Usage Arguments Details Value Other usages Author(s) Examples

View source: R/feature_removal.R

This function screens features iteratively in consideration of limiting overfitting and overall performance.

feature_removal(g1 = NULL, g0 = NULL, cutoff1, cutoff0, lt = ">",
  offset = 1, weight.method = reciprocal_colSums,
  scoreStandardization.method = min_max,
  scoreCombine.method = linear_combine, SE = NULL, g0.filter = NULL, ...)

`g1`	a dataframe with the row of feature, and the column of observation. Cells are numeric or bool. If NULL, input data should be param SE and g0.filter.
`g0`	a dataframe with the same row names as `g1`. Normally, the observations in `g0` are in the distinct group of `g1`. If NULL, input data should be param SE and g0.filter.
`cutoff1`	`g1` is converted to a dataframe filled with 1 or 0 by `cutoff1` and `lt`. The result is called `g1.signal`. For example, if `lt=">"`, the result of the step is `g1.signal <- g1 > cutoff1`. If you do not want the conversion, let `lt="skip"`.
`cutoff0`	`g0` is converted to dataframes of 1 or 0 by `cutoff0` and `lt`. It has the same usage as `cutoff1`. Different `cutoff1` and `cutoff0` influence overfitting.
`lt`	An operator to compare `gx` and `cutoffx`. Default is ">". Other options include ">=", "<=", "<", etc. Additionally, `lt="skip"` skips the comparation and `cutoffx` will be ignored.
`offset`	a parameter in `scoreCombine.method`. It adjusts the score proportion of g1 and g2. Besides, `offset` can be a number or a numeric vector. If it is a vector, the overall iteration is done for each offset respectively. See more in parameter `scoreCombine.method`.
`weight.method`	`gx.weight`, weight of gx, is computed using `weight.method`. The weight is for the observations/columns, not the features/rows. The default weight method is `reciprocal_colSums`, ie. `1 / (1 + colSums(gx.signal, na.rm=T))`. You can specify your own function, and the first parameter of the function should be the exact word of `gx.signal`.
`scoreStandardization.method`	Default standardization method is Min-Max, ie. normalizing the vector to 0-1 range. You can specify your own function, and the first parameter of the function is the sum-up dataframe. See more in Details section.
`scoreCombine.method`	to combine the feature score vectors of g1 and g0. This method must have three parameters in order, `g1.score.feature`, `g0.score.feature`, and `offset`. Default method is `linear_combine`. `offset` in the default method adjusts the proportion of `g1.score.feature`. Specifically, `g1.score.feature * offset + g0.score.feature`. Besides, `offset` can be a number or a vector. If it is a vector, the overall iteration is done for each offset respectively.
`SE`	a SummarizedExperiment object. If NULL, input data should be g1 and g0.
`g0.filter`	a logical vector `g0.filter` to define `SE`'s columns that belong to `g0`. If NULL, input data should be param g1 and g0.
`...`	Other parameter passed to method of expression class.

The method removes one feature/row in each iteration, and requires (A) two dataframes, g1 and g0, with identical row names; OR (B) A SummarizedExperiment object SE, and a logical vector g0.filter to define SE's columns that belong to g0. Normally, g0 is the control set. SE will be devided to g1 and g0 automatically.

In each iteration, first, g1 and g0 are converted to dataframes of 1 or 0 by cutoff1, cutoff0, and lt. The converted dataframes are called gx.singal, and x stands for 1 and 0. If you do not want the conversion, let lt="skip", and cutoffs will be ignored.

Second, gx.weight, weight of gx, is computed using weight.method. The weight is for the observations/columns, not the features/rows. The default weight method is reciprocal_colSums, ie. 1 / (1 + colSums(gx.signal, na.rm=T)). You can specify your own function, and the first parameter of the function should be the exact word of gx.signal.

Third, gx.score, the score dataframe for observations and features, is computed. It is the result of dot product of gx.signal and gx.weight.

Then, Summing up gx.score by row, and the result is standardized with function scoreStandardization.method. Default standardization method is Min-Max, ie. normalizing the vector to 0-1 range. You can specify your own function, and the first parameter of the function is the sum-up dataframe.

After that, gx.score.feature, the feature scores of gx are calculated. Now using scoreCombine.method to combine the feature score vectors of g1 and g0. This method must have three parameters in order, g1.score.feature, g0.score.feature, and offset. Default method is linear_combine. offset in the default method adjusts the proportion of g1.score.feature. Specifically, g1.score.feature * offset + g0.score.feature. Besides, offset can be a number or a vector. If it is a vector, the overall iteration is done for each offset respectively.

a list with names "offset", "removed.feature_names", "removed.scores", and "max.scores".

feature_removal(g1, g0, cutoff1, cutoff0, lt = ">", offset = 1, weight.method = reciprocal_colSums, scoreStandardization.method = min_max, scoreCombine.method = linear_combine, ...)

feature_removal(SE, g0.filter, cutoff1, cutoff0, lt = ">", offset = 1, weight.method = reciprocal_colSums, scoreStandardization.method = min_max, scoreCombine.method = linear_combine, ...)

Jiacheng CHUAN

g1 <- SWRG1; g0 <- SWRG0
result.simple.A <- feature_removal(g1, g0, cutoff1=0.95, cutoff0=0.95)

result.simple.B <- feature_removal(SummarizedData, SummarizedData$Group==0,
    cutoff1=0.95, cutoff0=0.95)

result.complex <- feature_removal(g1, g0,
    cutoff1=0.95, cutoff0=0.925, lt=">",
    offset=c(0.5, 2),
    weight.method="reciprocal_colSums",
    scoreStandardization.method="min_max",
    scoreCombine.method="linear_combine")