big.fbed.reg: Forward Backward Early Dropping selection regression for big...

View source: R/big.fbed.reg.R

Forward Backward Early Dropping selection regression for big dataR Documentation

Forward Backward Early Dropping selection regression for big data

Description

Forward Backward Early Dropping selection regression for big data.

Usage

big.fbed.reg(target = NULL, dataset, threshold = 0.01, ini = NULL,
test = "testIndLogistic", K = 0, backward = FALSE) 

Arguments

target

The class variable. Provide either a string, an integer, a numeric value, a vector, a factor, an ordered factor or a Surv object. This can also be NULL and will be extracted from the big.matrix object "dataset". If you want to use the test "censIndWR", for survival data for example, the target must not contain any censored values.

dataset

The dataset; this is abig.matrix object, where rows denote the samples and columns the features. If "target" is NULL, the first column must be the target. Only continuous variables are allowed. Note: In the case of thest being "gSquare", the dataset should contain the target variable in the last line.

threshold

Threshold (suitable values in (0, 1)) for assessing p-values significance. Default value is 0.05.

ini

If you already have the test statistics and the p-values of the univariate associations (the first step of FBED) supply them as a list with the names "stat" and "pvalue" respectively.

test

The available tests: "testIndFisher", "testIndPois", "testIndLogistic", "censIndWR", "testIndQPois", "testIndMultinom", "gSquare".

K

How many times should the process be repeated? The default value is 0. You can also specify a range of values of K, say 0:4 for example.

backward

After the Forward Early Dropping phase, the algorithm proceeds witha the usual Backward Selection phase. The default value is set to TRUE. It is advised to perform this step as maybe some variables are false positives, they were wrongly selected. Pay attention to this, as it will convert the big.matrix object with the selected features into a matrix object in R.

The backward phase using likelihood ratio test is a different functions and can be called directly by the user. So, if you want for example to perform a backard regression with a different threshold value, just use that functions separately.

Details

The algorithm is a variation of the usual forward selection. At every step, the most significant variable enters the selected variables set. In addition, only the significant variables stay and are further examined. The non signifcant ones are dropped. This goes until no variable can enter the set. The user has the option to re-do this step 1 or more times (the argument K). In the end, a backward selection is performed to remove falsely selected variables. Note that you may have specified, for example, K=10, but the maximum value FBED used can be 4 for example.

Notes: The backward phase needs caution, because the big.matrix object with the selected features is turned into a matrix and then the backward selection takes place. In general, this algorithm is to be used with a few tens ,or hundreds of features and millions of rows. It is designed for thin matrices only. The big.gomp on the other hand is designed for thin, fat and matrices with many rows and many columns.

Value

If K is a single number a list including:

univ

If you have used the log-likelihood ratio test this list contains the test statistics and the associated p-values of the univariate associations tests. If you have used the EBIc this list contains the eBIC of the univariate associations. Note, that the "gam" argument must be the same though.

res

A matrix with the selected variables, their test statistic and the associated logged p-value.

info

A matrix with the number of variables and the number of tests performed (or models fitted) at each round (value of K). This refers to the forward phase only.

runtime

The runtime required.

back.rem

The variables removed in the backward phase.

back.n.tests

The number of models fitted in the backward phase.

Author(s)

Michail Tsagris

R implementation and documentation: Michail Tsagris mtsagris@uoc.gr

References

Borboudakis G. and Tsamardinos I. (2019). Forward-backward selection with early dropping. Journal of Machine Learning Research, 20(8): 1-39.

See Also

fs.reg, ebic.bsreg, bic.fsreg, MMPC

Examples

## Not run: 
#simulate a dataset with continuous data
x <- matrix( runif(10^6 * 50, 1, 100), ncol = 50 )
require(bigmemory)
dataset <- bigmemory::as.big.matrix(x)
#define a simulated class variable 
target <- rt(10^6, 10)
a1 <- big.fbed.reg(target, dataset, test = "testIndFisher") 
y <- rpois(10^6, 10)
a2 <- big.fbed.reg(y, dataset, test = "testIndPois") 

## End(Not run)

MXM documentation built on Aug. 25, 2022, 9:05 a.m.