random.subset: Selects a random subset of the input.

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/random.subset.R

Description

If a subset of samples are selected randomly, the navigate of positive classes might be too sparse or even empty. This function will repeat sampling until the classes are appropriate in this sense.

Usage

1
random.subset(F_, L_, gamma, persistence = 1000, minimum.class.size=2, replace)

Arguments

F_

The feature matrix, each column is a feature.

L_

The vector of labels named according to the rows of F.

gamma

A value in range 0-1 that determines the relative size of sample subsets.

persistence

Maximum number of tries for randomly choosing.samples, If we try this many times and the obtained labels are all the same, we give up (maybe the whole labels are the same) with the error message: " Not enough variation in the labels...".

minimum.class.size

A lower bound on the number of samples in each class.

replace

If TRUE, sampling is done by replacement.

Details

The function also returns a refined feature matrix by ignoring too sparse features after sampling.

Value

Returns a list of:

X_

The sampled feature matrix, each column is a feature after ignoring the redundant ones.

Y_

The vector of labels named according to the rows of X_.

remainder.samples

The names of the rows of F_ which do not appear in X_, later on can be used for validation.

Author(s)

Habil Zare

References

"Statistical Analysis of Overfitting Features", manuscript in preparation.

See Also

FeaLect, train.doctor, doctor.validate, random.subset, compute.balanced,compute.logistic.score, ignore.redundant, input.check.FeaLect

Examples

1
2
3
4
5
6
7
8
9
library(FeaLect)
data(mcl_sll)
F <- as.matrix(mcl_sll[ ,-1])	# The Feature matrix
L <- as.numeric(mcl_sll[ ,1])	# The labels
names(L) <- rownames(F)
message(dim(F)[1], " samples and ",dim(F)[2], " features.")

XY <- random.subset(F_=F, L_=L, gamma=3/4,replace=TRUE)
XY$remainder.samples

FeaLect documentation built on Feb. 26, 2020, 1:06 a.m.