Description Usage Arguments Value Author(s) References See Also Examples
RaSE
is a general ensemble classification framework to solve the sparse classification problem. In RaSE algorithm, for each of the B1 weak learners, B2 random subspaces are generated and the optimal one is chosen to train the model on the basis of some criterion.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27  Rase(
xtrain,
ytrain,
xval = NULL,
yval = NULL,
B1 = 200,
B2 = 500,
D = NULL,
dist = NULL,
base = NULL,
super = list(type = c("separate"), base.update = TRUE),
criterion = NULL,
ranking = TRUE,
k = c(3, 5, 7, 9, 11),
cores = 1,
seed = NULL,
iteration = 0,
cutoff = TRUE,
cv = 5,
scale = FALSE,
C0 = 0.1,
kl.k = NULL,
lower.limits = NULL,
upper.limits = NULL,
weights = NULL,
...
)

xtrain 
n * p observation matrix. n observations, p features. 
ytrain 
n 0/1 observatons. 
xval 
observation matrix for validation. Default = 
yval 
0/1 observation for validation. Default = 
B1 
the number of weak learners. Default = 200. 
B2 
the number of subspace candidates generated for each weak learner. Default = 500. 
D 
the maximal subspace size when generating random subspaces. Default = 
dist 
the distribution for features when generating random subspaces. Default = 
base 
the type of base classifier. Default = 'lda'. Can be either a single string chosen from the following options or a string/probability vector. When it indicates a single type of base classifiers, the classical RaSE model (Tian, Y. and Feng, Y., 2021(b)) will be fitted. When it is a string vector which includes multiple base classifier types, a super RaSE model (Zhu, J. and Feng, Y., 2021) will be fitted, by samling base classifiers with equal probabilty. It can also be a probability vector with row names corresponding to the specific classifier type, in which case a super RaSE model will be trained by sampling base classifiers in the given sampling probability.

super 
a list of control parameters for super RaSE (Zhu, J. and Feng, Y., 2021). Not used when base equals to a single string. Should be a list object with the following components:

criterion 
the criterion to choose the best subspace for each weak learner. For the classical RaSE (when

ranking 
whether the function outputs the selected percentage of each feature in B1 subspaces. Logistic, default = TRUE. 
k 
the number of nearest neightbors considered when 
cores 
the number of cores used for parallel computing. Default = 1. 
seed 
the random seed assigned at the start of the algorithm, which can be a real number or 
iteration 
the number of iterations. Default = 0. 
cutoff 
whether to use the empirically optimal threshold. Logistic, default = TRUE. If it is FALSE, the threshold will be set as 0.5. 
cv 
the number of crossvalidations used. Default = 5. Only useful when 
scale 
whether to normalize the data. Logistic, default = FALSE. 
C0 
a positive constant used when 
kl.k 
the number of nearest neighbors used to estimate RIC in a nonparametric way. Default = 
lower.limits 
the vector of lower limits for each coefficient in logistic regression. Should be a vector of length equal to the number of variables (the column number of 
upper.limits 
the vector of upper limits for each coefficient in logistic regression. Should be a vector of length equal to the number of variables (the column number of 
weights 
observation weights. Should be a vector of length equal to training sample size (the length of 
... 
additional arguments. 
An object with S3 class 'RaSE'
if base
indicates a single base classifier.
marginal 
the marginal probability for each class. 
base 
the type of base classifier. 
criterion 
the criterion to choose the best subspace for each weak learner. 
B1 
the number of weak learners. 
B2 
the number of subspace candidates generated for each weak learner. 
D 
the maximal subspace size when generating random subspaces. 
iteration 
the number of iterations. 
fit.list 
sequence of B1 fitted base classifiers. 
cutoff 
the empirically optimal threshold. 
subspace 
sequence of subspaces correponding to B1 weak learners. 
ranking 
the selected percentage of each feature in B1 subspaces. 
scale 
a list of scaling parameters, including the scaling center and the scale parameter for each feature. Equals to 
An object with S3 class 'super_RaSE'
if base
includes multiple base classifiers or the sampling probability of multiple classifiers.
marginal 
the marginal probability for each class. 
base 
the list of B1 base classifier types. 
criterion 
the criterion to choose the best subspace for each weak learner. 
B1 
the number of weak learners. 
B2 
the number of subspace candidates generated for each weak learner. 
D 
the maximal subspace size when generating random subspaces. 
iteration 
the number of iterations. 
fit.list 
sequence of B1 fitted base classifiers. 
cutoff 
the empirically optimal threshold. 
subspace 
sequence of subspaces correponding to B1 weak learners. 
ranking.feature 
the selected percentage of each feature corresponding to each type of classifier. 
ranking.base 
the selected percentage of each classifier type in the selected B1 learners. 
scale 
a list of scaling parameters, including the scaling center and the scale parameter for each feature. Equals to 
Ye Tian (maintainer, ye.t@columbia.edu) and Yang Feng. The authors thank Yu Cao (Exeter Finance) and his team for many helpful suggestions and discussions.
Tian, Y. and Feng, Y., 2021(a). RaSE: A variable screening framework via random subspace ensembles. Journal of the American Statistical Association, (justaccepted), pp.130.
Tian, Y. and Feng, Y., 2021(b). RaSE: Random subspace ensemble classification. Journal of Machine Learning Research, 22(45), pp.193.
Zhu, J. and Feng, Y., 2021. Super RaSE: Super Random Subspace Ensemble Classification. https://www.preprints.org/manuscript/202110.0042
Chen, J. and Chen, Z., 2008. Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 95(3), pp.759771.
Chen, J. and Chen, Z., 2012. Extended BIC for smallnlargeP sparse GLM. Statistica Sinica, pp.555574.
Akaike, H., 1973. Information theory and an extension of the maximum likelihood principle. In 2nd International Symposium on Information Theory, 1973 (pp. 267281). Akademiai Kaido.
Schwarz, G., 1978. Estimating the dimension of a model. The annals of statistics, 6(2), pp.461464.
predict.RaSE
, RaModel
, print.RaSE
, print.super_RaSE
, RaPlot
, RaScreen
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60  set.seed(0, kind = "L'EcuyerCMRG")
train.data < RaModel("classification", 1, n = 100, p = 50)
test.data < RaModel("classification", 1, n = 100, p = 50)
xtrain < train.data$x
ytrain < train.data$y
xtest < test.data$x
ytest < test.data$y
# test RaSE classifier with LDA base classifier
fit < Rase(xtrain, ytrain, B1 = 100, B2 = 50, iteration = 0, base = 'lda',
cores = 2, criterion = 'ric')
mean(predict(fit, xtest) != ytest)
## Not run:
# test RaSE classifier with LDA base classifier and 1 iteration round
fit < Rase(xtrain, ytrain, B1 = 100, B2 = 50, iteration = 1, base = 'lda',
cores = 2, criterion = 'ric')
mean(predict(fit, xtest) != ytest)
# test RaSE classifier with QDA base classifier and 1 iteration round
fit < Rase(xtrain, ytrain, B1 = 100, B2 = 50, iteration = 1, base = 'qda',
cores = 2, criterion = 'ric')
mean(predict(fit, xtest) != ytest)
# test RaSE classifier with kNN base classifier
fit < Rase(xtrain, ytrain, B1 = 100, B2 = 50, iteration = 0, base = 'knn',
cores = 2, criterion = 'loo')
mean(predict(fit, xtest) != ytest)
# test RaSE classifier with logistic regression base classifier
fit < Rase(xtrain, ytrain, B1 = 100, B2 = 50, iteration = 0, base = 'logistic',
cores = 2, criterion = 'bic')
mean(predict(fit, xtest) != ytest)
# test RaSE classifier with SVM base classifier
fit < Rase(xtrain, ytrain, B1 = 100, B2 = 50, iteration = 0, base = 'svm',
cores = 2, criterion = 'training')
mean(predict(fit, xtest) != ytest)
# test RaSE classifier with random forest base classifier
fit < Rase(xtrain, ytrain, B1 = 20, B2 = 10, iteration = 0, base = 'randomforest',
cores = 2, criterion = 'cv', cv = 3)
mean(predict(fit, xtest) != ytest)
# fit a super RaSE classifier by sampling base learner from kNN, LDA and logistic
# regression in equal probability
fit < Rase(xtrain = xtrain, ytrain = ytrain, B1 = 100, B2 = 100,
base = c("knn", "lda", "logistic"), super = list(type = "separate", base.update = T),
criterion = "cv", cv = 5, iteration = 1, cores = 2)
mean(predict(fit, xtest) != ytest)
# fit a super RaSE classifier by sampling base learner from random forest, LDA and
# SVM with probability 0.2, 0.5 and 0.3
fit < Rase(xtrain = xtrain, ytrain = ytrain, B1 = 100, B2 = 100,
base = c(randomforest = 0.2, lda = 0.5, svm = 0.3),
super = list(type = "separate", base.update = F),
criterion = "cv", cv = 5, iteration = 0, cores = 2)
mean(predict(fit, xtest) != ytest)
## End(Not run)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.