title: "A demonstration of the RaSEn package"
author: "Ye Tian and Yang Feng"
date: "r Sys.Date()
"
header-includes:
- \usepackage{dsfont}
- \usepackage{bm}
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{RaSEn demo}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
We provide a detailed demo of the usage for the \verb+RaSEn+ package. This package implements the random subspace ensemble classification (RaSE) method (@tian2021rase), the variable screening approach via RaSE (@tian2021variable) and the super RaSE method (@zhu2021super).
library(formatR)
Suppose we have training data ${\bm{x}i, y_i}{i=1}^n \in {\mathbb{R}^p, {0, 1}}$, where each $\bm{x}_i$ is a $1 \times p$ vector.
Based on training data, RaSE algorithm aims to generate $B_1$ weak learners ${C_n^{S_j}}{j=1}^{B_1}$, each of which is constructed in a feature subspace $S_j \subseteq {1, ..., p}$ instead using all $p$ features. To obtain each weak learner, $B_2$ candidates ${C_n^{S{jk}}}{k=1}^{B_2}$ are trained based in subspaces ${S{jk}}_{k=1}^{B_2}$, respectively. To choose the optimal one among these $B_2$ candidates, some criteria need to be applied, including minimizing ratio information criterion (RIC, @tian2021rase), minimizing extended Bayes information criterion (eBIC, @chen2008extended, @chen2012extended), minimizing the training error, minimizing the validation error (if validation data is available), minimizing the cross-validation error, minimizing leave-one-out error etc. And the type of weak learner can be quite flexible.
To better adapt RaSE into the sparse setting, we can update the distribution of random feature subspaces according to the selected percentage of features in $B_1$ subspaces in each round. This can be seen as an adaptive strategy to increase the possibility to cover the signals that contribute to our model, which can improve the performance of RaSE classifiers in sparse settings.
The selected percentage of each of $p$ features in $B_1$ subspaces can be used for feature ranking as well. And we could plot the selected percentage to intuitively rank the importance of each feature in a RaSE model.
RaSEn
can be installed from CRAN.
install.packages("RaSEn", repos = "http://cran.us.r-project.org")
Then we can load the package:
library(RaSEn)
We will show in this section how to fit RaSE classifiers based on different types of base classifiers. First we generate the data from a binary guanssian mixture model (model 1 in @tian2021rase) $$ \bm{x} \sim (1-y)N(\bm{\mu}^{(0)}, \Sigma) + yN(\bm{\mu}^{(1)}, \Sigma), $$ where $\bm{\mu}^{(0)}, \bm{\mu}^{(1)}$ are both $1 \times p$ vectors, $\Sigma$ is a $p \times p$ symmetric positive definite matrix. Here $y$ follows a bernoulli distribution: $$ y \sim Bernoulli(\pi_1), $$ where $\pi_1 \in (0,1)$ and we denote $\pi_0 = 1-\pi_1$.
Here we follow from the setting of @mai2012direct, letting $\Sigma = (0.5^{|i-j|}){p \times p} , \bm{\mu}^{(0)} = \bm{0}{p \times 1}, \bm{\mu}^{(1)} = \Sigma^{-1}\times 0.556(3, 1.5, 0, 0, 2, \bm{0}_{1 \times (p-5)})^T$. Let $n = 100, p =50$. According to the definition of minimal discriminative set in @tian2021rase, here the minimal discriminative set $S^* = {1, 2, 5}$, which contribute to the classification.
Apply function RaModel
to generate training data and test data of size 100 with dimension 50.
set.seed(0, kind = "L'Ecuyer-CMRG") train.data <- RaModel("classification", 1, n = 100, p = 50) test.data <- RaModel("classification", 1, n = 100, p = 50) xtrain <- train.data$x ytrain <- train.data$y xtest <- test.data$x ytest <- test.data$y
We can visualize the first two dimensions or feature 1 and 2 as belows:
library(ggplot2) ggplot(data = data.frame(xtrain, y = factor(ytrain)), mapping = aes(x = X1, y = X2, color = y)) + geom_point()
Similarly, we can also visualize the feature 6 and 7:
ggplot(data = data.frame(xtrain, y = factor(ytrain)), mapping = aes(x = X6, y = X7, color = y)) + geom_point()
It's obvious to see that in dimension 1 and 2 the data from two classes are more linearly seperate than in dimension 6 and 7. Then we call Rase
function to fit the RaSE classifier with LDA, QDA and logistic regression base classifiers with criterion of minimizing RIC and RaSE classifier with knn base classifier with criterion of minimizing leave-one-out error. To use different types of base classifier, we set base
as "lda", "qda", "knn" and "logistic", repectively. B1
is set to be 100 to generate 100 weak learners and B2
is set to be 100 as well to generate 100 subspace candidates for each weak learner. Without using iterations, we set iteration
as 0. criterion
is set to be "ric" for RaSE classifier with LDA, QDA and logistic regression while it is "loo" for RaSE classifier with knn base classifier. To speed up the computation, we apply parallel computing with 2 cores by setting cores = 2
.
fit.lda <- Rase(xtrain, ytrain, B1 = 100, B2 = 50, iteration = 0, base = "lda", cores = 2, criterion = "ric") fit.qda <- Rase(xtrain, ytrain, B1 = 100, B2 = 50, iteration = 0, base = "qda", cores = 2, criterion = "ric") fit.knn <- Rase(xtrain, ytrain, B1 = 100, B2 = 50, iteration = 0, base = "knn", cores = 2, criterion = "loo") fit.logistic <- Rase(xtrain, ytrain, B1 = 100, B2 = 50, iteration = 0, base = "logistic", cores = 2, criterion = "ric")
We can print the summarized results of RaSE model by calling print
function. For instance, we print the RaSE model with LDA base classifier:
print(fit.lda)
To evaluate the performance of four different models, we calculate the test error on test data:
er.lda <- mean(predict(fit.lda, xtest) != ytest) er.qda <- mean(predict(fit.qda, xtest) != ytest) er.knn <- mean(predict(fit.knn, xtest) != ytest) er.logistic <- mean(predict(fit.logistic, xtest) != ytest) cat("LDA:", er.lda, "QDA:", er.qda, "knn:", er.knn, "logistic:", er.logistic)
And the output of Rase
function is an object belonging to S3 class "RaSE". It contains:
marginal: the marginal probability for each class.
base: the type of base classifier.
criterion: the criterion to choose the best subspace for each weak learner.
B1: the number of weak learners.
B2: the number of subspace candidates generated for each weak learner.
D: the maximal subspace size when generating random subspaces.
iteration: the number of iterations.
fit.list: a list of B1 fitted base classifiers.
cutoff: the empirically optimal threshold.
subspace: a list of subspaces correponding to B1 weak learners.
ranking: the selected percentage of each feature in B1 subspaces.
scale: a list of scaling parameters, including the scaling center and the scale parameter for each feature.
The selected percentage of features in $B_1$ subspaces for four RaSE classifiers are contained in the output, which can be used for feature ranking. We can plot them by using RaPlot
function:
library(gridExtra) plot_lda <- RaPlot(fit.lda) plot_qda <- RaPlot(fit.qda) plot_knn<- RaPlot(fit.knn) plot_logistic <- RaPlot(fit.logistic) grid.arrange(plot_lda, plot_qda, plot_knn, plot_logistic, ncol=2)
From four figures, it can be noticed that feature 1, 2 and 5 obtain high selected percentage among all $p = 50$ features under LDA, QDA and $k$NN models, implying their importance in classification model. We can set a positive iteration number to increase the selected percentage of three signals among $B_1$ subspaces, which may improve the performance.
In RaSE, we consider only a single type of base classifiers (e.g. LDA, QDA, $k$NN, etc.). @zhu2021super extends the idea of RaSE by combining classifiers of different types. For each of the $B_1B_2$ weak learners, the base classifier type is sampled randomly from some given types with corresponding probabilities. For iterative super RaSE, not only the feature sampling probability will be updated based on the feature selected frequencies, but the classifier type sampling probability will be updated according to the type selected frequencies as well. Note that here the feature sampling probability is updated on the basis of feature frequencies in the last iteration for corresponding base classifier type. The user can also fix the classifier type sampling probability. It can be controled by component base.update
in parameter super
. The super RaSE will be fitted when the parameter base
is a string vector of base classifier types or a numeric probability vector with classifier type names. In the first case, the base classifier type will be sampled uniformly, while in the second case, it will be sampled according to the provided probability.
The following example shows how to fit a super RaSE classifier which mixes $k$NN, LDA and logistic regression.
fit.super <- Rase(xtrain = xtrain, ytrain = ytrain, B1 = 100, B2 = 50, base = c("knn", "lda", "logistic"), super = list(type = "separate", base.update = T), criterion = "cv", cv = 5, iteration = 0, cores = 2) ypred <- predict(fit.super, xtest) mean(ypred != ytest)
We can look at the sampling percentage of each feature from each classifier type and the base classifier type selected percentages.
fit.super$ranking.feature fit.super$ranking.base
In this section, we describe how to apply RaSE for variable screening.
We follow the aforementioned notations. Although @tian2021rase only discusses the classification problem, RaSE framework can be imediately applied for continuous response with no extra effort. As before, we would like to select $B_1$ subspaces ${S_{j}}{j=1}^{B_1}$, for each of which $B_2$ candidate subspaces ${S{jk}}{k=1}^{B_2}$ are generated. Some specific criterion is required for subspace selection. The selected percentage of features in ${S{j}}_{j=1}^{B_1}$ can be used for variable screening (@tian2021variable).
We will present how to do variable screening through RaSE framework in this section. First we generate the data from the following model (model 1 in @tian2021rase, model II in @fan2008sure). $$ y = 5x_1 + 5x_5 + 5x_3 - \frac{15}{\sqrt{2}}x_4 + \epsilon, $$ where $\bm{x} = (x_1, \ldots, x_p)^T \sim N(\bm{0}, \Sigma)$, $\Sigma = (\sigma_{ij}){p \times p}$, $\sigma{ij} = 0.5^{\mathds{1}(i \neq j)}$, $\epsilon\sim N(0, 1)$, and $\epsilon \perp !!! \perp \bm{x}$. The signal set $S^* = {1, 2, 3, 4}$.
Let $n=100$ and $p=100$. Call function RaModel
to generate the data.
train.data <- RaModel("screening", 1, n = 100, p = 100) xtrain <- train.data$x ytrain <- train.data$y
As @tian2021rase describes, here $x_4$ is marginally independent of $y$. We first apply RaSE equipped with linear regression model and BIC by calling function RaScreen
. Set B1
= B2
= 100, model
= lm
and criterion
= bic
. To demonstrate the power of iterations, we set iteration
= 1.
fit.bic <- RaScreen(xtrain, ytrain, B1 = 100, B2 = 100, model = "lm", criterion = "bic", cores = 2, iteration = 1)
The output of RaScreen
is a list including the selected percentage of variables, the model we use and other information. Note that the selected percentage of variables are stored in a list (when iteration
= 0, it is a vector) called "selected.perc". All results from different iteration rounds are available. Function RaRank
provides a convenient and automatic way to select variables from the output of RaScreen
. We set selected.num
= n/logn
to select $[n/\log n] = 21$ variables. Let's compare the results from vanilla RaSE (no iteration) and RaSE with 1 interation round as follows.
RaRank(fit.bic, selected.num = "n/logn", iteration = 0) RaRank(fit.bic, selected.num = "n/logn", iteration = 1)
Observe that RaSE with linear regression model and BIC captures the signals very well. Interation indeed improves vanilla RaSE by ranking four signals on the top.
Note that there could be some variables with zero selected percentage, especially in iterative RaSE. In this case, such variables are indistinguishable. When the user requests more variables beyond the number of variables with positive selected percentages, RaRank
function will randomly sample from variables with zero selected percentage and pop up a warning message as below.
RaRank(fit.bic, selected.num = "n-1", iteration = 1)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.