Description Usage Arguments Details Value References See Also Examples
View source: R/seq_bin_model.R
seq_bin_model
estimates the the effective variables and chooses the
subjects sequentially by the logistic regression model for the binary
classification case with adaptive shrinkage estimate method.
1 2 | seq_bin_model(startnum, data.clust, xfix, yfix, d = 0.5,
criterion = "BIC", pho = 0.05, ptarget = 0.5)
|
startnum |
The initial number of subjects from original dataset. |
data.clust |
Large list obtained through k-means clustering. The samples of the element(data.clust[[1]]) in the data.clust is closer to each other compared to another element. |
xfix |
A dataframe that each row is a sample,each column represents an independent variable. The sample has the minimum variance from each cluster of the data.clust to represent the all samples for the corresponding cluster. |
yfix |
Numeric vector consists of 0 or 1. The length of yfix must be the same as the xfix. |
d |
A numeric number specifying the length of the fixed size confidence set for our model. Note that the smaller the d, the larger the sample size and the longer the time costs. The default value is 0.5. |
criterion |
A character string that determines the model selection criterion to be used, matching one of 'BIC' or 'AIC. The default value is 'BIC'. |
pho |
A numeric number used in subject selection according to the D-optimality. That is, select the first (rho * length(data)) data from the unlabeled data set and add it to the uncertainty set. The default value is 0.05. |
ptarget |
A numeric number that help to choose the samples. The default value is 0.5 |
seq_bin_model is a binary logistic regression model that estimetes the effective variables and determines the samples sequentially from original training data set using adaptive shrinkage estimate given the fixed size confidence set. It's a sequential method that we select sample one by one from data pool. Once it stops, it means we select the enough samples that satisfy the stopping criterion and we can conclude which are the effective variables and its corresponding values and the number of the samples we select.
a list containing the following components
d |
the length of the fixed size confidence set that we specify |
n |
the current sample size when the stopping criterion is satisfied |
is_stopped |
the label of sequential iterations stop or not. When the value of is_stopped is 1, it means the iteration stops |
beta_est |
the parameters that we estimate when the the iteration is finished |
cov |
the covariance matrix between the estimated parameters |
Wang Z, Kwon Y, Chang YcI (2019). Active learning for binary classification with variable selection. arXiv preprint arXiv:1901.10079.
seq_GEE_model
for generalized estimating equations case
seq_bin_model
for binary classification case
seq_ord_model
for ordinal case.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | # generate the toy example. You should remove '#' to
# run the following command.
# library(doMC)
# registerDoMC(9)
# library(foreach)
beta <- c(-1,1,0,0)
N <- 10000
nclass <- 1000
seed <- 123
data <- gen_bin_data(beta,N,nclass,seed)
xfix <- data[['X']]
yfix <- data[['y']]
data.clust <- data[['data.clust']]
startnum <- 24
d <- 0.75
# use seq_bin_model to binary classification problem. You can remove '#' to
# run the command.
# results <- seq_bin_model(startnum, data.clust, xfix, yfix, d,
# criterion = "BIC", pho = 0.05, ptarget = 0.5)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.