seq_bin_model: The sequential logistic regression model for binary...

Description Usage Arguments Details Value References See Also Examples

View source: R/seq_bin_model.R

Description

seq_bin_model estimates the the effective variables and chooses the subjects sequentially by the logistic regression model for the binary classification case with adaptive shrinkage estimate method.

Usage

1
2
seq_bin_model(startnum, data.clust, xfix, yfix, d = 0.5,
  criterion = "BIC", pho = 0.05, ptarget = 0.5)

Arguments

startnum

The initial number of subjects from original dataset.

data.clust

Large list obtained through k-means clustering. The samples of the element(data.clust[[1]]) in the data.clust is closer to each other compared to another element.

xfix

A dataframe that each row is a sample,each column represents an independent variable. The sample has the minimum variance from each cluster of the data.clust to represent the all samples for the corresponding cluster.

yfix

Numeric vector consists of 0 or 1. The length of yfix must be the same as the xfix.

d

A numeric number specifying the length of the fixed size confidence set for our model. Note that the smaller the d, the larger the sample size and the longer the time costs. The default value is 0.5.

criterion

A character string that determines the model selection criterion to be used, matching one of 'BIC' or 'AIC. The default value is 'BIC'.

pho

A numeric number used in subject selection according to the D-optimality. That is, select the first (rho * length(data)) data from the unlabeled data set and add it to the uncertainty set. The default value is 0.05.

ptarget

A numeric number that help to choose the samples. The default value is 0.5

Details

seq_bin_model is a binary logistic regression model that estimetes the effective variables and determines the samples sequentially from original training data set using adaptive shrinkage estimate given the fixed size confidence set. It's a sequential method that we select sample one by one from data pool. Once it stops, it means we select the enough samples that satisfy the stopping criterion and we can conclude which are the effective variables and its corresponding values and the number of the samples we select.

Value

a list containing the following components

d

the length of the fixed size confidence set that we specify

n

the current sample size when the stopping criterion is satisfied

is_stopped

the label of sequential iterations stop or not. When the value of is_stopped is 1, it means the iteration stops

beta_est

the parameters that we estimate when the the iteration is finished

cov

the covariance matrix between the estimated parameters

References

Wang Z, Kwon Y, Chang YcI (2019). Active learning for binary classification with variable selection. arXiv preprint arXiv:1901.10079.

See Also

seq_GEE_model for generalized estimating equations case

seq_bin_model for binary classification case

seq_ord_model for ordinal case.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# generate the toy example. You should remove '#' to
# run the following command.
# library(doMC)
# registerDoMC(9)
# library(foreach)
beta <- c(-1,1,0,0)
N <- 10000
nclass <- 1000
seed <- 123
data  <- gen_bin_data(beta,N,nclass,seed)
xfix <- data[['X']]
yfix <- data[['y']]
data.clust <- data[['data.clust']]
startnum <- 24
d <- 0.75

# use seq_bin_model to binary classification problem. You can remove '#' to
# run the command.
# results <- seq_bin_model(startnum, data.clust, xfix, yfix, d,
#                          criterion = "BIC", pho = 0.05, ptarget = 0.5)

Example output



seqest documentation built on July 2, 2020, 2:28 a.m.

Related to seq_bin_model in seqest...