View source: R/quantile_main_function.R
ssp.quantreg | R Documentation |
Draw subsample from full dataset and fit quantile regression model. For a quick start, refer to the vignette.
ssp.quantreg(
formula,
data,
subset = NULL,
tau = 0.5,
n.plt,
n.ssp,
B = 5,
boot = TRUE,
criterion = "optL",
sampling.method = "withReplacement",
likelihood = c("weighted"),
control = list(...),
contrasts = NULL,
...
)
formula |
A model formula object of class "formula" that describes the model to be fitted. |
data |
A data frame containing the variables in the model. Denote |
subset |
An optional vector specifying a subset of observations from |
tau |
The interested quantile. |
n.plt |
The pilot subsample size (first-step subsample size). This subsample is used to compute the pilot estimator and estimate the optimal subsampling probabilities. |
n.ssp |
The expected size of the optimal subsample (second-step subsample). For |
B |
The number of subsamples for the iterative sampling algorithm. Each subsample contains |
boot |
If TRUE then perform iterative sampling algorithm and estimate the covariance matrix. If FALSE then only one subsample with size |
criterion |
It determines how subsampling probabilities are computed.
Choices include
|
sampling.method |
The sampling method for drawing the optimal subsample.
Choices include |
likelihood |
The type of the maximum likelihood function used to
calculate the optimal subsampling estimator. Currently |
control |
The argument
|
contrasts |
An optional list. It specifies how categorical variables are represented in the design matrix. For example, |
... |
A list of parameters which will be passed to |
Most of the arguments and returned variables have the same meaning with ssp.glm. Refer to vignette
A pilot estimator for the unknown parameter \beta
is required because
optL subsampling probabilities depend on \beta
. There is no "free lunch" when determining optimal subsampling probabilities. For quantile regression, this
is achieved by drawing a size n.plt
subsample with replacement from full
dataset, using uniform sampling probability.
If boot
=TRUE, the returned value subsample.size.expect
equals to B*n.ssp
, and the covariance matrix for coef
would be calculated.
If boot
=FALSE, the returned value subsample.size.expect
equals to B*n.ssp
, but the covariance matrix won't be estimated.
ssp.quantreg
returns an object of class "ssp.quantreg" containing the following components (some are optional):
The original function call.
The pilot estimator. See Details for more information.
The estimator obtained from the optimal subsample.
The covariance matrix of coef
Row indices of pilot subsample in the full dataset.
Row indices of of optimal subsample in the full dataset.
The number of observations in the full dataset.
The expected subsample size
The terms object for the fitted model.
Wang, H., & Ma, Y. (2021). Optimal subsampling for quantile regression in big data. Biometrika, 108(1), 99-112.
#quantile regression
set.seed(1)
N <- 1e4
B <- 5
tau <- 0.75
beta.true <- rep(1, 7)
d <- length(beta.true) - 1
corr <- 0.5
sigmax <- matrix(0, d, d)
for (i in 1:d) for (j in 1:d) sigmax[i, j] <- corr^(abs(i-j))
X <- MASS::mvrnorm(N, rep(0, d), sigmax)
err <- rnorm(N, 0, 1) - qnorm(tau)
Y <- beta.true[1] + X %*% beta.true[-1] +
err * rowMeans(abs(X))
data <- as.data.frame(cbind(Y, X))
colnames(data) <- c("Y", paste("V", 1:ncol(X), sep=""))
formula <- Y ~ .
n.plt <- 200
n.ssp <- 100
optL.results <- ssp.quantreg(formula,data,tau = tau,n.plt = n.plt,
n.ssp = n.ssp,B = B,boot = TRUE,criterion = 'optL',
sampling.method = 'withReplacement',likelihood = 'weighted')
summary(optL.results)
uni.results <- ssp.quantreg(formula,data,tau = tau,n.plt = n.plt,
n.ssp = n.ssp,B = B,boot = TRUE,criterion = 'uniform',
sampling.method = 'withReplacement', likelihood = 'weighted')
summary(uni.results)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.