probe_frontier: Estimate the problem dimension where two classes become...

View source: R/probe_frontier.R

probe_frontierR Documentation

Estimate the problem dimension where two classes become linearly separable

Description

This function estimates the sample size n_s, or equivalently problem dimension \kappa_s = p/n_s, that two classes from the data becomes separable. To locate \kappa_s, we bisect the interval [p/n, 0.5], until the window size is smaller than eps. For each sample size nn, it generates B subsamples of size nn, and estimate the separable probability \hat{\pi} with the proportion of separable subsamples. Finally we fit a logistic regression using \hat{\pi} as response and \kappa = p/nn as covariate to determine the \hat{\kappa} where separable probability is 0.5.

Usage

probe_frontier(X, Y, B = 10, eps = 0.001, verbose = FALSE)

Arguments

X

Covariate matrix. Each row in X is one observation.

Y

Response vector of +1 and -1 representing the two classes. Y has the same length as the number of rows in X.

B

Numeric. How many subsamples should I generate for each sample size?

eps

Numeric. Minimum window size. Terminate when the search interval is smaller than eps

verbose

Print prgress if TRUE.

Value

Numeric. Estimated \hat{\kappa}.

References

A modern maximum-likelihood theory for high-dimensional logistic regression, Pragya Sur and Emmanuel J. Candes, Proceedings of the National Academy of Sciences Jul 2019, 116 (29) 14516-14525

Examples

# Y is independent of X, kappa_s is approximately 0.5
n <- 1000; p <- 200
X <- matrix(rnorm(n*p, 0, 1), n, p)
Y <- 2 * rbinom(n, 1, 0.5) - 1
probe_frontier(X, Y, verbose = TRUE)

zq00/glmhd documentation built on April 7, 2023, 7:45 a.m.