summary.mdyplFit: Summary method for '"mdyplFit"' objects

View source: R/mdyplFit.R

summary.mdyplFitR Documentation

Summary method for "mdyplFit" objects

Description

Summary method for "mdyplFit" objects

Usage

## S3 method for class 'mdyplFit'
summary(
  object,
  hd_correction = FALSE,
  se_start,
  gh = NULL,
  prox_tol = 1e-10,
  transform = TRUE,
  init_iter = 50,
  init_method = "Nelder-Mead",
  ...
)

## S3 method for class 'summary.mdyplFit'
print(
  x,
  digits = max(3L, getOption("digits") - 3L),
  symbolic.cor = x$symbolic.cor,
  signif.stars = getOption("show.signif.stars"),
  ...
)

Arguments

object

an object of class "glm", usually, a result of a call to glm.

hd_correction

if FALSE (default), then the summary corresponding to standard asymptotics is computed. If TRUE then the high-dimensionality corrections in Sterzinger & Kosmidis (2024) are employed to updates estimates, estimated standard errors, z-statistics. See Details.

se_start

a vector of starting values for the state evolution equations. See the start argument in solve_se().

gh

A list with the Gauss-Hermite quadrature nodes and weights, as returned from statmod::gauss.quad() with kind = "hermite". Default is NULL, in which case gh is set to statmod::gauss.quad(200, kind = "hermite").

prox_tol

tolerance for the computation of the proximal operator; default is 1e-10.

transform

if TRUE (default), the optimization is with respect to log(mu), log(b),log(sigma), (and iota if intercept is numeric). If FALSE, then it is over mu, b, sigma (and iota if intercept is numeric). The solution is returned in terms of the latter set, regardless of how optimization took place.

init_iter

how many iterations of optim() should we make to get starting values for nleqslv::nleqslv()? Default is 50, but can also be 0 in which case start is directly passed to nleqslv:nleqslv(). init_iter = "only" results in only optim() being used. See Details.

init_method

The method to be passed to optim(). Default is "Nelder-Mead".

...

further arguments to be passed to summary.glm().

x

an object of class "summary.glm", usually, a result of a call to summary.glm.

digits

the number of significant digits to use when printing.

symbolic.cor

logical. If TRUE, print the correlations in a symbolic form (see symnum) rather than as numbers.

signif.stars

logical. If TRUE, ‘significance stars’ are printed for each coefficient.

Details

If hd_correction = TRUE, the sloe() estimator of the square root of the corrupted signal strength is estimated from object, as are the conditional variances of each covariate given the others (excluding the intercept). The latter are estimated using residual sums of squares from the linear regression of each covariate on all the others, as proposed in Zhao et al (2021, Section 5.1). Then the appropriate state evolution equations are solved using solve_se() with corrupted = TRUE, and the obtained constants are used to rescale the estimates, and adjust estimated standard errors and z-statistics as in Sterzinger & Kosmidis (2024).

The key assumptions under which the rescaled estimates and corrected standard errors and z-statistics are asymptotically valid are that the covariates have sub-Gaussian distributions, and that the signal strength, which is the limit

\gamma^2

of var(X \beta) is finite as p / n \to \kappa \in (0, 1), with \kappa \in (0, 1). See Sterzinger & Kosmidis (2024).

If hd_correction = TRUE, and the model has an intercept, then the result provides only a corrected estimate of the intercept with no accompanying standard error, z-statistic, and p-value. Also, vcov(summary(object, hd_correction = TRUE)) is always NULL. Populating those objects with appropriate estimates is the subject of current work.

Value

A list with objects as in the result of stats::summary.glm(), with extra component se_parameters, which is the vector of the solution to the state evolution equations with extra attributes (see solve_se()).

Author(s)

Ioannis Kosmidis ⁠[aut, cre]⁠ ioannis.kosmidis@warwick.ac.uk

References

Zhao Q, Sur P, Cand\'es E J (2022). The asymptotic distribution of the MLE in high-dimensional logistic models: Arbitrary covariance. Bernoulli, 28, 1835–1861. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.3150/21-BEJ1401")}.

Sterzinger P, Kosmidis I (2024). Diaconis-Ylvisaker prior penalized likelihood for p/n \to \kappa \in (0,1) logistic regression. arXiv:2311.07419v2, https://arxiv.org/abs/2311.07419.

See Also

mdyplFit(), solve_se()

Examples


## Not run: 

set.seed(123)
n <- 2000
p <- 400
set.seed(123)
betas <- c(rnorm(p / 2, mean = 7, sd = 1), rep(0, p / 2))
X <- matrix(rnorm(n * p, 0, 1/sqrt(n)), nrow = n, ncol = p)
probs <- plogis(drop(X %*% betas))
y <- rbinom(n, 1, probs)
fit_mdypl <- glm(y ~ -1 + X, family = binomial(), method = "mdyplFit")

st_summary <- summary(fit_mdypl)
hd_summary <- summary(fit_mdypl, hd_correction = TRUE)

cols <- hcl.colors(3, alpha = 0.2)
par(mfrow = c(1, 2))
plot(betas, type = "l", ylim = c(-3, 14),
     main = "MDYPL estimates",
     xlab = "Parameter index", ylab = NA)
points(coef(st_summary)[, "Estimate"], col = NA, bg = cols[1], pch = 21)

plot(betas, type = "l", ylim = c(-3, 14),
     main = "rescaled MDYPL estimates",
     xlab = "Parameter index", ylab = NA)
points(coef(hd_summary)[, "Estimate"], col = NA, bg = cols[2], pch = 21)

## z-statistics
z_mdypl <- coef(st_summary)[betas == 0, "z value"]
qqnorm(z_mdypl, col = NA, bg = cols[1], pch = 21, main = "z value")
abline(0, 1, lty = 2)
z_c_mdypl <- coef(hd_summary)[betas == 0, "z value"]
qqnorm(z_c_mdypl, col = NA, bg = cols[2], pch = 21, main = "corrected z value")
abline(0, 1, lty = 2)


## End(Not run)


brglm2 documentation built on Aug. 29, 2025, 5:25 p.m.