| mdyplFit | R Documentation |
glm() for maximum Diaconis-Ylvisaker prior
penalized likelihood estimation of logistic regression modelsmdyplFit() is a fitting method for glm() that fits logistic
regression models using maximum Diaconis-Ylvisaker prior penalized
likelihood estimation.
mdyplFit(
x,
y,
weights = rep(1, nobs),
start = NULL,
etastart = NULL,
mustart = NULL,
offset = rep(0, nobs),
family = binomial(),
control = list(),
intercept = TRUE,
singular.ok = TRUE
)
mdypl_fit(
x,
y,
weights = rep(1, nobs),
start = NULL,
etastart = NULL,
mustart = NULL,
offset = rep(0, nobs),
family = binomial(),
control = list(),
intercept = TRUE,
singular.ok = TRUE
)
x |
a design matrix of dimension |
y |
a vector of observations of length |
weights |
an optional vector of ‘prior weights’ to be used
in the fitting process. Should be |
start |
starting values for the parameters in the linear predictor. |
etastart |
starting values for the linear predictor. |
mustart |
starting values for the vector of means. |
offset |
this can be used to specify an a priori known
component to be included in the linear predictor during fitting.
This should be |
family |
a description of the error distribution and link
function to be used in the model. For |
control |
a list of parameters controlling the fitting
process. See |
intercept |
logical. Should an intercept be included in the null model? |
singular.ok |
logical; if |
mdyplFit() uses stats::glm.fit() to fit a logistic regression
model on responses alpha * y + (1 - alpha) / 2, where y are the
original binomial responses scaled by the binomial totals. This is
equivalent to penalizing the likelihood by the Diaconis-Ylvisaker
prior with shrinkage parameter \alpha and regression parameters
set to zero. See Rigon & Aliverti (2023) and Sterzinger & Kosmidis
(2024).
By default, alpha = m / (p + m) is used, where m is the sum of
the binomial totals. Alternative values of alpha can be passed to
the control argument; see mdyplControl() for setting up the
list passed to control. If alpha = 1 then mdyplFit() will
simply do maximum likelihood estimation.
Note that null.deviance, deviance and aic in the resulting
object are computed at the adjusted responses. Hence, methods such
as logLik() and AIC() use the
penalized log-likelihood. With the default alpha, the inferential
procedures based on penalized likelihood are asymptotically
equivalent to the ones that use the unpenalized likelihood when
p/n is vanishing asymptotically.
For high-dimensionality corrected estimates, standard errors and z
statistics, use the summary method for
"mdyplFit" objects with hd_correction = TRUE.
mdypl_fit() is an alias to mdyplFit().
An object inheriting from "mdyplFit" object, which
is a list having the same elements to the list that
stats::glm.fit() returns, with a few extra arguments.
Ioannis Kosmidis [aut, cre] ioannis.kosmidis@warwick.ac.uk
Sterzinger P, Kosmidis I (2024). Diaconis-Ylvisaker prior
penalized likelihood for p/n \to \kappa \in (0,1) logistic
regression. arXiv:2311.07419v2, https://arxiv.org/abs/2311.07419.
Rigon T, Aliverti E (2023). Conjugate priors and bias reduction for logistic regression models. Statistics & Probability Letters, 202, 109901. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.spl.2023.109901")}.
mdyplControl(), summary.mdyplFit(), plrtest.mdyplFit(), glm()
data("lizards", package = "brglm2")
liz_fm <- cbind(grahami, opalinus) ~ height + diameter + light + time
## ML fit = MDYPL fit with `alpha = 1`
liz_ml <- glm(liz_fm, family = binomial(), data = lizards,
method = "mdyplFit", alpha = 1)
liz_ml0 <- glm(liz_fm, family = binomial(), data = lizards)
## liz_ml is the same fit as liz_ml0
summ_liz_ml <- summary(liz_ml)
summ_liz_ml0 <- summary(liz_ml0)
all.equal(coef(summ_liz_ml), coef(summ_liz_ml0))
## MDYPL fit with default `alpha` (see `?mdyplControl`)
liz_fm <- cbind(grahami, opalinus) ~ height + diameter + light + time
liz_mdypl <- glm(liz_ml, family = binomial(), data = lizards,
method = "mdyplFit")
## Comparing outputs from ML and MDYPL, with and without
## high-dimensionality corrections.
summary(liz_mdypl)
summary(liz_mdypl, hd_correction = TRUE)
summ_liz_ml
summary(liz_ml, hd_correction = TRUE)
## Not much difference in fits here as this is a low dimensional
## problem with dimensionality constant
(liz_ml$rank - 1) / sum(weights(liz_ml))
## The case study in Section 8 of Sterzinger and
## Kosmidis (2024)
data("MultipleFeatures", package = "brglm2")
## Center the fou.* and kar.* features
vars <- grep("fou|kar", names(MultipleFeatures), value = TRUE)
train_id <- which(MultipleFeatures$training)
MultipleFeatures[train_id, vars] <- scale(MultipleFeatures[train_id, vars], scale = FALSE)
## Compute the MDYPL fits
kappa <- length(vars) / sum(MultipleFeatures$training)
full_fm <- formula(paste("I(digit == 7) ~", paste(vars, collapse = " + ")))
nest_vars <- grep("fou", vars, value = TRUE)
nest_fm <- formula(paste("I(digit == 7) ~", paste(nest_vars, collapse = " + ")))
full_m <- glm(full_fm, data = MultipleFeatures, family = binomial(),
method = mdyplFit, alpha = 1 / (1 + kappa), subset = training)
nest_m <- update(full_m, nest_fm)
## With a naive penalized likelihood ratio test we get no evidence
## against the hypothesis that the model with only `fou` features
## is an as good descrition of `7` as the model with both `fou` and
## `kar` features.
plrtest(nest_m, full_m)
## With a high-dimensionality correction theres is strong evidence
## against the model with only `fou` features
plrtest(nest_m, full_m, hd_correction = TRUE)
## Not run:
## A simulated data set as in Rigon & Aliverti (2023, Section 4.3)
set.seed(123)
n <- 1000
p <- 500
gamma <- sqrt(5)
X <- matrix(rnorm(n * p, 0, 1), nrow = n, ncol = p)
betas0 <- rep(c(-1, -1/2, 0, 2, 3), each = p / 5)
betas <- gamma * betas0 / sqrt(sum(betas0^2))
probs <- plogis(drop(X %*% betas))
y <- rbinom(n, 1, probs)
fit_mdypl <- glm(y ~ -1 + X, family = binomial(), method = "mdyplFit")
## The default value of `alpha` is `n / (n + p)` here
identical(n / (n + p), fit_mdypl$alpha)
## Aggregate bias of MDYPL and rescaled MDYPL estimators
ag_bias <- function(estimates, beta) mean(estimates - beta)
ag_bias(coef(summary(fit_mdypl))[, "Estimate"], betas)
ag_bias(coef(summary(fit_mdypl, hd_correction = TRUE))[, "Estimate"], betas)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.