mdyplFit | R Documentation |
glm()
for maximum Diaconis-Ylvisaker prior
penalized likelihood estimation of logistic regression modelsmdyplFit()
is a fitting method for glm()
that fits logistic
regression models using maximum Diaconis-Ylvisaker prior penalized
likelihood estimation.
mdyplFit(
x,
y,
weights = rep(1, nobs),
start = NULL,
etastart = NULL,
mustart = NULL,
offset = rep(0, nobs),
family = binomial(),
control = list(),
intercept = TRUE,
singular.ok = TRUE
)
mdypl_fit(
x,
y,
weights = rep(1, nobs),
start = NULL,
etastart = NULL,
mustart = NULL,
offset = rep(0, nobs),
family = binomial(),
control = list(),
intercept = TRUE,
singular.ok = TRUE
)
x |
a design matrix of dimension |
y |
a vector of observations of length |
weights |
an optional vector of ‘prior weights’ to be used
in the fitting process. Should be |
start |
starting values for the parameters in the linear predictor. |
etastart |
starting values for the linear predictor. |
mustart |
starting values for the vector of means. |
offset |
this can be used to specify an a priori known
component to be included in the linear predictor during fitting.
This should be |
family |
a description of the error distribution and link
function to be used in the model. For |
control |
a list of parameters controlling the fitting
process. See |
intercept |
logical. Should an intercept be included in the null model? |
singular.ok |
logical; if |
mdyplFit()
uses stats::glm.fit()
to fit a logistic regression
model on responses alpha * y + (1 - alpha) / 2
, where y
are the
original binomial responses scaled by the binomial totals. This is
equivalent to penalizing the likelihood by the Diaconis-Ylvisaker
prior with shrinkage parameter \alpha
and regression parameters
set to zero. See Rigon & Aliverti (2023) and Sterzinger & Kosmidis
(2024).
By default, alpha = m / (p + m)
is used, where m
is the sum of
the binomial totals. Alternative values of alpha
can be passed to
the control
argument; see mdyplControl()
for setting up the
list passed to control
. If alpha = 1
then mdyplFit()
will
simply do maximum likelihood estimation.
Note that null.deviance
, deviance
and aic
in the resulting
object are computed at the adjusted responses. Hence, methods such
as logLik() and AIC() use the
penalized log-likelihood. With the default alpha
, the inferential
procedures based on penalized likelihood are asymptotically
equivalent to the ones that use the unpenalized likelihood when
p/n
is vanishing asymptotically.
For high-dimensionality corrected estimates, standard errors and z
statistics, use the summary
method for
"mdyplFit"
objects with hd_correction = TRUE
.
mdypl_fit()
is an alias to mdyplFit()
.
An object inheriting from "mdyplFit"
object, which
is a list having the same elements to the list that
stats::glm.fit()
returns, with a few extra arguments.
Ioannis Kosmidis [aut, cre]
ioannis.kosmidis@warwick.ac.uk
Sterzinger P, Kosmidis I (2024). Diaconis-Ylvisaker prior
penalized likelihood for p/n \to \kappa \in (0,1)
logistic
regression. arXiv:2311.07419v2, https://arxiv.org/abs/2311.07419.
Rigon T, Aliverti E (2023). Conjugate priors and bias reduction for logistic regression models. Statistics & Probability Letters, 202, 109901. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.spl.2023.109901")}.
mdyplControl()
, summary.mdyplFit()
, plrtest.mdyplFit()
, glm()
data("lizards", package = "brglm2")
liz_fm <- cbind(grahami, opalinus) ~ height + diameter + light + time
## ML fit = MDYPL fit with `alpha = 1`
liz_ml <- glm(liz_fm, family = binomial(), data = lizards,
method = "mdyplFit", alpha = 1)
liz_ml0 <- glm(liz_fm, family = binomial(), data = lizards)
## liz_ml is the same fit as liz_ml0
summ_liz_ml <- summary(liz_ml)
summ_liz_ml0 <- summary(liz_ml0)
all.equal(coef(summ_liz_ml), coef(summ_liz_ml0))
## MDYPL fit with default `alpha` (see `?mdyplControl`)
liz_fm <- cbind(grahami, opalinus) ~ height + diameter + light + time
liz_mdypl <- glm(liz_ml, family = binomial(), data = lizards,
method = "mdyplFit")
## Comparing outputs from ML and MDYPL, with and without
## high-dimensionality corrections.
summary(liz_mdypl)
summary(liz_mdypl, hd_correction = TRUE)
summ_liz_ml
summary(liz_ml, hd_correction = TRUE)
## Not much difference in fits here as this is a low dimensional
## problem with dimensionality constant
(liz_ml$rank - 1) / sum(weights(liz_ml))
## The case study in Section 8 of Sterzinger and
## Kosmidis (2024)
data("MultipleFeatures", package = "brglm2")
## Center the fou.* and kar.* features
vars <- grep("fou|kar", names(MultipleFeatures), value = TRUE)
train_id <- which(MultipleFeatures$training)
MultipleFeatures[train_id, vars] <- scale(MultipleFeatures[train_id, vars], scale = FALSE)
## Compute the MDYPL fits
kappa <- length(vars) / sum(MultipleFeatures$training)
full_fm <- formula(paste("I(digit == 7) ~", paste(vars, collapse = " + ")))
nest_vars <- grep("fou", vars, value = TRUE)
nest_fm <- formula(paste("I(digit == 7) ~", paste(nest_vars, collapse = " + ")))
full_m <- glm(full_fm, data = MultipleFeatures, family = binomial(),
method = mdyplFit, alpha = 1 / (1 + kappa), subset = training)
nest_m <- update(full_m, nest_fm)
## With a naive penalized likelihood ratio test we get no evidence
## against the hypothesis that the model with only `fou` features
## is an as good descrition of `7` as the model with both `fou` and
## `kar` features.
plrtest(nest_m, full_m)
## With a high-dimensionality correction theres is strong evidence
## against the model with only `fou` features
plrtest(nest_m, full_m, hd_correction = TRUE)
## Not run:
## A simulated data set as in Rigon & Aliverti (2023, Section 4.3)
set.seed(123)
n <- 1000
p <- 500
gamma <- sqrt(5)
X <- matrix(rnorm(n * p, 0, 1), nrow = n, ncol = p)
betas0 <- rep(c(-1, -1/2, 0, 2, 3), each = p / 5)
betas <- gamma * betas0 / sqrt(sum(betas0^2))
probs <- plogis(drop(X %*% betas))
y <- rbinom(n, 1, probs)
fit_mdypl <- glm(y ~ -1 + X, family = binomial(), method = "mdyplFit")
## The default value of `alpha` is `n / (n + p)` here
identical(n / (n + p), fit_mdypl$alpha)
## Aggregate bias of MDYPL and rescaled MDYPL estimators
ag_bias <- function(estimates, beta) mean(estimates - beta)
ag_bias(coef(summary(fit_mdypl))[, "Estimate"], betas)
ag_bias(coef(summary(fit_mdypl, hd_correction = TRUE))[, "Estimate"], betas)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.