DDPstar | R Documentation |
This function estimates the DDPstar model proposed by Rodriguez-Alvarez, Inacio, et al. (2025).
DDPstar(formula, data, standardise = TRUE, compute.lpml = FALSE, compute.WAIC = FALSE,
compute.DIC = FALSE, prior = priorcontrol(), mcmc = mcmccontrol(), verbose = FALSE)
formula |
A |
data |
A data frame representing the data and containing all needed variables. |
standardise |
A logical value. If TRUE both the responses and the continuous covariates are standardised (i.e., the resulting variables have mean zero and standard deviation of one). The default is TRUE. |
compute.lpml |
A logical value. If TRUE, the log pseudo marginal likelihood (LPML, Geisser and Eddy, 1979) and the conditional predictive ordinates (CPO) are computed. |
compute.WAIC |
A logical value. If TRUE, the widely applicable information criterion (WAIC, Gelman et al., 2014; Watanabe, 2010) is computed. |
compute.DIC |
A logical value. If TRUE, the deviance information criterion (DIC)(Celeux et al., 2006, Spiegelhalter et al., 2002) is computed. |
prior |
Hyparameter specification. A list of control values to replace the default values returned by the function |
mcmc |
A list of control values to replace the default values returned by the function |
verbose |
A logical value indicating whether additional information should be displayed while the MCMC algorithm is running. . The default is FALSE |
Our DDPstar model for the conditional density takes the following form
p(y|\boldsymbol{x}) = \sum_{l=1}^{L}\omega_{l}\phi(y\mid\mu_{l}(\boldsymbol{x}),\sigma_{l}^2),
where \phi(y|\mu, \sigma^2)
denotes the density function of the normal distribution, evaluated at y
, with mean \mu
and variance \sigma^2
. The regression function \mu_{l}(\boldsymbol{x})
can incorportate both linear and nonlinear (through P-splines) effects of continuous covariates, random effects, categorical covariates (factors) as well as interactions. Interactions between categorical and (nonlinear) continuous covariates are also allowed (factor-by curve interactions).
The mean (regression function) of each mixture component, \mu_l(\cdot)
, is compactly specified (see Rodriguez-Alvarez, Inacio, et al. (2025) for details) as
\mu_{l}(\boldsymbol{x}) = \underbrace{\mathbf{x}^{\top}\boldsymbol{\beta}_{l}}_{\mbox{Parametric}} + \underbrace{\sum_{r = 1}^{R}\mathbf{z}_{r}(\boldsymbol{v}_r)^{\top}\boldsymbol{\gamma}_{lr}}_{\mbox{Smooth/Random}}
where \mathbf{x}^{\top}\boldsymbol{\beta}_{l}
contains all parametric effects, including, by default, the intercept, parametric effects of categorical covariates and linear or parametric effects of continuous covariates and their interactions, as well as, the unpenalised terms associated with the P-splines formulation, and \sum_{r = 1}^{R}\mathbf{z}_{r}(\boldsymbol{v}_r)^{\top}\boldsymbol{\gamma}_{lr}
contains the P-splines' penalised terms, as well as, the random effects.
In our DDPstar model, L
is a pre-specified upper bound on the number of mixture components. The \omega_{l}
's result from a truncated version of the stick-breaking construction (\omega_{1} = \eta_{1}
; \omega_{l} = \eta_{l}\prod_{r<l}(1-\eta{r})
, l=2,\ldots,L
; \eta_{1},\ldots,\eta_{L-1}\sim
Beta (1,\alpha)
; \eta_{L} = 1
). Further, for \boldsymbol{\beta}_{l}
we consider
p\left(\boldsymbol{\beta}_{l} \mid \mathbf{m}, \mathbf{S}\right) \propto \exp\left(-\frac{1}{2}\left(\boldsymbol{\beta}_{l} - \mathbf{m}\right)^{\top}\mathbf{S}^{-1}\left(\boldsymbol{\beta}_{l} - \mathbf{m}\right)\right),
with conjugate hyperpriors \mathbf{m} \sim N_{Q}(\mathbf{m}_{0},\mathbf{S}_{0})
and \mathbf{S}^{-1}\sim W_{Q}(\nu,(\nu\Psi)^{-1})
, where W(\nu,(\nu\Psi)^{-1})
denotes a Wishart distribution with \nu
degrees of freedom and expectation \Psi^{-1}
, and Q
denotes the length of vector \boldsymbol{\beta}_{l}
. Regarding the prior distributions for the P-splines' penalised coefficients and random effect coefficients we have
p(\boldsymbol{\gamma}_{lr}\mid \tau_{lr}^{2}) \propto \exp\left(-\frac{1}{2\tau_{lr}^{2}}\boldsymbol{\gamma}_{lr}^{\top}\mathbf{K}_{r}\boldsymbol{\gamma}_{lr}\right),
where the specific forms of \mathbf{K}_{r}
depend on the components type (see Rodriguez-Alvarez, Inacio, et al. (2025) for details). Finally, \sigma_{l}^{-2}\sim\Gamma(a_{\sigma^2},b_{\sigma^2})
, \tau_{lr}^{-2}\sim\Gamma(a_{\tau^2},b_{\tau^2})
and \alpha \sim \Gamma(a_{\alpha}, b_{\alpha})
, where \Gamma(a,b)
denotes a Gamma distribution with shape parameter a
and rate parameter b
. For a detailed description, we refer to Rodriguez-Alvarez, Inacio, et al. (2025).
It is worth referring that with respect to the computation of the DIC, when L=1
, it is computed as in Spiegelhalter et al. (2002), and when L>1
, DIC3 as described in Celeux et al. (2006) is computed. Also, for the computation of the conditional predictive ordinates (CPO) we follow the stable version proposed by Gelman et al. (2014).
An object of class DDPstar
. It is a list containing the following objects:
call |
The matched call. |
data |
The original supplied data argument. |
data_model |
A list with the data used in the fit: response variable and design matrix. |
missing.ind |
A logical value indicating whether for each pair of observations (responses and covariates) missing values occur. |
mcmc |
A list of control values for the Markov chain Monte Carlo (MCMC) |
fit |
Named list with the following information: (1) |
lpml |
If computed, a list with two components: the log pseudo marginal likelihood (LPML) and the conditional predictive ordinates (CPO). |
WAIC |
If computed, a list with two components: widely applicable information criterion (WAIC) and associated complexity penalty (pW). |
DIC |
If computed, a list with two components: deviance information criterion (DIC) and associated complexity penalty (pD). |
The input argument formula
is similar to that used for the glm
function, except that flexible specifications can be added by means of function f
. For instance, specification y ~ x1 + f(x2)
would assume a linear effect of x1
(if x1
is continuous) and the effect of x2
(assumed continuous)) would be modeled using Bayesian P-splines. Categorical variables (factors) can be also incorporated, as well as random effects (see rae
) and interaction terms. For example, to include the factor-by-curve interaction between x4
(continuous covariate) and x3
(categorical covariate) we need to specify y ~ x3 + f(x4, by = x3)
.
Celeux, G., Forbes, F., Robert C. P., and Titerrington, D. M. (2006). Deviance information criteria for missing data models. Bayesian Analysis, 1, 651-674.
Eilers, P.H.C. and Marx, B.D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11(2), 89-121.
Eilers, P.H.C. and Marx, B.D. (2003). Multidimensional calibration with temperature interaction using two- dimensional penalized signal regression. Chemometrics and Intelligence Laboratory Systems, 66, 159-174.
De Iorio, M., Muller, P., Rosner, G. L., and MacEachern, S. N. (2004). An anova model for dependent random measures. Journal of the American Statistical Association, 99(465), 205-215
De Iorio, M., Johnson, W. O., Muller, P., and Rosner, G. L. (2009). Bayesian nonparametric nonproportional hazards survival modeling. Biometrics, 65, 762–775.
Geisser, S. and Eddy, W.F. (1979) A Predictive Approach to Model Selection, Journal of the American Statistical Association, 74, 153-160.
Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., and Rubin, D.B. (2014). Bayesian Data Analysis, 3rd ed. CRC Press: Boca Raton, FL.
Gelman, A., Hwang, J., and Vehtari, A. (2014). Understanding predictive information criteria for Bayesian models. Statistics and Computing, 24, 997-1010.
Lang, S. and Brezger, A. (2004). Bayesian P-splines. Journal of Computational and Graphical Statistics, 13(1), 183-212.
Lee, D.-J., Durban, M., and Eilers, P. (2013). Efficient two-dimensional smoothing with P- spline ANOVA mixed models and nested bases. Computational Statistics & Data Analysis, 61, 22-37.
Rodriguez-Alvarez, M. X, Inacio, V. and Klein N. (2025). Density regression via Dirichlet process mixtures of normal structured additive regression models. Accepted for publication in Statistics and Computing (DOI: 10.1007/s11222-025-10567-0).
Speigelhalter, D. J., Best, N. G., Carlin, B. P., and van der Linde, A. (2002). Bayesian measures of model comparison and fit. Journal of the Royal Statistical Society, Ser. B, 64, 583-639.
Watanabe, S. (2010). Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory. Journal of Machine Learning Research, 11, 3571-3594.
f
, rae
, predict.DDPstar
library(DDPstar)
data(dde)
dde$GAD <- dde$GAD/7 # GAD in weeks
formula.dde <- GAD ~ f(DDE, bdeg = 3, nseg = 20, pord = 2, atau = 1, btau = 0.005)
mcmc <- mcmccontrol(nburn = 20000, nsave = 15000, nskip = 1)
prior <- priorcontrol(a = 2, b = 0.5, aalpha = 2, balpha = 2, L = 20)
set.seed(10) # For reproducibility
fit_dde <- DDPstar(formula = formula.dde,
data = dde, mcmc = mcmc, prior = prior,
standardise = TRUE, compute.lpml = TRUE, compute.WAIC = TRUE, compute.DIC = TRUE)
summary(fit_dde)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.