View source: R/MAP.estimation.R
MAP.estimation | R Documentation |
MAP.estimation
function is used (in local centers) to compute Maximum A Posterior (MAP) estimators of the parameters for Generalized Linear Models (GLM) and Survival models.
MAP.estimation(y,
X,
family = c("gaussian", "binomial", "survival"),
Lambda,
intercept = TRUE,
basehaz = c("weibul", "exp", "gomp", "poly", "pwexp", "unspecified"),
treatment = NULL,
treat_round = NULL,
refer_treat,
gamma_bfi = NULL,
RCT_propens = NULL,
initial = NULL,
alpha = 0.1,
max_order = 2,
n_intervals = 4,
min_max_times,
center_zero_sample = FALSE,
zero_sample_cov,
refer_cat,
zero_cat,
control = list())
y |
response vector. If the “ |
X |
design matrix of dimension |
family |
a description of the error distribution. This is a character string naming a family of the model. In the current version of the package, the family of model can be “ |
Lambda |
the inverse variance-covariance matrix of the Gaussian distribution that is used as prior distribution for the model parameters. The dimension of the matrix depends on the number of columns of |
intercept |
logical flag for fitting an intercept. If |
basehaz |
a character string representing one of the available baseline hazard functions; |
treatment |
a character string representing the name or place of the binary covariate, respectively. This covariate indicates whether the patient got the new treatment ( |
treat_round |
a character string representing the |
refer_treat |
a character string representing the reference category of the treatment variable. The reference category is considered as |
gamma_bfi |
a vector specifying the BFI estimates of the coefficients received from the central server in the first round. It can be defined by the output of |
RCT_propens |
a vector specifying the propensity scores, which represent the probability of receiving the treatment given the covariates, which are known in the randomized studies (RCTs). For example, in a 1:1 randomized trial, the propensity scores are, by definition, equal to 1/2 (or 0.5), whereas in an unbalanced randomized trial, e.g., a 2:1 trial, the propensity scores are now 2/3 and 1/3 for the two arms, respectively. The length of |
initial |
a vector specifying initial values for the parameters to be optimized over. The length of |
alpha |
a significance level used in the chi-squared distribution (with one degree of freedom and 1- |
max_order |
an integer representing the maximum value of |
n_intervals |
an integer representing the number of intervals in the piecewise exponential baseline hazard function. This argument is only used when |
min_max_times |
a scalar representing the minimum of the maximum event times observed in the centers. The value of this argument should be defined by the central server (which has access to the maximum event times of all the centers) and is only used when |
center_zero_sample |
logical flag indicating whether the center has a categorical covariate with no observations/individuals in one of the categories. Default is |
zero_sample_cov |
either a character string or an integer representing the categorical covariate that has no samples/observations in one of its categories. This covariate should have at least two categories, one of which is the reference. It is used when |
refer_cat |
a character string representing the reference category. The category with no observations (the argument |
zero_cat |
a character string representing the category with no samples/observations. It is used when |
control |
a list of control parameters. See ‘Details’. |
MAP.estimation
function finds the Maximum A Posteriori (MAP) estimates of the model parameters by maximizing the log-posterior density with respect to the parameters, i.e., the estimates equal the values for which the log-posterior density is maximal (the posterior mode).
In other words, MAP.estimation()
optimizes the log-posterior density with respect to the parameter vector to obtain its MAP estimates.
In addition to the model parameters (i.e., coefficients (\boldsymbol{\beta}
) and variance error (\sigma^2_e
) for gaussian
or the parameters of the baseline hazard (\boldsymbol{\omega}
) for survival
), the curvature matrix (Hessian of the log-posterior) is estimated around the mode.
The MAP.estimation
function returns an object of class 'bfi
'. Therefore, summary()
can be used for the object returned by MAP.estimation()
.
For the case where family = "survival"
and basehaz = "poly"
, we assume that in all centers the q_\ell
's are equal.
However, the order of the estimated polynomials may vary across the centers so that each center can have different number of parameters, say q_\ell
+1.
After obtaining the estimates within the local centers (by using MAP.estimation()
) and having all estimates in the central server, we choose the order of the polynomial approximation for the combined data to be the maximum of the orders of the local polynomial functions, i.e., \max \{q_1, \ldots, q_L \}
, to approximate the global baseline hazard (exponentiated polynomial) function more accurately. This is because the higher-order polynomial approximation can capture more complex features and details in the combined data. Using the higher-order approximation ensures that we account for the higher-order moments and features present in the data while maintaining accuracy.
As a result, all potential cases are stored in the theta_A_poly
argument to be used in bfi()
by the central server.
For further information on the survival
family, refer to the 'References' section.
The three arguments 'treatment'
, 'treat_round'
, 'refer_treat'
, 'gamma_bfi'
, and 'RCT_propens'
are related to the estimation of the treatment effect.
For observational and non-randomized studies, the treatment effect is estimated in two rounds; In the first round, \hat{\boldsymbol{\beta}}_{\ell}
(or \hat{\boldsymbol{\gamma}}_{\ell}
) are estimated locally and in the central server \hat{\boldsymbol{\beta}}_{BFI}
(or \hat{\boldsymbol{\gamma}}_{BFI}
) is estimated and then is sent to all local centers for the second round to estimate propensity scores, weights, treatment effect and ATEs.
In the first round, the argument treatment
should not be 'NULL
' and treat_round = "first"
, while gamma_bfi = NULL
and RCT_propens = NULL
. Moreover, in the first round, the family must be set to binomial
, however this is handled automatically.
In the second round, local weighted MAP estimate of the treatment effects and propensity scores are estimated, and along with some summary statistics are sent to the central server to estimate the average treatment effects ATEs (in this case treatment
and gamma_bfi
should not be 'NULL
' and treat_round = "second"
, but RCT_propens = NULL
).
In contrast, for the randomized control trial (RCT), the treatment effect can be estimated by only one round as the propensity scores are known (in this case treatment
and RCT_propens
should not be 'NULL
', but gamma_bfi = NULL
).
NOTE: the argument gamma_bfi
should not include estimates of the nuisance parameter \sigma
in the gaussian
family or any parameters of the baseline hazard (\boldsymbol{\omega}
) and the intercept for survival
.
For more examples on treatment effect estimation, see the ‘Examples’ section of bfi
.
To solve unconstrained and bound-constrained optimization problems, the MAP.estimation
function utilizes an optimization algorithm called Limited-memory Broyden-Fletcher-Goldfarb-Shanno with Bound Constraints (L-BFGS-B), Byrd et. al. (1995).
The L-BFGS-B algorithm is a limited-memory “quasi-Newton” method that iteratively updates the parameter estimates by approximating the inverse Hessian matrix using gradient information from the history of previous iterations. This approach allows the algorithm to approximate the curvature of the posterior distribution and efficiently search for the optimal solution, which makes it computationally efficient for problems with a large number of variables.
By default, the algorithm uses a relative change in the objective function as the convergence criterion. When the change in the objective function between iterations falls below a certain threshold ('factr
') the algorithm is considered to have converged.
The convergence can be checked with the argument convergence
in the output. See ‘Value’.
In case of convergence issue, it may be necessary to investigate and adjust optimization parameters to facilitate convergence. It can be done using the initial
and control
arguments. By the argument initial
the initial points of the interative optimization algorithm can be changed, and the argument control
is a list that can supply any of the following components:
maxit
:is the maximum number of iterations. Default is 150;
factr
:controls the convergence of the 'L-BFGS-B'
method. Convergence occurs when the reduction in the objective is within this factor of the machine tolerance. Default for factr
is 1e7, which gives a tolerance of about 1e-9. The exact tolerance can be checked by multiplying this value by .Machine$double.eps
;
pgtol
:helps to control the convergence of the 'L-BFGS-B'
method. It is a tolerance on the projected gradient in the current search direction, i.e., the iteration will stop when the maximum component of the projected gradient is less than or equal to pgtol
, where pgtol
\geq 0
. Default is zero, when the check is suppressed;
trace
:is a non-negative integer. If positive, tracing information on the progress of the optimization is produced. Higher values may produce more tracing information: for the method 'L-BFGS-B'
there are six levels of tracing. To understand exactly what these do see the source code of optim
function in the stats package;
REPORT
:is the frequency of reports for the 'L-BFGS-B'
method if 'control$trace'
is positive. Default is every 10 iterations;
lmm
:is an integer giving the number of BFGS
updates retained in the 'L-BFGS-B'
method. Default is 5.
MAP.estimation
returns a list containing the following components:
theta_hat |
the vector corresponding to the maximum a posteriori (MAP) estimates of the parameters. For the |
A_hat |
minus the curvature (or Hessian) matrix around the point |
sd |
the vector of (posterior) standard deviation of the MAP estimates in |
Lambda |
the inverse variance-covariance matrix of the Gaussian distribution that is used as prior distribution for the parameters. It's exactly the same as the argument |
formula |
the formula applied; |
names |
the names of the model parameters; |
n |
sample size, |
np |
the number of coefficients; |
q_l |
the order/degree minus 1 of the exponentiated polynomial baseline hazard function determined for the current center by the likelihood ratio test. This output argument, |
theta_A_poly |
an array where the first component is a matrix with columns representing the MAP estimates of the parameters for different |
lev_no_ref_zer |
a vector containing the names of the levels of the categorical covariate that has no samples/observations in one of its categories. The name of the category with no samples and the name of the reference category are excluded from this vector. This argument is shown when |
treatment |
a character string representing the name or place of the binary covariate, respectively. If it is set to ' |
refer_treat |
the reference category of the treatment. It is shown when |
gamma_bfi |
a vector specifying the BFI estimates of the coefficients received from the central server in the first round. If |
RCT_propens |
a vector specifying the propensity scores, which represent the probability of receiving the treatment given the covariates, which are known in the randomized studies (RCTs). If |
propensity |
a vector specifying the propensity scores (the probability a patient gets the treatment given the characteristics measured at baseline) calculated by |
for_ATE |
a vector used in the central server to calculate the average treatment effect (ATE). For
but for |
zero_sample_cov |
the categorical covariate that has no samples/observations in one of its categories. It is shown when |
refer_cat |
the reference category. It is shown when |
zero_cat |
the category with no samples/observations. It is shown when |
value |
the value of minus the log-likelihood posterior density evaluated at |
family |
the |
basehaz |
the baseline hazard function used; |
intercept |
logical flag used to fit an intercept if |
convergence |
an integer value used to encode the warnings and the errors related to the algorithm used to fit the model. The values returned are:
|
control |
the list of control parameters used to compute the MAP estimates. |
Hassan Pazira and Marianne Jonker
Maintainer: Hassan Pazira hassan.pazira@radboudumc.nl
Jonker M.A., Pazira H. and Coolen A.C.C. (2024). Bayesian federated inference for estimating statistical models based on non-shared multicenter data sets, Statistics in Medicine, 43(12): 2421-2438. <https://doi.org/10.1002/sim.10072>
Pazira H., Massa E., Weijers J.A.M., Coolen A.C.C. and Jonker M.A. (2025b). Bayesian Federated Inference for Survival Models, Journal of Applied Statistics (Accepted). <https://arxiv.org/abs/2404.17464>
Jonker M.A., Pazira H. and Coolen A.C.C. (2025a). Bayesian Federated Inference for regression models based on non-shared medical center data, Research Synthesis Methods, 1-41. <https://doi.org/10.1017/rsm.2025.6>
Byrd R.H., Lu P., Nocedal J. and Zhu C. (1995). A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing, 16, 1190-1208. <https://doi.org/10.1137/0916069>
bfi
, inv.prior.cov
and summary.bfi
###--------------###
### y ~ Gaussian ###
###--------------###
# Setting a seed for reproducibility
set.seed(11235)
# model parameters: coefficients and sigma2 = 1.5
theta <- c(1, 2, 2, 2, 1.5)
#----------------
# Data Simulation
#----------------
n <- 30 # sample size
p <- 3 # number of coefficients without intercept
X <- data.frame(matrix(rnorm(n * p), n, p)) # continuous variables
# linear predictor:
eta <- theta[1] + theta[2] * X$X1 + theta[3] * X$X2 + theta[4] * X$X3
# inverse of the link function ( g^{-1}(\eta) = \mu ):
mu <- gaussian()$linkinv(eta)
y <- rnorm(n, mu, sd = sqrt(theta[5]))
# Load the BFI package
library(BFI)
#-----------------------------------------------
# MAP estimations for theta and curvature matrix
#-----------------------------------------------
# MAP estimates with 'intercept'
Lambda <- inv.prior.cov(X, lambda = c(0.1, 1), family = "gaussian")
(fit <- MAP.estimation(y, X, family = "gaussian", Lambda))
class(fit)
summary(fit, cur_mat = TRUE)
# MAP estimates without 'intercept'
Lambda <- inv.prior.cov(X, lambda = c(0.1, 1), family = 'gaussian',
intercept = FALSE)
(fit1 <- MAP.estimation(y, X, family = 'gaussian', Lambda, intercept = FALSE))
summary(fit1, cur_mat = TRUE)
###-----------------###
### Survival family ###
###-----------------###
# Setting a seed for reproducibility
set.seed(112358)
#-------------------------
# Simulating Survival data
#-------------------------
n <- 50
beta <- 1:4
p <- length(beta)
X <- data.frame(matrix(rnorm(n * p), n, p)) # continuous (normal) variables
## Simulating survival data from Weibull with a predefined censoring rate of 0.3
y <- surv.simulate(Z = list(X), beta = beta, a = 5, b = exp(1.8), u1 = 0.1,
cen_rate = 0.3, gen_data_from = "weibul")$D[[1]][, 1:2]
#---------------------------------------
# MAP estimations with "weibul" function
#---------------------------------------
Lambda <- inv.prior.cov(X, lambda = c(0.1, 1), family = 'survival',
basehaz = "weibul")
fit2 <- MAP.estimation(y, X, family = 'survival', Lambda = Lambda,
basehaz = "weibul")
fit2
summary(fit2, cur_mat = TRUE)
#-------------------------------------
# MAP estimations with "poly" function
#-------------------------------------
Lambda <- inv.prior.cov(X, lambda = c(0.1, 1), family = 'survival',
basehaz = 'poly')
fit3 <- MAP.estimation(y, X, family = "survival", Lambda = Lambda,
basehaz = "poly")
# Degree of the exponentiated polynomial baseline hazard
fit3$q_l + 1
# theta_hat for (beta_1, ..., beta_p, omega_0, ..., omega_{q_l})
fit3$theta_A_poly[,,1][,fit3$q_l+1] # equal to fit3$theta_hat
# A_hat
fit3$theta_A_poly[,,fit3$q_l+2] # equal to fit3$A_hat
summary(fit3, cur_mat = TRUE)
#------------------------------------------------------
# MAP estimations with "pwexp" function with 3 intervals
#-------------------------------------------------------
# Assume we have 4 centers
Lambda <- inv.prior.cov(X, lambda = c(0.1, 1), family = 'survival',
basehaz = 'pwexp', n_intervals = 3)
# For this baseline hazard ("pwexp"), we need to know
# maximum survival times of the 3 other centers:
max_times <- c(max(rexp(30)), max(rexp(50)), max(rexp(70)))
# Minimum of the maximum values of the survival times of all 4 centers is:
min_max_times <- min(max(y$time), max_times)
fit4 <- MAP.estimation(y, X, family = "survival", Lambda = Lambda,
basehaz = "pwexp", n_intervals = 3,
min_max_times=max(y$time))
summary(fit4, cur_mat = TRUE)
#--------------------------
# Semi-parametric Cox model
#--------------------------
Lambda <- inv.prior.cov(X, lambda = c(0.1), family = 'survival',
basehaz = "unspecified")
fit5 <- MAP.estimation(y, X, family = 'survival', Lambda = Lambda,
basehaz = "unspecified")
summary(fit5, cur_mat = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.