knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

The Bayesian Weighted Quantile Sum (BWQS) regression is a statistical model for accommodating mixtures. The algorithm provides a weighted index of the mixture components and its overall association with an outcome, which can be continuous or binary. The overall association of the mixture index (BWQS index) is estimated adjusting for a set of covariates. The prior distributions of all parameters are considered uninformative. Parameters mapped to the overall mixture, and the covariates follow uaninformative Normal distributions, while we assumed a Dirichlet prior density for the weights. The BWQS package has several dependencies, which are uploaded automatically with the BWQS package.

The algorithms are applied to a simulated dataset and all the steps are listed in order to make reproducible analyses. In this first part of the code the environment is set up with all the useful libraries.

library(BWQS)
library(rstan)
library(MASS)
library(knitr)
library(clusterGeneration)
library(kableExtra)

Bayesian Quantile Weighted Sums

The function bwqs fits the Bayesian Quantile Weighted Sums model. The object family allows the model to be used for both continuous or binary outcomes. The model is the following $$y_i = \beta_0 + \beta_1 \sum_{i=0}^5w_jq_{ij} + \sum_{k=0}^2 \delta_kc_{ik}$$ where $\beta_0$ is the intercept, $\beta_1$ is the overall association of the mixture with the outcome, $\sum_{i=0}^5w_jq_{ij}$ is the mixture composition made by the product between weights and quartiles (q = 4 in bwqs function) of the metals concentrations.

Synthetic dataset made of 1000 observations with a mixture composed by 5 metals (Al, As, Cd, Pb, V). The coefficients are $\beta_0 = -2$, $\beta_1 = 0.8$, the set of weights $w = [0.50,0.20,0.20,0.05,0.05]$ and the overall $\sigma = 1$. Two additional covariates are included in the model with two coefficients $\delta_1 = 0.5$ and $\delta_2 = 1.1$. Here the dataset:

# fix the seed
set.seed(1990)

# Sample sizes
N = 1000

# Mean & SD of variables
mu = c(0,0,0,1,3)
sd = c(1,1,1,3,1)
sd2 = sd %*% t(sd)

# Correlation Matrix
rho = 0.65
corMat = cbind(c(1,rho,rho^2,rho^2,rho^2),
               c(rho,1,rho^2, rho^2, rho^2),
               c(rho^2,rho^2,1,rho^2,rho^2),
               c(rho^2,rho^2,rho^2,1, rho),
               c(rho^2,rho^2,rho^2,rho,1))

# Covariance Matrix
Sigma = sd2*corMat

# Simulating five correlated exposure variables (Metals)
X = as.data.frame(mvrnorm(N, mu = mu, Sigma = Sigma, empirical = TRUE))
colnames(X) = c('Al','As','Cd','Pb','V')

# Quantile trasformation
Xq = as.matrix(quantile_split(X, mix_name = colnames(X), q=4))

# Intercept coefficient
beta0 = -2

# Overall effect
beta1 = -0.8

# Weights
W = c(0.5,0.20,0.2,0.05,0.05)

# sigma of the model
sigma = 1

# Simulate covariates
D = cbind(rnorm(N,1.3,1), rnorm(N,0.5,0.5))
colnames(D) = c("cov1","cov2")

# coefficient of covariates
delta = c(0.5, 1.1)

# Outcome simulation (continuos)
y = rnorm(N, beta0 + beta1*(Xq %*% W) + D %*% delta, sd = sigma)

# Aggregate data in a data.frame
Data = as.data.frame(cbind(y,X,D))

head(round(Data,3))

Now that we have the dataset we can fit the BWQS model with the following code:

# we run the model ans save results in "fit_bwqs" variable
fit_bwqs = bwqs(y ~ cov1 + cov2, mix_name = c('Al','As','Cd','Pb','V'),
                data = Data, q = 4, family = "gaussian")

The first function object is the underlying model, i.e. formula y ~ cov1 + cov2; here we have the outcome y and two covariates. Additional covariates can be included in the formula separated by +. If no covariates are included in the model, the formula has to be specified as y ~ NULL. The mix_name object contains a vector of strings with the name of the mixture components. Data is the name of the dataset, containing the outcome, the mixture components, and the covariates, note that the dataset should ve in data.frame format. The q object indicates how the mixture components are categorized (as quantiles), by default is set on 4 (quartiles), but also allows deciles (q = 10), centiles (q = 100) or continuous values (q = NULL). Other parameters can be modified; please check the BWQS package documentation for details.

Results are saved in the fit_bwqs object, which can be used to plot the results. In the results table: the first two columns are the mean and the standard deviation of the mean, the two following columns contain the values of the credible interval at 95%. The credible interval level can be chosen using the c_int parameter. The last two columns report values to evaluate the fitness of parameters and the convergence of the algorithm.

fit_bwqs$summary_fit %>%
  kbl() %>%
  kable_classic(full_width = F, html_font = "Cambria")

The following plots summarize the results: estimates and 95% credible intervals. The first object of the function contains the results of the model. parms is a vector with the name of the variables to be plotted: beta0 and beta1 for the main parameters of the model, W for the weights and delta for the covariates. The size identifies the magnitude of the means dots.

# bwqs_plot(fit_bwqs, parms = "W", size = 2)
# bwqs_plot(fit_bwqs, parms = c("beta0","beta1",'delta',"sigma"), size = 2)

The bwqs_waic function evaluates the goodness of fit for the BWQS model. Several measures are included in the function. For further details check here.

bwqs_waic(fit_bwqs$fit)

The bwqs function allows the user to fit model even with different type of outcomes like binary or poisson. In order to fit model with different outcomes we should specify the family parameter in the bwqs function, this parameter can be "binomial" or "poisson" based on the outcome that we want to use to build our model. Note that binomial only allows ${0,1}$ values while poisson allows integer number; in both cases the misspecification of the values will lead to errors.

Another important feature of bwqs function is the possibility to set the direction of the prior of the $\beta_1$ coefficient (the coefficient that explain the effect of the mixture). Developing model can happen that we have prior strong information on the constraint that the $\beta_1$ can have, in this case we only have to specify the parameter prior. The default value is "None" but it allows also "positive" and "negative" based on the assumptions that we have on the overall effect of the mixture.

Hierarchical Bayesian Quantile Weighted Sums

Here an example for the regression model:

$$\begin{gather} Y_{i} \sim \mathcal{N}( \alpha_{j[i]} + \beta_{j[i]}BWQS_{i} + \gamma Z_{i}, \sigma_c) \quad where \quad BWQS_{i} = \sum_{k=1}^K x_{ik}w_k \ \alpha \sim \mathcal{N}( \mu_{\alpha},\sigma_{\alpha}), \quad \beta \sim \mathcal{N}( \mu_{\beta},\sigma_{\beta}) \quad where: \ \begin{pmatrix} \mu_{\alpha} \ \mu_{\beta} \end{pmatrix} \sim \mathcal{N}2\left( \begin{pmatrix} 0 \ 0 \end{pmatrix}, \begin{bmatrix} 100 && 0 \ 0 && 100 \end{bmatrix} \right) \quad and \quad \sigma{\alpha},\sigma_{\beta} \sim \mathcal{IG}(0.01,0.01) \ \sigma_c \sim \mathcal{IG}(0.01,0.01), \quad w \sim \mathcal{Dir}k((1,\ldots,1)) \quad and \quad \gamma \sim \mathcal{N}{z} \left( \begin{pmatrix} 0 \ \vdots \ 0 \end{pmatrix} ,\begin{bmatrix} 100 && \ & \ddots & \ && 100 \end{bmatrix} \right) \ \end{gather}$$

Notation:

The HBWQS frameworks is pretty similar to the BWQS. The main different is the flexibility on multiple cohorts. In order to specify the cohort we need to use and additional parameter cohort which is a vector of integer that needs to point every observation to the specific cohort.

We generate a synthetic dataset to test the HBWQS model: we want 5 different cohorts with 250,200,150,100,100 people for each cohort. We choose 4 chemical for the mixture and we are using quartiles. Note that STAN doesn't allow factors so we have to re-map the categorical variables into dummy variables (excluding the reference value). The index of cohort should be an integer starting from 1 and we'll specify the column name in code. Here the synthetic dataset:

set.seed(1990)
N = c(250,150,100,100,200)
C = 4
names_ch = paste0("Ch_",1:C)
# Mean and variance of each chemical
mu <- rnorm(C)
sd <- runif(C,0,0.2)
sd2 <- sd %*% t(sd)

# Correlation Matrix
corMat = rcorrmatrix(C,alphad=1)

# Simulating three correlated exposure varMatrix
Sigma <- sd2*corMat
X <- as.data.frame(mvrnorm(sum(N), mu=mu, Sigma = Sigma, empirical=TRUE))
colnames(X)<-names_ch

# Quantile transformation
Xq = matrix(NA,sum(N),C)
for(i in 1:C) Xq[,i] = ecdf(X[[names_ch[i]]])(X[[names_ch[i]]])*4

beta0_1 = 0.65           # intercept 1st cohort
beta1_1 = (-1.7)         # slope 1st cohort
beta0_2 = -0.38          # intercept 2nd cohort
beta1_2 = -0.94          # slope 2nd cohort
beta0_3 = 0.15           # intercept 3rd cohort
beta1_3 = -0.11          # slope 3rd cohort
beta0_4 = (-0.98)        # intercept 4th cohort
beta1_4 = -1.45          # slope 4th cohort
beta0_5 = 1.13           # intercept 5th cohort
beta1_5 = -1.88          # slope 5th cohort
W = c(0.1,0.4,0.3,0.2)   # weights
sigma = 1                # sigma

K = cbind(runif(sum(N),20,40),
        rbinom(sum(N),1,prob = 0.5),
        runif(sum(N),11,18))
colnames(K) = c("m_age","sex","education")
d = c(-0.4,0.5,0.2)

Nh = cumsum(N)
y1 = rnorm(N[1], beta0_1 + beta1_1 * (Xq %*% W)[1:Nh[1]] + (K %*% d)[1:Nh[1]],sigma)
y2 = rnorm(N[2], beta0_2 + beta1_2 * (Xq %*% W)[(Nh[1]+1):Nh[2]] + (K %*% d)[(Nh[1]+1):Nh[2]],sigma)
y3 = rnorm(N[3], beta0_3 + beta1_3 * (Xq %*% W)[(Nh[2]+1):Nh[3]] + (K %*% d)[(Nh[2]+1):Nh[3]],sigma)
y4 = rnorm(N[4], beta0_4 + beta1_4 * (Xq %*% W)[(Nh[3]+1):Nh[4]] + (K %*% d)[(Nh[3]+1):Nh[4]],sigma)
y5 = rnorm(N[5], beta0_5 + beta1_5 * (Xq %*% W)[(Nh[4]+1):Nh[5]] + (K %*% d)[(Nh[4]+1):Nh[5]],sigma)

dt = data.frame(y = c(y1,y2,y3,y4,y5), K, X,
                cohort = rep(c(1,2,3,4,5),N))
head(round(dt,3))

Now that we have the dataset we can fit the HBWQS model with the following code:

# we run the model ans save results in "fit_hbwqs" variable
fit_hbwqs = bwqs_r(y ~ m_age + sex + education, 
                   mix_name = c("Ch_1","Ch_2","Ch_3","Ch_4"),
                   cluster_name = "cohort", Dalp = rep(1,4),
                   iter = 10000, thin = 3, data = dt, 
                   q = 4, family = "gaussian")
fit_hbwqs$summary_fit %>%
  kbl() %>%
  kable_classic(full_width = F, html_font = "Cambria")


ElenaColicino/bwqs documentation built on Feb. 26, 2023, 12:13 a.m.