estimate.wqs: Weighted Quantile Sum (WQS) Regression

Description Usage Arguments Details Value Rate WQS Regression Note References See Also Examples

View source: R/estimate_wqs_WQS.R

Description

Performs weighted quantile sum (WQS) regression model for continuous, binary, and count outcomes that was extended from wqs.est (author: Czarnota) in the wqs package. By default, if there is any missing data, the missing data is assumed to be censored and placed in the first quantile. Accessory functions (print, coefficient, plot) also accompany each WQS object.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
estimate.wqs(
  y,
  X,
  Z = NULL,
  proportion.train = 1L,
  n.quantiles = 4L,
  place.bdls.in.Q1 = if (anyNA(X)) TRUE else FALSE,
  B = 100L,
  b1.pos = TRUE,
  signal.fn = c("signal.none", "signal.converge.only", "signal.abs",
    "signal.test.stat"),
  family = c("gaussian", "binomial", "poisson"),
  offset = NULL,
  verbose = FALSE
)

Arguments

y

Outcome: numeric vector or factor. Assumed to be complete, and missing outcomes are ignored. Assumed to follow an exponential family distribution given in family.

X

Components/chemicals to be combined into an index; a numeric matrix or data-frame.

Z

Any covariates used. Ideally, a numeric matrix, but Z can be a factor, vector or data-frame. Assumed to be complete; observations with missing covariate values are ignored with a warning printed. If none, enter NULL.

proportion.train

The proportion of data between 0 and 1 used to train the model. If proportion.train = 1L, all the data is used to both train and validate the model. Default: 1L.

n.quantiles

An integer specifying the number of quantiles in categorizing the columns of X, e.g. in quartiles (q = 4), deciles (q = 10), or percentiles (q = 100). Default: 4L.

place.bdls.in.Q1

Logical; if TRUE or X has any missing values, missing values in X are placed in the first quantile of the weighted sum. Otherwise, the data is complete (no missing data) and the data is equally split into quantiles.

B

Number of bootstrap samples to be used in estimating the weights in the training dataset. In order to use WQS without bootstrapping, set B = 1. However, Carrico et al 2014 suggests that bootstrap some large number (like 100 or 1000) can increase component selection. In that spirit, we set the default to 100.

b1.pos

Logical; TRUE if the mixture index is expected to be positively related to the outcome (the default). If mixture index is expected to be inversely related to the outcome, put FALSE.

signal.fn

A character value indicating which signal function is used in calculating the mean weight. See details.

family

The distribution of outcome y. A character value: if equal to "gaussian" a linear model is implemented; if equal to "binomial" a logistic model is implemented; if equal to "poisson", a log-link (rate or count) model is implemented. See family in the stats package. Passed to glm2. Default: "gaussian".

offset

The at-risk population used as a numeric vector of length equal to the number of subjects when modeling rates in Poisson regression. Passed to glm2. Default: If there is no offset, enter NULL.

verbose

Logical; if TRUE, prints more information. Useful to check for any errors in the code. Default: FALSE.

Details

The solnp algorithm, or a nonlinear optimization technique using augmented Lagrange method, is used to estimate the weights in the training set. If the log likelihood evaluated at the current parameters is too large (NaN), the log likelihood is reset to be 1e24. A data-frame with object name train.estimates that summarizes statistics from the nonlinear regression is returned; it consists of these columns:

beta1

estimate using solnp

beta1_glm, SE_beta1, test_stat, pvalue

estimates of WQS parameter in model using glm2.

convergence

logical, if TRUE the solnp solver has converged. See solnp.

weight estimates

estimates of weight for each bootstrap.

Signal functions allow the user to adjust what bootstraps are used in calculating the mean weight. Looking at a histogram of the overall mixture effect, which is an element after plotting a WQS object, may help you to choose a signal function. The signal.fn argument allows the user to choose between four signal functions:

signal.none

Uses all bootstrap-estimated weights in calculating average weight.

signal.converge.only

Uses the estimated weights for the bootstrap samples that converged.

signal.abs

Applies more weight to the absolute value of test statistic for beta1, the overall mixture effect in the trained WQS model.

signal.test stat

Applies more weight to the absolute value of test statistic for beta1, the overall mixture effect in the trained WQS model.

This package uses the glm2 function in the glm2 package to fit the validation model.

The object is a member of the "wqs" class; accessory functions include coef(), print(), and plot().

See example 1 in the vignette for details.

Value

estimate.wqs returns an object of class "wqs". A list with the following items: (** important)

call

The function call, processed by rlist.

C

The number of chemicals in mixture, number of columns in X.

n

The sample size.

train.index

Vector, The numerical indices selected to form the training dataset. Useful to do side-by-side comparisons.

q.train

Matrix of quantiles used in training data.

q.valid

Matrix of quantiles used in validation data.

train.comparison

Data-frame that compares the training and validation datasets to validate equivalence

initial

Vector: Initial values used in WQS.

train.estimates

Data-frame with rows = B. Summarizes statistics from nonlinear regression in training dataset. See details.

processed.weights

** A C x 2 matrix, mean bootstrapped weights (and their standard errors) after filtering with the signal function (see signal.fn). Used to calculate the WQS index.

WQS

Vector of the weighted quantile sum estimate based on the processed weights.

fit

** glm2 object of the WQS model fit using the validation data. See glm2{glm2}.

boot.index

Matrix of bootstrap indices used in training dataset to estimate the weights. Its dimension is the length of training dataset with number of columns = B.

Rate WQS Regression

Rates can be modelled using the offset. The offset argument of estimate.wqs() function is on the normal scale, so please do not take a logarithm. The objective function used to model the mean rate of the ith individual λ_i with the offset is:

λ_i = offset * exp(η)

, where η is the linear term of a regression.

Note

No seed is set in this function. Because bootstraps and splitting is random, a seed should be set before every use.

References

Carrico, C., Gennings, C., Wheeler, D. C., & Factor-Litvak, P. (2014). Characterization of Weighted Quantile Sum Regression for Highly Correlated Data in a Risk Analysis Setting. Journal of Agricultural, Biological, and Environmental Statistics, 20(1), 100–120. https://doi.org/10.1007/s13253-014-0180-3

Czarnota, J., Gennings, C., Colt, J. S., De Roos, A. J., Cerhan, J. R., Severson, R. K., … Wheeler, D. C. (2015). Analysis of Environmental Chemical Mixtures and Non-Hodgkin Lymphoma Risk in the NCI-SEER NHL Study. Environmental Health Perspectives, 123(10), 965–970. https://doi.org/10.1289/ehp.1408630

Czarnota, J., Gennings, C., & Wheeler, D. C. (2015). Assessment of Weighted Quantile Sum Regression for Modeling Chemical Mixtures and Cancer Risk. Cancer Informatics, 14, 159–171. https://doi.org/10.4137/CIN.S17295

See Also

Other wqs: analyze.individually(), coef.wqs(), do.many.wqs(), estimate.wqs.formula(), make.quantile.matrix(), plot.wqs(), print.wqs()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Example 1: Binary outcome using the example simulated dataset in this package.
data(simdata87)
set.seed(23456)
W.bin4  <- estimate.wqs(
  y = simdata87$y.scenario, X = simdata87$X.true[, 1:9],
  B = 10, family = "binomial",
  verbose = TRUE
)
W.bin4

# Example 2: Continuous outcome. Use WQSdata example from wqs package.
## Not run: 
if (requireNamespace("wqs", quietly = TRUE)) {
  library(wqs)
  data(WQSdata)
  set.seed(23456)
  W <- wqs::wqs.est(WQSdata$y, WQSdata[, 1:9], B = 10)
  Wa <- estimate.wqs (y = WQSdata$y, X = WQSdata[, 1:9], B = 10)
  Wa
} else {
  message("You need to install the package wqs for this example.")
}

## End(Not run)

## More examples are found 02_WQS_Examples.
## Also checked vs. Czarnota code, as well as thesis data, to verify results.

miWQS documentation built on April 3, 2021, 1:06 a.m.