PSFormula: Set up a model formula for use in 'PStrata'
In PStrata: Principal Stratification Analysis in R

PSFormula

R Documentation

Set up a model formula for use in PStrata

Description

Set up a model formula for use in PStrata package allowing users to specify the treatment indicator, the post-randomization confounding variables, the outcome variable, and possibly the covariates. For survival outcome, a censoring indicator is also specified. Users can also define (potentially non-linear) transforms of the covariates and include random effects for clusters.

Usage

PSFormula(formula, data)

Arguments

`formula`	an object of class `formula` (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given in 'Details'.
`data`	a data frame containing the variables named in `formula`.

Details

Two models are required for the principal stratification analysis: the principal stratum model and the outcome model.

General formula structure

For the principal stratum model, the formula argument accepts formulas of the following syntax:

treatment + postrand ~ terms

The treatment variable refers to the name of the binary treatment indicator. The postrand variable refers to the name of the binary post-randomization confounding variable. The terms part includes all of the predictors used for the principal stratum model.

For the outcome model, the formula argument accepts formulas of the similar syntax:

response [+ observed] ~ terms

The response variable refers to the name of the outcome variable. The terms part includes all of the predictors used for the outcome model. The observed variable shall not be used for ordinary response. When the true response is subject to right censoring (also called survival outcome in relevant literature), the response variable should refer to the observed or censored response, and the observed variable should be an indicator of whether the true response is observed. For example, suppose the true time for an event is T and the time of censoring is C, Then, the response variable should refer to \min(T, C), the actual time of the event or censoring, whichever comes earlier, and the indicator observed is 1 if T < C and 0 otherwise.

The terms specified in the principal stratum model and the outcome model can be different.

Multiple post-randomization confounding variables

If multiple post-randomization confounding variables exist, one can specify all of them using the following syntax:

treatment + postrand_1 + postrand_2 + ... + postrand_n ~ terms

The post-randomization confounding variables are provided in place of postrand_1 to postrand_n. Up to this version, all of these variables should be binary indicators. Note that the order of these post-randomization confounding variables will not affect the result of the estimation of the parameters, but it will be important in specifying other parameters, such as strata and ER (see PStrata).

Non-linear transformation of the predictors

The syntax for the predictors follow the conventions as used in link{formula}. The part terms consists of a series of terms concatenated by +, each term being the name of a variable, or the interaction of several variables separated by :.

Apart from + and :, a number of other operators are also useful. The * operator is a short-hand for factor crossing: a*b is interpreted as a + b + a:b. The ^ operator means factor crossing to a specific degree. For example, (a + b + c)^2 is interpreted as (a + b + c) * (a + b + c), which is identical to a + b + c + a:b + a:c + b:c. The - operator removes specified terms, so that (a + b + c)^2 - a:b is identical to a + b + c + a:c + b:c. The - operator can be also used to remove the intercept term, such as x - 1. One can also use x + 0 to remove the intercept term.

Arithmetic expressions such as a + log(b) are also legal. However, arithmetic expressions may contain special symbols that are defined for other use, such as +, *, ^ and -. To avoid confusion, the function I() can be used to bracket portions where the operators should be interpreted in arithmetic sense. For example, in x + I(y + z), the term y + z is interpreted as the sum of y and z.

Group level random effect

When effects assumed to vary across grouping variables are considered, one can specify such effects by adding terms in the form of gterms | group, where group refers to the group indicator (usually a factor), and gterms specifies the terms whose coefficients are group-specific, drawn from a population normal distribution.

The most common situation for group level random effect is to include group-specific intercepts to account for unmeasured confounding. For example, x + y + (1 | g) specifies a model with population predictors x and y, as well as random intercept for each level of g.

For more complex random effect structures, refer to lme4::lmer. However, structures other than simple random intercepts and slopes may lead to unexpected behaviors.

Value

PSFormula returns an object of class PSFormula, which is a list containing for following components.

full_formula: input formula as is
data: input data frame
fixed_eff_formula: input formula with only fixed effects
response_names: character vector with names of variables that appear on the left hand side of input formula
has_random_effect: logical indicating whether random effects are specified in the input formula
has_intercept: logical indicating whether the input formula has an intercept
fixed_eff_names: character vector with names of all variables included as fixed effects
fixed_eff_count: integer indicating the number of variables (factors are converted to and counted as dummy variables)
fixed_eff_matrix: fixed-effect design matrix
random_eff_list: a list containing information for each random effect. Such information is a list with the corresponding design matrix, the term names and the factor levels.

Examples

df <- data.frame(
  X = 1:10, 
  Z = c(0,0,0,0,0,1,1,1,1,1),
  D = c(0,0,0,1,1,1,0,0,1,1),
  R = c(1,1,1,1,2,2,2,3,3,3)
 )
PSFormula(Z + D ~ X + I(X^2) + (1 | R), df)

PStrata documentation built on May 29, 2024, 8:17 a.m.