PSFormula | R Documentation |

Set up a model formula for use in PStrata package allowing users to specify the treatment indicator, the post-randomization confounding variables, the outcome variable, and possibly the covariates. For survival outcome, a censoring indicator is also specified. Users can also define (potentially non-linear) transforms of the covariates and include random effects for clusters.

```
PSFormula(formula, data)
```

`formula` |
an object of class |

`data` |
a data frame containing the variables named in |

Two models are required for the principal stratification analysis: the principal stratum model and the outcome model.

For the principal stratum model, the `formula`

argument accepts formulas of the following syntax:

`treatment + postrand ~ terms`

The `treatment`

variable refers to the name of the binary treatment indicator.
The `postrand`

variable refers to the name of the binary post-randomization confounding variable.
The `terms`

part includes all of the predictors used for the principal stratum model.

For the outcome model, the `formula`

argument accepts formulas of the similar syntax:

`response [+ observed] ~ terms`

The `response`

variable refers to the name of the outcome variable.
The `terms`

part includes all of the predictors used for the outcome model.
The `observed`

variable shall not be used for ordinary response.
When the true response is subject to right censoring (also called survival outcome in relevant literature),
the `response`

variable should refer to the observed or censored response, and the `observed`

variable should
be an indicator of whether the true response is observed.
For example, suppose the true time for an event is `T`

and the time of censoring is `C`

,
Then, the `response`

variable should refer to `\min(T, C)`

, the actual time of the event or censoring, whichever comes earlier,
and the indicator `observed`

is 1 if `T < C`

and 0 otherwise.

The `terms`

specified in the principal stratum model and the outcome model can be different.

If multiple post-randomization confounding variables exist, one can specify all of them using the following syntax:

`treatment + postrand_1 + postrand_2 + ... + postrand_n ~ terms`

The post-randomization confounding variables are provided in place of `postrand_1`

to
`postrand_n`

. Up to this version, all of these variables should be binary indicators.
Note that the order of these post-randomization confounding variables will not
affect the result of the estimation of the parameters, but it will be important
in specifying other parameters, such as `strata`

and `ER`

(see `PStrata`

).

The syntax for the predictors follow the conventions as used in `link{formula}`

.
The part `terms`

consists of a series of terms concatenated by `+`

,
each term being the name of a variable, or the interaction of several variables separated by `:`

.

Apart from `+`

and `:`

, a number of other operators are also useful.
The `*`

operator is a short-hand for factor crossing:
`a*b`

is interpreted as `a + b + a:b`

.
The `^`

operator means factor crossing to a specific degree. For example,
`(a + b + c)^2`

is interpreted as `(a + b + c) * (a + b + c)`

,
which is identical to `a + b + c + a:b + a:c + b:c`

.
The `-`

operator removes specified terms, so that `(a + b + c)^2 - a:b`

is
identical to `a + b + c + a:c + b:c`

.
The `-`

operator can be also used to remove the intercept term, such as
`x - 1`

. One can also use `x + 0`

to remove the intercept term.

Arithmetic expressions such as `a + log(b)`

are also legal.
However, arithmetic expressions may contain special symbols that are defined for other use, such as `+`

, `*`

, `^`

and `-`

.
To avoid confusion, the function `I()`

can be used to bracket portions where the operators should be interpreted in arithmetic sense.
For example, in `x + I(y + z)`

, the term `y + z`

is interpreted as the sum of `y`

and `z`

.

When effects assumed to vary across grouping variables are considered, one can
specify such effects by adding terms in the form of `gterms | group`

, where
`group`

refers to the group indicator (usually a `factor`

), and
`gterms`

specifies the terms whose coefficients are group-specific, drawn
from a population normal distribution.

The most common situation for group level random effect is to include group-specific
intercepts to account for unmeasured confounding.
For example, `x + y + (1 | g)`

specifies a model with population predictors
`x`

and `y`

, as well as random intercept for each level of `g`

.

For more complex random effect structures, refer to `lme4::lmer`

.
However, structures other than simple random intercepts and slopes may lead to unexpected behaviors.

`PSFormula`

returns an object of class `PSFormula`

,
which is a `list`

containing for following components.

`full_formula`

input formula as is

`data`

input data frame

`fixed_eff_formula`

input formula with only fixed effects

`response_names`

character vector with names of variables that appear on the left hand side of input formula

`has_random_effect`

logical indicating whether random effects are specified in the input formula

`has_intercept`

logical indicating whether the input formula has an intercept

`fixed_eff_names`

character vector with names of all variables included as fixed effects

`fixed_eff_count`

integer indicating the number of variables (factors are converted to and counted as dummy variables)

`fixed_eff_matrix`

fixed-effect design matrix

`random_eff_list`

a list containing information for each random effect. Such information is a list with the corresponding design matrix, the term names and the factor levels.

`formula`

, `lmer`

.

```
df <- data.frame(
X = 1:10,
Z = c(0,0,0,0,0,1,1,1,1,1),
D = c(0,0,0,1,1,1,0,0,1,1),
R = c(1,1,1,1,2,2,2,3,3,3)
)
PSFormula(Z + D ~ X + I(X^2) + (1 | R), df)
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.