sim_Kstage: Simulate a K-stage Sequential Multiple Assignment Randomized...
In DTRlearn2: Statistical Learning Methods for Optimizing Dynamic Treatment Regimes

Description Usage Arguments Value Author(s) See Also Examples

View source: R/sim_Kstage.R

This function simulates a K-stage SMART data with (pinfo + pnoise) baseline variables from a multivariate Gaussian distribution. The pinfo variables have variance 1 and pairwise correlation 0.2; the pnoise variables have mean 0 and are uncorrelated with each other and with the pinfo variables.

Subjects are from n_cluster latent groups with equal sizes, and these n_cluster groups are characterized by their differentiable means in the pinfo feature variables. Each latent group has its own optimal treatment sequence, where the optimal treatment for subjects in group g at stage k is generated as A^* = 2( [ g/(2k -1) ] mod 2) - 1. The assigned treatment group (1 or -1) for each subject at each stage is randomly generated with equal probability. The primary outcome is observed only at the end of the trial, which is generated as R = ∑_{k=1}^{K} A_k A_k^* + N(0,1).

1	sim_Kstage (n, n_cluster, pinfo, pnoise, centroids=NULL, K)

`n`	sample size, should be a multiple of `n_cluster`.
`n_cluster`	number of latent groups
`pinfo`	number of informative baseline variables
`pnoise`	number of non-informative baseline variables
`centroids`	centroids of the `pinfo` variables for the `n_cluster` groups. It is a matrix of dimension `n_cluster` by `pinfo`. It's used as the means of the multivariate Gaussians to generate the `pinfo` variables for the `n_cluster` groups. For a training set, do not assign centroids, the centroids are generated randomly from N(0,5) by the function. For a test set, one should assign the same set of centroids as the training set.
`K`	number of stages.

`X`	baseline variables. It is a matrix of dimension `n` by `(pinfo + pnoise)`.
`A`	treatment assigments for the K-stages. It is a list of K vectors.
`R`	outcomes of the K-stages. It is a list of K vectors. In this simulation setting, no intermediate outcomes are observed, so the first K-1 vectors are vectors of 0.
`optA`	optimal treatments for the K-stages. It is a list of K vectors.
`centroids`	centroids of the `pinfo` variables for the `n_cluster` groups. It is a matrix of dimension `n_cluster` by `pinfo`.

Yuan Chen, Ying Liu, Donglin Zeng, Yuanjia Wang

Maintainer: Yuan Chen <yc3281@columbia.edu><irene.yuan.chen@gmail.com>

owl, ql

n_train = 100
n_test = 500
n_cluster = 10
pinfo = 10
pnoise = 20

# simulate a 2-stage training set
train = sim_Kstage(n_train, n_cluster, pinfo, pnoise, K=2)

# simulate an independent 2-stage test set with the same centroids of the training set
test = sim_Kstage(n_test, n_cluster, pinfo, pnoise, train$centroids, K=2)