# sim_Kstage: Simulate a K-stage Sequential Multiple Assignment Randomized... In DTRlearn2: Statistical Learning Methods for Optimizing Dynamic Treatment Regimes

## Description

This function simulates a K-stage SMART data with `(pinfo + pnoise)` baseline variables from a multivariate Gaussian distribution. The `pinfo` variables have variance 1 and pairwise correlation 0.2; the `pnoise` variables have mean 0 and are uncorrelated with each other and with the `pinfo` variables.

Subjects are from `n_cluster` latent groups with equal sizes, and these `n_cluster` groups are characterized by their differentiable means in the `pinfo` feature variables. Each latent group has its own optimal treatment sequence, where the optimal treatment for subjects in group g at stage k is generated as A^* = 2( [ g/(2k -1) ] mod 2) - 1. The assigned treatment group (1 or -1) for each subject at each stage is randomly generated with equal probability. The primary outcome is observed only at the end of the trial, which is generated as R = ∑_{k=1}^{K} A_k A_k^* + N(0,1).

## Usage

 `1` ```sim_Kstage (n, n_cluster, pinfo, pnoise, centroids=NULL, K) ```

## Arguments

 `n` sample size, should be a multiple of `n_cluster`. `n_cluster` number of latent groups `pinfo` number of informative baseline variables `pnoise` number of non-informative baseline variables `centroids` centroids of the `pinfo` variables for the `n_cluster` groups. It is a matrix of dimension `n_cluster` by `pinfo`. It's used as the means of the multivariate Gaussians to generate the `pinfo` variables for the `n_cluster` groups. For a training set, do not assign centroids, the centroids are generated randomly from N(0,5) by the function. For a test set, one should assign the same set of centroids as the training set. `K` number of stages.

## Value

 `X ` baseline variables. It is a matrix of dimension `n` by `(pinfo + pnoise)`. `A ` treatment assigments for the K-stages. It is a list of K vectors. `R ` outcomes of the K-stages. It is a list of K vectors. In this simulation setting, no intermediate outcomes are observed, so the first K-1 vectors are vectors of 0. `optA ` optimal treatments for the K-stages. It is a list of K vectors. `centroids ` centroids of the `pinfo` variables for the `n_cluster` groups. It is a matrix of dimension `n_cluster` by `pinfo`.

## Author(s)

Yuan Chen, Ying Liu, Donglin Zeng, Yuanjia Wang

Maintainer: Yuan Chen <yc3281@columbia.edu><irene.yuan.chen@gmail.com>

`owl`, `ql`
 ``` 1 2 3 4 5 6 7 8 9 10 11``` ```n_train = 100 n_test = 500 n_cluster = 10 pinfo = 10 pnoise = 20 # simulate a 2-stage training set train = sim_Kstage(n_train, n_cluster, pinfo, pnoise, K=2) # simulate an independent 2-stage test set with the same centroids of the training set test = sim_Kstage(n_test, n_cluster, pinfo, pnoise, train\$centroids, K=2) ```