hdpGLM_simulateData: Simulate a Data Set from hdpGLM
In hdpGLM: Hierarchical Dirichlet Process Generalized Linear Models

hdpGLM_simulateData

R Documentation

Simulate a Data Set from hdpGLM

Description

Simulate a Data Set from hdpGLM

Usage

hdpGLM_simulateData(
  n,
  K,
  nCov = 2,
  nCovj = 0,
  J = 1,
  family = "gaussian",
  parameters = NULL,
  pi = NULL,
  same.K = FALSE,
  seed = NULL,
  context.effect = NULL,
  same.clusters.across.contexts = NULL,
  context.dependent.cluster = NULL
)

Arguments

`n`	integer, the sample size of the data. If there are multiple contexts, each context will have n cases.
`K`	integer, the number of clusters. If there are multiple contexts, K is the average number of clusters across contexts, and each context gets a number of clusters sampled from a Poisson distribution, except if `same.K` is `TRUE`.
`nCov`	integer, the number of covariates of the GLM components.
`nCovj`	an integer indicating the number of covariates determining the average parameter of the base measure of the Dirichlet process prior
`J`	an integer representing the number of contexts @param parameters either NULL or a list with the parameters to generate the model. If not NULL, it must contain a sublist name beta, a vector named tau, and a vector named pi. The sublist beta must be a list of vectors, each one with size nCov+1 to be the coefficients of the GLM mixtures components that will generate the data. For the vector tau, if nCovj=0 (single-context case) then it must be a 1x1 matrix containing 1. If nCovj>0, it must be a (nCov+1)x(nCovj+1) matrix. The vector pi must add up to 1 and have length K.
`family`	a character with either 'gaussian', 'binomial', or 'multinomial'. It indicates the family of the GLM components of the mixture model.
`parameters`	a list with the parameter values of the model. Format should be the same of the output of the function hdpGLM_simulateParameters()
`pi`	either NULL or a vector with length K that add up to 1. If not NULL, it determines the mixture probabilities
`same.K`	boolean, used when data is sampled from more than one context. If `TRUE` all contexts get the same number of clusters. If `FALSE`, each context gets a number of clusters sampled from a Poisson distribution with expectation equals to `K` (current not implemented)
`seed`	a seed for `set.seed`
`context.effect`	either `NULL` or a two dimensional integer vector. If it is `NULL`, all the coefficients (`beta`) of the individual level covariates are functions of context-level features (`tau`). If it is not `NULL`, the first component of the vector indicates the index of the lower level covariate (`X`) whose linear effect `beta` depends on context (`tau`) (0 is the intercept). The second component indicates the index context-level covariate (`W`) whose linear coefficient (`tau`) is non-zero.
`same.clusters.across.contexts`	boolean, if `TRUE` all the contexts will have the same number of clusters AND each cluster will have the same coefficient `beta`.
`context.dependent.cluster`	integer, indicates which cluster will be context-dependent. If `zero`, all clusters will be context-dependent