gen_GEE_data: Generate the datasets with clusters

Description Usage Arguments Details Value References See Also Examples

View source: R/gen_GEE_data.R

Description

gen_GEE_data generates the clustered data used for the generalized estimating equations with sequential method.

Usage

1
2
3
gen_GEE_data(numClusters, clusterSize, clusterRho, clusterCorstr, beta,
  family, intercept = TRUE, xCorstr = "ar1", xCorRho = 0.5,
  xVariance = 0.2)

Arguments

numClusters

A numeric number represents the number of clusters we will generated. Note that each cluster has several similar subjects. It should be a integer.

clusterSize

A numeric number specifying the number of subjects in each cluster. The subject in the same cluster is highly correlated to each other which can be regarded as the longitudinal data.

clusterRho

A numeric parameter in correlation structure for the clusters. It will be ignored when responseCorstr is independence.

clusterCorstr

A character string specifying the correlation structure for the clusters. Allowed structures are: "independence", "exchangeable" and "ar1".

beta

A nummeric vector denotes the true parameter in GEE model.

family

The type of response data, matching one of 'gaussian()' or 'binomial()'. The 'gaussian()' corresponds to the continuous case and 'binomial' corresponds to the discrete case.

intercept

A logical value indicating whether to add intercept term. The default value is TRUE.

xCorstr

A character string specifying the correlation structure for the covariate. The default value is 'ar1'.

xCorRho

A numeric parameter indicating the correlation coefficient in covariables. It does something similar to what the argument clusterRho does. The default value is 0.5.

xVariance

A numeric number specifying the marginal variance in the correlation matrix in one clusters. The default value is 0.2.

Details

The gen_GEE_data function is used to generate data. We can get data from two different distributions, corresponding to continuous and discrete cases. In the continuous case, the covariates vector x is created from a multivariate normal distribution with mean 0 and an AR(1) correlation matrix with autocorrelation coefficient and marginal variance. The value of autocorrelation coefficient and marginal variance are two arguments which we need specified. Then, the response y is generated by the equation: y = wx + e where the random error vector e follows a normal distribution with mean 0 and three different covariance structures with corresponding dimensional numbers. These three covariance matrices are the identity matrix, the exchangeable, and the AR(1) autoregressive correlation structure. In the discrete case, we use a logistic model. The covariates vectors x is the same as the continuous case. The binary response vector for each cluster has an AR(1) correlation structure with correlation coefficient alpha, and the marginal expectation u satisfies the following equation: logit(u) = wx

Value

a list containing the following components

x

the covariate matrices. Note that the number of rows is numClusters * clusterSize and the number of columns is the length of beta + 1 if intercept is TRUE.

y

the response data which has the same number of rows to x

clusterID

the id for each sample. Note that the subjects in the same cluster will have identical id.

References

Chen, Z., Wang, Z., & Chang, Y. I. (2019). Sequential adaptive variables and subject selection for GEE methods. Biometrics. doi:10.1111/biom.13160

See Also

gen_multi_data for categorical and ordinal case

gen_bin_data for binary classification case.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
initialSampleSize <-  75
clusterSize <-  5
responseCorstr <-  "ar1"
responseCorRho <-  0.3
response <-  gaussian()
beta0 <-  c(1, -1.1, 1.5, -2, rep(0, 50))
xVariance <-  0.2
xCorRho <-  0.5
xCorstr <-  "ar1"
data <- gen_GEE_data(numClusters = initialSampleSize,
                     clusterSize = clusterSize,
                     clusterCorstr = responseCorstr,
                     clusterRho = responseCorRho,
                     beta = beta0,
                     family = response,
                     intercept = TRUE,
                     xVariance = xVariance,
                     xCorstr = xCorstr,
                     xCorRho = xCorRho)

seqest documentation built on July 2, 2020, 2:28 a.m.

Related to gen_GEE_data in seqest...