simulateLD: Simulate a dataset for testing the performance of screenLD In VariableScreening: High-Dimensional Screening for Semiparametric Longitudinal Regression

Description

Simulates a dataset that can be used to test the screenLD function, and to test the performance of the proposed method under different scenarios. The simulated dataset has two z-covariates and p X-covariates, only a few of which have nonzero effect. There are n subjects in the simulated dataset, each having J observations, which are not necessarily evenly timed, because we randomly draw a subset to create an unbalanced dataset (unless the user sets proportionMissing=0). The within-subject correlation is assumed to be AR-1.

Usage

 1 2 3 simulateLD(n = 100, J = 10, rho = 0.6, p = 500, proportionMissing = 0.2, trueIdx = c(5, 100, 200, 400), beta0Fun = NULL, betaFun = NULL, gammaFun = NULL, varFun = NULL)

Arguments

 n Number of subjects in the simulated dataset J Number of observations per subject rho The correlation parameter for the AR-1 correlation structure. p The total number of features to be screened from proportionMissing The proportion of the observations to randomly remove, in order to create unequal numbers of measurements between subjects. trueIdx The indexes for the active features in the simulated x matrix. This should be a vector, and the values should be a subset of 1:p. beta0Fun The time-varying intercept for the data-generating model, as a function of time. If left as null, it will default to f(t) 2 * t^2 - 1. Time is assumed to be scaled to the interval [0,1]. betaFun The time-varying coefficients for z in the data-generating model, as a function of time. If left as null, it will be specified as two functions. The first is f(t) exp(t + 1)/2. The second is f(t) t^2 + 0.5. Time is assumed to be scaled to the interval [0,1]. gammaFun A list of functions of time, one function for each entry in trueIdx, giving the time-varying effects of each active feature in the simulated x matrix. If left as null, it will be specified as four functions. The first is a step function f(t)=(t > 0.4). The second is f(t)=- cos(2 * pi * t). the third is f(t)=(2 - 3 * t)^2/2 - 1. The fourth is f(t)=sin(2 * pi * t). varFun A function of time telling the marginal variance of the error function at a given time. If left as null, it will be specified as function(t) 0.5 + 3 * t^3.

Value

A list with following components:

 X: Matrix of features to be screened. It will have n*J rows and p columns. Y: Vector of responses. It will have length of n*J. z: A matrix representing covariates to be included in each of the screening models. The first column will be all ones, representing the intercept. The second will consist of random ones and zeros, representing simulated genders. id: Vector of integers identifying the subject to which each observation belongs. time: Vector of real numbers identifying observation times. It should have the same length as the number of rows of X.

Examples

 1 2 set.seed(12345678) results <- simulateLD(p=1000)

VariableScreening documentation built on May 2, 2019, 6:54 a.m.