simulateLD: Simulate a dataset for testing the performance of screenLD

Description Usage Arguments Value Examples

View source: R/simulateLD.R

Description

Simulates a dataset that can be used to test the screenLD function, and to test the performance of the proposed method under different scenarios. The simulated dataset has two z-covariates and p X-covariates, only a few of which have nonzero effect. There are n subjects in the simulated dataset, each having J observations, which are not necessarily evenly timed, because we randomly draw a subset to create an unbalanced dataset (unless the user sets proportionMissing=0). The within-subject correlation is assumed to be AR-1.

Usage

1
2
3
simulateLD(n = 100, J = 10, rho = 0.6, p = 500,
  proportionMissing = 0.2, trueIdx = c(5, 100, 200, 400), beta0Fun = NULL,
  betaFun = NULL, gammaFun = NULL, varFun = NULL)

Arguments

n

Number of subjects in the simulated dataset

J

Number of observations per subject

rho

The correlation parameter for the AR-1 correlation structure.

p

The total number of features to be screened from

proportionMissing

The proportion of the observations to randomly remove, in order to create unequal numbers of measurements between subjects.

trueIdx

The indexes for the active features in the simulated x matrix. This should be a vector, and the values should be a subset of 1:p.

beta0Fun

The time-varying intercept for the data-generating model, as a function of time. If left as null, it will default to f(t) 2 * t^2 - 1. Time is assumed to be scaled to the interval [0,1].

betaFun

The time-varying coefficients for z in the data-generating model, as a function of time. If left as null, it will be specified as two functions. The first is f(t) exp(t + 1)/2. The second is f(t) t^2 + 0.5. Time is assumed to be scaled to the interval [0,1].

gammaFun

A list of functions of time, one function for each entry in trueIdx, giving the time-varying effects of each active feature in the simulated x matrix. If left as null, it will be specified as four functions. The first is a step function f(t)=(t > 0.4). The second is f(t)=- cos(2 * pi * t). the third is f(t)=(2 - 3 * t)^2/2 - 1. The fourth is f(t)=sin(2 * pi * t).

varFun

A function of time telling the marginal variance of the error function at a given time. If left as null, it will be specified as function(t) 0.5 + 3 * t^3.

Value

A list with following components:

X:

Matrix of features to be screened. It will have n*J rows and p columns.

Y:

Vector of responses. It will have length of n*J.

z:

A matrix representing covariates to be included in each of the screening models. The first column will be all ones, representing the intercept. The second will consist of random ones and zeros, representing simulated genders.

id:

Vector of integers identifying the subject to which each observation belongs.

time:

Vector of real numbers identifying observation times. It should have the same length as the number of rows of X.

Examples

1
2
set.seed(12345678)
results <- simulateLD(p=1000)

VariableScreening documentation built on May 2, 2019, 6:54 a.m.