# meerva.sim.nrm: Simulate Linear Regression Data with Measurement Errors in... In meerva: Analysis of Data with Measurement Error Using a Validation Subsample

## Description

The meerva package is designed to analyze data with measurement error when there is a validation subsample. The merva.sim.nrm function generates a simulated data set for the linear regression setting demonstrating the data form expected for input to the meervad.fit function. This simulation function first generates 4 reference predictors based upon a multivariate normal distribution, with variance-covariance specified by the user. The first two predictors are dichotomized to have probabilites specified by the user. This results in two class and two quantitative reference predictor variables. The response variable may have a surrogate with differential measurement error. There is one yes/no surrogate predictor variable involving error in place of one of the yes/no reference predictors, and one quantitative surrogate predictor variable involving error in place of one of the quantitative reference predictors. The simulated data are not necessarily realistic, but their analysis shows how even with rather strong measurement error the method yields reasonable solutions. The method is able to handle different types of measurement error without the user having to specify any relationship between the reference variables measured without error and the surrogate variables measured with error.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14``` ```meerva.sim.nrm( n = 4000, m = 400, beta = c(-0.5, 0.5, 0.2, 1, 0.5), alpha1 = c(0, 0, 0, 0), alpha2 = c(1, 1, 1, 1), bx3s1 = c(NA, NA, NA, NA, NA), bx3s2 = c(NA, NA, NA), sd = 1, fewer = 0, bx12 = c(0.25, 0.15), mncor = 0, sigma = NULL ) ```

## Arguments

 `n` The full dataset size. `m` The validation subsample size (m < n). `beta` A vector of length 5 for the true regression parameter for the linear regression model with 5 predictors including the intercept. `alpha1` a vector of length four determining the measurement error for the outcome. if x1==1 then the error has mean alpha1 and variance alpha1. if x1==0 then the error has mean alpha1 and variance alpha1. `alpha2` A vector describing the correct classification probabilities for the surrogate for x1. If the outcome variable has positive error, then alpha2 and alpha2 are the probabilities of correct classification when x1 is 1 or 0. If the outcome variable has negative error, then alpha2 and alpha2 are the probabilities of correct classification when x1 is 1 or 0. `bx3s1` A vector of length 5 determining the relation between the reference variable x3 and the mean and SD of the surrogate x3s1. Roughly, bx3s1 determines a minimal measurement error SD, conditional on x3 bx3s1 determines a rate of increase in SD for values of x3 greater than bx3s1, bx3s1 is a value above which the relation between x3 and the mean of x3s is determined by the power bx3s1. The mean values for x3s1 are rescaled to have mean 0 and variance 1. `bx3s2` A vector of length 3 determining scale in x3s and potentially x3s2, a second surrogate for xs. Roughly, bx3s2 takes the previously determined mean for x3s1 using bx3s1 and multiples by bx3s2. Conditional on x3, x3s2 has mean bx3s2 * x3 and variance bx3s2. `sd` The sd of outcome y `fewer` When set to 1 x3s1 and x4 will be collapsed to one variable in the surrogate set. This demonstrates how the method works when there are fewer surrogate variables than reference variables. If bx3s2 is specified such that there are duplicate surrogate variables for the reference variable x3 then the number of surrogate predictors will not be reduced. `bx12` Bernoulli probabilities for reference variables x1 and x2. A vector of length 2, default is c(0.25, 0.15). If mncor (see below) is positive the correlations between these Bernoulli and continuous predictors remains positive. `mncor` Correlation of the columns in the x matrix before x1 and x2 are dichotomized to Bernoulli random variables. Default is 0. `sigma` A 4x4 varaince-covarniance matrix for the multivarite normal dsitribution used to derive the 4 reference predictor variables.

## Value

meerva.sim.nrm returns a list containing vectors and matrices which can be used as example input to the meerva.fit function.

`meerva.sim.block` , `meerva.sim.brn` , `meerva.sim.cox` , `meerva.fit`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19``` ```# Simulate linear regression data with mesurement errors simd = meerva.sim.nrm(beta=c(-0.5, 0.5, 0.2, 1, 0.5), alpha1=c(-0.05, 0.1, 0.05, 0.1), alpha2=c(0.95, 0.91, 0.9, 0.9), bx3s1=c(0.05, 0, 0, NA, NA), bx3s2 = c(1.1, 0.9, 0.05) ) simd = meerva.sim.nrm(beta=c(-0.5, 0.5, 0.2, 1, 0.5), alpha1=c(-0.05, 0.1, 0.05, 0.1), alpha2=c(0.95, 0.91, 0.9, 0.9), bx3s1=c(0.05, 0, 0, NA, NA), bx3s2 = c(1.1, NA, NA), fewer=1 ) # Copy the data vectors and matrices to input to meerva.fit x_val = simd\$x_val y_val = simd\$y_val xs_val = simd\$xs_val ys_val = simd\$ys_val xs_non = simd\$xs_non ys_non = simd\$ys_non # Analyze the data and display results nrmout = meerva.fit(x_val, y_val, xs_val, ys_val, xs_non, ys_non ) summary(nrmout) ```