# rmixreg: Simulate data from a mixture of regressions model. In mixreg: Functions to Fit Mixtures of Regressions

## Description

Simulate data from a mixture of regressions model, as specified by the user or as fitted to a data set. The simulation may be done either in a parametric or “semiparametric” manner.

## Usage

 ```1 2 3 4 5 6 7``` ```rmixreg(x, ...) ## Default S3 method: rmixreg(x, nobs, theta, seed = NULL, xNms=NULL, yNm = "y", ...) ## S3 method for class 'mixreg' rmixreg(x, semiPar = FALSE, conditional=semiPar, seed = NULL, ...) ```

## Arguments

 `x` For the default method, this is a numeric vector constituting a predictor for a regression model, or a matrix whose columns form such predictors. The number of columns of `x` (or of `as.matrix(x)`) must be equal to the number of linear coefficients, or be one less than this number, otherwise an error is thrown. If the number of columns of `x` is one less than the number of linear coefficients then it is assumed that the missing predictor is the intercept and a column of 1s gets prepended (internally). Note that the number of linear coefficients is determined from argument `theta` (see below). Note that `x` may be set equal to `NULL` in which case data are generated from a model with no predictors, i.e. from a scalar mixture model. In this case `nobs` (see below) must be specified. Data from a scalar mixture model may also be generated by specifying `x` to be a vector of 1s. (In this case `nobs` is ignored.) For the `"mixreg"` method this is an object of class `"mixreg"` as returned by the function `mixreg()`. `nobs` Integer scalar, specifying the number of observations to be generated. Used only if argument `x` in `NULL`. Otherwise the number of observations is determined from the length or number of rows of `x`. `theta` Either a list or a matrix specifying the parameters of the model from which data are to be simulated. If it is a list it should have components `beta`, `sigsq` and `lambda`. Each of these components is in turn a list of length `K` where `K` is the number of components in the model. The `k`th entry of `beta` is a vector of regression coefficients. These vectors must all have the same length. The `k`th entry of `sigsq` is a positive scalar specifying the error variance for the `k`th component. The `k`th entry of `lambda` is a probability (positive scalar, less than 1) specifying the probability that an observation corresponds to the `k`th component. Note that the `lambda` values must sum to 1. If `theta` is a matrix it is of dimension K x np where `np` is the number of parameters. Note that `np` is equal to p + 2 where p is the number of linear regression coefficients (the other 2 parameters being `sigsq` and `lambda`). The rows of this matrix correspond to components of the mixture and the columns to the parameters of the model. `seed` A numeric scalar. If not an integer it gets rounded to the nearest integer (so `seed=pi` has the same impact as `seed=3`. If not supplied, `seed` is selected randomly from ```1, 2, ..., 1e5```. The seed for random number generation is set to `seed` in `rmixreg()` before any random number generation is done. `xNms` Character vector of names for the predictors other than the intercept (if there is one). This vector must be of length equal to the number of (non-intercept) predictors. This is equal to the number of columns of `x` prior to its having a column of 1s prepended (given that this augmentation of `x` does indeed take place). If not supplied the names used are `X1`, ..., `Xn` where `n` is the number of predictors. (Except, if there is only one predictor it is named by the name of the `x` variable.) `yNm` Character scalar; a name for the response. `semiPar` Logical scalar. Should the simulation be done “semiparametrically”? (See Details.) `conditional` Logical scalar; should the component-selection probabilities be determined conditionally upon the observations? `...` Not used.

## Details

In this context “parametric” bootstrapping means that the bootstrap data sets are generated by simulating from the fitted `ncomp` model parameters, using Gaussian errors. In contrast semiparametric bootstrapping means that the errors are generated by resampling from the residuals. Since at each predictor value there are `ncomp` residuals, one for each component of the model, the errors are selected from these `ncomp` possibilities. If the argument `conditional` is `TRUE` then the selection probabilities at this step are the conditional probabilities, of the observation being generated by each component of the model, given that observation. If `conditional` is `FALSE` then these probabilities are the corresponding entries of `lambda` (see Value. The residuals are sampled independently in either case. The procedure is termed semiparametric (rather than non-parametric) since the sampling probabilities depend on the parameters of the model. Note that it makes no sense to specify `conditional=TRUE` if `semiPar` is `FALSE`. Doing so will generate an error.

## Value

A data frame whose columns consist of the predictors and the simulated response. For the default method the predictor are the columns of the matrix specified by argument `x`. They have names given by argument `xNms` if this was provided and by `X1`, `X2`, ..., `Xn` (where `n` is the number of columns of `x`) or simply `x` if there is only a single predictor. For the `"mixreg"` method the columns are the same as those of `x\$data`, with response column replaced by the simulated response.

## Author(s)

Rolf Turner r.turner@auckland.ac.nz

## References

Turner, T. R. (2000) Estimating the rate of spread of a viral infection of potato plants via mixtures of regressions. Applied Statistics 49, Part 3 pp. 371 – 384.

`mixreg()`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16``` ``` fit <- mixreg(plntsInf ~ aphRel, ncomp=2, data=aphids) sim1 <- rmixreg(fit) with(sim1,plot(aphRel,plntsInf,,main="Parametric simulation")) sim2 <- rmixreg(fit,semiPar=TRUE) with(sim2,plot(aphRel,plntsInf,,main="Semiparametric simulation")) x <- cbind(1:50,rnorm(50)) pmat <- matrix(c(3,5,0.01,1600,0.7,1,2,-0.01,100,0.3),nrow=2,byrow=TRUE) sim3 <- rmixreg(x,theta=pmat,seed=42) with(sim3,plot(X1,y,main="Using rmixreg.default; predictor 1")) with(sim3,plot(X2,y,main="Using rmixreg.default; predictor 2")) pmat <- matrix(c(10,2,0.7,3,1,0.3),nrow=2,byrow=TRUE) sim4 <- rmixreg(x=rep(1,50),theta=pmat,seed=17) sim5 <- rmixreg(x=NULL,nobs=50,theta=pmat,seed=17) # Same as sim4 but # with no columns of 1s. chk4 <- mixreg(y~1,data=sim4,ncomp=2,seed=116) chk5 <- mixreg(y~1,data=sim5,ncomp=2,seed=116) # Same-same. ```