rmixreg: Simulate data from a mixture of regressions model.
In mixreg: Functions to Fit Mixtures of Regressions

Description Usage Arguments Details Value Author(s) References See Also Examples

Simulate data from a mixture of regressions model, as specified by the user or as fitted to a data set. The simulation may be done either in a parametric or “semiparametric” manner.

rmixreg(x, ...)
## Default S3 method:
rmixreg(x, nobs, theta, seed = NULL,
                          xNms=NULL, yNm = "y", ...)
## S3 method for class 'mixreg'
rmixreg(x, semiPar = FALSE, conditional=semiPar,
                         seed = NULL, ...)

`x`	For the default method, this is a numeric vector constituting a predictor for a regression model, or a matrix whose columns form such predictors. The number of columns of `x` (or of `as.matrix(x)`) must be equal to the number of linear coefficients, or be one less than this number, otherwise an error is thrown. If the number of columns of `x` is one less than the number of linear coefficients then it is assumed that the missing predictor is the intercept and a column of 1s gets prepended (internally). Note that the number of linear coefficients is determined from argument `theta` (see below). Note that `x` may be set equal to `NULL` in which case data are generated from a model with no predictors, i.e. from a scalar mixture model. In this case `nobs` (see below) must be specified. Data from a scalar mixture model may also be generated by specifying `x` to be a vector of 1s. (In this case `nobs` is ignored.) For the `"mixreg"` method this is an object of class `"mixreg"` as returned by the function `mixreg()`.
`nobs`	Integer scalar, specifying the number of observations to be generated. Used only if argument `x` in `NULL`. Otherwise the number of observations is determined from the length or number of rows of `x`.
`theta`	Either a list or a matrix specifying the parameters of the model from which data are to be simulated. If it is a list it should have components `beta`, `sigsq` and `lambda`. Each of these components is in turn a list of length `K` where `K` is the number of components in the model. The `k`th entry of `beta` is a vector of regression coefficients. These vectors must all have the same length. The `k`th entry of `sigsq` is a positive scalar specifying the error variance for the `k`th component. The `k`th entry of `lambda` is a probability (positive scalar, less than 1) specifying the probability that an observation corresponds to the `k`th component. Note that the `lambda` values must sum to 1. If `theta` is a matrix it is of dimension K x np where `np` is the number of parameters. Note that `np` is equal to p + 2 where p is the number of linear regression coefficients (the other 2 parameters being `sigsq` and `lambda`). The rows of this matrix correspond to components of the mixture and the columns to the parameters of the model.
`seed`	A numeric scalar. If not an integer it gets rounded to the nearest integer (so `seed=pi` has the same impact as `seed=3`. If not supplied, `seed` is selected randomly from `1, 2, ..., 1e5`. The seed for random number generation is set to `seed` in `rmixreg()` before any random number generation is done.
`xNms`	Character vector of names for the predictors other than the intercept (if there is one). This vector must be of length equal to the number of (non-intercept) predictors. This is equal to the number of columns of `x` prior to its having a column of 1s prepended (given that this augmentation of `x` does indeed take place). If not supplied the names used are `X1`, ..., `Xn` where `n` is the number of predictors. (Except, if there is only one predictor it is named by the name of the `x` variable.)
`yNm`	Character scalar; a name for the response.
`semiPar`	Logical scalar. Should the simulation be done “semiparametrically”? (See Details.)
`conditional`	Logical scalar; should the component-selection probabilities be determined conditionally upon the observations?
`...`	Not used.

In this context “parametric” bootstrapping means that the bootstrap data sets are generated by simulating from the fitted ncomp model parameters, using Gaussian errors. In contrast semiparametric bootstrapping means that the errors are generated by resampling from the residuals. Since at each predictor value there are ncomp residuals, one for each component of the model, the errors are selected from these ncomp possibilities. If the argument conditional is TRUE then the selection probabilities at this step are the conditional probabilities, of the observation being generated by each component of the model, given that observation. If conditional is FALSE then these probabilities are the corresponding entries of lambda (see Value. The residuals are sampled independently in either case. The procedure is termed semiparametric (rather than non-parametric) since the sampling probabilities depend on the parameters of the model. Note that it makes no sense to specify conditional=TRUE if semiPar is FALSE. Doing so will generate an error.

A data frame whose columns consist of the predictors and the simulated response. For the default method the predictor are the columns of the matrix specified by argument x. They have names given by argument xNms if this was provided and by X1, X2, ..., Xn (where n is the number of columns of x) or simply x if there is only a single predictor. For the "mixreg" method the columns are the same as those of x$data, with response column replaced by the simulated response.

Rolf Turner r.turner@auckland.ac.nz

Turner, T. R. (2000) Estimating the rate of spread of a viral infection of potato plants via mixtures of regressions. Applied Statistics 49, Part 3 pp. 371 – 384.

mixreg()

   fit  <- mixreg(plntsInf ~ aphRel, ncomp=2, data=aphids)
   sim1 <- rmixreg(fit)
   with(sim1,plot(aphRel,plntsInf,,main="Parametric simulation"))
   sim2 <- rmixreg(fit,semiPar=TRUE)
   with(sim2,plot(aphRel,plntsInf,,main="Semiparametric simulation"))
   x    <- cbind(1:50,rnorm(50))
   pmat <- matrix(c(3,5,0.01,1600,0.7,1,2,-0.01,100,0.3),nrow=2,byrow=TRUE)
   sim3 <- rmixreg(x,theta=pmat,seed=42)
   with(sim3,plot(X1,y,main="Using rmixreg.default; predictor 1"))
   with(sim3,plot(X2,y,main="Using rmixreg.default; predictor 2"))
   pmat <- matrix(c(10,2,0.7,3,1,0.3),nrow=2,byrow=TRUE)
   sim4 <- rmixreg(x=rep(1,50),theta=pmat,seed=17)
   sim5 <- rmixreg(x=NULL,nobs=50,theta=pmat,seed=17) # Same as sim4 but
                                                      # with no columns of 1s.
   chk4 <- mixreg(y~1,data=sim4,ncomp=2,seed=116) 
   chk5 <- mixreg(y~1,data=sim5,ncomp=2,seed=116) # Same-same.