rmixreg: Simulate data from a mixture of regressions model.

Description Usage Arguments Details Value Author(s) References See Also Examples


Simulate data from a mixture of regressions model, as specified by the user or as fitted to a data set. The simulation may be done either in a parametric or “semiparametric” manner.


rmixreg(x, ...)
## Default S3 method:
rmixreg(x, nobs, theta, seed = NULL,
                          xNms=NULL, yNm = "y", ...)
## S3 method for class 'mixreg'
rmixreg(x, semiPar = FALSE, conditional=semiPar,
                         seed = NULL, ...) 



For the default method, this is a numeric vector constituting a predictor for a regression model, or a matrix whose columns form such predictors. The number of columns of x (or of as.matrix(x)) must be equal to the number of linear coefficients, or be one less than this number, otherwise an error is thrown. If the number of columns of x is one less than the number of linear coefficients then it is assumed that the missing predictor is the intercept and a column of 1s gets prepended (internally). Note that the number of linear coefficients is determined from argument theta (see below). Note that x may be set equal to NULL in which case data are generated from a model with no predictors, i.e. from a scalar mixture model. In this case nobs (see below) must be specified.

Data from a scalar mixture model may also be generated by specifying x to be a vector of 1s. (In this case nobs is ignored.)

For the "mixreg" method this is an object of class "mixreg" as returned by the function mixreg().


Integer scalar, specifying the number of observations to be generated. Used only if argument x in NULL. Otherwise the number of observations is determined from the length or number of rows of x.


Either a list or a matrix specifying the parameters of the model from which data are to be simulated. If it is a list it should have components beta, sigsq and lambda. Each of these components is in turn a list of length K where K is the number of components in the model. The kth entry of beta is a vector of regression coefficients. These vectors must all have the same length. The kth entry of sigsq is a positive scalar specifying the error variance for the kth component. The kth entry of lambda is a probability (positive scalar, less than 1) specifying the probability that an observation corresponds to the kth component. Note that the lambda values must sum to 1. If theta is a matrix it is of dimension K x np where np is the number of parameters. Note that np is equal to p + 2 where p is the number of linear regression coefficients (the other 2 parameters being sigsq and lambda). The rows of this matrix correspond to components of the mixture and the columns to the parameters of the model.


A numeric scalar. If not an integer it gets rounded to the nearest integer (so seed=pi has the same impact as seed=3. If not supplied, seed is selected randomly from 1, 2, ..., 1e5. The seed for random number generation is set to seed in rmixreg() before any random number generation is done.


Character vector of names for the predictors other than the intercept (if there is one). This vector must be of length equal to the number of (non-intercept) predictors. This is equal to the number of columns of x prior to its having a column of 1s prepended (given that this augmentation of x does indeed take place). If not supplied the names used are X1, ..., Xn where n is the number of predictors. (Except, if there is only one predictor it is named by the name of the x variable.)


Character scalar; a name for the response.


Logical scalar. Should the simulation be done “semiparametrically”? (See Details.)


Logical scalar; should the component-selection probabilities be determined conditionally upon the observations?


Not used.


In this context “parametric” bootstrapping means that the bootstrap data sets are generated by simulating from the fitted ncomp model parameters, using Gaussian errors. In contrast semiparametric bootstrapping means that the errors are generated by resampling from the residuals. Since at each predictor value there are ncomp residuals, one for each component of the model, the errors are selected from these ncomp possibilities. If the argument conditional is TRUE then the selection probabilities at this step are the conditional probabilities, of the observation being generated by each component of the model, given that observation. If conditional is FALSE then these probabilities are the corresponding entries of lambda (see Value. The residuals are sampled independently in either case. The procedure is termed semiparametric (rather than non-parametric) since the sampling probabilities depend on the parameters of the model. Note that it makes no sense to specify conditional=TRUE if semiPar is FALSE. Doing so will generate an error.


A data frame whose columns consist of the predictors and the simulated response. For the default method the predictor are the columns of the matrix specified by argument x. They have names given by argument xNms if this was provided and by X1, X2, ..., Xn (where n is the number of columns of x) or simply x if there is only a single predictor. For the "mixreg" method the columns are the same as those of x$data, with response column replaced by the simulated response.


Rolf Turner r.turner@auckland.ac.nz


Turner, T. R. (2000) Estimating the rate of spread of a viral infection of potato plants via mixtures of regressions. Applied Statistics 49, Part 3 pp. 371 – 384.

See Also



   fit  <- mixreg(plntsInf ~ aphRel, ncomp=2, data=aphids)
   sim1 <- rmixreg(fit)
   with(sim1,plot(aphRel,plntsInf,,main="Parametric simulation"))
   sim2 <- rmixreg(fit,semiPar=TRUE)
   with(sim2,plot(aphRel,plntsInf,,main="Semiparametric simulation"))
   x    <- cbind(1:50,rnorm(50))
   pmat <- matrix(c(3,5,0.01,1600,0.7,1,2,-0.01,100,0.3),nrow=2,byrow=TRUE)
   sim3 <- rmixreg(x,theta=pmat,seed=42)
   with(sim3,plot(X1,y,main="Using rmixreg.default; predictor 1"))
   with(sim3,plot(X2,y,main="Using rmixreg.default; predictor 2"))
   pmat <- matrix(c(10,2,0.7,3,1,0.3),nrow=2,byrow=TRUE)
   sim4 <- rmixreg(x=rep(1,50),theta=pmat,seed=17)
   sim5 <- rmixreg(x=NULL,nobs=50,theta=pmat,seed=17) # Same as sim4 but
                                                      # with no columns of 1s.
   chk4 <- mixreg(y~1,data=sim4,ncomp=2,seed=116) 
   chk5 <- mixreg(y~1,data=sim5,ncomp=2,seed=116) # Same-same.

mixreg documentation built on Oct. 14, 2021, 9:12 a.m.