ExLinear: Artificial Data from a Generalized Linear Regression Mixture

View source: R/examples.R

ExLinearR Documentation

Artificial Data from a Generalized Linear Regression Mixture

Description

Generate random data mixed from k generalized linear regressions (GLMs).

Usage

ExLinear(beta, n, xdist = "runif", xdist.args = NULL,
         family = c("gaussian","poisson"), sd = 1, ...)

Arguments

beta

A matrix of regression coefficients. Each row corresponds to a variable, each column to a mixture component. The first row is used as intercept.

n

Integer, the number of observations per component.

xdist

Character, name of a random number function for the explanatory variables.

xdist.args

List, arguments for the random number functions.

family

A character string naming a GLM family.Only "gaussian" and "poisson" are implemented at the moment.

sd

Numeric, the error standard deviation for each component for Gaussian responses.

...

Used as default for xdist.args if that is not specified.

Details

First, arguments n (and sd for Gaussian response) are recycled to the number of mixture components ncol(beta), and arguments xdist and xdist.args are recycled to the number of explanatory variables nrow(beta)-1. Then a design matrix is created for each mixture component by drawing random numbers from xdist. For each component, the design matrix is multiplied by the regression coefficients to form the linear predictor. For Gaussian responses the identity link is used, for Poisson responses the log link.

The true cluster memberships are returned as attribute "clusters".

Examples

## simple example in 2d
beta <- matrix(c(1, 2, 3, -1), ncol = 2)
sigma <- c(.5, 1)
df1 <- ExLinear(beta, 100, sd = sigma, min = -1, max = 2)
plot(y~x1, df1, col = attr(df1, "clusters"))

## add a second explanatory variable with exponential distribution
beta2 <- rbind(beta, c(-2, 2))
df2 <-  ExLinear(beta2, 100, sd = c(.5, 1),
                 xdist = c("runif", "rexp"),
                 xdist.args = list(list(min = -1, max = 2),
                                   list(rate = 3)))

summary(df2)

opar = par("mfrow")
par(mfrow = 1:2)
hist(df2$x1)
hist(df2$x2)
par(opar)

f1 <- flexmix(y ~ ., data = df2, k = 2)

## sort components on Intercept
f1 <- relabel(f1, "model", "Intercept")

## the parameters should be close to the true beta and sigma
round(parameters(f1), 3)
rbind(beta2, sigma)

### A simple Poisson GLM
df3 <- ExLinear(beta/2, 100, min = -1, max = 2, family = "poisson")
plot(y ~ x1, df3, col = attr(df3, "clusters"))

f3 <- flexmix(y ~ ., data = df3, k = 2,
              model = FLXMRglm(family = "poisson"))
round(parameters(relabel(f3, "model", "Intercept")), 3)
beta/2

flexmix documentation built on March 31, 2023, 8:36 p.m.