Description Usage Arguments Value Warning Notes Author(s) References See Also Examples
Estimates the parameters for a mixture of linear regressions, assuming Gaussian errors, using the EM algorithm.
1 2 3 4 |
x |
Either a formula specifying the regression model in question
or a matrix of predictors for that regression model.
In the latter case |
y |
The vector of responses for the regression models. Not used and
should not be specified if |
ncomp |
The number of components in the mixture. If |
intercept |
Logical scalar; should the linear regressions should have
intercepts fitted? Ignored if |
eqVar |
Logical scalar; should the error variance should be the same for all component? (The alternative is that each component should be allowed a different error variance.) |
thetaStart |
Either a list or a matrix providing starting values for the
estimation procedure. If it is a list each of its entries is in
turn a list with entries If |
itmax |
The maximum number of EM steps to be undertaken. If this maximum
number of steps is exceeded then a warning is issued, but a
fit (probably unreliable) is returned anyway. The returned
value will, in this case, have the |
eps |
A value specifying the convergence criterion for the EM algorithm.
If the maximum absolute value of the change in the parameters is
less than |
verb |
Logical argument; if |
digits |
The number of digits to which the details are printed out, when
|
maxTry |
If the algorithm encounters a singularity in the likelihood (as may possibly occur when eqVar is FALSE) the algorithm is restarted using new (randomly generated) starting values. The restart is attempted a maximum of maxTry times. |
seed |
A numeric scalar used to seed the random number generators if
|
data |
A list or data frame in which the values of the data |
covMat |
Logical scalar; should the covariance matrix of the parameter estimates be calculated? (This can take some time.) |
MC |
Logical scalar; should the covariance matrix of the parameter estimates
be calculated by a Monte Carlo procedure? Ignored if |
warn |
Logical scalar. Should a warning be issued if the EM algorithm fails
to converge in |
... |
Optional arguments which are passed to |
A list, of class mixreg, with components
parmat |
The parameters of the fitted model arranged as a matrix, each row corresponding to one component of the mixture. |
theta |
The parameters of the fitted model as a list, each entry of the list being itself a list (like those in thetaStart) corresponding to one component of the mixture. |
log.like |
The log likelihood of the fitted model, based on Gaussian errors. |
aic |
The Akaike Information Criterion value for the fitted model; aic is equal to -2 * log.like + 2*M where M is the number of parameters in the model. |
intercept |
The value of the |
eqVar |
The value of the |
nsteps |
The number of steps the EM algorithm took to converge. |
converged |
Logical scalar indicating whether the algorithm did indeed 'converge or stopped because it reach the itmax EM step. |
data |
A list or data frame providing the data |
formula |
The formula used by |
The returned value has an attribute "seed"
specifying the
seed for the random number generators that was used in producing
random starting values. If thetaStart
was specified then
the attribute "seed"
is NA
.
If the x
argument is a formula, then specifying y
is not only meaningless, but will cause a (possibly mystifying)
error to be thrown.
This function (and indeed the entire mixreg
package) is
structured so as to be able to deal with mixtures of regressions
involving any number of predictors. However I know of no practical
examples in which more than one predictor is involved, and it seems
likely that models involving more than one predictor would present
substantial difficulties. Some of the functions in this package
(cband()
, visualFit()
, plot.mixreg()
, ...)
can cope only with single-predictor models.
The entries parmat
and theta
of the returned
value contain the same information, presented in a different format.
Even if eqVar
is TRUE
, each entry
of theta
still has its own sigsq
entry.
The values of these will all be equal, however, if eqVar
is TRUE
.
For scalar mixtures, allowing the components to
have different variances can have the effect of introducing
singularities in the likelihood function. It is not clear
to me what the impact of allowing different variances is in
the mixtures-of-regressions setting. In respect of the scalar
setting, Aitkin and Tunnicliffe Wilson (see References)
assert that the singularities that may arise “do not
cause any computational difficulty in practice”. However in the
mixtures of regressions setting, I have observed strange anomalies
which appear to be alleviated by setting eqVar=TRUE
.
See Examples. If results seem to be unsatisfactory, you
may be well-advised to try setting eqVar=TRUE
, to see if
that makes an improvement.
Rolf Turner r.turner@auckland.ac.nz
T. Rolf Turner (2000). Estimating the rate of spread of a viral infection of potato plants via mixtures of regressions. Applied Statistics 49 Part 3, pp. 371 – 384.
A. P. Dempster, N. M. Laird and D. B. Rubin (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society series B, 39, pp. 1 – 22.
M. Aitkin and G. Tunnicliffe Wilson (1980). Mixture models, outliers, and the EM algorithm. Technometrics 22, pp. 325 – 331.
ncMcTest()
, cband()
,
qqMix()
, residuals.mixreg()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | # Aphids.
fit1 <- mixreg(aphRel,plntsInf,ncomp=2,seed=42,data=aphids)
plot(fit1)
thStrt <- list(list(beta=c(3.0,0.1),sigsq=16,lambda=0.5),
list(beta=c(0.0,0.0),sigsq=16,lambda=0.5))
fit2 <- mixreg(aphRel,plntsInf,ncomp=2,thetaStart=thStrt,data=aphids)
fit3 <- mixreg(plntsInf ~ aphRel,ncomp=2,thetaStart=thStrt,data=aphids)
# Kilns.
## Not run:
vfit <- visualFit(y ~ x,ncomp=3, data=kilnAoneOut)
fit <- mixreg(y ~ x,ncomp=3,data=kilnAoneOut,thetaStart=vfit$theta)
## End(Not run)
thStrt <- list(
list(beta=c(26.07,48808),sigsq=1.1573,lambda=0.33333333),
list(beta=c(23.48,32387),sigsq=1.8730,lambda=0.33333333),
list(beta=c(-0.0597,20760),sigsq=0.2478,lambda=0.33333333)
)
# Roughly the value of vfit$theta.
fit <- mixreg(y ~ x,ncomp=3,data=kilnAoneOut,thetaStart=thStrt)
plot(fit)
# Kilns, zero intercept model.
## Not run:
vfit <- visualFit(y ~ x - 1,ncomp=3, data=kilnAoneOut)
fit <- mixreg(y ~ x - 1,ncomp=3,data=kilnAoneOut,thetaStart=vfit$theta)
## End(Not run)
thStrt <- list(list(beta=50900,sigsq=3.297,lambda=0.33333333),
list(beta=33800,sigsq=25.52,lambda=0.33333333),
list(beta=20755,sigsq=0.2477,lambda=0.33333333))
# Roughly the value of vfit$theta.
fit <- mixreg(y ~ x - 1,ncomp=3,data=kilnAoneOut,thetaStart=thStrt)
plot(fit) # Yikes!!! (But a plot of vfit looks practically perfect.)
fit <- mixreg(y ~ x - 1,ncomp=3,data=kilnAoneOut,thetaStart=thStrt,eqVar=TRUE)
plot(fit) # Looks fine.
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.