cnmms | R Documentation |
Functions cnmms
, cnmpl
and cnmap
can be used to compute the maximum likelihood estimate of a
semiparametric mixture model that has a one-dimensional mixing
parameter. The types of mixture models that can be computed
include finite, nonparametric and semiparametric ones.
Function cnmms
can also be used to compute the maximum
likelihood estimate of a finite or nonparametric mixture model.
A finite mixture model has a density of the form
f(x; \pi, \theta, \beta) = \sum_{j=1}^k \pi_j f(x; \theta_j,
\beta).
where pi_j \ge 0
and \sum_{j=1}^k pi_j
=1
.
A nonparametric mixture model has a density of the form
f(x; G) = \int f(x; \theta) d G(\theta),
where G
is a mixing distribution
that is completely unspecified. The maximum likelihood estimate of
the nonparametric G
, or the NPMLE of $G
, is known to
be a discrete distribution function.
A semiparametric mixture model has a density of the form
f(x; G, \beta) = \int f(x; \theta, \beta) d G(\theta),
where G
is a mixing distribution that is completely
unspecified and \beta
is the structural parameter.
Of the three functions, cnmms
is recommended for most
problems; see Wang (2010).
Functions cnmms
, cnmpl
and cnmap
implement
the algorithms CNM-MS, CNM-PL and CNM-AP that are described in
Wang (2010). Their implementations are generic using S3
object-oriented programming, in the sense that they can work for
an arbitrary family of mixture models that is defined by the
user. The user, however, needs to supply the implementations of
the following functions for their self-defined family of mixture
models, as they are needed internally by the functions above:
initial(x, beta, mix, kmax)
valid(x, beta)
logd(x, beta, pt, which)
gridpoints(x, beta, grid)
suppspace(x, beta)
length(x)
print(x, ...)
weight(x, ...)
While not needed by the algorithms, one may also implement
plot(x, mix, beta, ...)
so that the fitted model can be shown graphically in a way that the user desires.
For creating a new class, the user may consult the implementations
of these functions for the families of mixture models included in
the package, e.g., cvp
and mlogit
.
cnmms(x, init=NULL, maxit=1000, model=c("spmle","npmle"), tol=1e-6,
grid=100, kmax=Inf, plot=c("null", "gradient", "probability"),
verbose=0)
cnmpl(x, init=NULL, tol=1e-6, tol.npmle=tol*1e-4, grid=100, maxit=1000,
plot=c("null", "gradient", "probability"), verbose=0)
cnmap(x, init=NULL, maxit=1000, tol=1e-6, grid=100, plot=c("null",
"gradient"), verbose=0)
x |
a data object of some class that can be defined fully by the user |
init |
list of user-provided initial values for the mixing
distribution |
maxit |
maximum number of iterations |
model |
the type of model that is to estimated:
non-parametric MLE ( |
tol |
a tolerance value that is used to terminate an
algorithm. Specifically, the algorithm is terminated, if the
relative increase of the log-likelihood value after an iteration
is less than |
grid |
number of grid points that are used by the algorithm
to locate all the local maxima of the gradient function. A
larger number increases the chance of locating all local maxima,
at the expense of an increased computational cost. The locations
of the grid points are determined by the function
|
kmax |
upper bound on the number of support points. This is particularly useful for fitting a finite mixture model. |
plot |
whether a plot is produced at each iteration. Useful
for monitoring the convergence of the algorithm. If |
verbose |
verbosity level for printing intermediate results in each iteration, including none (= 0), the log-likelihood value (= 1), the maximum gradient (= 2), the support points of the mixing distribution (= 3), the mixing proportions (= 4), and if available, the value of the structural parameter beta (= 5). |
tol.npmle |
a tolerance value that is used to terminate the computing of the NPMLE internally. |
family |
the class of the mixture family that is used to fit to the data. |
num.iterations |
Number of iterations required by the algorithm |
grad |
For |
max.gradient |
Maximum value of the gradient function,
evaluated at the beginning of the final iteration. It is only
given by function |
convergence |
convergence code. |
ll |
log-likelihood value at convergence |
mix |
MLE of the mixing distribution, being an object of the
class |
beta |
MLE of the structural parameter |
Yong Wang <yongwang@auckland.ac.nz>
Wang, Y. (2007). On fast computation of the non-parametric maximum likelihood estimate of a mixing distribution. Journal of the Royal Statistical Society, Ser. B, 69, 185-198.
Wang, Y. (2010). Maximum likelihood computation for fitting semiparametric mixture models. Statistics and Computing, 20, 75-86
nnls
, cnm
,
cvp
, cvps
, mlogit
.
## Compute the MLE of a finite mixture
x = rnpnorm(100, disc(c(0,4), c(0.7,0.3)), sd=1)
for(k in 1:6) plot(cnmms(x, kmax=k), x, add=(k>1), comp="null", col=k+1,
main="Finite Normal Mixtures")
legend("topright", 0.3, leg=paste0("k = ",1:6), lty=1, lwd=2, col=2:7)
## Compute a semiparametric MLE
# Common variance problem
x = rcvps(k=50, ni=5:10, mu=c(0,4), pr=c(0.7,0.3), sd=3)
cnmms(x) # CNM-MS algorithm
cnmpl(x) # CNM-PL algorithm
cnmap(x) # CNM-AP algorithm
# Logistic regression with a random intercept
x = rmlogit(k=30, gi=3:5, ni=6:10, pt=c(0,4), pr=c(0.7,0.3),
beta=c(0,3))
cnmms(x)
data(toxo) # k = 136
cnmms(mlogit(toxo))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.