pattnpml.fit: NPML estimation for paired comparison models

pattnpml.fitR Documentation

NPML estimation for paired comparison models

Description

Fits a mixture model to overdispersed paired comparison data using non-parametric maximum likelihood (Aitkin, 1996a).

Usage

pattnpml.fit(formula, random = ~1, k = 1, design,
        tol = 0.5, startp = NULL, EMmaxit = 500, EMdev.change = 0.001,
        seed = NULL, pr.it = FALSE)

Arguments

formula

A formula defining the response (the count of the number of cases of each pattern) and the fixed effects (e.g. y ~ x).

random

A formula defining the random model. If there are three objects labelled o1, o2, o3, set random = ~o1+o2+o3 to model overdispersion. For more details, see below.

k

The number of mass points (latent classes). Up to 21 mass points are supported.

design

The design data frame for paired comparison data as generated using patt.design (mandatory, even if it is attached to the workspace!).

tol

The tol scalar (usually, 0 < tol <= 1). This scalar sets the scaling factor for the locations of the initial mass points. A larger value means that the starting point locations are more widely spread.

startp

Optional numerical vector of length k specifying the starting probabilities for the mass points to initialise the EM algorithm. The default is to take Gaussian quadrature probabilities.

EMmaxit

The maximum number of EM iterations.

EMdev.change

Stops EM algorithm when deviance change falls below this value.

seed

Seed for random weights. If NULL, the seed is set using the system time.

pr.it

A dot is printed at each iteration cycle of the EM algorithm if set to TRUE.

Details

The function pattnpml.fit is a wrapper function for alldistPC which in turn is a modified version of the function alldist from the npmlreg package.

The non-parametric maximum likelihood (NPML) approach was introduced in Aitkin (1996) as a tool to fit overdispersed generalised linear models. The idea is to approximate the unknown and unspecified distribution of the random effect by a discrete mixture of exponential family densities, leading to a simple expression of the marginal likelihood which can then be maximised using a standard EM algorithm.

This function extends the NPML approach to allow fitting of overdispersed paired comparison models. It assumes that overdispersion arises because of dependence in the patterns. Fitting a non-parametric random effects term is equivalent to specifying distinct latent classes of response patterns.

The number of components k of the finite mixture has to be specified beforehand.

The EM algorithm used by the function takes the Gauss-Hermite masses and mass points as starting points. The position of the starting points can be concentrated or extended by setting tol smaller or larger, respectively; the initial mass point probabilities of the starting points can also be specified through startp.

Fitting models for overdispersion can be achieved by specifying the paired comparison items as additive terms in the random part of the model formula. A separate estimate for each item and for each mass point is produced.

Fitting subject covariate models with the same effect for each mass point component is achieved by specifying as part of the formula a) a subject factor giving a different estimate for each covariate combination b) an interaction of the chosen subject covariates with the objects. For models with subject factor covariates only, the first term is simply the interaction of all of the factor covariates.

Fitting subject covariate models with a different effect for each mass point component (sometimes called random coefficient models, see Aitkin, Francis, Hinde and Darnell, 2009, pp. 497) is possible by specifying an interaction of the subject covariates with the items in the random term, and also in the formula part. Thus the setting random = ~x:(o1+o2+o3 gives a model with a set of random slopes (one set for each mass point) and a set of random intercepts, one set for each mass point.

The AIC and BIC functions from the stats-package can be used.

Value

The function produces an object of class pattNPML. The object contains the following 29 components:

coefficients

a named vector of coefficients (including the mass points). In case of Gaussian quadrature, the coefficient given at z corresponds to the standard deviation of the mixing distribution.

residuals

the difference between the true response and the empirical Bayes predictions.

fitted.values

the empirical Bayes predictions (Aitkin, 1996b) on the scale of the responses.

family

the ‘family’ object used.

linear.predictors

the extended linear predictors eta_ik.

disparity

the disparity (-2logL) of the fitted mixture regression model.

deviance

the deviance of the fitted mixture regression model.

null.deviance

The deviance for the null model (just containing an intercept), comparable with ‘deviance.’

df.residual

the residual degrees of freedom of the fitted model (including the random part).

df.null

the residual degrees of freedom for the null model.

y

the (extended) response vector.

call

the matched call.

formula

the formula supplied.

random

the random term of the model formula.

data

the data argument.

model

the (extended) design matrix.

weights

the case weights initially supplied.

offset

the offset initially supplied.

mass.points

the fitted mass points.

masses

the mass point probabilities corresponding to the patterns.

sdev

a list of the two elements sdev$sdev and sdev$sdevk. The former is the estimated standard deviation of the Gaussian mixture components (estimated over all mixture components), and the latter gives the unequal or smooth component-specific standard deviations. All values are equal if lambda = 0.

shape

a list of the two elements shape$shape and shape$shapek, to be interpreted in analogy to sdev.

rsdev

estimated random effect standard deviation.

post.prob

a matrix of posteriori probabilities.

post.int

a vector of ‘posteriori intercepts’ (as in Sofroniou et al. (2006)).

ebp

the empirical Bayes Predictions on the scale of the linear predictor. For compatibility with older versions.

EMiter

gives the number of iterations of the EM algorithm.

EMconverged

logical value indicating if the EM algorithm converged.

lastglm

the fitted glm object from the last EM iteration.

Misc

contains additional information relevant for the summary and plot functions, in particular the disparity trend and the EM trajectories.

For further details see the help file for function alldist in package npmlreg.

Note

The mass point probabilities given in the output are the proportion of patterns estimated to contribute to each mass point. To estimate the proportion of cases contributing to each mass point the posterior probabilities need to be averaged over patterns with observed counts as weights (see example below).

Author(s)

Originally translated from the GLIM 4 functions alldist and allvc (Aitkin & Francis, 1995) to R by Ross Darnell (2002). Modified, extended, and prepared for publication by Jochen Einbeck and John Hinde (2006). Adapted for paired comparison modelling by Reinhold Hatzinger and Brian Francis (2009).

References

Aitkin, M. (1996). A general maximum likelihood analysis of overdispersion in generalized linear models. Statistics and Computing, 6(3), 251–262. doi: 10.1007/BF00140869

Aitkin, M., Francis, B., Hinde, J., & Darnell, R. (2009). Statistical Modelling in R. Oxford: Oxford University Press.

Einbeck, J., & Hinde, J. (2006). A Note on NPML Estimation for Exponential Family Regression Models with Unspecified Dispersion Parameter. Austrian Journal of Statistics, 35(2&3), 233–243.

Sofroniou, N., Einbeck, J., & Hinde, J. (2006). Analyzing Irish suicide rates with mixture models. Proceedings of the 21st International Workshop on Statistical Modelling in Galway, Ireland, 2006.

See Also

glm

Examples

# two latent classes for paired comparison data
dfr   <- patt.design(dat4, 4)
modPC <- pattnpml.fit(y ~ 1, random = ~o1 + o2 + o3, k = 2, design = dfr)
modPC

# estimated proportion of cases in each mixture component
apply(modPC$post.prob, 2, function(x){ sum(x * dfr$y / sum(dfr$y)) })

## Not run: 
# fitting a model for two latent classes and fixed categorical subject
# covariates to the Eurobarometer 55.2 data (see help("euro55.2.des"))
# on rankings of sources of information on scientific developments

model2cl <- pattnpml.fit(
  y ~ SEX:AGE4 + (SEX + AGE4):(TV + RAD + NEWSP + SCIMAG + WWW + EDINST) - 1,
  random = ~ TV + RAD + NEWSP + SCIMAG + WWW + EDINST,
  k = 2, design = euro55.2.des, pr.it = TRUE)
summary(model2cl)
BIC(model2cl)
## End(Not run)

prefmod documentation built on June 11, 2022, 3 p.m.