BinaryEPPM: Fitting of EPPM models to binary data.

View source: R/BinaryEPPM.R

BinaryEPPMR Documentation

Fitting of EPPM models to binary data.

Description

Fits regression models to under- and over-dispersed binary data using extended Poisson process models.

Usage

BinaryEPPM(formula, data, subset = NULL, na.action = NULL, 
       weights = NULL, model.type = "p only", 
       model.name = "EPPM extended binomial", link = "cloglog", 
       initial = NULL, method = "Nelder-Mead", 
       pseudo.r.squared.type = "square of correlation", control = NULL)

Arguments

formula

Formulae for the probability of a success p and scale-factor. The object used is from the package Formula of Zeileis and Croissant (2010) which allows multiple parts and multiple responses. "formula" should consist of a left hand side (lhs) of single response variable and a right hand side (rhs) of one or two sets of variables for the linear predictors for the mean and (if two sets) the variance. This is as used for the R function "glm" and also, for example, as for the package "betareg" (Cribari-Neto and Zeileis, 2010). The function identifies from the argument data whether a data frame (as for use of "glm") or a list has been input. The list should be exactly the same as for a data frame except that the response variable is a list of vectors of frequency distributions rather than two vectors of paired counts of number responding (r) out of number tested as for the data frame. The subordinate functions fit models where the response variables are "p.obs", or "scalef.obs" according to the model type being fitted. The values for these response variables are not input as part of "data", they are calculated within the function from a list of grouped binary data input. If the "model.type" is "p only", "formula" consists of a lhs of the response variable and a rhs of the terms of the linear predictor for the mean model. If the "model.type" is "p and scale-factor" there are two sets of terms in the rhs of "formula" i.e., "p.obs" and "scalef.obs" together with the two sets of terms for the linear predictors of p and scale-factor.

data

"data" should be either a data frame (as for use of "glm") or a list. The list should be exactly the same as for a data frame except that the response variable is a list of vectors of frequency distributions rather than a vector of single counts as for the data frame. Only one list is allowed within "data" as it is identified as the dependent variable. If other lists are in "data", for example for use as weights, they should be removed from "data" prior to calling this function. The extracted list can be called using the "weights" argument to this function. Within the function a working list "listcounts" and data frames with components such as "p.obs", "scalef.obs", "covariates", "offset.mean", "offset.variance" are set up . The component "covariates" is a data frame of vectors of covariates in the model. The component "listcounts" is a list of vectors of frequency distributions, or the single pairs of r/n in grouped form if "data" is a data frame.

subset

Subsetting commands.

na.action

Action taken for NAs in data.

weights

Vector of list of lists of weights.

model.type

Takes one of two values i.e. "p only" or "p and scale-factor". The "p only" value fits a linear predictor function to the parameter a in equation (3) of Faddy and Smith (2012). If the model type being fitted is binomial, modeling a is the same as modeling the mean. For the negative binomial the mean is b exp(a)-1), b also being as in equation (3) of Faddy and Smith (2012). The "p and scale-factor" value fits linear predictor functions to both the probability of a success p and the scale-factor.

model.name

If model.type is "p only" the model being fitted is one of the four "binomial", "EPPM extended binomial", "beta binomial", "correlated binomial". If model.type is "p and scale-factor" the model being fitted is either "EPPM extended binomial" i.e. as equations (4) and (6) of Faddy and Smith (2012) or one of the two "beta binomial", "correlated binomial".

link

Takes one of nine values i.e., 'logit', 'probit', 'cloglog', 'cauchit', 'log', 'loglog', 'double exponential', 'double reciprocal', 'power logit'. The default is 'cloglog'. The 'power logit' has an attribute of 'power' for which the default is 1 i.e., a logit link.

initial

This is a vector of initial values for the parameters. If this vector is NULL then initial values based on a fitting binomial models using "glm" are calculated within the function.

method

Takes one of the two values "Nelder-Mead" or "BFGS" these being arguments of optim.

pseudo.r.squared.type

Takes one of the three values "square of correlation", "R square" or "max-rescaled R square". The "default" is as used in Cribari-Neto and Zeileis (2010) and is the square of the correlation between the observed and predicted values on the GLM linear predictor scale. The other two are as described in Cox and Snell (1989), and Nagelkerke (1991) and apply to logistic regression.

control

"control" is a list of control parameters as used in "optim". If this list is NULL the defaults for "optim" are set as "control <- list(fnscale=-1, trace=0, maxit=1000)". The control parameters that can be changed by inputting a variable length list are "fnscale, trace, maxit, abstol, reltol, alpha, beta, gamma". Details of "optim" and its control parameters are available in the online R help manuals.

Value

An object of class "BinaryEPMM" is returned. A list of object items follows.

data.type

The type of the data i.e., data frame or list

list.data

Data as a list of lists of frequency distributions

call

The call of the function

formula

The formula argument

model.type

The type of model being fitted

model.name

The model being fitted

link

The link function

covariates.matrix.p

The design matrix for the probability of a success

covariates.matrix.scalef

The design matrix for the scalefactor

offset.p

The offset vector for the probability of a success

offset.scalef

The offset vector for the scalefactor

coefficients

Estimates of model parameters

loglikelihood

Loglikelihood

vcov

The variance/covariance matrix

n

The number of observations

nobs

The number of observations

df.null

The degrees of freedom of the null model

df.residual

The degrees of freedom of the residual

vnmax

Vector of maximums of grouped count data vectors in list.counts

weights

Vector or list of weights

converged

Whether the iterative process converged, TRUE or FALSE

iterations

Number of iterations taken

method

Method for optim either Nelder-Mead or BFGS

pseudo.r.squared

Pseudo R**2 value

start

Starting values for iterative process

optim

Estimates of model parameters

control

Control parameters for optim

fitted.values

Fitted values for probability of success

y

Dependent variable

terms

Terms in model fitted

Author(s)

David M. Smith <dmccsmith@verizon.net>

References

Cox DR, Snell EJ. (1989). Analysis of Binary Data. Second Edition. Chapman & Hall.

Cribari-Neto F, Zeileis A. (2010). Beta Regression in R. Journal of Statistical Software, 34(2), 1-24. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v034.i02")}.

Grun B, Kosmidis I, Zeileis A. (2012). Extended Beta Regression in R: Shaken, Stirred, Mixed, and Partitioned. Journal of Statistical Software, 48(11), 1-25. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v048.i11")}.

Faddy M, Smith D. (2012). Extended Poisson Process Modeling and Analysis of Grouped Binary Data. Biometrical Journal, 54, 426-435. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1002/bimj.201100214")}.

Nagelkerke NJD. (1991). A Note on a General Definition of the Coefficient of Determination. Biometrika, 78, 691-692.

Smith D, Faddy M. (2019). Mean and Variance Modeling of Under-Dispersed and Over-Dispersed Grouped Binary Data. Journal of Statistical Software, 90(8), 1-20. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v090.i08")}.

Zeileis A, Croissant Y. (2010). Extended Model Formulas in R: Multiple Parts and Multiple Responses. Journal of Statistical Software, 34(1), 1-13. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v034.i01")}.

See Also

CountsEPPM betareg

Examples

data("ropespores.case") 
output.fn <- BinaryEPPM(data = ropespores.case,
                  number.spores / number.tested ~ 1 + offset(logdilution),
                  model.type = "p only", model.name = "binomial")   
summary(output.fn)

BinaryEPPM documentation built on June 22, 2024, 10:30 a.m.