# Fitting Log-Linear Models via Maximum Likelihood

### Description

`loglinML`

fits log-linear models by ML (maximum likelihood).
For complete data, it is based on a object of the class `readCatdata`

.
For missing data, it is based on a object of the class `satMarML`

(under MAR or MCAR).
Depending on the formulation (freedom equations or constraints) and on the model
type (ordinary or generalized log-linear model), different arguments must be informed.

### Usage

1 2 |

### Arguments

`obj` |
object of the class |

`A` |
a matrix that specifies the log-linear functions of the probabilities to be
modeled; by default, it is
diag( |

`X` |
a model specification matrix for the freedom equation formulation of the ordinary log-linear model. |

`U` |
a matrix for the constraint formulation of the ordinary log-linear specification. |

`XL` |
a model specification matrix for the freedom equation formulation of the generalized log-linear model. |

`UL` |
a matrix for the constraint formulation of the generalized log-linear model. |

`start` |
by default, the function uses the proportions of the complete data as starting
values in the iterative process, but the current argument allows the user to inform an
alternative starting value for the parameters of the model if the freedom equation formulation
is considered and the matrix |

`maxit` |
the maximum number of iterations (the default is 100). |

`trace` |
the alternatives are: |

`epsilon1` |
the convergence criterion of the iterative process is attained if the
absolute difference of the values of the likelihood ratio statistic in successive iterations
is less than the value defined in |

`epsilon2` |
the convergence criterion of the iterative process is attained if the
absolute differences of the values of estimates for all parameters of the marginal
probabilities of categorization in consecutive iterations are less than the value defined
in |

`zeroN` |
values used to replace null frequencies in the denominator of the Neyman statistic;
by default, the function replaces the values by |

`digits` |
integer value indicating the number of decimal places to round results when shown
by |

### Details

`loglinML`

handles both ordinary and generalized log-linear model types either under
a freedom equation formulation or under a constraint formulation.
X and U are used for ordinary log-linear models, and XL and UL are used for generalized
log-linear models.
X and XL are used for the freedom equation formulation, and U and UL are used for the constraint
formulation.
Namely, the 4 ways with which the function allows to specify the model are:
log(Theta)=nu+X%*%Beta, U%*%log(Theta)=0, A%*%log(Theta)=XL%*%Beta,
UL%*%A%*%log(Theta)=0, where nu are non-estimated parameters included only to satisfy the
natural constraints of the product-multinomial distribution and Beta are the parameters to be
estimated.

The generic functions `print`

and `summary`

are used to print the results and to obtain a
summary thereof.

### Value

An object of the class `loglinML`

is a list containing most of the components of the
argument `obj`

as well as the following components:

`thetaH` |
vector of ML estimates for all product-multinomial probabilities under the log-linear model for the marginal probabilities of categorization and, in the case of missing data, under an assumption of an ignorable missingness mechanism. |

`VthetaH` |
corresponding estimated covariance matrix. |

`beta` |
vector of ML estimates for the parameters of the log-linear model (only for freedom equation formulation). |

`Vbeta` |
corresponding estimated covariance matrix (only for the freedom equation formulation). |

`Fu` |
observed log-linear functions, without model constraints. |

`VFu` |
corresponding estimated covariance matrix. |

`FH` |
ML estimates for the log-linear functions under the fitted model. |

`VFH` |
corresponding estimated covariance matrix. |

`QvH` |
likelihood ratio statistic for testing the goodness of fit of the log-linear model (for missing data, conditional on the assumed missingness mechanism). |

`QpH` |
Pearson statistic for testing the goodness of fit of the log-linear model (for missing data, conditional on the assumed missingness mechanism). |

`QnH` |
Neyman statistic for testing the goodness of fit of the log-linear model (for missing data, conditional on the assumed missingness mechanism). |

`QwH` |
Wald statistic for testing the goodness of fit of the log-linear model (for missing data, conditional on the assumed missingness mechanism). |

`glH` |
degrees of freedom for testing the goodness of fit of the log-linear model (for missing data, conditional on the assumed missingness mechanism). |

`QvHMCAR` |
likelihood ratio statistic for the conditional test of both the log-linear model and MCAR given a MAR assumption (for missing data only). |

`QpHMCAR` |
Pearson statistic for the conditional test of both the log-linear model and MCAR given a MAR assumption (for missing data only). |

`QnHMCAR` |
Neyman statistic for the conditional test of both the log-linear model and MCAR given a MAR assumption (for missing data only). |

`glHMCAR` |
degrees of freedom for the conditional test of both the log-linear model and MCAR given a MAR assumption (for missing data only). |

`ystH` |
for complete data, it has the ML estimates for the frequencies under the log-linear model; for missing data, it has the ML estimates for the augmented frequencies under both the log-linear model and the assumed missingness mechanism. |

### Author(s)

Frederico Zanqueta Poleto(frederico@poleto.com)

Julio da Motta Singer (jmsinger@ime.usp.br)

Carlos Daniel Paulino (daniel.paulino@math.ist.utl.pt)

with the collaboration of

Fabio Mathias Correa (fmcorrea@uesc.br)

Enio Galinkin Jelihovschi (eniojelihovs@gmail.com)

### References

Paulino, C.D. e Singer, J.M. (2006). *Analise de dados categorizados*
(in Portuguese). Sao Paulo: Edgard Blucher.

Poleto, F.Z. (2006). *Analise de dados categorizados com omissao* (in
Portuguese). Dissertacao de mestrado. IME-USP.
http://www.poleto.com/missing.html.

Poleto, F.Z., Singer, J.M. e Paulino, C.D. (2007). *Analyzing categorical
data with complete or missing responses using the Catdata package*. Unpublished
vignette. http://www.poleto.com/missing.html.

Poleto, F.Z., Singer, J.M. e Paulino, C.D. (2012). A product-multinomial
framework for categorical data analysis with missing responses.
To appear in *Brazilian Journal of Probability and Statistics*.
http://imstat.org/bjps/papers/BJPS198.pdf.

Singer, J. M., Poleto, F. Z. and Paulino, C. D. (2007). Catdata: software for
analysis of categorical data with complete or missing responses. *Actas
de la XII Reunion Cientifica del Grupo Argentino de Biometria y I Encuentro
Argentino-Chileno de Biometria*.
http://www.poleto.com/SingerPoletoPaulino2007GAB.pdf.

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | ```
#Example 9.1 of Paulino and Singer (2006)
e91.TF<-c(3,25,32,68)
e91.catdata<-readCatdata(TF=e91.TF)
e91.U<-c(1,-1,-1,1)
e91.X<-rbind(c(0,0), c(0,1),
c(1,0), c(1,1))
e91.X2<-rbind(c(0,0,0),
c(0,1,0),
c(1,0,0),
c(1,1,1))
e91.loglinml1<-loglinML(e91.catdata,U=e91.U)
e91.loglinml2<-loglinML(e91.catdata,X=e91.X)
e91.loglinml3<-loglinML(e91.catdata,X=e91.X2)
e91.loglinml4<-loglinML(e91.catdata,A=c(1,-1,-1,1),XL=1)
# Independence ordinary log-linear model, constraint formulation
e91.loglinml1
# Independence ordinary log-linear model, freedom equation formulation
e91.loglinml2
#Saturated ordinary log-linear model, freedom equation formulation
e91.loglinml3
#Saturated generalized log-linear model, freedom equation formulation
e91.loglinml4
#95% confidence interval for log-odds ratio and for odds ratio
round(e91.loglinml4$beta+c(-1,1)*qnorm(0.975)*sqrt(e91.loglinml4$Vbeta),3)
round(exp(e91.loglinml4$beta),3)
round(exp(e91.loglinml4$beta+c(-1,1)*qnorm(0.975)*sqrt(e91.loglinml4$Vbeta)),3)
#Example 1 of Poleto et al (2012)
smoking.TF<-rbind(c(167,17,19,10,1,3,52,10,11, 176,24,121, 28,10,12),
c(120,22,19, 8,5,1,39,12,12, 103, 3, 80, 31, 8,14))
smoking.Zp<-t(rep(1,2))%x%cbind(diag(3)%x%rep(1,3), rep(1,3)%x%diag(3))
smoking.Rp<-rbind(c(3,3),c(3,3))
smoking.catdata<-readCatdata(TF=smoking.TF,Zp=smoking.Zp,Rp=smoking.Rp)
smoking.catdata #Proportions of the complete data
smoking.satmarml<-satMarML(smoking.catdata)
smoking.satmcarml<-satMarML(smoking.catdata,missing="MCAR")
smoking.E<-rbind(c(1,-1,0),c(0,1,-1))
smoking.A<-diag(2)%x%smoking.E%x%smoking.E
smoking.loglin2.marml<-loglinML(smoking.satmarml,A=smoking.A,XL=rep(1,8))
smoking.loglin2.mcarml<-loglinML(smoking.satmcarml,A=smoking.A,XL=rep(1,8))
``` |