mipfp-package: Multidimensional Iterative Proportional Fitting and...

Description Details Author(s) References See Also Examples

Description

An implementation of several methods for updating an initial N-dimensional array (called a seed) with respect to given target marginal distributions. Those targets can also be multi-dimensional. The procedures are also able to estimate a (multi-dimensional) contingency table (encoded as an array) matching a given set of (multi-dimensional) margins. In that case, each cell of the seed must simply be set to 1.

The package provides the iterative proportional fitting procedure (IPFP), also known as the RAS algorithm in economics and matrix raking or matrix scaling in computer science. Additionnaly several alternative estimating methods to the IPFP are also included, namely the maximum likelihood (ML), minimum chi-squared (CHI2) and weighted least squares (WLSQ) model-based approaches.

The package also includes an application of the IPFP to simulate and estimate the parameters of multivariate Bernoulli distributions.

Finally a function extracting the linearly independant columns from a matrix, hence returning a matrix of full rank is provided.

Details

Package: mipfp
Type: Package
Version: 3.2.1
Date: 2018-08-29
Depends: cmm, numDeriv, Rsolnp, R(>= 2.10.0)
License: GPL-2

This package provides an implementation of several fitting procedures for updating a N-dimensional array with respect to given target marginal distributions. Those targets can also multi-dimensional. The available methods are listed herehunder.

The function Estimate provides an interface to these two methods. Each of them returns an object of class mipfp, but Estimate should be the preferred constructor.

The package provides several methods and functions to extract various information from the resulting object such as as the variance-covariance matrix of the estimated cell probabilities or counts using either the Lang's (2004) or the Delta method (Little and Wu, 1991) (vcov), the confidence interval of the estimates (confint), the comparison of the deviations (CompareMaxDev), etc. Note that the functions starting with a lower case are S3 methods for objects of class mipfp while the one starting with an upper case are general functions.

The package also includes an application of the IPFP to simulate and estimate the parameters of multivariate Bernoulli distributions, respectively in the functions RMultBinary and ObtainMultBinaryDist. In addition, the functions Corr2Odds, Odds2Corr, Corr2PairProbs, Odds2PairProbs are in turn responsible for converting correlation to odds ratio, odds ratio to correlation, correlation to pairwise probability and odds ratio to pairwise probability.

Finally, auxillary functions are also provided. expand expands a multi-dimensional contingency table (stored in table) into a data frame of individual recors. Array2Vector and Vector2Array transforms an array to a vector and vice-versa. flat flattens multi-dimensional objects for pretty printing. The function GetLinInd extracting the linearly independant columns from a matrix (using QR decomposition) and returning a matrix of full rank is also provided.

Author(s)

Johan Barthelemy and Thomas Suesse.

Maintainer: Johan Barthelemy johan@uow.edu.au.

References

Bacharach, M. (1965). Estimating Nonnegative Matrices from Marginal Data. International Economic Review (Blackwell Publishing) 6 (3): 294-310.

Barthelemy, J., Suesse, T. (2018). mipfp: An R Package for Multidimensional Array Fitting and Simulating Multivariate Bernoulli Distributions. Journal of Statistical Software, Code Snippets 86 (2): 1-20, doi: 10.18637/jss.v086.c02.

Bishop, Y. M. M., Fienberg, S. E., Holland, P. W. (1975). Discrete Multivariate Analysis: Theory and Practice. MIT Press. ISBN 978-0-262-02113-5.

Deming, W. E., Stephan, F. F. (1940). On a Least Squares Adjustment of a Sampled Frequency Table When the Expected Marginal Totals are Known. Annals of Mathematical Statistics 11 (4): 427-444.

Fienberg, S. E. (1970). An Iterative Procedure for Estimation in Contingency Tables. Annals of Mathematical Statistics 41 (3): 907-917.

Golub, G. H., Van Loan C. F. (2012) Matrix Computations. Third Edition. Johns Hopkins University Press.

Lang, J.B. (2004) Multinomial-Poisson homogeneous models for contingency tables. Annals of Statistics 32(1): 340-383.

Lee, A.J. (1993). Generating Random Binary Deviates Having Fixed Marginal Distributions and Specified Degrees of Association The American Statistician 47 (3): 209-215.

Little, R. J., Wu, M. M. (1991) Models for contingency tables with known margins when target and sampled populations differ. Journal of the American Statistical Association 86 (413): 87-95.

Qaqish, B. F., Zink, R. C., and Preisser, J. S. (2012). Orthogonalized residuals for estimation of marginally specified association parameters in multivariate binary data. Scandinavian Journal of Statistics 39, 515-527.

Stephan, F. F. (1942). Iterative method of adjusting frequency tables when expected margins are known. Annals of Mathematical Statistics 13 (2): 166-178.

See Also

ipfp for a package implementing the ipfp to solve problems of the form Ax = b.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# generation of an intial 2-ways table to be updated
seed <- array(1, dim=c(2, 2))
# desired targets (margins)
target.row <- c(87, 13)
target.col <- c(52, 48)
# storing the margins in a list
target.data <- list(target.col, target.row)
# list of dimensions of each marginal constrain
target.list <- list(1, 2)
# calling the fitting methods
r.ipfp <- Ipfp(seed, target.list, target.data)
r.ml <- ObtainModelEstimates(seed, target.list, target.data, method = "ml")
r.chi2 <- ObtainModelEstimates(seed, target.list, target.data, method = "chi2")
r.lsq <- ObtainModelEstimates(seed, target.list, target.data, method = "lsq")

Example output

Loading required package: cmm
Loading required package: Rsolnp
Loading required package: numDeriv

mipfp documentation built on May 2, 2019, 6:01 a.m.