README.md

distrom

CRAN_Status_Badge Total
Downloads

The R package distrom contains functions for computing a distributed multinomial regression. The main function is dmr() which takes a matrix of covars and a matrix of multinomial counts as input. Independent Poisson log regressions of the form counts ~ covars are then fit for each multinomial count. These independent Poisson log regressions are estimated in parallel using the parallel and gamlr packages which allows for easy in-memory parallelization and distribution across multiple machines. This parallelization is essential for use cases such as text analysis where the counts matrix consists of many tokenized documents and can grow to billions of observations. In the text analysis use case token counts are modeled as arising from a multinomial distribution that is dependent upon the article attributes contained in the covars matrix.

To cite this package, use “Taddy (2015), Distributed Multinomial Regression, Annals of Applied Statistics”.

Links

For a description of the functions in the distrom package please read the reference manual: distrom manual

For a detailed explanation of distributed multinomial regression and example use cases see: Taddy (2015), Distributed Multinomial Regression, Annals of Applied Statistics

For information on the related gamlr package please read the gamlr manual or visit the gamlr repository.

For information on the related textir package please read the textir manual or visit the textir repository.

Installation

To install the stable version from CRAN:

install.packages("distrom")

To install the development version from GitHub:

# install.packages("remotes")
remotes::install_github("TaddyLab/distrom")


TaddyLab/distrom documentation built on April 6, 2022, 3:47 p.m.