dm_nbinom: Create a data model based on a negative binomial distribution
In ONSdigital/Bayesian-demographic-accounts: Bayesian Demographic Accounts

Description Usage Arguments Details Value Examples

TODO - EDIT THIS

1	dm_nbinom(data, ratio, disp, nm_series, nm_data = NULL)

`data`	A data frame, described in `data-arg`.
`ratio`	A data frame, identical to `data`, except that the `"count"` variable is replaced by a `"ratio"` variable giving expected coverage ratios.
`disp`	A single number or a data frame. If a data frame, it is identical to `data`, except that the `"count"` variable is replaced by a `"disp"` variable giving values for dispersion.
`nm_series`	The name of the demographic series that `data` describes.
`nm_data`	The name of the dataset. If no value supplied, then `nm_data` is assumed to equal the name of the object supplied as the `data` argument.

Create a data model where the reported value has a negative binomial distribution. The negative binomial distribution has a mean mean-dispersion parameterisation.

ratio and disp can both be data frames or single numbers

ratio can be zero, but disp cannot. Neither can be negative.

The "ratio" column in data frame ratio gives expected coverage ratios, that is, the number of people or events that the dataset is expected to report for each actual person or event. If ratio$ratio[i] is the coverage ratio, and true$count[i] is the true number of people or events, then the expected value for data$count[i] is ratio$ratio[i] * true$count[i].

All elements ratio$ratio must be non-negative, and can only be NA if the corresponding value of data$data is.

The disp argument measures the amount of dispersion beyond what would be expected for a Poisson distribution. It equals the reciprocal of the size argument in NegBinomial Setting disp to 0 is equivalent to having Poisson variance, and setting disp to a higher number induces greater variable. In general, the less reliable the data source, the higher disp should be.

disp can be a single number, in which case all values of data have the same dispersion, or it can be a data frame with a column called "disp".

If disp is a single number, it must be non-negative, and cannot be NA. If disp is a data frame, all elements disp$disp must be non-negative, and can only be NA if the corresponding value of data$data is.d

If ratio or disp are data frames, then they do not need to have all the variables that are in data. Values for ratio or disp are assumed to be constant across the missing variables. For instance, if disp does not have a time variable, then values for dism are assumed to be constant across time.

If ratio and disp are data frames, then every row in data must map on to them. However, not every row in ratio and disp needs to map on to a row in data: any rows that do not map on to data are silently dropped.

An object of class "dm_nbinom".

## Use a constant ratio across all categories
## but use higher dispersion for males than for females,
## and higher dispersion for ages 20-29 than for
## other age groups.
reg_popn <- account::gl_reg_popn
ratio <- 1
disp <- within(reg_popn, {
  rm(count)
  disp <- ifelse(gender == "Female", 1.1, 1.2)
  disp <- ifelse(age %in% 20:29, disp * 1.3, disp)
})
reg_popn_dm <- dm_nbinom(data = reg_popn,
                         ratio = ratio,
                         disp = disp,
                         nm_series = "population")
reg_popn_dm