GUCfit: Predicting the Underlying Causes of Death
In dachuwu/TBDtoolbox: The toolbox for Taiwan Burden of Disease

Description Usage Arguments Details Value Methods (by class) References See Also Examples

Fit a redistribution model to predict the Underlying Causes (UCs) from Garbage Codes (GCs). NHIRC-Usable \loadmathjax

## S3 method for class 'formula'
GUCfit(
  formula,
  data,
  gc_to_uc,
  nm_id = "id",
  method = c("NB", "MLR"),
  prop_valid = 0.2,
  ...
)

GUCfit(formula, ...)

## S3 method for class 'GUCfit'
print(x)

## S3 method for class 'GUCfit'
summary(x)

`formula`	an object of class "formula": a symbolic description of the model to be fitted. This should contain an outcome variable (as the underlying causes) and predictor variables if any. Predictor variables put inside `multi()` are recognized as multiple causes. See Details §1 also.
`data`	a data.frame or a list (with equal-length vectors) containing all the variables.
`gc_to_uc`	a named matrix specifying the a priori constraints for GC-UC mapping. The row names are used as the Garbage Code (GC) levels and the column names are used as the Underlying Cause (UC) levels. This is a required argument. See Details §2 also.
`nm_id`	variable name of the identity key for the individual record.
`method`	one of the following redistribution model, See Details §3 also. "NB": Naive Bayes Classifier (default). "MLR": Multinomial Logistic Regression implemented by `nnet::multinom()`.
`prop_valid`	proportion of data used in validation (default to 0.2). See Details §4 also.
`...`	the following optional arguments are passed to redistribution methods. `alp = 0.1` (default): The smoothing parameter in "NB" method , i.e. the additional counts added to all strata of the conditional probabilities. `maxit = 100` (default): Maximum number of iterations in "MLR" method. Additional arguments for `nnet` can also be specified here. See `nnet::nnet()` for details.

§1. Specify the model formula

The formula argument takes the general model form as in lm() or glm(). The form should be like GUC ~ x1 + x2 + multi(MC1, MC2), where GUC is the name of the outcome variable (the underlying causes of death , UCs). The RHS of ~ contains the names of predictors used by the model. Here, x1 and x2 represent the normal predictor variables as in common regression models. multi(...) is used to specify the multiple causes of death (here, MC1 and MC2). multi(...) is treated differently in different methods. In "NB", multi(...) seen as item-sets to calculate the conditional probabilities. See the reference paper for details. In "MLR", multi(...) is transformed into many binary variables indicating whether one cause of death item exist. There should be only one multi(...) term in the formula. Also, only factor or character variables are accepted.

§2. Specify the GC-UC mapping constraints

The GC-UC mapping constraints gc_to_uc should be a named matrix. The row names and column names are required as the row names define the GC categories, and the column names define the UC categories. The entries of this matrix (\mjseqnA) should be binary, so that \mjseqnA_ij = 1 denotes the permission to redistribute \mjseqni-th GC category to \mjseqnj-th UC category, otherwise, \mjseqnA_ij = 0.

§3. Redistributing GCs to UCs

Records with UCs (defined in gc_to_uc) in the outcome variable are used to train the "NB" or "MLR" model. Then, the trained model is used to predict the UCs for those having GC outcomes. Generally, "NB" is recommended as it better handles missing data and large number of UC categories (more accurate and efficient) . However, "MLR" can perform better with more complete data and small number of UC categories. We recommend using validation procedure to compare the two methods before full implementation.

§4. Validation and the error measures

When the proportion of validation (prop_valid = 0.2 by default) is greater than zero, a random proportion of records with UCs is erserved for validation. Binary and cross entropy error measures are used to evaluate the model performance. Use summary() to the returned GUCfit object to see the average errors in the training and validation partition.

A GUCfit object containing the following components.

formula: The formula same as the input
pred_GUC: The the predicted UC probabilities for each GC record. A data.frame where the row identifying the individual records, and the column identifying the UC categories. The key nm_id is preserved to identify individual predictions.
dat_info: A data.frame summarizing the no. of records used for training, validation, and prediction.
error_info: A data.frame summarizing the error measures
fit: The fitted model.
gcs: A character vector listing the GC levels, same as the rownames of gc_to_uc.
ucs: A character vector listing the UC levels, same as the colnames of gc_to_uc.
method: The modeling method.

GUCfit: Print the basics (GC/UC levels, redistribution method) of GUCfit
GUCfit: Print the details (variable importance, errors) of GUCfit

Ng, T. C., Lo, W. C., Ku, C. C., Lu, T. H., & Lin, H. H. (2020). Improving the use of mortality data in public health: A comparison of garbage code redistribution models. American journal of public health, 110(2), 222-229.

multideath for the demo dataset, nnet::multinom for the underlying MLR method.

## Not run: 
# load demo dataset
data("multideath")

# create a full gc_to_uc matrix
gucs <- sort(unique(multideath$GUC))
gc_to_uc = matrix(1, 10, 97, dimnames = list(gucs[98:107], gucs[1:97]))

# predictors have to be factors or characters
d <- multideath
d$x1 <- factor(d$x1)
d$x2 <- factor(d$x2)
d$x3 <- factor(d$x3)

# fit a NB model
fit1 <- GUCfit(
  formula = GUC ~ age + x1  + x2  + x3 + multi(MC1, MC2, MC3),
  data = d, gc_to_uc = gc_to_uc,
  nm_id = "id", method = "NB", prop_valid = 0.2)

# summarizing the results
summary(fit1)

## End(Not run)

dachuwu/TBDtoolbox documentation built on Dec. 27, 2021, 8:11 p.m.

dachuwu/TBDtoolbox index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

dachuwu/TBDtoolbox
The toolbox for Taiwan Burden of Disease

GUCfit: Predicting the Underlying Causes of Death
In dachuwu/TBDtoolbox: The toolbox for Taiwan Burden of Disease

Description

Usage

Arguments

Details

§1. Specify the model formula

§2. Specify the GC-UC mapping constraints

§3. Redistributing GCs to UCs

§4. Validation and the error measures

Value

Methods (by class)

References

See Also

Examples

Related to GUCfit in dachuwu/TBDtoolbox...

R Package Documentation

Browse R Packages

We want your feedback!

dachuwu/TBDtoolbox The toolbox for Taiwan Burden of Disease

GUCfit: Predicting the Underlying Causes of Death In dachuwu/TBDtoolbox: The toolbox for Taiwan Burden of Disease

Description

Usage

Arguments

Details

§1. Specify the model formula

§2. Specify the GC-UC mapping constraints

§3. Redistributing GCs to UCs

§4. Validation and the error measures

Value

Methods (by class)

References

See Also

Examples

Related to GUCfit in dachuwu/TBDtoolbox...

R Package Documentation

Browse R Packages

We want your feedback!

dachuwu/TBDtoolbox
The toolbox for Taiwan Burden of Disease

GUCfit: Predicting the Underlying Causes of Death
In dachuwu/TBDtoolbox: The toolbox for Taiwan Burden of Disease