va: VA:Software for Analyzing Verbal Autopsy Data

Description Usage Arguments Details Value References

View source: R/va_base.R

Description

Estimates cause-specific mortality rates in a population where a set of dichotomous symptoms are available, using the relationship between symptoms and a multicategory cause-of-death variable collected from a nearby medical facility. Estimation is nonparametric.

Usage

1
2
3
va(formula, data=list(hospital=NA,community=NA), nsymp=16, n.subset=300,
method=``quadOpt'', fix=NA, bound=NA, prob.wt=1, boot.se=FALSE,
nboot=300, printit=TRUE, print.reg.size=FALSE, predict.S=FALSE)

Arguments

formula

a formula object. The left side of the formula is the collection of symptoms. The right side is the cause of death. For example, if there are 5 symptoms, named fever,coughing,chestpain,dizziness, shortbreath, and the cause of death variable is death, then the formula can be written as:

formula=cbind(fever, coughing, chestpain, dizziness, shortbreath)~death

or for short as: formula=cbind(fever, ... ,shortbreath)~death

Note that the short way of writing formula requires the symptoms variables are located in a consecutive block in the data starting from fever and ending with shortbreath. Note that the current version requires the varible on the right hand side of the formula, death in this example, to be present in the community sample. If it is unknown in the community sample, the user needs to create such variable with arbitrary numerical values.

data

a list of two datasets. The first is the hospital data, which contains the known cause of death for each individual, and a collection of symptoms from verbal autopsy studies. The second is the community data where typically only the symptoms are available. The known cause of death can be available outside hospital if it is a validation study, but it will not be used during estimation. Variable names must be exactly the same in two data sets.

nsymp

a positive integer, specifing the size of subsets of symptoms drawn from the total set for estimating cause specific mortality fractions at each iteration. nsymp can be found calling va_gcv, which use general cross-validation method to find the optimal size of subset that minimize the prediction errors based on the training data(typically, hospital data). For more details, refer to King and Lu (2006). For practical purpose, we give the following recommendataions: for total number of causes of death D<=10, use 7-12 symptoms; for D>10, use 12-18 symptoms. If the number of obserations is large in both hospital and community samples, for exmaple, over 1000 cases total, use more symptoms, otherwise use fewer. Sentivitity analysis can also be used to choose nsymp. In general, the results stabilize in the right range of the choices of nsymp. default=16.

n.subset

A positive integer specifing the total number of subsets and thus estimations of all symptoms. The default is 300.

method

A string specifying the computational procedure used to estimate the cause specific mortality fractions. When method=''quadOpt'', CSMF is estimated via constrained quadratic programming. A subroutine (Solve.QP) from the quadprog package is called to perform the constrained quadratic optimization task. When method=``constrainLS'', CSMF is estimated via constrained least squares. The default method is quadprog as it is faster and more stable.

fix

A vector of strings that specifies whether a subset of the cause specific mortality fractions are set to predetermined values (based on, e.g.,the information obtained from other sources). Suppose we would like to prefix ”d1” to be 5%, ”d2” to be 15%, then fix=c("d1=0.05", "d2=0.15"). The default is NA, no such constrain is imposed.

bound

A vector of strings that specifies lower and upper bounds of a subset of the cause specific mortality fractions (based on, e.g.,the information obtained from other sources). Suppose we would like ”d3” to be estimated between 5% and 10%, "d4" to be between 1% and 2%, then bound=c("0.05<d3<0.1", "0.01<d4<0.02"). The default is NA, no such constrain is imposed.

prob.wt

A positive integer or a vector of weights that determines how likely a symptom is of being selected for a subset. When prob.wt is a user input vector, it needs to be a vector of probabilities and sum up to 1. The length of prob.wt needs to be equal to the total number of symptoms. When prob.wt=1, binomial weights which are proportion to the inverse of variances of the each reported binary symptom variable. When prob.wt=0, all symptoms will be equally selected. The default is 1.

boot.se

a Logical value. If TRUE, bootstrap standard errors of the CSMF are estimated. This typically takes a lot of computing time. Default=FALSE.

nboot

a positive integer. If boot.se=TRUE, it specifies the number of bootstrapping samples taken to estimate the standard errors of CSMF. The default is 300.

printit

Logical value. If TRUE, the progress of the estimation procedure will be printed on the screen.

print.reg.size

Logical value. If TRUE, the size of the regression matrix is printed at each step of subsampling. It provides helpful information for user to choose the number of symptoms to subsample. It is recommended to print the size of the regression matrix for different values of nsymp with a small size of n.subset.

predict.S

Logical value. If TRUE, the predicted probabilities of each symptom in the community sample are estimated. If boot.se is TRUE, predict.S is a matrix with nboot rows and as many columns as the number of total symptoms used in formula.If boot.se is FALSE, predict.S is a vector of the number of the symptoms. The default is TRUE.

Details

For details, please refer to "Verbal Autposy Methods with Multiple Causes of Death"(King and Lu, 2008), and http:\gking.harvard.edu\va

Value

va outputs a list containing the estimated cause-specific mortality fractions est.CSMF, and the true cause-specific mortality fractions true.CSMF, whenever available.

If boot.se=TRUE, the bootstrapping estimations of est.CSMF and their standard errors CSMF.se are reported.

When the causes of death are observed in validation studies, the bootstrapping mean(true.CSMF.bootmean) and standard error(true.bootse) of the sample CSMF are also reported.

References

King, Gary and Ying Lu. (2008) “Verbal Autopsy Methods with Multiple Causes of Death”, Statistical Science, 14(1). Also available at http:gking.harvard.edu/va


iqss-research/VA-package documentation built on Dec. 20, 2021, 7:58 p.m.