nbc: Train a NBC model

View source: R/nbc4va_main.R

nbcR Documentation

Train a NBC model

Description

Performs supervised Naive Bayes Classification on verbal autopsy data.

Usage

nbc(train, test, known = TRUE)

Arguments

train

Dataframe of verbal autopsy train data (See Data documentation).

  • Columns (in order): ID, Cause, Symptom-1 to Symptom-n..

  • ID (vectorof char): unique case identifiers

  • Cause (vectorof char): observed causes for each case

  • Symptom-n.. (vectorsof (1 OR 0)): 1 for presence, 0 for absence, other values are treated as unknown

  • Unknown symptoms are imputed randomly from distributions of 1s and 0s per symptom column; if no 1s or 0s exist then the column is removed

Example:

ID Cause S1 S2 S3
"a1" "HIV" 1 0 0
"b2" "Stroke" 0 0 1
"c3" "HIV" 1 1 0
test

Dataframe of verbal autopsy test data in the same format as train except if causes are not known:

  • The 2nd column (Cause) can be omitted if known is FALSE

known

TRUE to indicate that the test causes are available in the 2nd column and FALSE to indicate that they are not known

Value

out The result nbc list object containing:

  • $prob.causes (vectorof double): the probabilities for each test case prediction by case id

  • $pred.causes (vectorof char): the predictions for each test case by case id

  • Additional values:

    • * indicates that the value is only available if test causes are known

    • $train (dataframe): the input train data

    • $train.ids (vectorof char): the ids of the train data

    • $train.causes (vectorof char): the causes of the train data by case id

    • $train.samples (double): the number of input train samples

    • $test (dataframe): the input test data

    • $test.ids (vectorof char): the ids of the test data

    • $test.causes* (vectorof char): the causes of the test data by case id

    • $test.samples (double): the number of input test samples

    • $test.known (logical): whether the test causes are known

    • $symptoms (vectorof char): all unique symptoms in order

    • $causes (vectorof char): all possible unique causes of death

    • $causes.train (vectorof char): all unique causes of death in the train data

    • $causes.test* (vectorof char): all unique causes of death in the test data

    • $causes.pred (vectorof char): all unique causes of death in the predicted cases

    • $causes.obs* (vectorof char): all unique causes of death in the observed cases

    • $pred (dataframe): a table of predictions for each test case, sorted by probability

      • Columns (in order): CaseID, TrueCause, Prediction-1 to Prediction-n..

      • CaseID (vectorof char): case identifiers

      • TrueCause* (vectorof char): the observed causes of death

      • Prediction-n.. (vectorsof char): the predicted causes of death, where Prediction1 is the most probable cause, and Prediction-n is the least probable cause

      Example:

      CaseID Prediction1 Prediction2
      "a1" "HIV" "Stroke"
      "b2" "Stroke" "HIV"
      "c3" "HIV" "Stroke"
    • $obs* (dataframe): a table of observed causes matching $pred for each test case

      • Columns (in order): CaseID, TrueCause

      • CaseID (vectorof char): case identifiers

      • TrueCause (vectorof char): the actual cause of death if applicable

      Example:

      CaseID TrueCause
      "a1" "HIV"
      "b2" "Stroke"
      "c3" "HIV"
    • $obs.causes* (vectorof char): all observed causes of death by case id

    • $prob (dataframe): a table of probabilities of each cause for each test case

      • Columns (in order): CaseID, Cause-1 to Cause-n..

      • CaseID (vectorof char): case identifiers

      • Cause-n.. (vectorsof double): probabilies for each cause of death

      Example:

      CaseID HIV Stroke
      "a1" 0.5 0.5
      "b2" 0.3 0.7
      "c3" 0.9 0.1

References

  • Miasnikof P, Giannakeas V, Gomes M, Aleksandrowicz L, Shestopaloff AY, Alam D, Tollman S, Samarikhalaj, Jha P. Naive Bayes classifiers for verbal autopsies: comparison to physician-based classification for 21,000 child and adult deaths. BMC Medicine. 2015;13:286. doi:10.1186/s12916-015-0521-2.

See Also

Other main functions: plot.nbc(), print.nbc_summary(), summary.nbc()

Examples

library(nbc4va)
data(nbc4vaData)

# Run naive bayes classifier on random train and test data
# Set "known" to indicate whether or not "test" causes are known
train <- nbc4vaData[1:50, ]
test <- nbc4vaData[51:100, ]
results <- nbc(train, test, known=TRUE)

# Obtain the probabilities and predictions
prob <- results$prob.causes
pred <- results$pred.causes


rrwen/nbc4va documentation built on May 11, 2022, 9:45 p.m.