internalCheckNBC: Check arguments for nbc()

View source: R/nbc4va_validation.R

internalCheckNBCR Documentation

Check arguments for nbc()

Description

Performs checks to ensure that the arguments passed to internalNBC are correct. This function will also auto-clean when appropriate, and display warning messages of the cleaning tasks.

Usage

internalCheckNBC(train, test, known = TRUE, assume = FALSE, unknown = 99)

Arguments

train

Dataframe of verbal autopsy train data (See Data documentation).

  • Columns (in order): ID, Cause, Symptom-1 to Symptom-n..

  • ID (vectorof char): unique case identifiers

  • Cause (vectorof char): observed causes for each case

  • Symptom-n.. (vectorsof (1 OR 0)): 1 for presence, 0 for absence, other values are treated as unknown

  • Unknown symptoms are imputed randomly from distributions of 1s and 0s per symptom column; if no 1s or 0s exist then the column is removed

Example:

ID Cause S1 S2 S3
"a1" "HIV" 1 0 0
"b2" "Stroke" 0 0 1
"c3" "HIV" 1 1 0
test

Dataframe of verbal autopsy test data in the same format as train except if causes are not known:

  • The 2nd column (Cause) can be omitted if known is FALSE

known

TRUE to indicate that the test causes are available in the 2nd column and FALSE to indicate that they are not known

assume

TRUE to set all symptoms not equal to 1 as 0 and FALSE to raise error if symptoms are not 0 or 1. This takes priority over unknown.

unknown

A single integer value which determines if a symptom is unknown as to if is present or absent.

  • The unknown values are substituted according to the proportion of the 1s and 0s per column

  • Setting this to NULL will ignore this substitution

  • All other values that are not the unknown value or 1 will be set to 0 after the substition

Details

The following checks are applied to train and test to ensure they:

  • are a dataframe

  • have required number of rows and columns

  • have required data types for each column

  • have required symptom values

  • are in the same format

  • have unique ids

Value

out A list object containing the checked inputs:

  • $train: dataframe of id, cause and symptoms

  • $test: dataframe of id, cause and symptoms in the same format as train

  • $known: TRUE if the test causes are known or FALSE if not

See Also

Other validation functions: internalCheckNBCSummary()

Examples

library(nbc4va)
data(nbc4vaData)

# Check train and test inputs, error if it does not pass check
train <- nbc4vaData[1:50, ]
test <- nbc4vaData[51:100, ]
checked <- nbc4va::internalCheckNBC(train, test)
train <- checked$train
test <- checked$test


nbc4va documentation built on May 10, 2022, 5:07 p.m.