nbc4vaIO: Run nbc4va using file input and output

View source: R/nbc4va_utility.R

nbc4vaIOR Documentation

Run nbc4va using file input and output

Description

Runs nbc and uses summary.nbc on input data files or dataframes to output result files or dataframes with data on predictions, probabilities, causes, and performance metrics in an easily accessible way.

Usage

nbc4vaIO(
  trainFile,
  testFile,
  known = TRUE,
  csmfaFile = NULL,
  saveFiles = TRUE,
  outDir = dirname(testFile),
  fileHeader = strsplit(basename(testFile), "\\.")[[1]][[1]],
  fileReader = read.csv,
  fileReaderIn = "file",
  fileReaderArgs = list(as.is = TRUE),
  fileWriter = write.csv,
  fileWriterIn = "x",
  fileWriterOut = "file",
  fileWriterArgs = list(row.names = FALSE),
  outExt = "csv"
)

Arguments

trainFile

A character value of the path to the data to be used as the train argument for nbc or a dataframe of the train argument.

testFile

A character value of the path to the data to be used as the test argument for nbc or a dataframe of the test argument.

known

TRUE to indicate that the test causes are available in the 2nd column and FALSE to indicate that they are not known

csmfaFile

A character value of the path to the data to be used as the csmfa.obs argument for summary.nbc or a named vector of the csmfa.obs argument.

  • If (csmfaFile is char): the file must have only 1 column of the causes per case

saveFiles

Set to TRUE to save the return object as files or FALSE to return the actual object

outDir

A character value of the path to the directory to store the output results files.

fileHeader

A character value of the file header name to use for the output results files.

  • The default is to use the name of the testFile

fileReader

A function that is able to read the trainFile and the testFile.

  • The default is set to read csv files using read.csv

fileReaderIn

A character value of the fileReader argument name that accepts a file path for reading as an input.

fileReaderArgs

A list of the fileReader arguments to be called with do.call.

fileWriter

A function that is able to write data.frame objects to a file location.

  • The default is set to write csv files using write.csv

fileWriterIn

A character value of the fileWriter argument name that accepts a dataframe for writing.

fileWriterOut

A character value of the fileWriter argument name that accepts a file path for writing as an output.

fileWriterArgs

A list of arguments of the fileWriter arguments to be called with do.call.

outExt

A character value of the extension (without the period) to use for the result files.

  • The default is set to use the "csv" extension

  • The default is the directory of the testFile

Details

See Methods documentation for details on the methodology and implementation of the Naive Bayes Classifier algorithm. This function may also act as a wrapper for the main nbc4va package functions.

Value

out Vector or list of respective paths or data from the naive bayes classifier:

  • If (saveFiles is TRUE) return a named character vector of the following:

    • Names: dir, pred, prob, causes, summary

    • dir (char): the path to the directory of the output files

    • pred (char): the path to the prediction table file, where the columns of Pred1..PredN are ordered by the prediction probability with Pred1 being the most probable cause

    • prob (char): the path to the probability table file, where the columns excluding the CaseID are the cause and each cell has a probability value

    • causes (char): the path to the cause performance metrics table file, where each column is a metric and each row is a cause

    • metrics (char): the path to the overall performance metrics table file, where each column is a metric

  • If (saveFiles is FALSE) return a list of the following:

    • Names: pred, prob, causes, summary

    • pred (dataframe): the prediction table, where the columns of Pred1..PredN are ordered by the prediction probability with Pred1 being the most probable cause

    • prob (dataframe): the probability table, where the columns excluding the CaseID are the cause and each cell has a probability value

    • causes (dataframe): the cause performance metrics table, where each column is a metric and each row is a cause

    • metrics (dataframe): the summary table, where each column is a performance metric

    • nbc (object): the returned nbc object

    • nbc_summary (object): the returned summary.nbc object

See Also

Other utility functions: nbc4vaGUI()

Examples

library(nbc4va)
data(nbc4vaData)

# Split data into train and test sets
train <- nbc4vaData[1:50, ]
test <- nbc4vaData[51:100, ]

# Save train and test data as csv in temp location
trainFile <- tempfile(fileext=".csv")
testFile <- tempfile(fileext=".csv")
write.csv(train, trainFile, row.names=FALSE)
write.csv(test, testFile, row.names=FALSE)

# Use nbc4vaIO via file input and output
# Set "known" to indicate whether test causes are known
outFiles <- nbc4vaIO(trainFile, testFile, known=TRUE)

# Use nbc4vaIO as a wrapper
out <- nbc4vaIO(train, test, known=TRUE, saveFiles=FALSE)


nbc4va documentation built on May 10, 2022, 5:07 p.m.