CheckInputData: Checking input data for errors and inconsistencies

View source: R/CheckInputData.R

CheckInputDataR Documentation

Checking input data for errors and inconsistencies

Description

Test the input data is adequately formatted for use with PoolPrev() and HierPoolPrev().

Usage

CheckInputData(data, result, poolSize, ...)

Arguments

data

A data.frame with one row for each pooled sampled and columns for the size of the pool (i.e., the number of specimens / isolates / insects pooled to make that particular pool), the result of the test of the pool. It may also contain additional columns with additional information (e.g. location where pool was taken) which can optionally be used for stratifying the data into smaller groups and calculating prevalence by group (e.g. calculating prevalence for each location).

result

The name of column with the result of each test on each pooled sample. The result must be stored with 1 indicating a positive test result and 0 indicating a negative test result.

poolSize

The name of the column with number of specimens/isolates/insects in each pool.

...

Optional name(s) of columns with variables to stratify the data by.

Value

Returns data invisibly, using invisible(x)

This function is used to check the input data for formatting problems including:

  • Incorrect class of the input data

  • incorrect class of result and poolSize columns

  • Missing columns

  • Missing values in rows

  • Invalid values in rows

If any problems are detected, an error or warning will be raised describing the issue.

See Also

PrepareClusterData, SimpleExampleData, PoolPrev, HierPoolPrev

Examples

# Check whether the SimpleExampleData is formatted 
# appropriately for estimating prevalence in 
# PoolTestR
SimpleExample_output <- 
  CheckInputData(
    data = SimpleExampleData, 
    result = "Result", poolSize = "NumInPool"
  )
# No errors/warnings were raised
identical(SimpleExample_output, SimpleExampleData)
# The hierarchical scheme is formatted properly so
# the output is identical to the input


## Not run: 
  # Error raised when input data is not class data.frame
  CheckInputData(
    data = 1, 
    result = "Result", poolSize = "NumInPool"
  )
  # Error raised when result/poolSize column names are incorrect
  CheckInputData(
    data = SimpleExampleData, 
    result = "WrongResultName", poolSize = "WrongNumInPoolName"
  )
  # Error raised when optional stratifying variable column names are incorrect
  CheckInputData(
    data = SimpleExampleData, 
    result = "Result", poolSize = "NumInPool",
    "WrongRegionName", "WrongYearName"
  )
  # Error raised when Result/poolSize columns are not numeric/integer
  CheckInputData(
    data = SimpleExampleData %>%
      mutate(Result = as.character(.data$Result),
             .keep = "all"), 
    result = "Result", poolSize = "NumInPool"
  )
  CheckInputData(
    data = SimpleExampleData %>%
      mutate(NumInPool = as.character(.data$NumInPool),
             .keep = "all"), 
    result = "Result", poolSize = "NumInPool"
  )
  # Error raised when Result column values are not numeric 0 and 1
  CheckInputData(
    data = SimpleExampleData %>%
      mutate(Result = 2,
             .keep = "all"), 
    result = "Result", poolSize = "NumInPool"
  )
  # Error raised when poolSize column values are not positive
  CheckInputData(
    data = SimpleExampleData %>%
      mutate(NumInPool = (-1*.data$NumInPool),
             .keep = "all"), 
    result = "Result", poolSize = "NumInPool"
  )

## End(Not run)


AngusMcLure/PoolTestR documentation built on Jan. 16, 2025, 4:35 p.m.