replace_errors: Replace erroneous fields with NA or a suggested value

replace_errorsR Documentation

Replace erroneous fields with NA or a suggested value

Description

Find erroneous fields using locate_errors() and replace these fields automatically with NA or a suggestion that is provided by the error detection algorithm.

Usage

replace_errors(
  data,
  x,
  ref = NULL,
  ...,
  cl = NULL,
  Ncpus = getOption("Ncpus", 1),
  value = c("NA", "suggestion")
)

## S4 method for signature 'data.frame,validator'
replace_errors(
  data,
  x,
  ref = NULL,
  ...,
  cl = NULL,
  Ncpus = getOption("Ncpus", 1),
  value = c("NA", "suggestion")
)

## S4 method for signature 'data.frame,ErrorLocalizer'
replace_errors(
  data,
  x,
  ref = NULL,
  ...,
  cl = NULL,
  Ncpus = getOption("Ncpus", 1),
  value = c("NA", "suggestion")
)

## S4 method for signature 'data.frame,errorlocation'
replace_errors(
  data,
  x,
  ref = NULL,
  ...,
  cl = NULL,
  Ncpus = 1,
  value = c("NA", "suggestion")
)

Arguments

data

data to be checked

x

validator() or errorlocation object. If an errorlocation is already available (through locate_errors()) this is more efficient.

ref

optional reference data set

...

these parameters are handed over to locate_errors()

cl

optional cluster for parallel execution (see details)

Ncpus

number of nodes to use. (see details)

value

NA

Details

Note that you can also use the result of locate_errors() with replace_errors. When the procedure takes a long time and locate_errors was called previously this is the preferred way, because otherwise locate_errors will be executed again. The errors that were removed from the data.frame can be retrieved with the function errors_removed(). For more control over error localization see locate_errors().

replace_errors has the same parallelization options as locate_errors() (see there).

Value

data with erroneous values removed.

Note

In general it is better to replace the erroneous fields with NA and apply a proper imputation method. Suggested values from the error localization method may introduce an undesired bias.

See Also

errorlocation-class()

Other error finding: errorlocation-class, errors_removed(), expand_weights(), locate_errors()

Examples

rules <- validator( profit + cost == turnover
              , cost - 0.6*turnover >= 0
              , cost>= 0
              , turnover >= 0
)
data <- data.frame(profit=755, cost=125, turnover=200)

data_no_error <- replace_errors(data,rules)

# faulty data was replaced with NA
data_no_error

errors_removed(data_no_error)

# a bit more control, you can supply the result of locate_errors
# to replace_errors, which is a good thing, otherwise replace_errors will call
# locate_errors internally.
error_locations <- locate_errors(data, rules)
replace_errors(data, error_locations)

data-cleaning/errorlocate documentation built on Oct. 1, 2023, 1:04 p.m.