correct_typos: Correct typos in restricted numeric data

Description Usage Arguments Value Details References Examples

Description

Attempt to fix violations of linear (in)equality restrictions imposed on a record by replacing values with values that differ from the original values by typographical errors.

Usage

1
2
3
4
correct_typos(dat, x, ...)

## S4 method for signature 'data.frame,validator'
correct_typos(dat, x, fixate = NULL, eps = 1e-08, maxdist = 1, ...)

Arguments

dat

An R object holding numeric (integer) data.

x

An R object holding linear data validation rules

...

Options to be passed to stringdist which is used to determine the typographic distance between the original value and candidate solutions. By default, the optimal string alignment distance is used, with all weights equal to one.

fixate

[character] vector of variable names that may not be changed

eps

[numeric] maximum roundoff error

maxdist

[numeric] maximum allowd typographical distance

Value

dat, with values corrected.

Details

The algorithm works by proposing candidate replacement values and checking whether they are likely to be the result of a typographical error. A value is accepted as a solution when it resolves at least one equality violation. An equality restriction a.x=b is considered satisfied when abs(a.x-b)<eps. Setting eps to one or two units of measurement allows for robust typographical error detection in the presence of roundoff-errors.

The algorithm is meant to be used on numeric data representing integers.

References

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
library(validate)

# example from section 4 in Scholtus (2009)

v <-validate::validator( 
   x1 + x2 == x3
 , x2 == x4
 , x5 + x6 + x7 == x8
 , x3 + x8 == x9
 , x9 - x10 == x11
 )
 

dat <- read.csv(textConnection(
"x1, x2 , x3  , x4 , x5 , x6, x7, x8 , x9   , x10 , x11
1452, 116, 1568, 116, 323, 76, 12, 411,  1979, 1842, 137
1452, 116, 1568, 161, 323, 76, 12, 411,  1979, 1842, 137
1452, 116, 1568, 161, 323, 76, 12, 411, 19979, 1842, 137
1452, 116, 1568, 161,   0,  0,  0, 411, 19979, 1842, 137
1452, 116, 1568, 161, 323, 76, 12,   0, 19979, 1842, 137"
))
cor <- correct_typos(dat,v)
dat - cor

deductive documentation built on March 29, 2021, 5:12 p.m.