correctTypos: Correct records under linear restrictions using typographical...
In deducorrect: Deductive Correction, Deductive Imputation, and Deterministic Correction

Description Usage Arguments Details Value References See Also Examples

This algorithm tries to detect and repair records that violate linear equality constraints by correcting simple typo's as described in Scholtus (2009). The implemention of the detection of typing errors differs in that it uses the restricted Damerau-Levensthein distance. Furthermore it solves a broader class of problems: the original paper describes the class of equalities: Ex=0 (balance edits) and this implementation allows for Ex=a.

correctTypos(E, dat, ...)

## S3 method for class 'editset'
correctTypos(E, dat, ...)

## S3 method for class 'editmatrix'
correctTypos(E, dat, fixate = NULL, cost = c(1, 1, 1,
  1), eps = sqrt(.Machine$double.eps), maxdist = 1, ...)

`E`	`editmatrix` or `editset`
`dat`	`data.frame` with data to be corrected.
`...`	arguments to be passed to other methods.
`fixate`	`character` with variable names that should not be changed.
`cost`	for a deletion, insertion, substition or transposition.
`eps`	`numeric`, tolerance on edit check. Default value is `sqrt(.Machine$double.eps)`. Set to 2 to allow for rounding errors. Set this parameter to 0 for exact checking.
`maxdist`	`numeric`, tolerance used in finding typographical corrections. Default value 1 allows for one error. Used in combination with `cost`.

For each row in dat the correction algorithm first detects if row x violates the equality constraints of E taking possible rounding errors into account. Mathematically: |∑_{i=1}^nE_{ji}x_i - a_j| ≤q \varepsilon,\quad \forall j

It then generates correction suggestions by deriving alternative values for variables only involved in the violated edits. The correction suggestions must be within a typographical edit distance (default = 1) to be selected. If there are more then 1 solutions possible the algorithm tries to derive a partial solution, otherwise the solution is applied to the data.

correctTypos returns an object of class deducorrect object describing the status of the record and the corrections that have been applied.

Inequalities in editmatrix E will be ignored in this algorithm, so if this is the case, the corrected records are valid according to the equality restrictions, but may be incorrect for the given inequalities.

Please note that if the returned status of a record is "partial" the corrected record still is not valid. The partially corrected record will contain less errors and will violate less constraints. Also note that the status "valid" and "corrected" have to be interpreted in combination with eps. A common scenario is first to correct for typo's and then correct for rounding errors. This means that in the first step the algorithm should allow for typo's (e.g. eps==2). The returned "valid" record therefore may still contain rounding errors.

deducorrect object with corrected data.frame, applied corrections and status of the records.

Scholtus S (2009). Automatic correction of simple typing errors in numerical data with balance edits. Discussion paper 09046, Statistics Netherlands, The Hague/Heerlen.

Damerau F (1964). A technique for computer detection and correction of spelling errors. Communications of the ACM, 7,issue 3

Levenshtein VI (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10: 707-10

A good description of the restricted DL-distance can be found on wikipedia: http://en.wikipedia.org/wiki/Damerau

damerauLevenshteinDistance

library(editrules)

# example from section 4 in Scholtus (2009)

E <- editmatrix( c("x1 + x2 == x3"
                  ,"x2 == x4"
                  ,"x5 + x6 + x7 == x8"
                  ,"x3 + x8 == x9"
                  ,"x9 - x10 == x11"
                  )
               )

dat <- read.csv(txt<-textConnection(
"    , x1, x2 , x3  , x4 , x5 , x6, x7, x8 , x9   , x10 , x11
4  , 1452, 116, 1568, 116, 323, 76, 12, 411,  1979, 1842, 137
4.1, 1452, 116, 1568, 161, 323, 76, 12, 411,  1979, 1842, 137
4.2, 1452, 116, 1568, 161, 323, 76, 12, 411, 19979, 1842, 137
4.3, 1452, 116, 1568, 161,   0,  0,  0, 411, 19979, 1842, 137
4.4, 1452, 116, 1568, 161, 323, 76, 12,   0, 19979, 1842, 137"
))
close(txt)
(cor <- correctTypos(E,dat))



# example with editset
E <- editset(expression(
    x + y == z,
    x >= 0,
    y > 0,
    y < 2,
    z > 1,
    z < 3,
    A %in% c('a','b'),
    B %in% c('c','d'),
    if ( A == 'a' ) B == 'b',
    if ( B == 'b' ) x > 3
))

x <- data.frame(
    x = 10,
    y = 1,
    z = 2,
    A = 'a',
    B = 'b'
)

correctTypos(E,x)

deducorrect documentation built on May 2, 2019, 3:47 p.m.

deducorrect index

Package overview deducorrect-vignette

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

deducorrect
Deductive Correction, Deductive Imputation, and Deterministic Correction

correctTypos: Correct records under linear restrictions using typographical...
In deducorrect: Deductive Correction, Deductive Imputation, and Deterministic Correction

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to correctTypos in deducorrect...

R Package Documentation

Browse R Packages

We want your feedback!

deducorrect Deductive Correction, Deductive Imputation, and Deterministic Correction

correctTypos: Correct records under linear restrictions using typographical... In deducorrect: Deductive Correction, Deductive Imputation, and Deterministic Correction

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to correctTypos in deducorrect...

R Package Documentation

Browse R Packages

We want your feedback!

deducorrect
Deductive Correction, Deductive Imputation, and Deterministic Correction

correctTypos: Correct records under linear restrictions using typographical...
In deducorrect: Deductive Correction, Deductive Imputation, and Deterministic Correction