Correct sign errors and value interchanges in data records

Description

Correct sign errors and value interchanges in data records.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
correctSigns(E, dat, ...)

## S3 method for class 'editset'
correctSigns(E, dat, ...)

## S3 method for class 'editmatrix'
correctSigns(E, dat, flip = getVars(E), swap = list(),
  maxActions = length(flip) + length(swap), maxCombinations = 1e+05,
  eps = sqrt(.Machine$double.eps), weight = rep(1, length(flip) +
  length(swap)), fixate = NA, ...)

Arguments

E

An object of class editmatrix

dat

data.frame, the records to correct.

...

arguments to be passed to other methods.

flip

A character vector of variable names who's values may be sign-flipped

swap

A list of character 2-vectors of variable combinations who's values may be swapped

maxActions

The maximum number of flips and swaps that may be performed

maxCombinations

The number of possible flip/swap combinations in each step of the algorithm is choose(n,k), with n the number of flips+swaps, and k the number of actions taken in that step. If choose(n,k) exceeds maxCombinations, the algorithm returns a record uncorrected.

eps

Tolerance to check equalities against. Use this to account for sign errors masked by rounding errors.

weight

weight vector. Weights can be assigned either to actions (flips and swap) or to variables. If length(weight)==length(flip)+length(swap), weights are assiged to actions, if length(weight)==ncol(E), weights are assigned to variables. In the first case, the first length{flip} weights correspond to flips, the rest to swaps. A warning is issued in the second case when the weight vector is not named. See the examples for more details.

fixate

a character vector with names of variables whos values may not be changed

Details

This algorithm tries to correct records violating linear equalities by sign flipping and/or value interchanges. Linear inequalities are taken into account when judging possible solutions. If one or more inequality restriction is violated, the solution is rejected. It is important to note that the status of a record has the following meaning:

valid The record obeys all equality constraints on entry. No error correction is performed.
It may therefore still contain inequality errors.
corrected Equality errors were found, and all of them are solved without violating inequalities.
partial Does not occur
invalid The record contains equality violations which could not be solved with this algorithm
NA record could not be checked. It contained missings.

The algorithm applies all combinations of (user-allowed) flip- and swap combinations to find a solution, and minimizes the number of actions (flips+swaps) that have to be taken to correct a record. When multiple solutions are found, the solution of minimal weight is chosen. The user may provide a weight vector with weights for every flip and every swap, or a named weight vector with a weight for every variable. If the weights do not single out a solution, the first one found is chosen.

If arguments flip or swap contain a variable not in E, these variables will be ignored by the algorithm.

Value

a deducorrect-object. The status slot has the following columns for every records in dat.

status a status factor, showing the status of the treated record.
degeneracy the number of solutions found, after applying the weight
weight the weight of the chosen solution
nflip the number of applied sign flips
nswap the number of applied value interchanges

References

Scholtus S (2008). Algorithms for correcting some obvious inconsistencies and rounding errors in business survey data. Technical Report 08015, Netherlands.

See Also

deducorrect-object

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# some data 
dat <- data.frame(
    x = c( 3,14,15,  1, 17,12.3),
    y = c(13,-4, 5,  2,  7, -2.1),
    z = c(10,10,-10, NA,10,10 ))
# ... which has to obey
E <- editmatrix(c("z == x-y"))

# All signs may be flipped, no swaps.

correctSigns(E, dat)

# Allow for rounding errors
correctSigns(E, dat, eps=2)

# Limit the number of combinations that may be tested 
correctSigns(E, dat, maxCombinations=2)

# fix z, flip everything else
correctSigns(E, dat,fixate="z")

# the same result is achieved with
correctSigns(E, dat, flip=c("x","y"))

# make x and y swappable, allow no flips
correctSigns(E, dat, flip=c(), swap=list(c("x","y")))

# make x and y swappable, swap a counts as one flip
correctSigns(E, dat, flip="z", swap=list(c("x","y")))

# same, but now, swapping is preferred (has lower weight)
correctSigns(E, dat, flip="z", swap=list(c("x","y")), weight=c(2,1))

# same, but now becayse x any y carry lower weight. Also allow for rounding errors
correctSigns(E, dat, flip="z", swap=list(c("x","y")), eps=2, weight=c(x=1, y=1, z=3))

# demand that solution has y>0
E <- editmatrix(c("z==x-y", "y>0"))
correctSigns(E,dat)

# demand that solution has y>0, taking acount of roundings in equalities
correctSigns(E,dat,eps=2)

# example with editset
E <- editset(expression(
    x + y == z,
    x >= 0,
    y > 0,
    y < 2,
    z > 1,
    z < 3,
    A %in% c('a','b'),
    B %in% c('c','d'),
    if ( A == 'a' ) B == 'b',
    if ( B == 'b' ) x < 1
))

x <- data.frame(
    x = -1,
    y = 1,
    z = 2,
    A = 'a',
    B = 'b'
)

correctSigns(E,x)



   
   

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.