eliminate: Eliminate a variable from a set of edit rules In editrules: Parsing, Applying, and Manipulating Data Cleaning Rules

Description

Eliminating a variable amounts to deriving all (non-redundant) edits not containing that variable. Geometrically, it can be seen as a projection of the solution space (records obeying all edits) along the eliminated variable's axis. If the solution space is non-concex (as is the usually case when conditional edits are involved), multiple projections of convex subregions are performed.

For objects of class `editmatrix`, Fourier-Motzkin elimination is used to eliminate a variable from the of linear (in)equality restrictions. An observation of Kohler (1967) is used to reduce the number of implied restrictions. Obvious redundancies of the type 0 < 1 are removed as well.

For categorical edits in an `editarray`, the elimination method is based on repeated logical reduction on categories. See Van der Loo (2012) for a description.

For an `editset`, `E` is transformed to an `editlist`. Each element of an `editlist` describes a convex subregion of the total solution space of the `editset`. After this, the elimination method for `editlist` is called.

For an `editlist`, the variable is eliminated from each consituting `editset`.

Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13``` ```eliminate(E, var, ...) ## S3 method for class 'editmatrix' eliminate(E, var, ...) ## S3 method for class 'editarray' eliminate(E, var, ...) ## S3 method for class 'editset' eliminate(E, var, ...) ## S3 method for class 'editlist' eliminate(E, var, ...) ```

Arguments

 `E` `editmatrix` or `editarray` `var` name of variable to be eliminated `...` argumemts to be passed to or from other methods

Value

If `E` is an `editmatrix` or `editarray`, an object of the same class is returned. A returned `editmatrix` contains an extra `history` attribute which is used to reduce the number of generated edits in consecutive eliminations (see `getH`). If `E` is an `editset`, an object of class `editlist` is returned.

References

D.A. Kohler (1967) Projections of convex polyhedral sets, Operational Research Center Report , ORC 67-29, University of California, Berkely.

H.P. Williams (1986) Fourier's method of linear programming and its dual, The American Mathematical Monthly 93, 681-695

M.P.J. van der Loo (2012) Variable elimination and edit generation with a flavour of semigroup algebra (submitted).

`substValue`, `isObviouslyInfeasible`, `isObviouslyRedundant`, `generateEdits`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88``` ```# The following is an example by Williams (1986). Eliminating all variables # except z maximizes -4x1 + 5x2 +3x3: P <- editmatrix(c( "4*x1 - 5*x2 - 3*x3 + z <= 0", "-x1 + x2 -x3 <= 2", "x1 + x2 + 2*x3 <= 3", "-x1 <= 0", "-x2 <= 0", "-x3 <= 0")) # eliminate 1st variable (P1 <- eliminate(P, "x1", fancynames=TRUE)) # eliminate 2nd variable. Note that redundant rows have been eliminated (P2 <- eliminate(P1, "x2", fancynames=TRUE)) # finally, the answer: (P3 <- eliminate(P2, "x3", fancynames=TRUE)) # check which original edits were used in deriving the new ones getH(P3) # check how many variables were eliminated geth(P3) # An example with an equality and two inequalities # The only thing to do is solving for x in e1 and substitute in e3. (E <- editmatrix(c( "2*x + y == 1", "y > 0", "x > 0"),normalize=TRUE)) eliminate(E,"x", fancynames=TRUE) # This example has two equalities, and it's solution # is the origin (x,y)=(0,0) (E <- editmatrix(c( "y <= 1 - x", "y >= -1 + x", "x == y", "y ==-2*x" ),normalize=TRUE)) eliminate(E,"x", fancynames=TRUE) # this example has no solution, the equalities demand (x,y) = (0,2) # while the inequalities demand y <= 1 (E <- editmatrix(c( "y <= 1 - x", "y >= -1 + x", "y == 2 - x", "y == -2 + x" ),normalize=TRUE)) # this happens to result in an obviously unfeasable system: isObviouslyInfeasible(eliminate(E,"x")) # for categorical data, elimination amounts to logical derivartions. For # example E <- editarray(expression( age %in% c('under aged','adult'), positionInHousehold %in% c('marriage partner', 'child', 'other'), maritalStatus %in% c('unmarried','married','widowed','divorced'), if (maritalStatus %in% c('married','widowed','divorced') ) positionInHousehold != 'child', if (maritalStatus == 'unmarried') positionInHousehold != 'marriage partner' , if ( age == 'under aged') maritalStatus == 'unmarried' ) ) E # by eliminating 'maritalStatus' we can deduce that under aged persones cannot # be partner in marriage. eliminate(E,"maritalStatus") E <- editarray(expression( age %in% c('under aged','adult'), positionInHousehold %in% c('marriage partner', 'child', 'other'), maritalStatus %in% c('unmarried','married','widowed','divorced'), if (maritalStatus %in% c('married','widowed','divorced') ) positionInHousehold != 'child', if (maritalStatus == 'unmarried') positionInHousehold != 'marriage partner' , if ( age == 'under aged') maritalStatus == 'unmarried' ) ) E # by eliminating 'maritalStatus' we can deduce that under aged persones cannot # be partner in marriage. eliminate(E,"maritalStatus") ```