Create a backtracker object for error localization
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
errorLocalizer(E, x, ...) ## S3 method for class 'editset' errorLocalizer(E, x, ...) ## S3 method for class 'editmatrix' errorLocalizer(E, x, weight = rep(1, length(x)), maxadapt = length(x), maxweight = sum(weight), maxduration = 600, tol = sqrt(.Machine$double.eps), ...) ## S3 method for class 'editarray' errorLocalizer(E, x, weight = rep(1, length(x)), maxadapt = length(x), maxweight = sum(weight), maxduration = 600, ...) ## S3 method for class 'editlist' errorLocalizer(E, x, weight = rep(1, length(x)), maxadapt = length(x), maxweight = sum(weight), maxduration = 600, ...)
a named numerical
Arguments to be passed to other methods (e.g. reliability weights)
maximum number of variables to adapt
maximum weight of solution, if weights are not given, this is equal to the maximum number of variables to adapt.
maximum time (in seconds), for
tolerance passed to
an object of class
backtracker. Each execution of
$searchNext() yields a solution
in the form of a
list (see details). Executing
$searchBest() returns the lowest-weight solution.
When multiple solotions with the same weight are found,
$searchBest() picks one at random.
backtracker object for error localization in numerical, categorical, or mixed data.
This function generates the workhorse program, called by
backtracker can be used to run a branch-and-bound algorithm which finds
the least (weighted) number of variables in
x that need to be adapted so that all restrictions
E can be satisfied. (Generalized principle of Fellegi and Holt (1976)).
The B&B tree is set up so that in in one branche,
a variable is assumed correct and its value subsituted in
E, while in the other
branche a variable is assumed incorrect and
See De Waal (2003), chapter 8 or De Waal, Pannekoek and Scholtus (2011) for
a concise description of the B&B algorithm.
Every call to
<backtracker>$searchNext() returns one solution
list, consisting of
w: The solution weight.
logical indicating whether a variable should be adapted (
TRUE) or not
Every subsequent call leads either to
NULL, in which case either all solutions have been found,
maxduration was exceeded. The property
<backtracker>$maxdurationExceeded indicates if this is
the case. Otherwise, a new solution with a weight
w not higher than the weight of the last found solution
<backtracker>$searchBest() will return the best solution found within
If multiple equivalent solutions are found, a random one is returned.
The backtracker is prepared such that missing data in the input record
x is already
set to adapt, and missing variables have been eliminated already.
The backtracker will crash when
E is an
editarray and one or more values are
not in the datamodel specified by
E. The more user-friendly function
circumvents this. See also
For records with a large numerical range (eg 1-1E9), the error locations represent solutions that
will allow repairing the record to within roundoff errors. We highly recommend that you round near-zero
values (for example, everything
<= sqrt(.Machine$double.eps)) and scale a record with values larger
than or equal to 1E9 with a constant factor.
This method is potentially very slow for objects of class
editset that contain
many conditional restrictions. Consider using
localizeErrors with the option
method="mip" in such cases.
I.P. Fellegi and D. Holt (1976). A systematic approach to automatic edit and imputation. Journal of the American Statistical Association 71, pp 17-25
T. De Waal (2003) Processing of unsave and erroneous data. PhD thesis, Erasmus Research institute of management, Erasmus university Rotterdam. http://www.cbs.nl/nl-NL/menu/methoden/onderzoek-methoden/onderzoeksrapporten/proefschriften/2008-proefschrift-de-waal.htm
T. De Waal, Pannekoek, J. and Scholtus, S. (2011) Handbook of Statistical Data Editing. Wiley Handbooks on Survey Methodology.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81
#### examples with numerical edits # example with a single editrule # p = profit, c = cost, t = turnover E <- editmatrix(c("p + c == t")) cp <- errorLocalizer(E, x=c(p=755, c=125, t=200)) # x obviously violates E. With all weights equal, changing any variable will do. # first solution: cp$searchNext() # second solution: cp$searchNext() # third solution: cp$searchNext() # there are no more solution since changing more variables would increase the # weight, so the result of the next statement is NULL: cp$searchNext() # Increasing the reliability weight of turnover, yields 2 solutions: cp <- errorLocalizer(E, x=c(p=755, c=125, t=200), weight=c(1,1,2)) # first solution: cp$searchNext() # second solution: cp$searchNext() # no more solutions available: cp$searchNext() # A case with two restrictions. The second restriction demands that # c/t >= 0.6 (cost should be more than 60% of turnover) E <- editmatrix(c( "p + c == t", "c - 0.6*t >= 0")) cp <- errorLocalizer(E,x=c(p=755,c=125,t=200)) # Now, there's only one solution, but we need two runs to find it (the 1st one # has higher weight) cp$searchNext() cp$searchNext() # With the searchBest() function, the lowest weifght solution is found at once: errorLocalizer(E,x=c(p=755,c=125,t=200))$searchBest() # An example with missing data. E <- editmatrix(c( "p + c1 + c2 == t", "c1 - 0.3*t >= 0", "p > 0", "c1 > 0", "c2 > 0", "t > 0")) cp <- errorLocalizer(E,x=c(p=755, c1=50, c2=NA,t=200)) # (Note that e2 is violated.) # There are two solutions. Both demand that c2 is adapted: cp$searchNext() cp$searchNext() ##### Examples with categorical edits # # 3 variables, recording age class, position in household, and marital status: # We define the datamodel and the rules E <- editarray(expression( age %in% c('under aged','adult'), maritalStatus %in% c('unmarried','married','widowed','divorced'), positionInHousehold %in% c('marriage partner', 'child', 'other'), if( age == 'under aged' ) maritalStatus == 'unmarried', if( maritalStatus %in% c('married','widowed','divorced')) !positionInHousehold %in% c('marriage partner','child') ) ) E # Let's define a record with an obvious error: r <- c( age = 'under aged', maritalStatus='married', positionInHousehold='child') # The age class and position in household are consistent, while the marital # status conflicts. Therefore, changing only the marital status (in stead of # both age class and postition in household) seems reasonable. el <- errorLocalizer(E,r) el$searchNext()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.