Description Usage Arguments Details Value References See Also Examples
add description about this function
1 |
Y.original |
original dataset of (n, p) dimension with missing and edit-failing values where n is the number of records and p is the number of variables. |
ratio |
ratio edit. |
range |
range restriction. |
balance |
balance edit. |
eps.bal |
threshold for balance edit. Defaults to 0.6. |
Y.original
has n records and p variables. The variable names (column names) of Y.original
are used to specify ratio
edits.
The edit rules are either imported from text files or written by editrules package's syntax.
A balance edit is considered as two inequality constraints with the threshold, i.e., ‘A = B’ is converted to ‘-eps.bal < A - B < eps.bal’ before computation.
For accurate computation, nested balances are written as ‘total variable = sum of component variables’. For example, it is recommended to replace ‘X1 = X2 + X3’ and ‘X3 = X4 + X5’ with ‘X1 = X2 + X4 + X5’ and ‘X3 = X4 + X5’ so that ‘X3’ does not appear both sides of the balance edits.
readData returns an EditIn.data
object which consists of
Y.input
: input dataset which replaces NA in Y.original
with -999 and zero values with 0.01.
Edit.editmatrix
: editmatrix
of edit rules. It can be used for functions of editrules package.
Edit.matrix
:matrix
of edit rules.
Bound.LU
: range restrictions. For variable X whose range is not specified in range
, the default values are set as max( 0.1min(X), 1e-5 ) for the lower bound and 10max(x) for the upper bound.
ratio
: ratio edits.
n.balance
: number of balance edit, i.e., the row number of balance
.
FaultyRecordID
: record IDs of Y.orig
whose values violate edit rules.
Hang J. Kim, Lawrence H. Cox, Alan F. Karr, Jerome P. Reiter and Quanli Wang (2015). "Simultaneous Edit-Imputation for Continuous Microdata", Journal of the American Statistical Association, DOI: 10.1080/01621459.2015.1040881.
Edwin de Jonge and Mark van der Loo (2013). editrules: R package for parsing, applying, and
manipulating data cleaning rules. R package version 2.7.2.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | ### option 1. import from text files ###
data(NestedEx)
D_obs1 = NestedEx$D.obs
Ratio1 = NestedEx$Ratio.edit
Range1 = NestedEx$Range.edit
Balance1 = NestedEx$Balance.edit
data1 = readData(Y.original=D_obs1, ratio=Ratio1, range=Range1,
balance=Balance1, eps.bal=0.6)
# print(data1$Edit.editmatrix)
# plot(data1$Edit.editmatrix) ## function of 'editrules' package
### option 2. Using the syntax of R package 'editrules' ###
data(NestedEx) ; D_obs2 = NestedEx$D.obs
Ratio2 <- editmatrix(c(
"X1 <= 1096.63*X5", "X1 <= 2980.96*X7", "X1 <= 148.41*X8", "X1 <= 7.39*X9",
"X5 <= 0.37*X1", "X5 <= 54.60*X7", "X5 <= 2.72*X8", "X5 <= 0.14*X9",
"X7 <= 0.14*X1", "X7 <= 1.65*X5", "X7 <= 7.39*X8", "X7 <= 0.05*X9",
"X8 <= 1.65*X1", "X8 <= 54.60*X5", "X8 <= 403.43*X7", "X8 <= 1.65*X9",
"X9 <= 20.09*X1", "X9 <= 403.43*X5", "X9 <= 13359.73*X7", "X9 <= 148.41*X8"
))
Range2 <- editmatrix(c(
"X1 >= 2", "X2 <= 1.2e+06", "X11 >= 0.002", "X11 <= 1.2e+04"
))
Balance2 <- editmatrix(c(
"X1 == X2+X3+X4", "X5 == X6 + 0.4*X10 + 0.6*X11", "X7 == 0.4*X10 + 0.6*X11"
))
data2 = readData(D_obs2, Ratio2, Range2, Balance2)
# print(data2$Edit.editmatrix)
# Note: data2 is equivalent to data1
# plot(data2$Edit.editmatrix) ## function of 'editrules' package
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.