sel.edit: Influential Error Detection In SeleMix: Selective Editing via Mixture Models

Description

Computes the score function and identifies influential errors

Usage

 ```1 2``` ``` sel.edit (y, ypred, wgt=rep(1,nrow(as.matrix(y ))), tot=colSums(ypred * wgt), t.sel=0.01) ```

Arguments

 `y` matrix or data frame containing the response variables `ypred` matrix of predicted values for y variables `wgt` optional vector of sampling weights (default=1) `tot` optional vector containing reference estimates of totals for the y variables. If omitted, it is computed as the (possibly weighted) sum of predicted values `t.sel` optional vector of threshold values, one for each variable, for selective editing (default=0.01)

Details

This function ranks observations (`rank`) according to the importance of their potential errors. The order is made with respect to the global score function values (`global.score`). The function also selects the units to be edited (`sel`) so that the expected residual error of all variables is below a prefixed level of accuracy (`t.sel`). The global score (`global.score`) is the maximum of the local scores computed for each variable (`y1.score, y2.score,...`). The local scores are defined as a weighted (`weights`) absolute difference between the observed (`y1, y2,...`) and the predicted values (`y1.p, y2.p,...`) standardised with respect to the reference total estimates (`tot`).

The selection of the units to be edited because affected by an influential error (`sel=1`) is made according to a two-step algorithm:
1) order the observations with respect to the `global.score` (decreasing order);
2) select the first k units such that, from the (k+1)th to the last observation, all the residual errors (`y1.reserr, y2.reserr,...`) for each variable are below `t.sel`.

The function provides also an indicator function (`y1.sel, y2.sel,...`) reporting which variables contain an influential errors in a unit selected for the revision.

Value

`sel.edit` returns a data matrix containing the following columns:

 `y1, y2,...` observed variables `y1.p, y2.p,...` predictions of y variables `weights` sampling weights `y1.score, y2.score,...` local scores `global.score` global score `y1.reserr, y2.reserr,...` residual errors `y1.sel, y2.sel,...` influential error flags `rank` rank according to global score `sel` 1 if the observation contains an influential error, 0 otherwise

Author(s)

M. Teresa Buglielli <bugliell@istat.it>, Ugo Guarnera <guarnera@istat.it>

References

Di Zio, M., Guarnera, U. (2013) "A Contamination Model for Selective Editing", Journal of Official Statistics. Volume 29, Issue 4, Pages 539-555 (http://dx.doi.org/10.2478/jos-2013-0039).

Buglielli, M.T., Di Zio, M., Guarnera, U. (2010) "Use of Contamination Models for Selective Editing", European Conference on Quality in Survey Statistics Q2010, Helsinki, 4-6 May 2010.

Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20``` ```# Example 1 # Parameter estimation with one contaminated variable and one covariate data(ex1.data) ml.par <- ml.est(y=ex1.data[,"Y1"], x=ex1.data[,"X1"]) # Detection of influential errors sel <- sel.edit(y=ex1.data[,"Y1"], ypred=ml.par\$ypred) head(sel) sum(sel[,"sel"]) # orders results for decreasing importance of score sel.ord <- sel[order(sel[,"rank"]), ] # adds columns to data ex1.data <- cbind(ex1.data, tau=ml.par\$tau, outlier=ml.par\$outlier, sel[,c("rank", "sel")]) # plot of data with outliers and influential errors sel.pairs(ex1.data[,c("X1","Y1")],outl=ml.par\$outlier, sel=sel[,"sel"]) # Example 2 data(ex2.data) par.joint <- ml.est(y=ex2.data) sel <- sel.edit(y=ex2.data, ypred=par.joint\$ypred) sel.pairs(ex2.data,outl=par.joint\$outlier, sel=sel[,"sel"]) ```

SeleMix documentation built on Nov. 29, 2020, 9:09 a.m.