errorStats: Compute error components of k-NN imputations In yaImpute: Nearest Neighbor Observation Imputation and Evaluation Tools

Description

Error properties of estimates derived from imputation differ from those of regression-based estimates because the two methods include a different mix of error components. This function computes a partitioning of error statistics as proposed by Stage and Crookston (2007).

Usage

 `1` ```errorStats(mahal,...,scale=FALSE,pzero=0.1,plg=0.5,seeMethod="lm") ```

Arguments

 `mahal` An object of class `yai` computed with `method="mahalanobis"`. `...` Other objects of class `yai` for which statistics are desired. All objects should be for the same data and variables used for the first argument. `scale` When `TRUE`, the errors are scaled by their respective standard deviations. `pzero` The lower tail p-value used to pick reference observations that are zero distance from each other (used to compute `rmmsd0`). `plg` The upper tail p-value used to pick reference observations that are substantially distant from each other (used to compute `rmsdlg`). `seeMethod` Method used to compute `SEE`: `seeMethod="lm"` uses `lm` and `seeMethod="gam"` uses `gam`. In both cases, the model formula is a simple linear combination of the X-variables.

Value

A list that contains several data frames. The column names of each are a combination of the name of the object used to compute the statistics and the name of the statistic. The rownames correspond the the Y-variables from the first argument. The data frame names are as follows:

 `common` statistics used to compute other statistics. `name of first argument` error statistics for the first `yai` object. `names of ... arguments` error statistics for each of the remaining `yai` objects, if any. `see` standard error of estimate for individual regressions fit for corresponding Y-variables. `rmmsd0` root mean square difference for imputations based on `method="mahalanobis"` (always based on the first argument to the function). `mlf` square root of the model lack of fit: sqrt(see^2 - (rmmsd0^2/2)). `rmsd` root mean square error. `rmsdlg` root mean square error of the observations with larger distances. `sei` standard error of imputation sqrt(rmsd^2 - (rmmsd0^2/2)). `dstc` distance component: sqrt(rmsd^2 - rmmsd0^2).

Note that unlike Stage and Crookston (2007), all statistics reported here are in the natural units, not squared units.

Author(s)

Nicholas L. Crookston [email protected]
Albert R. Stage [email protected]

References

Stage, A.R.; Crookston, N.L. (2007). Partitioning error components for accuracy-assessment of near neighbor methods of imputation. For. Sci. 53(1):62-72. http://www.treesearch.fs.fed.us/pubs/28385

`yai`, `TallyLake`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19``` ```require (yaImpute) data(TallyLake) diag(cov(TallyLake[,1:8])) # see col A in Table 3 in Stage and Crookston mal=yai(x=TallyLake[,9:29],y=TallyLake[,1:8], noTrgs=TRUE,method="mahalanobis") msn=yai(x=TallyLake[,9:29],y=TallyLake[,1:8], noTrgs=TRUE,method="msn") # variable "see" for "mal" matches col B (when squared and scaled) # other columns don't match exactly as Stage and Crookston used different # software to compute values errorStats(mal,msn) ```