# notablydifferent: Finds obervations with large differences between observed and... In yaImpute: Nearest Neighbor Observation Imputation and Evaluation Tools

## Description

This routine identifies observations with large errors as measured by scaled root mean square error (see `rmsd.yai`). A threshold is used to detect observations with large differences.

## Usage

 `1` ```notablyDifferent(object,vars=NULL,threshold=NULL,p=.05,...) ```

## Arguments

 `object` an object of class `yai`. `vars` a vector of character strings naming the variables to use, if null the X-variables form `object` are used. `threshold` a threshold that if exceeded the observations are listed as notably different. `p` `(1-p)*100` is the percentile point in the distribution of differences used to compute the threshold (used when `threshold` is NULL). `...` additional arguments passed to `impute.yai`.

## Details

The scaled differences are computed a follows:

1. A matrix of differences between observed and imputed values is computed for each observation (rows) and each variable (columns).

2. These differences are scaled by dividing by the standard deviation of the observed values among the reference observations.

3. The scaled differences are squared.

4. Row means are computed resulting in one value for each observation.

5. The square root of each of these values is taken.

These values are Euclidean distances between the target observations and their nearest references as measured using specified variables. All the variables that are used must have observed and imputed values. Generally, this will be the X-variables and not the Y-variables.

When `threshold` is NULL, the function computes one using the `quantile` function with its default arguments and `probs=1-p`.

## Value

A named list of several items. In all cases vectors are named using the observation ids which are the row names of the data used to build the `yai`object.

 `call` The call. `vars` The variables used (may be fewer than requested). `threshold` The threshold value. `notablyDifferent.refs` A sorted named vector of references that exceed the threshold. `notablyDifferent.trgs` A sorted named vector of targets that exceed the threshold. `rmsdS.refs` A sorted named vector of scaled RMSD references. `rmsdS.trgs` A sorted named vector of scaled RMSD targets.

## Author(s)

Nicholas L. Crookston [email protected]

`notablyDistant`, `plot.notablyDifferent`, `yai`, `grmsd`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21``` ```data(iris) set.seed(12345) # form some test data refs=sample(rownames(iris),50) x <- iris[,1:3] # Sepal.Length Sepal.Width Petal.Length y <- iris[refs,4:5] # Petal.Width Species # build an msn run, first build dummy variables for species. sp1 <- as.integer(iris\$Species=="setosa") sp2 <- as.integer(iris\$Species=="versicolor") y2 <- data.frame(cbind(iris[,4],sp1,sp2),row.names=rownames(iris)) y2 <- y2[refs,] names(y2) <- c("Petal.Width","Sp1","Sp2") msn <- yai(x=x,y=y2,method="msn") notablyDifferent(msn) ```