noteworthy | R Documentation |
This function extends the logic used by showLabels
to provide a more general
collection of methods to identify unusual or "noteworthy" points in a two-dimensional display.
Standard methods include Mahalanobis and Euclidean distance from the centroid, absolute value of distance from
the mean of X or Y, absolute value of Y and absolute value of the residual in a model Y ~ X
.
noteworthy(x, y = NULL, n = length(x), method = "mahal", level = NULL, ...)
x , y |
The x and y coordinates of a set of points. Alternatively, a single argument |
n |
Maximum number of points to identify. If set to 0, no points are identified. |
method |
Method of point identification. See Details. |
level |
Where appropriate, if supplied, the identified points are filtered so that only those for which the
criterion is |
... |
Other arguments, silently ignored |
The 'method' argument determines how the points to be identified are selected:
"mahal"
Treat (x, y) as if it were a bivariate sample,
and select cases according to their Mahalanobis distance from (mean(x), mean(y))
.
"dsq"
Similar to "mahal"
but uses squared Euclidean distance.
"x"
Select points according to their value of abs(x - mean(x))
.
"y"
Select points according to their value of abs(y - mean(y))
.
"r"
Select points according to their value of abs(y)
, as may be appropriate
in residual plots, or others with a meaningful origin at 0, such as a chi-square QQ plot.
"ry"
Fit the linear model, y ~ x
and select points according to their absolute residuals.
method
can be an integer vector of case numbers in 1:length{x}
, in which case those cases
will be labeled.
method
can be a vector of the same length as x consisting of values to determine the points
to be labeled. For example, for a linear model mod
, setting method=cooks.distance(mod)
will label the
n
points corresponding to the largest values of Cook's distance. Warning: If missing data are present,
points may be incorrectly selected.
In the case of method == "mahal"
a value for level
can be supplied.
This is used as a filter to select cases whose criterion value
exceeds level
. In this case, the number of points identified will be less than or equal to n
.
# example code
set.seed(47)
x <- c(runif(100), 1.5, 1.6, 0)
y <- c(2*x[1:100] + rnorm(100, sd = 1.2), -2, 6, 6 )
z <- y - x
mod <- lm(y ~ x)
# testing function to compare noteworthy with car::showLabels()
testnote <- function(x, y, n, method=NULL, ...) {
plot(x, y)
abline(lm(y ~ x))
if (!is.null(method))
car::showLabels(x, y, n=n, method = method) |> print()
ids <- noteworthy(x, y, n=n, method = method, ...)
text(x[ids], y[ids], labels = ids, col = "red")
ids
}
# Mahalanobis distance
testnote(x, y, n = 5, method = "mahal")
testnote(x, y, n = 5, method = "mahal", level = .99)
# Euclidean distance
testnote(x, y, n = 5, method = "dsq")
testnote(x, y, n = 5, method = "y")
testnote(x, y, n = 5, method = "ry")
# a vector of criterion values
testnote(x, y, n = 5, method = Mahalanobis(data.frame(x,y)))
testnote(x, y, n = 5, method = z)
# vector of case IDs
testnote(x, y, n = 4, method = seq(10, 60, 10))
testnote(x, y, n = 4, method = which(cooks.distance(mod) > .25))
# test use of xy.coords
noteworthy(data.frame(x,y), n=4)
noteworthy(y ~ x, n=4)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.