Description Usage Arguments Details Author(s) Examples
View source: R/distanceFunctions.R
Calculates nearest neighbor distance between points. Data are subset prior to calculating distances (see details).
1 2 | neighborDist(dfv, column.nums = 1:ncol(dfv), subset = 1:nrow(dfv),
S = NULL)
|
dfv |
a data frame containing observations in rows and statistics in columns. |
column.nums |
indexes the columns of the data frame that will be used to calculate nearest neighbor distances (all other columns are ignored). |
subset |
index the rows of the data frame that will be used to calculate the covariance matrix (unless specified manually). |
S |
the covariance matrix used to normalise the data in the nearest neighbor calculation. Leave as NULL to use the ordinary covariance matrix calculated using cov(dfv[subset,column.nums]). |
Takes a matrix or data frame as input, with observations in rows and statistics in columns. The parameter "column.nums" is used to select which columns to use in the analysis, all other columns are ignored. The covariance is then calculated on a subset of this data, specified using the parameter "subset" (which defaults to all observations). All distances in the calculation are normalised by multiplying by the inverse of this covariance matrix. Alternatively, this matrix can be specified manually as an additional argument. The nearest neighbor distance of a point is calculated as the closest distance between this point and all points in the chosen subset.
Note that this method cannot handle NA values.
Robert Verity r.verity@imperial.ac.uk
1 2 3 4 5 6 7 8 9 10 11 12 | ## Not run:
# create a data frame of observations
df <- data.frame(x=rnorm(100),y=rnorm(100))
# calculate nearest neighbor distances
distances <- neighborDist(df)
# use this distance to look for outliers
Q95 <- quantile(distances, 0.95)
which(distances>Q95)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.