distSpace | R Documentation |
Computation of distance space representation.
distSpace(trainingData, testData = NULL, type = "bagdistance", options = NULL)
trainingData |
A list of |
testData |
An |
type |
The distance used in the computations.
For multivariate data one of the following options: |
options |
A list of options to pass to the function
computing the underlying distance. |
The distance space representation is a tool in supervised classification and was introduced in Hubert et al. (2016) as a generalisation of the depth-depth representation of a multivariate sample. Based on a distance transform, an observation (be it multivariate or functional) is mapped to its representation in distance space. The distance transformation consists of mapping the observation to a vector containing at coordinate i
the distance to the training group i
. After transformation, any multivariate classifier may be used to classify new observations in distance space. Typically the k
-nearest neighbour algorithm is used.
Different options are available to compute the distance to each of the training groups. For multivariate data, the user may choose between the bagdistance or any of the projection type distances including the Stahel-Donoho outlyingness, the adjusted outlyingness or the directional outlyingness. For functional data, the user may opt to employ the functional bagdistance (fbd), the functional Stahel-Donoho outlyingness (fSDO), the functional skweness-adjusted outlyingness (fAO) or the functional directional outlyingness (fDO). Options available in each of the underlying distance routines may be passed down using the options
argument.
A q
by (p+1)
matrix composed of two blocks. The first block contains the observations in the training set (rows) with in each column the distance to each of the groups. The last column contains a label indicating the original group membership of the observation. The second block contains the observations in the test set, if any, with in each column the distance to the different training groups. The last column contains an indicator signaling the observation was part of the test set.
P. Segaert
Hubert M., Rousseeuw P.J., Segaert P. (2017). Multivariate and functional classification using depth and distance. Advances in Data Analysis and Classification, 11, 445–466.
data(plane)
# Build the training data
Mirage <- plane$plane1[, 1:25, 1, drop = FALSE]
Eurofighter <- plane$plane3[, 1:25, 1, drop = FALSE]
trainingData <- list(group1 = Mirage,
group2 = Eurofighter)
# Build the test data
Mirage.t <- plane$plane1[, 26:30, 1, drop = FALSE]
Eurofighter.t <- plane$plane3[, 26:30, 1, drop = FALSE]
testData <- abind::abind(Mirage.t, Eurofighter.t, along = 2)
# Transform the data into distSpace
Result <- distSpace(trainingData = trainingData, testData = testData, type="fbd")
# Plot the results
plotColors <- c(rep("orange", dim(Mirage)[2]),
rep("blue", dim(Eurofighter)[2]),
rep("green3", dim(testData)[2]))
plot(Result[, 1:2, ],
col = plotColors, pch=16,
xlab = "distance to Mirage", ylab = "distance to Eurofighter",
main = "distSpace representation of Mirage and Eurofighter")
legend("bottomleft", legend = c("Mirage","Eurofighter", "test data"), pch = 16,
col = c("orange","blue", "green3"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.