Description Usage Arguments Value See Also Examples
Here, observations of a dataset are allocated to a set of preestablished cluster centers. This is intended to be used for the test set in train-test dataset situations.
1 |
inDataFrame |
A dataframe or matrix with the data that that the cluster centers will be allocated to. This data should be scaled in the same way as the data for the original depeche was scaled when it entered the algorithm, i.e. in the normal case, not at all. |
clusterCenters |
A matrix that needs to be inherited from a depeche run. It contains the information about which clusters and variables that have been sparsed away and where the cluster centers are located for the remaining clusters and variables. |
log2Off |
If the automatic detection for high kurtosis, and followingly, the log2 transformation, should be turned off. |
noZeroNum |
For internal use. Controls the that the internal algorithm returns a cluster with number 0. |
A vector with the same length as number of rows in the inDataFrame, where the cluster identity of each observation is noted.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | # Retrieve some example data
data(testData)
## Not run:
# Now arbitrarily (for the sake of the example) divide the data into a
# training- and a test set.
testDataSample <- sample(1:nrow(testData), size = 10000)
testDataTrain <- testData[testDataSample, ]
testDataTest <- testData[-testDataSample, ]
# Run the depeche function for the train set
x_depeche_train <- depeche(testDataTrain[, 2:15],
maxIter = 20,
sampleSize = 1000
)
# Allocate the test dataset to the centers of the train dataset
x_depeche_test <- dAllocate(testDataTest[, 2:15],
clusterCenters = x_depeche_train$clusterCenters
)
# And finally plot the two groups to see how great the overlap was:
trainTablePerId <- apply(as.matrix(table(
testDataTrain$ids,
x_depeche_train$clusterVector
)), 1, function(x) x / sum(x))
trainTableCollapsed <- apply(trainTablePerId, 1, sum)
trainTableFraction <- trainTableCollapsed / sum(trainTableCollapsed)
testTablePerId <- apply(
as.matrix(table(testDataTest$ids, x_depeche_test)),
1, function(x) x / sum(x)
)
testTableCollapsed <- apply(testTablePerId, 1, sum)
testTableFraction <- testTableCollapsed / sum(testTableCollapsed)
xmatrix <- t(cbind(trainTableFraction, testTableFraction))
library(gplots)
barplot2(xmatrix, beside = TRUE, legend = rownames(xmatrix))
title(main = "Difference between train and test set")
title(xlab = "Clusters")
title(ylab = "Fraction")
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.