dAllocate: Allocation of observations to pre-established cluster...

View source: R/dAllocate.R

dAllocateR Documentation

Allocation of observations to pre-established cluster centers.

Description

Here, observations of a dataset are allocated to a set of preestablished cluster centers. This is intended to be used for the test set in train-test dataset situations.

Usage

dAllocate(inDataFrame, depModel)

Arguments

inDataFrame

A dataset that should be allocated to a set of cluster centers, for example a richer, but less representative dataset, with all datapoints from all donors, instead of only a set number of values from all.

depModel

This is the result of the original application of the depeche function on the associated, more representative dataset.

Value

A vector with the same length as number of rows in the inDataFrame, where the cluster identity of each observation is noted.

See Also

depeche

Examples

# Retrieve some example data
data(testData)
## Not run: 
# Now arbitrarily (for the sake of the example) divide the data into a
# training- and a test set.
testDataSample <- sample(1:nrow(testData), size = 10000)
testDataTrain <- testData[testDataSample, ]
testDataTest <- testData[-testDataSample, ]

# Run the depeche function for the train set

depeche_train <- depeche(testDataTrain[, 2:15],
    maxIter = 20,
    sampleSize = 1000
)

# Allocate the test dataset to the centers of the train dataset
depeche_test <- dAllocate(testDataTest[, 2:15], depeche_train
)

# And finally plot the two groups to see how great the overlap was:
clustVecList <- list(list("Ids" =testDataTrain$ids, 
                          "Clusters" = depeche_train$clusterVector),
                     list("Ids" =testDataTest$ids, 
                          "Clusters" = depeche_test))
tablePerId <- do.call("rbind", lapply(seq_along(clustVecList), function(x){
                                      locDat <- clustVecList[[x]]
                                      locRes <- apply(as.matrix(table(
                                      locDat$Ids, locDat$Clusters)),
                                      1, function(y) y/sum(y))
                                      locResLong <- reshape2::melt(locRes)
                                      colnames(locResLong) <- 
                                      c("Cluster", "Donor", "Fraction")
                                      locResLong$Group <- x
                                      locResLong
                                      }))
tablePerId$Cluster <- as.factor(tablePerId$Cluster)
tablePerId$Group <- as.factor(tablePerId$Group)

library(ggplot2)
ggplot(data=tablePerId, aes(x=Cluster, y=Fraction, 
        fill=Group)) + geom_boxplot() + theme_bw()

## End(Not run)        

Theorell/DepecheR documentation built on July 27, 2023, 8:13 p.m.