ClusterFunction-class: Class ClusterFunction

internalFunctionCheckR Documentation

Class ClusterFunction

Description

ClusterFunction is a class for holding functions that can be used for clustering in the clustering algorithms in this package.

The constructor ClusterFunction creates an object of the class ClusterFunction.

Usage

internalFunctionCheck(clusterFUN, inputType, algorithmType, outputType)

ClusterFunction(clusterFUN, ...)

## S4 method for signature 'function'
ClusterFunction(
  clusterFUN,
  inputType,
  outputType,
  algorithmType,
  inputClassifyType = NA_character_,
  requiredArgs = NA_character_,
  classifyFUN = NULL,
  checkFunctions = TRUE
)

Arguments

clusterFUN

function passed to slot clusterFUN.

inputType

character for slot inputType

algorithmType

character for slot inputType

outputType

character for slot outputType

...

arguments passed to different methods of ClusterFunction

inputClassifyType

character for slot inputClassifyType

requiredArgs

character for slot requiredArgs

classifyFUN

function for slot classifyFUN

checkFunctions

logical for whether to check the input functions with internalFunctionsCheck

Details

internalFunctionCheck is the function that is called by the validity check of the ClusterFunction constructor (if checkFunctions=TRUE). It is available as an S3 function for the user to be able to test their functions and debug them, which is difficult to do with a S4 validity function.

clusterFUN: The following arguments are required to be accepted for clusterFUN – higher-level code may pass these arguments (but the function can ignore them or just have be handled with a ... )

  • "inputMatrix"will be the matrix of data

  • "inputType"one of "X", "diss", or "cat". If "X", then inputMatrix is assumed to be nfeatures x nsamples (like assay(CEObj) would give). If "cat" then nfeatures x nsamples, but all entries should be categorical levels, encoded by positive integers, with -1/-2 types of NA (like a clusterMatrix slot, but with dimensions switched). If "diss", then inputMatrix should be a nxn dissimilarity matrix.

  • "checkArgs"logical argument. If checkArgs=TRUE, the clusterFUN should check if the arguments passed in ... are valid and return an error if not; otherwise, no error will be given, but the check should be done and only valid arguments in ... passed along. This is necessary for the function to work with clusterMany which passes all arguments to all functions without checking.

  • "cluster.only"logical argument. If cluster.only=TRUE, then clusterFUN should return only the vector of cluster assignments (or list if outputType="list"). If cluster.only=FALSE then the clusterFUN should return a named list where one of the elements entitled clustering contains the vector described above (no list allowed!); anything else needed by the classifyFUN to classify new data should be contained in the output list as well. cluster.only is set internally depending on whether classifyFUN will be later used by subsampling or only for clustering the final product.

  • "..."Any additional arguments specific to the algorithm used by clusterFUN should be passed via ... and NOT passed via arguments to clusterFUN

  • "Other required arguments"clusterFUN must also accept arguments required for its algorithmType (see Details below).

classifyFUN: The following arguments are required to be accepted for classifyFUN (if not NULL)

  • inputMatrixthe new data that will be classified into the clusters

  • inputTypethe inputType of the new data (see above)

  • clusterResultthe result of running clusterFUN on the training data, when cluster.only=FALSE. Whatever is returned by clusterFUN is assumed to be sufficient for this function to classify new objects (e.g. could return the centroids of the clustering, if clustering based on nearest centroid).

algorithmType: Type "01" is for clustering functions that expect as an input a dissimilarity matrix that takes on 0-1 values (e.g. from subclustering) with 1 indicating more dissimilarity between samples. "01" algorithm types must also have inputType equal to "diss". It is also generally expected that "01" algorithms use the 0-1 nature of the input to set criteria as to where to find clusters. "01" functions must take as an argument alpha between 0 and 1 to determine the clusters, where larger values of alpha require less similarity between samples in the same cluster. "K" is for clustering functions that require an argument k (the number of clusters), but arbitrary inputType. On the other hand, "K" algorithms are assumed to need a predetermined 'k' and are also assumed to cluster all samples to a cluster. If not, the post-processing steps in mainClustering such as findBestK and removeSil may not operate correctly since they rely on silhouette distances.

Value

Returns a logical value of TRUE if there are no problems. If there is a problem, returns a character string describing the problem encountered.

A ClusterFunction object.

Slots

clusterFUN

a function defining the clustering function. See details for required arguments.

inputType

a character vector defining what type(s) of input clusterFUN takes. Must consist of values "diss","X", or "cat" indicating the set of input values that the algorithm can handle (see details below).

algorithmType

a character defining what type of clustering algorithm clusterFUN is. Must be one of either "01" or "K". clusterFUN must take the corresponding required arguments for its type (see details below).

classifyFUN

a function that has takes as input new data and the output of clusterFUN (where the output is from when cluster.only=FALSE) and results in cluster assignments of the new data. Used in subsampling clustering. Note that the function should assume that the data given to the inputMatrix argument is not the same samples that were input to the ClusterFunction (but does assume that it is the same number of features/columns). If slot classifyFUN is given value NULL then subsampling type can only be "InSample", see subsampleClustering.

inputClassifyType

the input type for the classification function (if not NULL); like inputType, must be a vector containing "diss","X", or "cat"

outputType

the type of output given by clusterFUN. Must either be "vector" or "list". If "vector" then the output should be a vector of length equal to the number of observations with integer-valued elements identifying them to different clusters; the vector assignments should be in the same order as the original input of the data. Samples that are not assigned to any cluster should be given a '-1' value. If "list", then it must be a list equal to the length of the number of clusters, and the elements of the list contain the indices of the samples in that cluster. Any indices not in any of the list elements are assumed to be -1. The main advantage of "list" is that it can preserve the order of the clusters if the clusterFUN desires to do so. In which case the orderBy argument of mainClustering can preserve this ordering (default is to order by size).

requiredArgs

Any additional required arguments for clusterFUN (beyond those required of all clusterFUN, described in details). Will be used in checking that user provided necessary arguments.

checkFunctions

logical. If TRUE, the validity check of the ClusterFunction object will check the clusterFUN with simple toy data using the function internalFunctionCheck.

Examples

#Use internalFunctionCheck to check possible function
goodFUN<-function(inputMatrix,k,cluster.only,...){
cluster::pam(x=t(inputMatrix),k=k,cluster.only=cluster.only)
}
#passes internal check
internalFunctionCheck(goodFUN,inputType=c("X","diss"),
   algorithmType="K",outputType="vector")
myCF<-ClusterFunction(clusterFUN=goodFUN, inputType="X",
   algorithmType="K", outputType="vector")
#doesn't work, because haven't made results return vector when cluster.only=TRUE
badFUN<-function(inputMatrix,k,cluster.only,...){cluster::pam(x=inputMatrix,k=k)}
internalFunctionCheck(badFUN,inputType=c("X","diss"),
   algorithmType="K",outputType="vector")

epurdom/clusterCells documentation built on April 28, 2024, 8:14 p.m.