ClusterFunction-class: Class ClusterFunction

Description Usage Arguments Details Value Slots Examples

Description

ClusterFunction is a class for holding functions that can be used for clustering in the clustering algorithms in this package.

The constructor ClusterFunction creates an object of the class ClusterFunction.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
internalFunctionCheck(clusterFUN, inputType, algorithmType, outputType)

ClusterFunction(clusterFUN, ...)

## S4 method for signature ''function''
ClusterFunction(
  clusterFUN,
  inputType,
  outputType,
  algorithmType,
  inputClassifyType = NA_character_,
  requiredArgs = NA_character_,
  classifyFUN = NULL,
  checkFunctions = TRUE
)

Arguments

clusterFUN

function passed to slot clusterFUN.

inputType

character for slot inputType

algorithmType

character for slot inputType

outputType

character for slot outputType

...

arguments passed to different methods of ClusterFunction

inputClassifyType

character for slot inputClassifyType

requiredArgs

character for slot requiredArgs

classifyFUN

function for slot classifyFUN

checkFunctions

logical for whether to check the input functions with internalFunctionsCheck

Details

internalFunctionCheck is the function that is called by the validity check of the ClusterFunction constructor (if checkFunctions=TRUE). It is available as an S3 function for the user to be able to test their functions and debug them, which is difficult to do with a S4 validity function.

clusterFUN: The following arguments are required to be accepted for clusterFUN – higher-level code may pass these arguments (but the function can ignore them or just have be handled with a ... )

classifyFUN: The following arguments are required to be accepted for classifyFUN (if not NULL)

algorithmType: Type "01" is for clustering functions that expect as an input a dissimilarity matrix that takes on 0-1 values (e.g. from subclustering) with 1 indicating more dissimilarity between samples. "01" algorithm types must also have inputType equal to "diss". It is also generally expected that "01" algorithms use the 0-1 nature of the input to set criteria as to where to find clusters. "01" functions must take as an argument alpha between 0 and 1 to determine the clusters, where larger values of alpha require less similarity between samples in the same cluster. "K" is for clustering functions that require an argument k (the number of clusters), but arbitrary inputType. On the other hand, "K" algorithms are assumed to need a predetermined 'k' and are also assumed to cluster all samples to a cluster. If not, the post-processing steps in mainClustering such as findBestK and removeSil may not operate correctly since they rely on silhouette distances.

Value

Returns a logical value of TRUE if there are no problems. If there is a problem, returns a character string describing the problem encountered.

A ClusterFunction object.

Slots

clusterFUN

a function defining the clustering function. See details for required arguments.

inputType

a character vector defining what type(s) of input clusterFUN takes. Must consist of values "diss","X", or "cat" indicating the set of input values that the algorithm can handle (see details below).

algorithmType

a character defining what type of clustering algorithm clusterFUN is. Must be one of either "01" or "K". clusterFUN must take the corresponding required arguments for its type (see details below).

classifyFUN

a function that has takes as input new data and the output of clusterFUN (where the output is from when cluster.only=FALSE) and results in cluster assignments of the new data. Used in subsampling clustering. Note that the function should assume that the data given to the inputMatrix argument is not the same samples that were input to the ClusterFunction (but does assume that it is the same number of features/columns). If slot classifyFUN is given value NULL then subsampling type can only be "InSample", see subsampleClustering.

inputClassifyType

the input type for the classification function (if not NULL); like inputType, must be a vector containing "diss","X", or "cat"

outputType

the type of output given by clusterFUN. Must either be "vector" or "list". If "vector" then the output should be a vector of length equal to the number of observations with integer-valued elements identifying them to different clusters; the vector assignments should be in the same order as the original input of the data. Samples that are not assigned to any cluster should be given a '-1' value. If "list", then it must be a list equal to the length of the number of clusters, and the elements of the list contain the indices of the samples in that cluster. Any indices not in any of the list elements are assumed to be -1. The main advantage of "list" is that it can preserve the order of the clusters if the clusterFUN desires to do so. In which case the orderBy argument of mainClustering can preserve this ordering (default is to order by size).

requiredArgs

Any additional required arguments for clusterFUN (beyond those required of all clusterFUN, described in details). Will be used in checking that user provided necessary arguments.

checkFunctions

logical. If TRUE, the validity check of the ClusterFunction object will check the clusterFUN with simple toy data using the function internalFunctionCheck.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
#Use internalFunctionCheck to check possible function
goodFUN<-function(inputMatrix,k,cluster.only,...){
cluster::pam(x=t(inputMatrix),k=k,cluster.only=cluster.only)
}
#passes internal check
internalFunctionCheck(goodFUN,inputType=c("X","diss"),
   algorithmType="K",outputType="vector")
myCF<-ClusterFunction(clusterFUN=goodFUN, inputType="X",
   algorithmType="K", outputType="vector")
#doesn't work, because haven't made results return vector when cluster.only=TRUE
badFUN<-function(inputMatrix,k,cluster.only,...){cluster::pam(x=inputMatrix,k=k)}
internalFunctionCheck(badFUN,inputType=c("X","diss"),
   algorithmType="K",outputType="vector")

clusterExperiment documentation built on Feb. 11, 2021, 2 a.m.