repRankAggreg: Repeat Rank Aggregation
In optCluster: Determine Optimal Clustering Algorithm and Number of Clusters

View source: R/optCluster-Functions.R

repRankAggreg

R Documentation

Repeat Rank Aggregation

Description

repRankAggreg repeats rank aggregation of ordered validation measure lists obtained from an
object of class "optCluster". The function returns an object of class "optCluster".

Usage

  repRankAggreg(optObj, rankMethod = "same", distance = "same", 
  importance = NULL, rankVerbose = FALSE, ... )

Arguments

`optObj`	An object of class `"optCluster"`.
`rankMethod`	A character string providing the method to be used for rank aggregation. As default, the "same" method as the input `"optCluster"` object is used. The cross-entropy Monte Carlo algorithm ("CE") or Genetic algorithm ("GA") can also be directly specified. Selection of only one method is allowed.
`distance`	A character string providing the type of distance to be used for measuring the similarity between ordered lists in rank aggregation. As default, the "same" distance as the input `"optCluster"` object is used. The weighted Spearman footrule distance ("Spearman") or the weighted Kendall's tau distance ("Kendall") can also be directly specified. Selection of only one distance is allowed.
`importance`	Vector of weights indicating the importance of each validation measure list. Default of NULL represents equal weights to each validation measure. See Weighted Rank Aggregation in the ‘Details’ section for more information.
`rankVerbose`	If TRUE, current rank aggregation results are displayed at each iteration.
`...`	Additional arguments that can be passed to the internal function `RankAggreg`: `maxIter` - The maximum number of iterations allowed. Default = 1000 `k` - Size of top-k list in aggregation. `convIN` - Stopping criteria for CE and GA algorithms. The algorithm converges once the "best" solution does not change after convIN iterations. Default: 7 for CE and 30 for GA. `N` - Number of samples generated by MCMC in the CE algorithm. Default = 10k^2 `rho` - For CE algorithm, (rhoN) is the qunatile of candidate list sorted by function values. `weight` - For CE algorithm, the learning factor used in the probability update feature. Default = 0.25 `popSize` - For GA algorithm population size in each generation. Default = 100 `CP` - For GA algorithm, the crossover probability. Default = 0.4 `MP` - For GA algorithm, the mutation probability. Default = 0.01

Details

This function tests the consistency of the rank aggregation results by repeating rank aggregation with the same rank aggregation method, distance measure, clustering algorithm lists, and validation score lists used to create the input object of class "optCluster". A different rank aggregation algorithm or type of distance measure can also be evaluated using this function, but doing so may affect the final results.

Weighted Rank Aggregation: A list of weights for each validation measure list can be included using the importance argument. The default value of equal weights (NULL) is represented by rep(1, length(x)), where x is the character vector of validation measure names. This means each validation measure list has a weight of 1/length(x). To manually change the weights, the order of the validation measures selected needs to be known. The order of validation measures used in optCluster is provided below:

When selected, stability measures will ALWAYS be listed first and in the following order: "APN", "AD", "ADM", "FOM".
When selected, internal measures will only precede biological measures. The order of these measures is: "Connectivity", "Dunn", "Silhouette".
When selected, biological measures will always be listed last and in the following order: "BHI", "BSI".

Value

repRankAggreg returns an object of class "optCluster". The class description is provided in the help file.

References

Sekula, M., Datta, S., and Datta, S. (2017). optCluster: An R package for determining the optimal clustering algorithm. Bioinformation, 13(3), 101. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5450252

Pihur, V., Datta, S. and Datta, S. (2007). Weighted rank aggregation of cluster validation measures: A Mounte Carlo cross-entropy approach. Bioinformatics 23(13): 1607-1615.

Pihur, V., Datta, S. and Datta, S. (2009). RankAggreg, an R package for weighted rank aggregation. BMC Bioinformatics, 10:62, https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-10-62.

Examples

	
	## These examples may take a few minutes to compute
		
	## Obtain Dataset	
	data(arabid)			
	
	## Normalize Data with Respect to Library Size	
	obj <- t(t(arabid)/colSums(arabid))
	
	## Analysis of Normalized Data using Internal and Stability Validation Measures
	norm1 <- optCluster(obj, 2:4, clMethods = "all")
	print(norm1)
	repCE <- repRankAggreg(norm1)
	print(repCE)
	repGA <- repRankAggreg(norm1, rankMethod = "GA")
	print(repGA)

optCluster documentation built on April 16, 2022, 5:05 p.m.