Description Usage Arguments Details Value Author(s) References Examples
View source: R/partitions.bsd.R
This function obtains the optimal number of partitions for gene trees. The input should be an object of class 'bsd' with the pairwise distances among tres and the scaling factors (see bsd.matrix for more details on this class). partitions.bsd returns an object of class partitions with a matrix with the parition assignments, the range of the sBSDmin distances and an object of class cluster with the output from the clustering analysis.
1 | partitions.bsd(bsd.object, FUN = pam, find.best = T, B = 500, gap.best = "firstSEmax", kmax = "", ...)
|
bsd.object |
An object of class bsd. It can be obtained witn bsd.matrix or bsd.matrix.para. The example(bsd.matrix) for details. |
FUN |
A clustering function as defined in the package cluster. It can be pam, clara, or fanny. However note that for fanny the maximum k should be the number of data subsets / 2. |
find.best |
logical. Select T to find the optimal number of partitions. If F, the function does not find the best number of partitions. |
B |
Numeric. Number of bootstrap replicates for the Gap statistic. Not used if find.best == F |
gap.best |
A character. The criterion to select the optimal number of partitions (k). It can be any of "firstSEmax", "Tibs2001SEmax", "globalSEmax", "firstmax", or "globalmax". |
kmax |
Numeric. Maximum number of partitions to test. The default is the number of data subsets - 1. |
... |
Additional arguments passed to the clustering function. |
see the help for pacakge clustering for more details.
An object of class 'partitions'
parts.mat |
A matrix with the cluster assignments. The partitions are numbered from 1:maximum k. The columns are each value of k, and the columns are the names of the data subsets if provided. |
range.bsd |
A numeric vector with the range of the sBSDmin distances |
best.k |
The optimal number of partitions. Only returned if find.best == T |
clusterdata |
Results from the Gap statistic. Only returned if find.best == T. See ?clusGap for more details |
Sebastian Ducnene
Tibshirani, R., Walther, G. and Hastie, T. (2001). Estimating the number of data clusters via the Gap statistic. _Journal of the Royal Statistical Society B_, *63*, 411-423.
Tibshirani, R., Walther, G. and Hastie, T. (2000). Estimating the number of clusters in a dataset via the Gap statistic. Technical Report. Stanford.
Per Broberg (2006). SAGx: Statistical Analysis of the GeneChip. R package version 1.9.7. <URL: http://home.swipnet.se/pibroberg/expression_hemsida1.html>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 | # Create a list of trees of class multiPhylo with four patterns of among-lineage rate variation
tr.fix <- rtree(10)
set.seed(12345)
rates1 <- abs(rnorm(18, sd = 0.1))
set.seed(123456)
rates2 <- abs(rnorm(18, sd = 0.1))
set.seed(1234567)
rates3 <- abs(rnorm(18, sd = 0.1))
set.seed(12345678)
rates4 <- abs(rnorm(18, sd = 0.1))
trees.list <- list()
for(i in 1:20){
trees.list[[i]] <- tr.fix
if(i <= 5){
trees.list[[i]]$edge.length <- abs(rates1 + rnorm(18, 0, 0.01))
}else if(i > 5 && i <= 10){
trees.list[[i]]$edge.length <- abs(rates2 + rnorm(18, 0, 0.01))
}else if(i >= 10 && i < 15){
trees.list[[i]]$edge.length <- abs(rates3 + rnorm(18, 0, 0.01))
}else{
trees.list[[i]]$edge.length <- abs(rates4 + rnorm(18, 0, 0.01))
}
}
names(trees.list) <- paste0("tree", 1:20)
class(trees.list) <- "multiPhylo"
# Estimate sBSDmin distance for all pairs of trees:
trees.bsd <- bsd.matrix(trees.list)
# Find the optimal number of partitions:
partitions.object <- partitions(trees.bsd)
# plot partitions in two graphics devices
plot(partitions.object)
## The function is currently defined as
function (bsd.object, FUN = pam, find.best = T, B = 500, gap.best = "firstSEmax",
kmax = "", ...)
{
dimat <- as.matrix(bsd.object[[1]])
if (kmax == "") {
kmax <- nrow(dimat) - 1
}
dimat.mds <- cmdscale(dimat, k = 3)
if (find.best == T) {
clusdat <- clusGap(dimat.mds, B = B, FUNcluster = FUN, K.max = kmax)
npart <- maxSE(f = clusdat$Tab[, 3], SE.f = clusdat$Tab[,
4], method = gap.best)
}
parts.list <- list()
for (i in 1:kmax) {
clus.temp <- FUN(dimat.mds, k = i, ...)
parts.list[[i]] <- clus.temp$clustering
}
parts.mat <- do.call("cbind", parts.list)
rownames(parts.mat) <- rownames(dimat.mds)
colnames(parts.mat) <- paste0("k=", 1:ncol(parts.mat))
res <- list(parts.mat, range(bsd.object[[1]]))
names(res) <- c("parts.mat", "range.bsd")
if (exists("npart")) {
colnames(res[[1]])[npart] <- paste0(colnames(parts.mat)[npart],
"BEST")
res[[3]] <- npart
names(res)[3] <- "best.k"
}
if (find.best == T) {
res[[4]] <- clusdat
names(res)[4] <- "clusterdata"
}
class(res) <- "partitions"
return(res)
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.