partitions.bsd: Obtain optimal partitioning scheme

Description Usage Arguments Details Value Author(s) References Examples

View source: R/partitions.bsd.R

Description

This function obtains the optimal number of partitions for gene trees. The input should be an object of class 'bsd' with the pairwise distances among tres and the scaling factors (see bsd.matrix for more details on this class). partitions.bsd returns an object of class partitions with a matrix with the parition assignments, the range of the sBSDmin distances and an object of class cluster with the output from the clustering analysis.

Usage

1
partitions.bsd(bsd.object, FUN = pam, find.best = T, B = 500, gap.best = "firstSEmax", kmax = "", ...)

Arguments

bsd.object

An object of class bsd. It can be obtained witn bsd.matrix or bsd.matrix.para. The example(bsd.matrix) for details.

FUN

A clustering function as defined in the package cluster. It can be pam, clara, or fanny. However note that for fanny the maximum k should be the number of data subsets / 2.

find.best

logical. Select T to find the optimal number of partitions. If F, the function does not find the best number of partitions.

B

Numeric. Number of bootstrap replicates for the Gap statistic. Not used if find.best == F

gap.best

A character. The criterion to select the optimal number of partitions (k). It can be any of "firstSEmax", "Tibs2001SEmax", "globalSEmax", "firstmax", or "globalmax".

kmax

Numeric. Maximum number of partitions to test. The default is the number of data subsets - 1.

...

Additional arguments passed to the clustering function.

Details

see the help for pacakge clustering for more details.

Value

An object of class 'partitions'

parts.mat

A matrix with the cluster assignments. The partitions are numbered from 1:maximum k. The columns are each value of k, and the columns are the names of the data subsets if provided.

range.bsd

A numeric vector with the range of the sBSDmin distances

best.k

The optimal number of partitions. Only returned if find.best == T

clusterdata

Results from the Gap statistic. Only returned if find.best == T. See ?clusGap for more details

Author(s)

Sebastian Ducnene

References

Tibshirani, R., Walther, G. and Hastie, T. (2001). Estimating the number of data clusters via the Gap statistic. _Journal of the Royal Statistical Society B_, *63*, 411-423.

Tibshirani, R., Walther, G. and Hastie, T. (2000). Estimating the number of clusters in a dataset via the Gap statistic. Technical Report. Stanford.

Per Broberg (2006). SAGx: Statistical Analysis of the GeneChip. R package version 1.9.7. <URL: http://home.swipnet.se/pibroberg/expression_hemsida1.html>

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# Create a list of trees of class multiPhylo with four patterns of among-lineage rate variation 
tr.fix <- rtree(10)

set.seed(12345)
rates1 <- abs(rnorm(18, sd = 0.1))
set.seed(123456)
rates2 <- abs(rnorm(18, sd = 0.1))
set.seed(1234567)
rates3 <- abs(rnorm(18, sd = 0.1))
set.seed(12345678)
rates4 <- abs(rnorm(18, sd = 0.1))

trees.list <- list()

for(i in 1:20){
      trees.list[[i]] <- tr.fix
      if(i <= 5){
      	   trees.list[[i]]$edge.length <- abs(rates1 + rnorm(18, 0, 0.01)) 
      }else if(i > 5 && i <= 10){
      	    trees.list[[i]]$edge.length <- abs(rates2 + rnorm(18, 0, 0.01)) 
      }else if(i >= 10 && i < 15){
      	    trees.list[[i]]$edge.length <- abs(rates3 + rnorm(18, 0, 0.01))  
      }else{
	    trees.list[[i]]$edge.length <- abs(rates4 + rnorm(18, 0, 0.01)) 
      }
}

names(trees.list) <- paste0("tree", 1:20)
class(trees.list) <- "multiPhylo"

# Estimate sBSDmin distance for all pairs of trees:
trees.bsd <- bsd.matrix(trees.list)

# Find the optimal number of partitions:
partitions.object <- partitions(trees.bsd)

# plot partitions in two graphics devices
plot(partitions.object)





## The function is currently defined as
function (bsd.object, FUN = pam, find.best = T, B = 500, gap.best = "firstSEmax", 
    kmax = "", ...) 
{
    dimat <- as.matrix(bsd.object[[1]])
    if (kmax == "") {
        kmax <- nrow(dimat) - 1
    }
    dimat.mds <- cmdscale(dimat, k = 3)
    if (find.best == T) {
        clusdat <- clusGap(dimat.mds, B = B, FUNcluster = FUN, K.max = kmax)
        npart <- maxSE(f = clusdat$Tab[, 3], SE.f = clusdat$Tab[, 
            4], method = gap.best)
    }
    parts.list <- list()
    for (i in 1:kmax) {
        clus.temp <- FUN(dimat.mds, k = i, ...)
        parts.list[[i]] <- clus.temp$clustering
    }
    parts.mat <- do.call("cbind", parts.list)
    rownames(parts.mat) <- rownames(dimat.mds)
    colnames(parts.mat) <- paste0("k=", 1:ncol(parts.mat))
    res <- list(parts.mat, range(bsd.object[[1]]))
    names(res) <- c("parts.mat", "range.bsd")
    if (exists("npart")) {
        colnames(res[[1]])[npart] <- paste0(colnames(parts.mat)[npart], 
            "BEST")
        res[[3]] <- npart
        names(res)[3] <- "best.k"
    }
    if (find.best == T) {
        res[[4]] <- clusdat
        names(res)[4] <- "clusterdata"
    }
    class(res) <- "partitions"
    return(res)
  }

sebastianduchene/ClockstaR documentation built on May 29, 2019, 4:57 p.m.