scaleboot: Multiscale Bootstrap Resampling

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/relltest.R

Description

Performs multiscale bootstrap resampling for a specified statistic.

Usage

1
2
3
4
5
6
7
8
scaleboot(dat,nb,sa,fun,parm=NULL,count=TRUE,weight=TRUE,
          cluster=NULL,onlyboot=FALSE,seed=NULL,...)

countw.assmax(x,w,ass)

countw.shtest(x,w,obs)

countw.shtestass(x,w,assobs)

Arguments

dat

data matrix or data-frame. Row vectors are to be resampled.

nb

vector of the numbers of bootstrap replicates.

sa

vector of scales in sigma squared (σ^2).

fun

function for a statistic.

parm

parameter to be passed to fun above.

count

logical. Should only the accumulative counts be returned? Otherwise, raw statistic vectors are returned.

weight

logical. In fun above, resampling is specified by a weight vector. Otherwise, resampling is specified by a vector of indices.

cluster

parallel cluster object which may be generated by function makeCluster.

onlyboot

logical. Should only bootstrap resampling be performed? Otherwise, sbfit or sbconf is called internally.

seed

If non NULL, random seed is set. Specifying a seed is particularly important when cluster is non NULL, in which case seed + seq(along=cluster) are set to cluster nodes.

...

further arguments passed to and from other methods.

x

data matrix or data-frame passed from scaleboot.

w

weight vector for resampling.

ass

a list of association vectors. An example of parm above.

obs

a vector of observed test statistics. An example of parm above.

assobs

a list of ass and obs above. An example of parm above.

Details

These functions are used internally by relltest for computing raw bootstrap probabilities of phylogenetic inference. Alternatively, we used pvclust to get raw bootstrap probabilities of hierarchical clustering. In other cases, users may utilize scaleboot function or prepare their own functions.

scaleboot performs multiscale bootstrap resampling for a statistic defined by fun, which should be one of the two possible forms fun(x,w,parm) and fun(x,i,parm). The former is used when weight=TRUE, and the weight vector w is generated by a multinomial distribution. The latter is used when weight=FALSE, and the index vector i is generated by resampling n' elements from \{1,...,n\}. When count=TRUE, fun should return a logical, or a vector of logicals.

Examples of fun(x,w,parm) are countw.assmax for AU p-values, countw.shtest for SH-test of trees, and countw.shtestass for SH-test of both trees and edges. The definitions are given below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
countw.assmax <- function(x,w,ass) {
  y <- maxdif(wsumrow(x,w)) <= 0 # countw.max
  if(is.null(ass)) y
  else {
    z <- vector("logical",length(ass))
    for(i in seq(along=ass)) z[i] <- any(y[ass[[i]]])
    z
  }
}

countw.shtest <- function(x,w,obs)  maxdif(wsumrow(x,w)) >= obs

countw.shtestass <- function(x,w,assobs)
  unlist(assmaxdif(wsumrow(x,w),assobs$ass)) >= assobs$obs
    
### weighted sum of row vectors
##
## x = matrix (array of row vectors)
## w = weight vector (for rows)
##
wsumrow <- function(x,w) {
  apply(w*x,2,sum)*nrow(x)/sum(w)
}

### calc max diff
##
## y[i] := max_{j neq i} x[j] - x[i]
##
maxdif <- function(x) {
  i1 <- which.max(x)  # the largest element
  x <- -x + x[i1]
  x[i1] <- -min(x[-i1])  # the second largest value
  x
}

### calc assmaxdif
##
## y[[i]][j] := max_{k neq ass[[i]]} x[k] - x[ass[[i]][j]]
##
assmaxdif <-  function(x,a) {
  y <- vector("list",length(a))
  names(y) <- names(a)
  for(i in seq(along=a))  y[[i]] <- max(x[-a[[i]]]) - x[a[[i]]]
  y
}

When count=TRUE, the summation of outputs from fun is calculated. This gives the frequencies for how many times the hypotheses are supported by the bootstrap replicates.

Value

If onlyboot=TRUE, then a list of raw results from the multiscale bootstrap resampling is returned. The components are "stat" for list vectors of outputs from fun (only when count=FALSE), "bps" for a matrix of multiscale bootstrap probabilities (only when count=FALSE), "nb" for the number of bootstrap replicates used, and "sa" for the scales used. Note that scales are redefined by sa <- nsize/round(nsize/sa), where nsize is the sample size.

If onlyboot=FALSE, then the result of a call to sbfit is returned when count=TRUE, otherwise the result of sbconf is returned when count=FALSE.

Author(s)

Hidetoshi Shimodaira

See Also

sbfit, sbconf, relltest.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
## Not run: 
## An example to calculate AU p-values for phylogenetic trees
## See also the Examples of "sbconf"
data(mam15) # load mam15.mt
sa <- 9^seq(-1,1,length=13) # parameter for multiscale bootstrap
nb <- 1000 # nb=10000 is better but slower
# Now compute bootstrap probabilities and fit models to them
sim <- scaleboot(mam15.mt,nb,sa,countw.assmax) # takes some time (< 1 min)
sim # show bootstrap probabilities and model fitting
summary(sim) # show AU p-vaslues

## End(Not run)

## Not run: 
## The following lines are only for illustration purpose
## a line from the definition of relltest
scaleboot(dat,nb,sa,countw.assmax,ass,cluster=cluster,
                 names.hp=na,nofit=nofit,models=models,seed=seed)

## two lines from rell.shtest (internal function)
scaleboot(z,nb,1,countw.shtest,tobs,cluster=cluster,
                 onlyboot=TRUE,seed=seed)
scaleboot(z,nb,1,countw.shtestass,pa,cluster=cluster,
                 onlyboot=TRUE,seed=seed)

## End(Not run)

scaleboot documentation built on Dec. 4, 2019, 5:07 p.m.