A function to select an optimal partition (clustering) from large number of candidates and calculate the p-value for it.

Share:

Description

For a given set of partitions (each partition is composed of non-overlapping clusters), this function uses two types of data to evaluate each partition and select the optimal one which has the highest rank in terms of both data type (presumed that score1 and score2 were from two different data source). Permutation approach used to calculate the corrected p-value of the selected partition.

Usage

1
perm_test(partitions, surv.time, status, score1 = NULL, score2, method = "BIC", nperm = 1000)

Arguments

partitions

A matrix in which rows represent partitions and columns represent samples

surv.time

A numeric vector contains follow-up time of patients in the partition

status

A binary vector contains survival status of patients in the partition, 0 = alive, 1 = dead

score1

A numeric vector contains the quality score for each partition. Scores are assumed to be calculated using the follow-up data. Note, prepare this vector in a way that high value corresponds to good quality partition.

score2

A numeric vector contains the quality score for each partition calculated by using any data type except for follow-up. The same as score1 this vector must be prepared in a way that high value corresponds to good quality partition.

method

Type of partition evaluation measure to use. Must be the same as the type of measure used in calculating the score1. Default is 'BIC'

nperm

The number of permutations.

Details

When studying association of cluster membership with follow-up data, we cannot use the standard testing procedures. Because score1 is already used the follow-up data. Thus, we would use the follow-up data twice and the resulting p-value is likely to be too small. We avoid this bias by also applying the semi-supervised partition selection under the null-hypothesis. This null-hypothesis is simply the absence of association between the data type used to generate the score2 and the follow-up. Our partition selection in combination with a suitable test statistic is designed to detect associations that can be represented by groups of samples. We adapt the p-value computation as follows:

  1. Use a suitable test statistic (e.g. log-rank for time-to-event data and chi-square for nominal data) to compute the conditional p-value given the cluster labels in the selected partition: p_obs.

  2. For i = 1...nperm:

    1. Randomly permute follow-up data among the samples.

    2. Apply exactly the same type of evaluation measure to evaluate all partitions, e.g. generate new score1 , but score2 is fixed. Selected the best partition as before.

    3. Conditional on the resulting partition, compute p-value p_i.

  3. Finally, the p-value of interest is equal the number of time p_i smaller (or equal) than the p_obs divided by the number of permutations ran.

Here, p satisfies a crucial property of p-value: it is uniformly distributed when the null-hypothesis is true, because then p_obs and p_i are exchangeable random variables. The exchangeability is a result from the null-hypothesis and the use of exactly the same procedures to compute p_obs and p_i.

Value

A list object contains following objects:

obs.p

Observed p-value

perm.p

A vector of p-values from permutations.

best

Selected optimal partition

Author(s)

Askar Obulkasim

References

Obulkasim,A. et al., (2013). "Semi-supervised adaptive-height snipping of the Hierarchical Clustering tree", submitted.

See Also

TwoHC_perm

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
data(BullingerLeukemia)
attach(BullingerLeukemia)
cl <- HCsnipper(em[, 1:30], min = 5)
cl <- cl$partitions[cl$id, ]
m <- apply(cl, 1, function(x) measure(parti = x, dis = 1-cor(em[, 1:30]))) 
s <- apply(cl, 1, function(x) surv_measure(x, surv.time[1:30], status[1:30]))
result <- perm_test(cl, surv.time[1:30], status[1:30], score1 = s, score2 = m, nperm = 10) 

### Visualize cluster differences in terms of Entropy.
H <- EnvioPlot(X = em[, 1:30], parti = result$best)