TwoHC_perm: Function to assess the significance of group assignmetn from...
In HCsnip: Semi-supervised adaptive-height snipping of the Hierarchical Clustering tree

Description Usage Arguments Details Value Author(s) References See Also Examples

Function to evaluate the significance of the group assignments generated by TwoHC_assign.

1	TwoHC_perm(TwoHC, nperm = 1000)

`TwoHC`	Output from the 'TwoHC_assign' function
`nperm`	The number of permutations.

Significance of group assignment for each patient is calculated as follows: for a given patient, examine the previously found optimal partition in each HC tree and identify two clusters to which this patient is most similar. Say these two competing clusters are cluster1 and cluster2 of size n1 and n2, respectively. Create a binary vector (call it x) of size n1 + n2 which has n2 ones and n1 zeros . Construct a Cox model using follow-up information of samples in cluster1 and cluster2 as response, and x as covariate. The absolute value of the estimated group parameter ('beta_obs') in the Cox model that compares the survival times of the other patients in the two competing clusters expresses the predicted gain in survival from the assignment by 'TwoHC_assign' with respect to random. The beta_obs will be transformed to

r^i_{obs} = exp≤ft(-|\hatβ^i_{obs}|\right),

which quantifies the gain in relative risk in the Cox model. The problem is that this is biased, because 'TwoHC_assign' already used 'beta_obs'. Hence, even when the two groups would be equally good for the molecular profiles in the two competing clusters, we obtain 'r_obs' < 1. To correct for this bias this function uses a permutation argument. For each new patient it applies 'nperm' permutations of the surival data among the two competing clusters. As above we compute 'r_perm' for each permutation which contains the same bias as 'r_obs'. Let Z_i = median(r_perm(1), ... , r_perm(nperm)), then risk-ratio rr_obs(i) = r_obs(i) / Z_i quantifies the biased-corrected reduction in relative risk. The permuted version 'rr_perm(i)' is defined analogously (see vignette). Finally it defines a test statistic:

{T}_{obs} = \frac{\frac{1}{n}∑_{i=1}^{n}log≤ft(rr^i_{obs}\right)}{stdev≤ft(log≤ft(rr^1_{obs},…,rr^n_{obs}\right)\right)}

'T_obs' compared with the background of its null-distribution as obtained by permutation to calculate p-value.

A list object contains following objects:

`Obs.betas`	A numeric vector contains the coefficient from the Cox model corresponding to each test sample.
`Perm.betas`	A matrix contains the coefficient from the Cox model trained with permuted follow-up data. Columns represent test samples, rows represent permutations
`Ranks`	A numeric vector contains the rank of each observed coefficient among the nperm coefficients generated by permutations.
`RiskRatios`	A numeric vector contains the ratio of relative risk for the test set.
`Pvalue`	p-value of the overall group assignment.

Askar Obulkasim

Obulkasim,A. et al., (2013). "Semi-supervised adaptive-height snipping of the Hierarchical Clustering tree", submitted.

Harrel,E.F. et al., (1982). "Evaluating the yield of medical tests", JAMA, 247, 2543-2546.

Obulkasim,A. et al., (2011). "Stepwise classification of cancer samples using clinical and molecular data", BMC Bioinformatics, 12, 422.

See also TwoHC_assign, cluster_pred

data(TcgaGBM)
attach(TcgaGBM)
id1 <- which(drugs == "Avastin")
id2 <- which(drugs == "Temodar") 
twoHC <- TwoHC_assign(X = em[ ,c(id1[1:50], id2[1:50])], index1 = 1:50, index2 = 51:100, 
                     new.X = em[, c(id1[51:60], id2[51:60])], minclus = 4,
                     surv.time = surv.time[c(id1[1:50], id2[1:50])], 
                     status = status[c(id1[1:50], id2[1:50])])  
result <- TwoHC_perm(twoHC, nperm = 100)
                   
## Not run: 
### Examples with a larger number of permutations (not run).
result <- TwoHC_perm(twoHC, nperm = 10000)
par(mfrow = c(1, 2))
plot(density(result$Ranks), xlab = "Ranks")
plot(density(result$RiskRatios), xlab = "Observed relative risk-ratios")

## End(Not run)