Description Usage Arguments Details Value Author(s) References See Also Examples
For given molecular data sets from two non-overlapping groups of patients, this functions constructs two independent HC trees and assigns new samples to one of them in semi-supervised way. See details.
1 2 3 | TwoHC_assign(X, index1, index2, new.X, dis.method = "cor", link.method = "ward",
minclus = 4, maxmiss = 30, surv.time, status, method1 = "BIC",
method2 = "g2")
|
X |
An object of class |
index1 |
Column indices of patients in X correspond to the first group. |
index2 |
Column indices of patients in X correspond to the second group. |
new.X |
An object of class |
dis.method |
The distance measure to be used. This must be one of method acceptable for |
link.method |
The agglomeration method to be used. This should be one of "ward" (default), "single", "complete", "average", "mcquitty", "median" or "centroid". |
minclus |
The minimum number of samples allowed to form a cluster. This parameter inversely proportional to the number of partition returned from a HC tree. e.g. a large value returns small number of partitions, and vice versa. |
maxmiss |
Maximum percentage of missing values per row in X. |
surv.time |
A numeric vector contains follow-up information of patient's in X |
status |
A binary vector contains survival status of patients in X, normally 0=alive, 1=dead. |
method1 |
Type of partition evaluation measures to use for assessing the relationship between follow-up and a partition. Default is "BIC". |
method2 |
Type of Partition evaluation measure to use for assessing the relationship between data matrix X and a partition. Default is Goodman and Kruskal index "g2". |
Say molecular profiles of two groups patients (without overlap) treated with two different drugs or the same drugs in different combinations are available. Besides that, their follow-up information are also given. When a new patient comes in (for which only molecular profiles are available), question will be to which group this patient should be assigned so that he/she will benefit most by the type of treatment this group received.
This function is designed for this problem. it works as follows: first, two independent HC trees will be derived from given data; second, partitions are extracted and the optimal partition is selected from each HC tree, separately; third, new patient's molecular profile is compared with each cluster in each optimal partition to calculate average similarity and identify two most similar clusters (competing clusters) fromt the two HC trees; finally, new sample is assigned to one of the two competing clusters which has better overall survival.
A list object contains following components:
hc1 |
HC tree derived from the data corresponds to the first group. |
hc2 |
HC tree derived from the data corresponds to the second group. |
partitions.hc1 |
A matrix includes partitions extracted from hc1. Rows represent partitions and columns represent samples. |
partitions.hc2 |
A matrix includes partitions extracted from hc2. Rows represent partitions and columns represent samples. |
best.hc1 |
Optimal partition found on the hc1 |
best.hc2 |
Optimal partition found on the hc2 |
score.hc1 |
A matrix with two columns. The first column contains the quality scores of partitions.hc1 calculated using the follow-up data. The second column contains the quality scores of partition.hc1 calculated by using X. |
score.hc2 |
The same as score.hc1, but for partitions.hc2. |
Assign |
A matrix with three columns. The first column contains the indices of HC trees to which a test sample was assigned. The second column contains the indices of clusters in best.hc1 to which a test sample was most similar. The third column contains the indices of clusters in best.hc2 to which a test sample was most similar. |
surv.time |
The same as input |
status |
The same as input |
index1 |
The same as input |
index2 |
The same as input |
new.X |
The same as input |
X |
The same as input |
method1 |
The same as input |
method2 |
The same as input |
minclus |
The same as input |
id1 |
indices of the partitions obtained from the hc1 in which minimum cluster size is equal or larger than minclus. |
id2 |
indices of the partitions obtained from the hc2 in which minimum cluster size is equal or larger than minclus. |
Askar Obulkasim
Harrel,E.F. et al., (1982). "Evaluating the yield of medical tests", JAMA, 247, 2543-2546.
Obulkasim,A. et al., (2011). "Stepwise classification of cancer samples using clinical and molecular data", BMC Bioinformatics, 12, 422.
Troyanskaya,O. et al., (2001). "Missing value estimation methods for DNA microarrays". Bioinformatics, 17, 520-525.
Obulkasim,A. et al., (2013). "Semi-supervised adaptive-height snipping of the Hierarchical Clustering tree", submitted.
See also TwoHC_perm
, cluster_pred
1 2 3 4 5 6 7 8 | data(TcgaGBM)
attach(TcgaGBM)
id1 <- which(drugs == "Avastin")
id2 <- which(drugs == "Temodar")
result <- TwoHC_assign(X = em[ ,c(id1[1:30], id2[1:30])], index1 = 1:30, index2 = 31:60,
new.X = em[, c(id1[31:60], id2[31:60])], minclus = 4,
surv.time = surv.time[c(id1[1:30], id2[1:30])],
status = status[c(id1[1:30], id2[1:30])])
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.