Description Usage Arguments Details Value Author(s) References See Also Examples
For a given partition, this function assigns new samples to one of the clusters in the partition. Partition (clustering) is composed of non-overlapping clusters
1 2 3 |
X |
An object of class |
partition |
A numeric vector contains the non-overlapping cluster labels. |
surv.time |
A numeric vector contains follow-up time of patients in the partition. |
status |
A binary vector contains survival status of patients in the partition, 0 = alive, 1 = dead. |
te.index |
A numeric vector contains the indices of columns in X corresponds to the test samples. |
minclus |
The minimum number samples allowed to form a cluster for the test set. This is to avoid returning tiny clusters and reduce the effect of outliers. |
te.surv.time |
An optional vector contains follow-up time of patients in the test set. If supplied with te.status, the logrank test p-value is calculated for the test set. |
te.status |
An optional vector contains survival status of patients in the test set. If supplied with te.surv.time, the logrank test p-value is calculated for the test set. |
method |
Type of methods to use in assigning test samples to one of the clusters in the partition. Must be either Ward distance 'ward' or Harrel's concordance index 'conc' (default). |
maxmiss |
Maximum percentage of missing values per row in X |
plot.it |
If TRUE and follow-up data of the test samples are given, Kaplan Meier curve(s) will be generated for each cluster in the test set. |
... |
Arguments for |
User has two options to assign test set to one of the clusters in the partition. One option is to use the Ward distance. Specifically, an average distance is calculated between a test sample and samples in each cluster in the partition, separately. The test sample is assigned to a cluster for which average distance is the smallest. Follow-up data is not required for this option.
Second option is to use the Harrel's concordance index (Harrel et al., 1982). For this option both main and follow-up data corresponds to the given partition are required. Main data is used to find the pseudo nearest neighbours (PNN) of a test sample (Obulkasim et a., 2011), and follow-up data is used to check how much PNN's follow-up info is concordant with follow-up info of samples in each cluster. The test sample is assigned to a cluster for which average concordance is the highest.
Before selecting either one of the options, we recommend user to check the correlation between main data and follow-up info (e.g. using global test). If correlation is relatively large, we recommend to use 'conc' option, and vice versa.
If plot.it is FALSE, function returns a vector of predicted cluster labels of the test set. If TRUE and follow-up data of the test set are given, function returns a list object contains following components:
St |
a data frame with following five columns:
|
value |
logrank test p-value for the test set |
Askar Obulkasim
Obulkasim,A. et al., (2013). "Semi-supervised adaptive-height snipping of the Hierarchical Clustering tree", submitted.
Harrel,E.F. et al., (1982). "Evaluating the yield of medical tests", JAMA, 247, 2543-2546.
Obulkasim,A. et al., (2011). "Stepwise classification of cancer samples using clinical and molecular data", BMC Bioinformatics, 12, 422.
Troyanskaya,O. et al., (2001). "Missing value estimation methods for DNA microarrays". Bioinformatics, 17, 520-525.
1 2 3 4 5 6 7 | data(BullingerLeukemia)
attach(BullingerLeukemia)
cl <- HCsnipper(em[, 1:30], min = 5)
cl <- cl$partitions[cl$id, ]
result <- cluster_pred(X = em[, 1:50], partition = cl[1, ], surv.time = surv.time[1:30],
status = status[1:30], te.index = 31:50)
names(result)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.