# Function to assign new samples to one of the two given hierarchical clustering trees in a semi-supervised way

### Description

For given molecular data sets from two non-overlapping groups of patients, this functions constructs two independent HC trees and assigns new samples to one of them in semi-supervised way. See details.

### Usage

1 2 3 | ```
TwoHC_assign(X, index1, index2, new.X, dis.method = "cor", link.method = "ward",
minclus = 4, maxmiss = 30, surv.time, status, method1 = "BIC",
method2 = "g2")
``` |

### Arguments

`X` |
An object of class |

`index1` |
Column indices of patients in |

`index2` |
Column indices of patients in |

`new.X` |
An object of class |

`dis.method` |
The distance measure to be used. This must be one of method acceptable for |

`link.method` |
The agglomeration method to be used. This should be one of "ward" (default), "single", "complete", "average", "mcquitty", "median" or "centroid". |

`minclus` |
The minimum number of samples allowed to form a cluster. This parameter inversely proportional to the number of partition returned from a HC tree. e.g. a large value returns small number of partitions, and vice versa. |

`maxmiss` |
Maximum percentage of missing values per row in |

`surv.time` |
A numeric vector contains follow-up information of patient's in |

`status` |
A binary vector contains survival status of patients in |

`method1` |
Type of partition evaluation measures to use for assessing the relationship between follow-up and a partition. Default is "BIC". |

`method2` |
Type of Partition evaluation measure to use for assessing the relationship between data matrix |

### Details

Say molecular profiles of two groups patients (without overlap) treated with two different drugs or the same drugs in different combinations are available. Besides that, their follow-up information are also given. When a new patient comes in (for which only molecular profiles are available), question will be to which group this patient should be assigned so that he/she will benefit most by the type of treatment this group received.

This function is designed for this problem. it works as follows: first, two independent HC trees will be derived from given data; second, partitions are extracted and the optimal partition is selected from each HC tree, separately; third, new patient's molecular profile is compared with each cluster in each optimal partition to calculate average similarity and identify two most similar clusters (competing clusters) fromt the two HC trees; finally, new sample is assigned to one of the two competing clusters which has better overall survival.

### Value

A list object contains following components:

`hc1` |
HC tree derived from the data corresponds to the first group. |

`hc2` |
HC tree derived from the data corresponds to the second group. |

`partitions.hc1` |
A matrix includes partitions extracted from |

`partitions.hc2` |
A matrix includes partitions extracted from |

`best.hc1` |
Optimal partition found on the |

`best.hc2` |
Optimal partition found on the |

`score.hc1` |
A matrix with two columns. The first column contains the quality scores of |

`score.hc2` |
The same as |

`Assign` |
A matrix with three columns. The first column contains the indices of HC trees to which a test sample was assigned. The second column contains the indices of clusters in |

`surv.time` |
The same as input |

`status` |
The same as input |

`index1` |
The same as input |

`index2` |
The same as input |

`new.X` |
The same as input |

`X` |
The same as input |

`method1` |
The same as input |

`method2` |
The same as input |

`minclus` |
The same as input |

`id1` |
indices of the partitions obtained from the |

`id2` |
indices of the partitions obtained from the |

### Author(s)

Askar Obulkasim

### References

Harrel,E.F. et al., (1982). "Evaluating the yield of medical tests", *JAMA*, 247, 2543-2546.

Obulkasim,A. et al., (2011). "Stepwise classification of cancer samples using clinical and molecular data", *BMC Bioinformatics*, 12, 422.

Troyanskaya,O. et al., (2001). "Missing value estimation methods for DNA microarrays". *Bioinformatics*, 17, 520-525.

Obulkasim,A. et al., (2013). "Semi-supervised adaptive-height snipping of the Hierarchical Clustering tree", submitted.

### See Also

See also `TwoHC_perm`

, `cluster_pred`

### Examples

1 2 3 4 5 6 7 8 | ```
data(TcgaGBM)
attach(TcgaGBM)
id1 <- which(drugs == "Avastin")
id2 <- which(drugs == "Temodar")
result <- TwoHC_assign(X = em[ ,c(id1[1:30], id2[1:30])], index1 = 1:30, index2 = 31:60,
new.X = em[, c(id1[31:60], id2[31:60])], minclus = 4,
surv.time = surv.time[c(id1[1:30], id2[1:30])],
status = status[c(id1[1:30], id2[1:30])])
``` |