Following the standard K-Means algorithm, we start by assigning each sample a random cluster value. Howewver, the known control values are assigned their own dosage. Then this algorithm performs consistent iterations of computing the mean of each cluster and then relabeling each sample to it's closest cluster.
1 2 |
data |
Dataset used for clustering. Must be an n x p numeric matrix; where n is the number of samples and p is the number of features. |
nClust |
Expected total number of clusters (i.e the possible dosage values). Default is 5 for tetraploids. |
Labels |
The initial labels of all the data. Unknown samples are marked NA, while the controls are dosage +1 (since 0 is invalid). |
max.iter |
Max number of iterations allowed to run for this algorithm. If set too low a warning will occur. |
converge.threshold |
If the percent change between two iterations is within this value then the algorithm will stop. Makes this more effecient. |
outlier.threshold |
Used to determine the probability values. Samples outside with standard deviation larger than this value will have very low probability and will be thrown out in determining the remaining samples' probability. |
get.Prob |
TRUE/FALSE value indicating if probabilities are desired alongside dosage calls. Defaults to TRUE, but use FALSE for more efficiency if probability is not needed.f |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.