MyClustering: Semi-Supervised Clustering Algorithm

Description Usage Arguments

Description

Following the standard K-Means algorithm, we start by assigning each sample a random cluster value. Howewver, the known control values are assigned their own dosage. Then this algorithm performs consistent iterations of computing the mean of each cluster and then relabeling each sample to it's closest cluster.

Usage

1
2
MyClustering(data, nClust = 5, Labels = rep(NA, nrow(X)), max.iter = 1000,
  converge.threshold = 0.05, outlier.threshold = 2, get.Prob = TRUE)

Arguments

data

Dataset used for clustering. Must be an n x p numeric matrix; where n is the number of samples and p is the number of features.

nClust

Expected total number of clusters (i.e the possible dosage values). Default is 5 for tetraploids.

Labels

The initial labels of all the data. Unknown samples are marked NA, while the controls are dosage +1 (since 0 is invalid).

max.iter

Max number of iterations allowed to run for this algorithm. If set too low a warning will occur.

converge.threshold

If the percent change between two iterations is within this value then the algorithm will stop. Makes this more effecient.

outlier.threshold

Used to determine the probability values. Samples outside with standard deviation larger than this value will have very low probability and will be thrown out in determining the remaining samples' probability.

get.Prob

TRUE/FALSE value indicating if probabilities are desired alongside dosage calls. Defaults to TRUE, but use FALSE for more efficiency if probability is not needed.f


dsherma7/PolyploidDosageCalling documentation built on May 23, 2019, 6:06 p.m.