ssClust: ssClust
In Noobivsho/ssClust: GMM for semi-supervised case a-la Mclust

Description Usage Arguments Value Author(s) References Examples

View source: R/ssClust.R

Performs GMM using some supervision

ssClust(X, knownLabels = NULL, trueLabels = NULL,
                 knownCannotLink = NULL, cannotLinkWithIdx = NULL,
                 Grange = 2, modelNames = c("VVV"), runParallel = TRUE,
                 fracOfCores2Use = 1, initClassAssignments = NULL,
                 initializationStrategy = "kpp",
         penalizeSupervised=T)

`X`	data
`knownLabels`	vector of indices of rows of x whose labels are known
`trueLabels`	length nrow(x) vector of true labels (only trueLabels[knownLabels] matter)
`knownCannotLink`	vector of indices of data with cannot link with constraints
`cannotLinkWithIdx`	cannotLinkWithIdx[[“i"]] = data i cannot link with these indices (in vector) ex: Suppose cannotLinkWithIdx[[“2"]]=c(1,3). Then, 2 cannot link with 1 and 3. Constructed with, for example: knownCannotLink = c(3,4,5,6,7,8) cannotLinkWithIdx = new.env() for(i in knownCannotLink) cannotLinkWithIdx[[as.character(i)]]=c(1,2)
`Grange`	number of clsuters considered
`modelNames`	models considered
`runParallel`	boolean for whether to run in parallel
`fracOfCores2Use`	Fraction of all cores to use (if in parallel)
`initClassAssignments`	Hard initial class assignment which overrides the default hiearchical clustering initialization scheme.
`initializationStrategy`	Strategy to initialize cluster labels for EM algorithm. Currently only supports kpp for semi-supervised k-means++
`penalizeSupervised`	Boolean for whether or not to include supervised data in the penalty term in the BIC. Defaults to TRUE.

`selectedModel`	Model selected (num components and covariance constraints)
`BIC`	Bayesian Information Criteria values for all models, adjusted for alternative penalty (dont use this number unless you know what you're doing))
`BICunadjusted`	Traditional Bayesian Information Criteria for all models
`allModels`	BIC and likelihood for all models
`z`	posterior probabilities of class memberships
`parameters`	winning model's parameters, like Mclust's parameters. Item:pro–mixing parameters Item: mean–means Item: variance–variance of the form mclustVariance
`classes`	vector of class assignments for winning model
`numParams`	number of parameters estimated for all models
`loglik`	loglik of winning model
`initPart`	initial partition
`modelsClasses`	matrix of class assignments for all models considered (generally ignore this)

Jordan Yoder

http://www.stat.washington.edu/mclust/

library('MASS')
library('mclust')

########### GENERATE DATA#######
n = 500 #total number of data points
C = 3 #total number of clusters
c = 5 #total number of labeled data point
meanOfControlCluster=-1
varOfControlCluster = 1/5
trueLabels = sort(sample(1:C,n,replace=TRUE)-1) #equal prior proportions

#generate the data
tableLabel = table(trueLabels)
cumSumTableLabel = cumsum(tableLabel)
cumSumTableLabel = c(0,cumSumTableLabel)
dataDim = 5
X = matrix(0, n,dataDim)
#save the true parameters
trueMeans = NULL
trueCovMat = NULL
spreadFactor=5 
trueMeans[[1]]=rep(meanOfControlCluster,dataDim)
trueMeans[[2]]=rep(1,dataDim)
trueMeans[[3]]=rep(-1,dataDim)
trueCovMat[[1]]=diag(varOfControlCluster,dataDim)
trueCovMat[[2]]=diag(1,dataDim)
trueCovMat[[3]]=diag(1,dataDim)
for(i in 1:C){
  X[(cumSumTableLabel[i]+1):(cumSumTableLabel[i+1]),] = 
  mvrnorm(n=tableLabel[i],mu=trueMeans[[i]],Sigma=trueCovMat[[i]])
}

##### CLUSTER USING ssClust and Mclust ######
mclust.out <- Mclust(X,G = 2:5, modelNames = c('VVV','EEE','VVI','VII') )
ssClust.out<-ssClust(X,knownLabels=1:c,trueLabels=trueLabels,Grange=2:5,
modelNames=c('VVV','EEE','VVI','VII'), runParallel=FALSE)
retVal <- c(adjustedRandIndex(mclust.out$cl[-(1:c)],trueLabels[-(1:c)]),
            adjustedRandIndex(ssClust.out$cl[-(1:c)],trueLabels[-(1:c)]))
retVal <- ifelse(is.nan(retVal),-1,retVal)