findPolyploidClusters: K-means clustering
In beadarrayMSV: Analysis of Illumina BeadArray SNP data including MSV markers

Description Usage Arguments Details Value Note Author(s) See Also Examples

Wrapper for kmeans, allows samples of low presicion to be left out from the clustering and subsequently assigned to clusters

1 2	findPolyploidClusters(X, indSE = rep(TRUE, nrow(X)), centers, plot = FALSE, wss.update = TRUE, ...)

`X`	Matrix with data for a single marker to be clustered, with three columns holding “theta”, “intensity”, and “SE” vectors (in that order) as from the `assayData` slot of an `"AlleleSetIllumina"` object
`indSE`	Logical vector of indexes to samples on which to base the clustering
`centers`	Numeric vector with “theta” starting values for the clustering
`plot`	If `TRUE`, histogram with bins encompassing the initial centre points is plotted
`wss.update`	The within-cluster sums of squares are returned from `kmeans` but not actually used in the genotype calling. If `FALSE`, time is saved by not recalculating the sums of squares after including initially left-out samples in the clusters
`...`	Additional arguments to `hist`

Usually called from within the function callGenotypes or relatives. There the column of intensities is scaled with twice its median value times a scaling factor “rPenalty” (see setGenoOptions) to ensure (by default) relatively higher weight to the “theta” dimension during clustering.

All samples left out from the clustering are subsequently incorporated into the clusters. By leaving out samples of low precision, the resulting clusters may be more accurate.

Object of class "kmeans"

The “Hartigan-Wong” algorithm (see kmeans) is used by default, however this method returns an error if no points are closest to one or more centres. If such an error is returned it will be catched, and a second attempt at clustering will be performed using the “MacQueen” algorithm. A warning will be issued in those cases

Lars Gidskehaug

callGenotypes, getCenters, kmeans

## Not run: 
#Read pre-processed data directly into AlleleSetIllumina object
rPath <- system.file("extdata", package="beadarrayMSV")
normOpts <- setNormOptions()
dataFiles <- makeFilenames('testdata',normOpts,rPath)
beadFile <- paste(rPath,'beadData_testdata.txt',sep='/')
beadInfo <- read.table(beadFile,sep='\t',header=TRUE,as.is=TRUE)
BSRed <- createAlleleSetFromFiles(dataFiles[1:4],markers=1:10,beadInfo=beadInfo)

#Generate list of marker categories
gO <- setGenoOptions()
polyCent <- generatePolyCenters(ploidy=gO$ploidy)
print(polyCent)

#Estimate list of likely center points for an MSV-5 marker
ind <- 2
dev.new(); par(mfrow=c(3,1),mai=c(.5,.5,.5,.1))
polyCl <- findClusters(assayData(BSRed)$theta[ind,],
    breaks=seq(-0.25,1.25,0.04),plot=TRUE)
print(polyCl)

#Clustering using all samples
sclR <- median(assayData(BSRed)$intensity[ind,],na.rm=TRUE)*ind*gO$rPenalty
X <- matrix(cbind(assayData(BSRed)$theta[ind,],
                  assayData(BSRed)$intensity[ind,]/sclR,
                  assayData(BSRed)$SE[ind,]),ncol=3)
clObj <- findPolyploidClusters(X,centers=polyCl$clPeaks,plot=TRUE)
plot(X[,1],X[,2],col=clObj$cluster)
print(clObj)

## End(Not run)