findPolyploidClusters: K-means clustering

Description Usage Arguments Details Value Note Author(s) See Also Examples

View source: R/beadarrayMSV.R

Description

Wrapper for kmeans, allows samples of low presicion to be left out from the clustering and subsequently assigned to clusters

Usage

1
2
findPolyploidClusters(X, indSE = rep(TRUE, nrow(X)), centers,
    plot = FALSE, wss.update = TRUE, ...)

Arguments

X

Matrix with data for a single marker to be clustered, with three columns holding “theta”, “intensity”, and “SE” vectors (in that order) as from the assayData slot of an "AlleleSetIllumina" object

indSE

Logical vector of indexes to samples on which to base the clustering

centers

Numeric vector with “theta” starting values for the clustering

plot

If TRUE, histogram with bins encompassing the initial centre points is plotted

wss.update

The within-cluster sums of squares are returned from kmeans but not actually used in the genotype calling. If FALSE, time is saved by not recalculating the sums of squares after including initially left-out samples in the clusters

...

Additional arguments to hist

Details

Usually called from within the function callGenotypes or relatives. There the column of intensities is scaled with twice its median value times a scaling factor “rPenalty” (see setGenoOptions) to ensure (by default) relatively higher weight to the “theta” dimension during clustering.

All samples left out from the clustering are subsequently incorporated into the clusters. By leaving out samples of low precision, the resulting clusters may be more accurate.

Value

Object of class "kmeans"

Note

The “Hartigan-Wong” algorithm (see kmeans) is used by default, however this method returns an error if no points are closest to one or more centres. If such an error is returned it will be catched, and a second attempt at clustering will be performed using the “MacQueen” algorithm. A warning will be issued in those cases

Author(s)

Lars Gidskehaug

See Also

callGenotypes, getCenters, kmeans

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
## Not run: 
#Read pre-processed data directly into AlleleSetIllumina object
rPath <- system.file("extdata", package="beadarrayMSV")
normOpts <- setNormOptions()
dataFiles <- makeFilenames('testdata',normOpts,rPath)
beadFile <- paste(rPath,'beadData_testdata.txt',sep='/')
beadInfo <- read.table(beadFile,sep='\t',header=TRUE,as.is=TRUE)
BSRed <- createAlleleSetFromFiles(dataFiles[1:4],markers=1:10,beadInfo=beadInfo)

#Generate list of marker categories
gO <- setGenoOptions()
polyCent <- generatePolyCenters(ploidy=gO$ploidy)
print(polyCent)

#Estimate list of likely center points for an MSV-5 marker
ind <- 2
dev.new(); par(mfrow=c(3,1),mai=c(.5,.5,.5,.1))
polyCl <- findClusters(assayData(BSRed)$theta[ind,],
    breaks=seq(-0.25,1.25,0.04),plot=TRUE)
print(polyCl)

#Clustering using all samples
sclR <- median(assayData(BSRed)$intensity[ind,],na.rm=TRUE)*ind*gO$rPenalty
X <- matrix(cbind(assayData(BSRed)$theta[ind,],
                  assayData(BSRed)$intensity[ind,]/sclR,
                  assayData(BSRed)$SE[ind,]),ncol=3)
clObj <- findPolyploidClusters(X,centers=polyCl$clPeaks,plot=TRUE)
plot(X[,1],X[,2],col=clObj$cluster)
print(clObj)

## End(Not run)

beadarrayMSV documentation built on May 1, 2019, 6:33 p.m.