FBN.kmeans: K-Means clustering of SNP microarray data

View source: R/FBN.kmeans.R

FBN.kmeansR Documentation

K-Means clustering of SNP microarray data

Description

Performs a k-means clustering of SNP microarray data. Returns clusters of values as being putatively characterized by different CN.

Usage

FBN.kmeans(inputData, minSpan, breaksData)
FBN.kmeans(inputData = NULL, minSpan = 0.2, breaksData = NULL)

Arguments

inputData

A vector of values containig the SNP microarray data

minSpan

The minimum distance separating consecutive local maxima that are to be detected on the histogram of the inputData. These maxima are used to initialize the k-means clustering process. For more details concerning the local maxima detection, check the documentation of FBN.histogramMaxima

breaksData

One of:

  • a vector giving the breakpoints between histogram cells,

  • a single number giving the number of cells for the histogram,

  • a character string naming an algorithm to compute the number of cells (see Details section of hist),

  • a function to compute the number of cells.

Details

This fuction takes as input the vector of raw SNP microarray values, and perform a k-means clustering trying to identify the groups of raw values characterized by different CNs. The clustering process is initialized with the local maxima detected on the histogram of the input data (see the documentation of FBN.histogramMaxima). To increase the robustness of the clustering process and to remove possible small or noisy clusters, a double filtering is done: firstly, removing those clusters populated by less than 1% values from the entire inputData, and then, due to putative noisy histograms, merging those clusters whose centers are closer than 0.2 in nominal values.

Value

An object of class kmeans

Author(s)

Adrian Andronache adi.andronache@gmail.com
Luca Agnelli luca.agnelli@gmail.com

Examples

	
require(stats)
require(graphics)
x = c(rnorm(1000, 1, .2), rnorm(1000, 2, .2))
y = FBN.kmeans(x, minSpan = .001)
h = hist(x)
par(new = TRUE)
plot(y$centers,vector(mode=mode(y$centers), length = length(y$centers)), 
	xlim = c(min(h$breaks), max(h$breaks)), ylim = c(0,max(h$counts)), 
	xlab = NA, ylab = NA, col = 'red' )
	

FBN documentation built on July 9, 2023, 5:18 p.m.