scanK: Run k-medoid algorithm with varying k on similarity matrix

Description Usage Arguments Value Author(s) Examples

View source: R/scanK.R

Description

Run k-medoid algorithm with varying k on similarity matrix

Usage

1
scanK(SimiMatIn, quan=.95,cut=NULL, maxK=NULL,minSize=0, maxSize=200, fixK=NULL, rawscale=FALSE)

Arguments

SimiMatIn

gene-by-gene similarity matrix

quan

only gene pairs with similarity score >= quan th quantile will be considered in the cluster analyses. Default is 0.95.

cut

pre-defined cutoff. Gene pairs with similarity score >= cut will be considered in cluster analyses. If cut is defined, quan will be ignored.

maxK

max number of clusters to consider (scan). if numbC=NULL, it will be calculated as [number of gene considered]/10.

minSize,maxSize

Only clusters with minSize<= cluster size <= maxSize are reported in output.

fixK

if fixK is specified, the k-medoids algorithm will be applied with fixK clusters.

rawscale

Recall the input is the similarity matrix (-log10(distance from the sine model)). the k-medoids clustering will be applied using (-Input) as distance. If rawscale is defined as TRUE, the k-medoids clustering will be applied using -10^Input as distance.

Value

scanK() function runs k-medoid clustering with varying number of clusters (k). The k is varied from 2 to maxK. The input of scanK() function should be a similarity matrix. scanK() function will cluster genes in gene pairs with high similarity score (the threshold can be defined using parameter quan). To select the top genes, the function first calculate the max similarity score for each gene, then select the genes with high max score.

The output object is a list with 4 sublists: membOut: members in each cluster. clusters are sorted by median similarity score within cluster;

MedCor: median similarity score for each cluster;

Mat: input similarity matrix;

filteredMat: similarity matrix, only showing the top genes used in clustering;

Kcluster: cluster indicator of each top gene.

Author(s)

Ning Leng

Examples

1
2
3
4
5
6
7
8
aa <- sin(seq(0,1,.1))
bb <- sin(seq(0.5,1.5,.1))
cc <- sin(seq(0.9,1.9,.1))
tmp <- matrix(sin(rnorm(330)),ncol=11)
rownames(tmp) <- paste0("tmp",1:30)
Dat <- rbind(aa, bb, cc, tmp)
res1 <- OscopeSine(Dat)
res2 <- scanK(res1$SimiMat, quan=.8, maxK=5)

Example output

Loading required package: EBSeq
Loading required package: blockmodeling
To cite package 'blockmodeling' in publications please use package
citation and (at least) one of the articles:

  <U+017D>iberna, Ale<U+0161> (2007). Generalized blockmodeling of valued networks.
  Social Networks 29(1), 105-126.

  <U+017D>iberna, Ale<U+0161> (2008). Direct and indirect approaches to blockmodeling
  of valued networks in terms of regular equivalence. Journal of
  Mathematical Sociology 32(1), 57<U+2013>84.

  ?iberna, Ale? (2018).  Generalized and Classical Blockmodeling of
  Valued Networks, R package version 0.3.4.

To see these entries in BibTeX format, use 'print(<citation>,
bibtex=TRUE)', 'toBibtex(.)', or set
'options(citation.bibtex.max=999)'.
Loading required package: gplots

Attaching package: 'gplots'

The following object is masked from 'package:stats':

    lowess

Loading required package: testthat
Loading required package: cluster
Loading required package: BiocParallel
gene pairs above this threshold are considered:
-0.146166973889729
max number of clusters considered:5
optimal number of clusters:2

Oscope documentation built on Nov. 8, 2020, 7:12 p.m.