README.md

SpecClustPack

A bunch of R functions related to spectral clustering.

Installation

The SpecClustPack package can be installed in R directly from GitHub by using devtools.

library(devtools)
install_github("norbertbin/SpecClustPack")

Simulate from Stochastic Blockmodel

blockPMat = matrix(c(.6,.2,.2,.6), nrow=2)
nMembers = c(5,5)

adjMat = simSBM(blockPMat, nMembers)
adjMat
##  10 x 10 sparse Matrix of class "dsCMatrix"
##                         
##  [1,] . . 1 1 1 . . . . 1
##  [2,] . . 1 1 1 . . 1 . .
##  [3,] 1 1 . . . . . . . .
##  [4,] 1 1 . . 1 . . . 1 .
##  [5,] 1 1 . 1 . . . . . .
##  [6,] . . . . . . 1 1 . 1
##  [7,] . . . . . 1 . 1 . 1
##  [8,] . 1 . . . 1 1 . . 1
##  [9,] . . . 1 . . . . . .
## [10,] 1 . . . . 1 1 1 . .

Plot the SBM Probability Matrix

plotSBM(blockPMat, nMembers)

Plot the Simulated Adjacency Matrix

plotAdj(adjMat)

Run Spectral Clustering

By default, the specClust function uses regularized spectral clustering (Qin and Rohe, 2013) with row normalization, but can be adjusted by changing the method and rowNorm parameters.

(clusters = specClust(adjMat, nBlocks = 2))
## [1] 1 1 1 1 1 2 2 2 1 2

Compute the Mis-clustering Rate

The function misClustRate computes the proportion of mis-clustered nodes (up to identifiability) given the cluster sizes.

misClustRate(clusters, nMembers)
## [1] 0.1

Estimate SBM Probabilities

The function estSBM estimates the block probability matrix given the adjacency matrix and the cluster assignments.

estSBM(adjMat, clusters)
##            [,1]       [,2]
## [1,] 1.00000000 0.08333333
## [2,] 0.08333333 0.53333333

Simulate Node Covariates

covProbMat = matrix(c(.8,.2,.2,.8), nrow=2)
nMembers = c(5,5)

covMat = simBernCovar(covProbMat, nMembers)
covMat
## [1,] 1 .
## [2,] 1 1
## [3,] 1 .
## [4,] . .
## [5,] 1 1
## [6,] . .
## [7,] . 1
## [8,] . 1
## [9,] 1 1
##[10,] . 1

Covariate-Assisted Spectral Clustering

The required input for the casc function includes an adjacency matrix, adjMat, a node covariate matrix, covMat, and the number of blocks to be recovered, nBlocks. For more details see the documentation.

casc(adjMat, covMat, nBlocks=2)
## $cluster
## [1] 1 1 1 1 1 2 2 2 2 2
##
## $h
## [1] 0.08101691
##
## $wcss
## [1] 0.1789759
##
## $eigenGap
## [1] 0.06532486

Partial Spectral Clustering

The partSpecClust function only runs an eigendecomposition on the adjacency matrix of the highest degree nodes in the network and uses the Nystrom extension to approximate the full eigenvectors (Belabbas and Wolfe, 2009). The approximate eigenvectors are then used for spectral clustering. The parameter subSampleSize specifies how many of the top degree nodes should be used.

(clusters = partSpecClust(adjMat, nBlocks = 2, subSampleSize = 8))
## [1] 1 1 1 1 1 2 2 2 1 2


norbertbin/SpecClustPack documentation built on May 23, 2019, 9:32 p.m.