Description Usage Arguments Details Value Author(s) References See Also Examples
This function performs propensity clustering that assigns objects (or nodes) in a network to clusters such that the resulting Cluster and Propensitybased Approximation (CPBA) of the input adjacency matrix optimizes a specific criterion. Large data sets on which standard propensity clustering may take too long are first optionally split into smaller blocks. Propensity clustering is then applied to each block, and the clustering is used for the final CPBA decomposition.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16  propensityClustering(
adjacency,
decompositionType = c("CPBA", "Pure Propensity"),
objectiveFunction = c("Poisson", "L2norm"),
fastUpdates = TRUE,
blocks = NULL,
initialClusters = NULL,
nClusters = NULL,
maxBlockSize = if (fastUpdates) 5000 else 1000,
clustMethod = "average",
cutreeDynamicArgs = list(deepSplit = 2, minClusterSize = 20,
verbose = 0),
dropUnassigned = TRUE,
unassignedLabel = 0,
verbose = 2,
indent = 0)

adjacency 
Adjacency matrix of the network: a square, symmetric, nonnegative matrix giving the connection strengths between pairs of nodes. Missing data are not allowed. 
decompositionType 
Decomposition type. Either the full CPBA (Cluster and PropensityBased Approximation) or pure propensity, which is a special case of CPBA when all nodes are in a single cluster. 
objectiveFunction 
Objective function. Available choices are 
fastUpdates 
Logical: should a fast, "approximate", propensity clustering method be used? This option is recommended unless the number of nodes to be clustered is small (less than 500). The fast updates may lead to slightly inferior results but are orders of magnitude faster for larger data sets (above say 500 nodes). 
blocks 
Optional specification of blocks. If given, must be a vector with length equal the number of columns in

initialClusters 
Optional specification of initial clusters. If given, must be a vector with length equal the number of
columns in

nClusters 
Optional specification of the number of clusters. Note that specifying 
maxBlockSize 
Maximum block size. 
clustMethod 
Hierarchical clustering method. Recognized options are "average", "complete", and "single". 
cutreeDynamicArgs 
Arguments (options) for the 
dropUnassigned 
Logical: should unassigned nodes be excluded from the clustering? Unassigned nodes
can be present in initial clustering or blocks (if given), and internal prepartitioning and initial
clustering can also lead to unassigned nodes. If 
unassignedLabel 
Label in input 
verbose 
Level of verbosity of printed diagnostic messages. 0 means silent (except for progress reports from the underlying propensity clustering function), higher values will lead to more detailed progress messages. 
indent 
Indentation of the printed diagnostic messages. 0 means no indentation, each unit adds two spaces. 
If initialClusters
are not given, they are determined from the adjancency in one of the following
two ways: if
nClusters
is not specified, the initialization uses hierarchical
clustering followed by the Dynamic Tree Cut (see cutreeDynamic
). Arguments and
options for the cutreeDynamic
can be specified using the argument
cutreeDynamicArgs
. Some nodes may be left unassigned and their handling is described below.
If nClusters
is specified, an internal initialization algorithm based on
connectivities is used. This second algorithm assigns all nodes to a cluster.
If dropUnassigned
is TRUE
, nodes left unassigned by the clustering procedure are excluded from
the following calculations. If dropUnassigned
is FALSE
, nodes left unassigned by the
clustering procedure are assigned to their nearest cluster, using the clustering dissimilarity measure
specified in clustMethod
.
In the next step, if the total number of nodes exceeds maximum block size, the initial clusters (either
given or those automatically determined by hierarchical clustering) are split into blocks.
Clusters bigger than maximum block size
maxBlockSize
are put
into separate blocks (one cluster per block). Clusters smaller than maximum block size are placed into
blocks such that the block size does not exceed maxBlockSize
and such that clusters with high
betweencluster adjacency are placed in the same block, if possible. The betweencluster adjacency is
consistent with clustMethod
.
Note that for the purposes of splitting data into blocks, hierarchical clustering is always used. If the internal initialization of clusters is used, it is applied within each block and idependently of all other blocks.
Next, propensity clustering is applied to each block. More precisely, propensity clustering is applied to the subset of nodes in each block that is assigned to an initial cluster. Some nodes may not be assigned to initial clusters and these nodes are excluded from propensity clustering.
Once propensity clustering on all blocks is finished, propensity decomposition is calculated on the entire network (excluding unassigned nodes).
List with the following components:
Clustering 
The final clustering. A vector of length equal to the number of nodes (columns in

Propensity 
Propensities (or conformities) of each node. 
NodeWasConsidered 
Logical vector with one entry per node. 
IntermodularAdjacency 
Intermodular adjacencies or the conformities between clusters. 
Factorizability 
Factorizability of the data. 
L2Norm or Loglik 
The L2 Norm or the loglikelihood depending on l2bool. 
MeanValues 
A distance structure representing the lower triangle of the symmetric matrix of estimated values of the adjacency matrix using the Propensity and IntermodularAdjacency. If the Poisson updates are used, the returned values are the estimate means of the distribution. 
TailPvalues 
A distance structure representing the lower triangle of the symmetric matrix of the tail probabilities under the Poisson distribution. 
Blocks 
Blocks. A vector with one component for each node giving the block label for each node. The blocks are labeled 1,2,3,... 
InitialClusters 
The initial clusters. A copy of the input if given, otherwise the automatically determined initial clutering. 
InitialTree 
The hierarchical clustering dendrogram (tree) used to determine initial clusters. Only present if the initial clusters were not supplied by the user. 
John Michael Ranola, Peter Langfelder, Kenneth Lange, Steve Horvath
Ranola et. al. (2010) A Poisson Model for Random Multigraphs. Bioinformatics 26(16):20042001. Ranola JM, Langfelder P, Lange K, Horvath S (2013) Cluster and propensity based approximation of a network. MC Syst Biol. 2013 Mar 14;7:21. doi: 10.1186/17520509721.
CPBADecomposition
for propensity decomposition;
hclust
for the hierarchical clustering function,
cutreeDynamic
for the dynamic tree cut to identify clusters in a dendrogram
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20  # Simulate 50 nodes in 5 clusters
nNodes=50
nClusters=5
# We would like to use L2Norm instead of Loglikelihood
objective = "L2norm"
ADJ<matrix(runif(nNodes*nNodes),ncol=nNodes)
ADJ = (ADJ + t(ADJ))/2;
diag(ADJ) = 0;
results<propensityClustering(
adjacency = ADJ,
objectiveFunction = objective,
initialClusters = NULL,
nClusters = nClusters,
fastUpdates = FALSE)
table(results$Clustering)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.