Description Community Detection Dissimilarity Shortest Paths Heirarchal Clustering Sample Networks Author(s) References See Also
redwalk is an R package for community detection that uses the
igraph package graphs for input and hclust
objects
from the stats package for outputting the results of the
heirarchal clustering. For the main algorithm, see
cluster_redwalk
.
cluster_redwalk
is the main function of the package. It
takes an igraph graph and optionally a shortest paths matrix as input and
returns an hclust object. Unlike most heirarchal clustering methods on
graphs, the dendrograms produced by this method are extremely useful in
determining community structure and the number of communities in a network.
Time complexity is O(n^2 log n) and real world performance is comparable to
that of cluster_walktrap
with 4 steps. Notably, we do not
attempt to maximize the popular "modularity" measure, although our
partitions result in maximum modularity values that are on average among
the highest of most commonly used algorithms for the widest variety of
network structures.
The heart of the algorithm involves computing the average shortest paths of a vertex's neighbors to all other vertices and then considering the dissimilarity between two vertices v_i and v_j to be the minimum of these two distances.
The above functions can be called with or without the shortest paths
provided. If shortest paths are not provided, they will be calculated on
each function call. For faster performance, precomputing all pairs shortest
paths for a network will provide much faster results, as a breadth first
search for every vertex has time complexity O(n^2 + nm). Although
igraph provides functions for calculating these values, this package
includes an equivalent function, shortest_path_lengths
which is optimized for the APSP problem on undirected, unweighted graphs.
redwalk uses hclust
objects to return information about
community structure rather than igraph communities
object due
to the fact that the igraph communities make heavy use of modularity which
the algorithm do not make use of in attempting to find community structure.
The clustering algorithm simply performs a heirarcal clustering on the
dissimilarity matrix described above using the average agglomeration method
. The resulting clustering can be viewed as a dendrogram using the
plot.hclust
method. The resulting dendrograms provide great
insight into the community structure of the network and generally give a
reasonable cut location given a community structure exists in the network.
Several sample networks are included in this package, all of which have
known ground truths. The actual membership for each vertex in a sample
network is stored for each vertex as an igraph vertex attribute named
"membership" (see link{vertex_attr}
for more information). The full
ground truth can be retrieved using V(graph)$membership
. All of the
provided sample networks are cited and contain sources.
karate
, dolphins
, football
,
polblogs
, polbooks
, davisclub
Dr. Kenneth Berenhaut berenhks@wfu.edu, Peter S. Barr barrps12@wfu.edu, Alyssa M. Kogel kogeam11@wfu.edu
Need some references here, gotta be all official and what not. Probably will cite ourselves when published as well.
Wake Forest University, Department of Mathematics and Statistics: http://college.wfu.edu/math
Wake Forest University, Department of Computer Science: http://college.wfu.edu/cs
cluster_redwalk
;
shortest_path_lengths
;
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.