redwalk-package: The redwalk package

Description Community Detection Dissimilarity Shortest Paths Heirarchal Clustering Sample Networks Author(s) References See Also

Description

redwalk is an R package for community detection that uses the igraph package graphs for input and hclust objects from the stats package for outputting the results of the heirarchal clustering. For the main algorithm, see cluster_redwalk.

Community Detection

cluster_redwalk is the main function of the package. It takes an igraph graph and optionally a shortest paths matrix as input and returns an hclust object. Unlike most heirarchal clustering methods on graphs, the dendrograms produced by this method are extremely useful in determining community structure and the number of communities in a network. Time complexity is O(n^2 log n) and real world performance is comparable to that of cluster_walktrap with 4 steps. Notably, we do not attempt to maximize the popular "modularity" measure, although our partitions result in maximum modularity values that are on average among the highest of most commonly used algorithms for the widest variety of network structures.

Dissimilarity

The heart of the algorithm involves computing the average shortest paths of a vertex's neighbors to all other vertices and then considering the dissimilarity between two vertices v_i and v_j to be the minimum of these two distances.

Shortest Paths

The above functions can be called with or without the shortest paths provided. If shortest paths are not provided, they will be calculated on each function call. For faster performance, precomputing all pairs shortest paths for a network will provide much faster results, as a breadth first search for every vertex has time complexity O(n^2 + nm). Although igraph provides functions for calculating these values, this package includes an equivalent function, shortest_path_lengths which is optimized for the APSP problem on undirected, unweighted graphs.

Heirarchal Clustering

redwalk uses hclust objects to return information about community structure rather than igraph communities object due to the fact that the igraph communities make heavy use of modularity which the algorithm do not make use of in attempting to find community structure. The clustering algorithm simply performs a heirarcal clustering on the dissimilarity matrix described above using the average agglomeration method . The resulting clustering can be viewed as a dendrogram using the plot.hclust method. The resulting dendrograms provide great insight into the community structure of the network and generally give a reasonable cut location given a community structure exists in the network.

Sample Networks

Several sample networks are included in this package, all of which have known ground truths. The actual membership for each vertex in a sample network is stored for each vertex as an igraph vertex attribute named "membership" (see link{vertex_attr} for more information). The full ground truth can be retrieved using V(graph)$membership. All of the provided sample networks are cited and contain sources.

karate, dolphins, football, polblogs, polbooks, davisclub

Author(s)

Dr. Kenneth Berenhaut berenhks@wfu.edu, Peter S. Barr barrps12@wfu.edu, Alyssa M. Kogel kogeam11@wfu.edu

References

Need some references here, gotta be all official and what not. Probably will cite ourselves when published as well.

Wake Forest University, Department of Mathematics and Statistics: http://college.wfu.edu/math

Wake Forest University, Department of Computer Science: http://college.wfu.edu/cs

See Also

cluster_redwalk; shortest_path_lengths;


barrpet/gclust documentation built on May 11, 2019, 6:23 p.m.