computeGraphClusters: Pefrom graph clustering of various types.

Description Usage Arguments Value Examples

Description

Graph clustering (or decomposition) divides graph into set of subgraphs that span whole graph. Depending on the type argument the subgraphs coudl be either non-intersecting or overlapping. Available types of decomposition include finding connected componenets, modularity clustering.

Usage

1
2
3
4
5
computeGraphClusters(channel, graph, type = "connected",
  createMembership = FALSE, includeMembership = FALSE, weight = FALSE,
  vertexWhere = graph$vertexWhere, edgeWhere = graph$edgeWhere,
  distanceTableName = NULL, membershipTableName = NULL, schema = NULL,
  allTables = NULL, test = FALSE, ...)

Arguments

channel

connection object as returned by odbcConnect.

graph

an object of class 'toagraph' referencing graph tables in Aster database.

type

specifies type of clustering or community detection to perform.

createMembership

logical indicates if vertex cluster membership table should be created (see membershipTableName). Currently, you must set it to TRUE if cluster membership data (see includeMembership) expected in the result. Also, required if operations that create graphs corresponding to some of the clusters to be performed later.

includeMembership

logical indicates if result should contain vertex cluster membership information. Currently, only supported when createMembership is TRUE. WARNING: including cluster membership may result in very large data set returned from Aster into memory.

weight

logical or character: if logical then TRUE indicates using 'weight' edge attribute, otherwise no weight used. If character then use as a name for the edge weight attribute. The edge weight may apply with types 'clustering', 'shortestpath' and centrality measures.

vertexWhere

optionally, a SQL WHERE clause to subset vertex table. When not NULL it overrides vertexWhere condition from the graph.

edgeWhere

optionally, a SQL WHERE clause to subset edge table. When not NULL it overrides edgeWhere condition from the graph.

distanceTableName

this table will contain distances between vertices (or other corresponding metrics associated with community detection algorithm chosen). By default, random table name that begins with toa_temp_graphcluster_distance is generated.

membershipTableName

when createMembership is TRUE then this table will contain vertex cluster membership information. By default, random table name that begins with toa_temp_graphcluster_membership is generated. This argument is ignored when createMembership is FALSE.

schema

name of Aster schema for the table name arguments distanceTableName and membershipTableName. There are two distinct approaches to providing table names: one that uses explicity schema name using this argument and another when table names already contain schema followed by dot and table name. The latter method is not applicable when generating randon table name with schema.

allTables

pre-built information about existing tables.

test

logical: if TRUE show what would be done, only (similar to parameter test in RODBC functions: sqlQuery and sqlSave).

...

other arguments passed on to Aster graph functions except for EDGEWEIGHT argument - use argument weight instead. Aster function areguments are not case-sensetive.

Value

computeGraphClusters returns an object of class "toacommunities" (compatible with both class "communities" and the value returned by clusters - all from the package igraph). It is a list with the following components:

membership

numeric vector giving the cluster (component or community) id to which each vertex belongs.

csize and sizes

numeric vector giving the sizes of the clusters.

no and length

numeric constant, the number of clusters.

algorithm

gives the name of the algorithm that was used to calculate the community structure.

id

integer vector of cluster ids from 1 to number no.

componentid

character vector of cluster names (or component ids) where names are derived from the cluster elements and naming convention differs for each community type.

distance

numeric vector of average distances within clusters.

diameter

numeric vector of the maximum distances with clusters.

graph

original graph object that identifies a graph for which clusters are crated.

weight

see argument weight above.

vertexWhere

see argument vertexWhere above.

edgeWhere

see argument edgeWhere above.

distanceTableName

Aster table name containing graph distances (applies to connected components only).

membershipTableName

(optional) Aster table name containing graph vertex to cluster memberships.

time

An object of class proc_time with user, system, and total elapsed times for the computeGraphClusters function call.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
if(interactive()) {

# undirected graph
policeGraphUn = toaGraph("dallaspolice_officer_vertices", "dallaspolice_officer_edges_un", 
     directed = FALSE, key = "officer", source = "officer1", target = "officer2", 
     vertexAttrnames = c("offense_count"), edgeAttrnames = c("weight"))
     
communities = computeGraphClusters(conn, policeGraphUn, type="connected", 
                                   createMembership = TRUE, includeMembership = TRUE,
                                   distanceTableName = "public.shortestpathdistances",
                                   membershipTableName = "public.clustermembership")
                                   
# get first 5 largest connected components as graphs
cluster_graphs = computeGraphClustersAsGraphs(conn, communities = communities, ids = 1:5)

# visualize component 2
library(GGally)
ggnet2(cluster_graphs[[2]], node.label="vertex.names", node.size="offense_count", 
       node.color="color", legend.position="none")

# compute connected components for certain type of subgraph that 
# includes only verteics that start with the letters
communities2 = computeGraphClusters(conn, policeGraphUn, type="connected", membership = TRUE,
                                    distanceTableName = "public.shortestpathdistances",
                                    vertexWhere = "officer ~ '[A-Z].*'", 
                                    edgeWhere = "weight > 0.36")
}

teradata-aster-field/toaster documentation built on May 31, 2019, 8:36 a.m.