clustR: Identify common geographies
In aslez/aiR: Areal interpolation and geographic aggregation

Description Usage Arguments Details Value References Examples

clustR identifies common geographies based on patterns of group overlap. Input can take one of two forms: partitions or intersections. A set of partitions is represented by a list of two or more SpatialPolygonsDataFrame objects, each of which is composed of a set of areal units (e.g., counties, census tracts). To identify common geographies clustR calculates the intersection of these partitions. This can be very slow depending on the number of observations and the resolution of the underlying boundary files. Alternatively, intersections can be calculated using a dedicated GIS (e.g., ArcGIS, QGIS) and then passesd to clustR (recommended). Intersections are represented as a single object. This can be either a SpatialPolygonsDataFrame object or a data.frame object. mp_shp and mp_int are internal helper functions used to construct properly formatted membership profiles.

1	clustR(x, nid = NULL, area = NULL, thresh = 0.05)

`x`	Either a list of two or more `SpatialPolygonsDataFrame` objects or a single object depicting the intersection between partitions. Intersections can represented using either an `SpatialPolygonsDataFrame` or a `data.frame`. When intersections are represented as a `SpatialPolygonsDataFrame`, the area of each intersection is calculated on the fly. When intersections are represented as a `data.frame`, the area is included in the `data.frame` in question. WARNING!! PROCEED WITH EXTREME CAUTION WHEN WORKING WITH ORIGINAL PARTITIONS. CHANGES IN THE RGEOS INTERSECTION ROUTINE ARE CAUSING POLYGONS TO BE DROPPED, LEADING TO INCORRECT CLUSTERS.
`nid`	A character vector containing the column names used to identify groups within each partition. This is only required when starting with intersections as opposed to paritions. When starting with partitions, `clustR` will default to polygon IDs unless otherwise indicated through the `nid` argument. Groups should be uniquely identified within partitions.
`area`	A string containing the name of the column containing data on the area of overlap between groups. This is only required when using a single `data.frame` object.
`thresh`	A number between 0 and 1 used to drop ties resulting from spurious polygons. This value represents the area of group overlap expressed as a proportion of the area of the smallest overlapping unit in question.

clustR assigns areal units to common geographies using the method outlined by Slez, O'Connell, and Curtis (2014), who show that identifying the common geographies associated with a set of k partitions is identical to identifying the components of a k-uniform k-partite hypergraph. Each edge in the hypergraph represents a membership profile depicting the intersection between areal units.

A data.frame depicting the relationship between hyperedges and components. Each hyperedge consists of a membership profile containing the name of one group from each partition. Each component refers to a common geography.

Slez, Adam, Heather A. O'Connell, and Katherine J. Curtis. 2017. "A Note on the Identification of Common Geographies." Sociological Methods and Research 46(2): 288–299.

#load list containing partitions
data(nd_list)
clustR(nd_list)

#load data frame containing intersections
data(example)

#add placeholder for area (real areas not needed)
example$AREA <- 1
clustR(example, nid = c("A", "B"), area = "AREA")

#load data frame containing intersections
data(south_df)
clustR(south_df, nid = c("ID1860", "ID2000"), area = "AREA")