decompose_divergence: Additively decompose divergence scores by and within groups

Description Usage Arguments Details Value Note Source

View source: R/decompose.R

Description

The Divergence Index is additively decomposable. This function allows for splitting a population into groups of observations and calculating the divergence score within those groups and between those groups.

Usage

1
2
3
4
5
6
7
8
decompose_divergence(
  dataframe,
  groupCol = NULL,
  popCol = NA,
  weightCol = NA,
  output = "scores",
  ...
)

Arguments

dataframe

A dataframe composed of numeric/integer columns representing percentages of each population group. All columns are used in the divergence calculation except for those specified in groupCol and popCol(optional), and no other columns should be included.

groupCol

Name of the column(s) in the dataframe used for grouping. if passing a grouped_df to dataframe, this parameter is ignored. If using multiple groups, divergence will be aggregated by all unique combinations of all groups, and compared to the total datafame

popCol

Either NA (default), which sets the population of each row to 1, or a character string of the column name in dataframe.

weightCol

alias for popCol

output

Any of:

"scores"

Default. The individual within and between divergence scores for each row or group, plus the total score.

"percentage"

One row for each entry(or group) as in "scores," but scaled so each observation reports a percentage of the total score that would be reproted with "summed".

"all"

The output from summed, weighted, and percentage.

...

options passed through to divergence

Details

The sum of the scores reported in decompose_divergence when setting summed==TRUE should always be equal to the

Deomposing the divergence index allows users to simultatneously examine the segregation within and between groups of a large geography. Furthermore, users can assess the percentage of segregation coming from each group.

The output paramater "scaled" transforms the divergence index it from an absolute to a relative measure of inequality and segregation, and negates several of its desirable properties, including aggregation equivalence and independence. (See Roberto, 2016)

Value

A dataframe as specified by the output parameter.

The dataframe will have three columns: 'within_divergence', equivalent to divergence() for each dataframe or group in dataframe; 'between_divergence', the divergence score of each group's demographics compared to the full population; and weightCol, the sum of the weights for each group. The sum of decompose_divergence(...,summed = T) should equal the result of divergence(...,summed = T)

Note

The divergence parameters for each group are set to their defaults unless explicitly noted above.

decompose_divergence treats the entire dataset its given as the total population, which may not be desirable in some contexts, for example, when trying to return divergence scores across years. In that context, it's helpful to split the dataframe into a list of dataframes and use decompose_divergence inside a sapply function.

Source

Roberto, 2016. "A Decomposable Measure of Segregation and Inequality."


arthurgailes/rsegregation documentation built on May 23, 2021, 6:33 a.m.