id: (Multilevel) index of dissimilarity

Description Usage Arguments Details Value See Also Examples

View source: R/id.R

Description

Returns either the standard index of dissimilarity (ID) or its multilevel equivalent

Usage

1
id(data, vars, levels = NA, expected = FALSE, nsims = 100, omit = NULL)

Arguments

data

a data frame with ncol(data) >= 2. Each row of the data represents a neighbourhood or some other areal unit for which counts of population have been made.

vars

a character or numeric vector of length 2 or 3 giving either the names or columns positions of the variables in data in the following order:

  1. the number of population group Y in each neighbourhood

  2. the number of population group X in each neighbourhood

  3. (optional) The total population in each neighbourhood

levels

a character or numeric vector of minimum length 1 identifying either the names or columns positions of the variables in data that record to which higher-level grouping each lower-lower level unit belongs. If levels = NA, the default, then only the standard index of dissimilarity is calculated.

expected

a logical scaler. Should the expected value of the ID under randomisation be calculated? Requires a measure of the total population in each neighbourhood. If omitted from vars that total will be calculated as sum(X + Y).

nsims

a vector, the number of random draws to be used for calculating the expected value. Default is 100.

omit

(optional) a character vector containing the names of places to search for in the data and to omit from the calculations

Details

If Y is the number of population group Y living in each neighbourhood and X is the number of population group X then id measures how unevenly distributed are the two groups relative to one another and is a measure of segregation. In addition, for geographically hierarchichal data, scale effects may be explored to examine the scale of geographical clustering.

The method works by treating the calculation of the ID as a regression problem: if Y is recalculated as the share per neighbourhood of the total count of population group Y (i.e. Y <- Y / sum(Y)) and X is recalculated in the same way for X, then fitting ols <- lm(Y ~ 0, offset = X) generates a set of residuals, e <- residuals(ols) where each residual is the difference in the share of Y and the share of X per neighbourhood, and the sum of the absolute of those residuals can be used to obtain the id: id <- 0.5 * sum(abs(e)).

The advantage of calculating the ID in this way is that it can be extended to consider geographic hierarchies, where neighbourhoods at the base level can be grouped into larger regions at the next level, and so forth. Then, for the multilevel index, the residuals are estimated at and partitioned between each level of the model net of the other levels, allowing scale effects to be explored.

print(index) displays the ID value, the expected value of the ID under randomisation (NA if not calculated), and, for a multilevel model, the percentage share of the total variance due to each level (a measure of the geographical scale of segregation: see the examples given by checkerboard) and the holdback scores - see holdback

Value

an object of class index. This is a value between zero and one where 0 implies no segreation, and 1 means 'complete segregation' - wherever group Y is located, X is not (and vice versa). If expected = TRUE the expected value under randomisation also is given. In addition, the object contains the following attributes:

and also, for a multilevel model,

See Also

checkerboard print.index holdback residuals.index lmer sumup

Harris R (2017) Fitting a multilevel index of segregation in R: using the MLID package http://rpubs.com/profrichharris/MLID

Harris R (2017) Measuring the scales of segregation: Looking at the residential separation of White British and other school children in England using a multilevel index of dissimilarity http://bit.ly/2lQ4r0n

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
data(ethnicities)
head(ethnicities)
# Calculate the standard index value
id(ethnicities, vars = c("Bangladeshi", "WhiteBrit"))

## Not run: 
# Calculate also the expected value under randomisation
id(ethnicities, vars = c("Bangladeshi", "WhiteBrit"), expected = TRUE)
# will generate a warning because the total population per neighbourhood
# has not been specified
id(ethnicities, vars = c("Bangladeshi", "WhiteBrit", "Persons"),
expected = TRUE)
# The expected value is a high percentage of the actual value so
# aggregate it into a higher level geography...
aggdata <- sumup(ethnicities, sumby = "LSOA", drop = "OA")
head(aggdata)

# Multilevel models
id(aggdata, vars = c("Bangladeshi", "WhiteBrit"),
levels = c("MSOA","LAD","RGN"))
id(aggdata, vars = c("Bangladeshi", "WhiteBrit"),
levels = c("MSOA","LAD","RGN"), omit = c("Tower Hamlets", "Newham"))

## End(Not run)

MLID documentation built on May 30, 2017, 2:23 a.m.