Description Usage Arguments Details Value See Also Examples
Returns either the standard index of dissimilarity (ID) or its multilevel equivalent
1 |
data |
a data frame with |
vars |
a character or numeric vector of length 2 or 3 giving either the
names or columns positions of the variables in
|
levels |
a character or numeric vector of minimum length 1 identifying
either the names or columns positions of the variables in |
expected |
a logical scaler. Should the expected value of the ID under
randomisation be calculated? Requires a measure of the total population in
each neighbourhood. If omitted from |
nsims |
a vector, the number of random draws to be used for calculating the expected value. Default is 100. |
omit |
(optional) a character vector containing the names of places to search for in the data and to omit from the calculations |
If Y
is the number of population group Y living in each neighbourhood
and X
is the number of population group X then id
measures how
unevenly distributed are the two groups relative to one another and is a
measure of segregation. In addition, for geographically hierarchichal data,
scale effects may be explored to examine the scale of geographical
clustering.
The method works by treating the calculation of the ID as a
regression problem: if Y
is recalculated as the share per
neighbourhood of the total count of population group Y
(i.e. Y <- Y / sum(Y)
) and X
is recalculated in the same way
for X, then fitting ols <- lm(Y ~ 0, offset = X)
generates a set of
residuals, e <- residuals(ols)
where each residual is the difference
in the share of Y and the share of X per neighbourhood, and the sum of the
absolute of those residuals can be used to obtain the id:
id <- 0.5 * sum(abs(e))
.
The advantage of calculating the ID in this way is that it can be extended to consider geographic hierarchies, where neighbourhoods at the base level can be grouped into larger regions at the next level, and so forth. Then, for the multilevel index, the residuals are estimated at and partitioned between each level of the model net of the other levels, allowing scale effects to be explored.
print(index)
displays the ID value, the expected value of
the ID under randomisation (NA if not calculated), and, for a multilevel
model, the percentage share of the total variance due to each level
(a measure of the geographical scale of segregation: see the examples given
by checkerboard
) and the holdback scores -
see holdback
an object of class index
. This is a value between zero and one
where 0 implies no segreation, and 1 means 'complete segregation' - wherever
group Y is located, X is not (and vice versa). If expected = TRUE
the
expected value under randomisation also is given. In addition, the object
contains the following attributes:
attr(x, "ols")
an object of class lm
. The OLS
regression used to calculate the ID. Useful for identifying significant
residuals (see Example below)
attr(x, "vars")
the names of Y and X in data
attr(x, "data")
a data frame with the population counts
for Y and X
and also, for a multilevel model,
attr(index, "mlm")
an object of class lmerMod
.
Fitted using lmer
attr(index, "variance")
the percentage of the total variance
due to each level of the model. This indicates the scale at which the
segregation is most prominent
attr(index, "holdback")
records the percentage change in the
ID that occurs if, at each level, its contribution to the ID net of
other levels is heldback (set to zero)
checkerboard
print.index
holdback
residuals.index
lmer
sumup
Harris R (2017) Fitting a multilevel index of segregation in R: using the MLID package http://rpubs.com/profrichharris/MLID
Harris R (2017) Measuring the scales of segregation: Looking at the residential separation of White British and other school children in England using a multilevel index of dissimilarity http://bit.ly/2lQ4r0n
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | data(ethnicities)
head(ethnicities)
# Calculate the standard index value
id(ethnicities, vars = c("Bangladeshi", "WhiteBrit"))
## Not run:
# Calculate also the expected value under randomisation
id(ethnicities, vars = c("Bangladeshi", "WhiteBrit"), expected = TRUE)
# will generate a warning because the total population per neighbourhood
# has not been specified
id(ethnicities, vars = c("Bangladeshi", "WhiteBrit", "Persons"),
expected = TRUE)
# The expected value is a high percentage of the actual value so
# aggregate it into a higher level geography...
aggdata <- sumup(ethnicities, sumby = "LSOA", drop = "OA")
head(aggdata)
# Multilevel models
id(aggdata, vars = c("Bangladeshi", "WhiteBrit"),
levels = c("MSOA","LAD","RGN"))
id(aggdata, vars = c("Bangladeshi", "WhiteBrit"),
levels = c("MSOA","LAD","RGN"), omit = c("Tower Hamlets", "Newham"))
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.