View source: R/swc_get_mapping.R
swc_get_mapping | R Documentation |
For two lists of Swiss municipality IDs at any two points in time, this function creates a data frame with two columns where each row represents a match between municipality IDs. This can be used as an intermediate table for merging two data sets with municipality identifiers taken at different, possibly unknown, points in time.
swc_get_mapping(ids_from, ids_to)
ids_from |
A list of "source" municipality IDs, preferably a factor |
ids_to |
A list of "target" municipality IDs, preferably a factor |
It is advisable to use factors as list of municipality IDs. By that, comparisons and merges for municipality IDs are automatically checked for compatibility.
Note that the "from" list must be from an earlier time than the "to" list. Trying to compute the mapping the other way round results in an error. This is intentional: As municipalities are usually merged, it makes sense to use the most recent data set as target for the mapping. This can also be a file with suitable geometries to allow for visualization.
For two lists of municipalities, we construct a mapping from the first list to the second. First, the most probable mutation number in the "municipality mutations" data set is computed.
A data frame with columns prefixed by from.
and to
that
represents the computed match. The municipality IDs are stored in the
columns from.mId
and to.mId
. The columns
from.MergeType
and to.MergeType
contain valid
if
the municipality is contained in both the input and the mapping table,
missing
if the municipality is missing from the input, and
extra
if the municipality is in the input but not in the mapping
table; most columns are NA
for such rows. In addition, the column
MergeType
offers a summary of the "from" and "to" status: Rows with
values other than "valid"
or "missing"
should be examined.
library(dplyr)
data(SwissPop)
data(SwissBirths)
# Show mismatch of municipality IDs:
ids_from <- with(SwissPop, MunicipalityID)
ids_to <- with(SwissBirths, MunicipalityID)
setdiff(ids_from, ids_to)
setdiff(ids_to, ids_from)
# Compute mapping and count non-matching municipality IDs:
mapping <- swc_get_mapping(ids_from = ids_from, ids_to = ids_to)
with(mapping, sum(mapping$mIdAsNumber.from != mapping$mIdAsNumber.to))
# Communes that are "missing" are mostly lakes and other special communes:
subset(mapping, MatchType == "missing")[, c("mIdAsNumber.from", "mShortName.from")]
# These should be looked at in some detail, and fixed manually:
subset(mapping, !(MatchType %in% c("valid", "missing")))
# Test for injectivity. The result shows that the mapping is almost injective,
# only one "from" commune is mapped to more than one other "to" commune.
# This situation requires further examination.
mapping.dupes <- subset(mapping, duplicated(mIdAsNumber.from))
(noninjective.mapping <- subset(
mapping, mIdAsNumber.from %in% mapping.dupes$mIdAsNumber.from
))
# Simple treatment (just for this example): Remove duplicates, and use only
# valid matches:
cleaned.mapping <- subset(
mapping,
!duplicated(mIdAsNumber.from) & MatchType == "valid"
)
# Now merge the two datasets based on the mapping table:
SwissPop.1970 <- subset(SwissPop, Year == "1970")
SwissPopMapping.1970 <- merge(SwissPop.1970,
cleaned.mapping[, c("mId.from", "mId.to")],
by.x = "MunicipalityID", by.y = "mId.from"
)
# Datasets from the "from" table must be suitably aggregated. For the given
# case of population totals we use the sum.
SwissPopMapping.1970.agg <- group_by(
SwissPopMapping.1970,
mId.to,
HouseholdSize
) %>%
summarize(Households = sum(Households))
with(SwissPopMapping.1970.agg, stopifnot(
length(unique(mId.to)) * length(levels(HouseholdSize)) ==
length(mId.to)
))
# The aggregated "from" dataset now can be merged with the "to" dataset:
SwissBirths.1970 <- subset(SwissBirths, Year == "1970")
SwissPopBirths.1970 <- merge(SwissPopMapping.1970.agg, SwissBirths.1970,
by.x = "mId.to", by.y = "MunicipalityID"
)
# Some more communes are still missing from the 1970 statistics, although
# the matches are valid:
subset(mapping, mIdAsNumber.to %in% setdiff(
SwissPopMapping.1970.agg$mId.to, SwissBirths.1970$MunicipalityID
))[
,
c("mId.from", "mShortName.from", "MatchType")
]
# The "from" list must be from an earlier time than the "to" list.
try(swc_get_mapping(ids_from = ids_to, ids_to = ids_from))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.