knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
Let's have a look at first decisions on asylum requests from Afghan citizens, in Europe.
This means that we use the dataset migr_asydcfsta
: "First instance decisions on applications by citizenship, age and sex Annual aggregated data (rounded)". (Remark: For Germany, these are the the BAMF decisions. Don't confuse this with the "first instance" in the legal system of Germany, that would be the Verwaltungsgericht.) (Second Remark: There's also quarterly data, we can work with this later.)
This data covers the years from 2008 to 2017 (as of August 2018). It's quite big - 85 MB, and 13'311'070 values. Loading this takes some minutes on my pretty strong laptop. We start with a reduced data set that was pulled and saved with the following code:
library(eurostat) migr_asydcfsta=get_eurostat("migr_asydcfsta") saveRDS(migr_asydcfsta, file="/tmp/migr_asydcfsta_20180812.rds")
The data covers 32 European countries (geo
). Some don't have enough data to do useful analysis. Let's ignore them.
In the code below, we reduce our data to: * Only totals in age and sex * only Afghanistan * only 2017 (for now) * only European countries with at least 500 decisions in 2017 * only columns of interest
library(knitr) library(dplyr) migr_asydcfsta=readRDS(file="/tmp/migr_asydcfsta_20180812.rds")
Let's have a quick look at what we have.
# The first few rows: glimpse(migr_asydcfsta) # A look at the structure of our data, also showing the categories (sex, age, decisions): str(migr_asydcfsta)
The important stuff is: geo: The EU country citizen: The applicant's citizenship, in our case, Afghanistan decision: The first decision on the application time: in our case, 2017
This is too much information. Let's reduce it:
cutoff=1000 major_geo_total=filter(migr_asydcfsta, values > cutoff, time == "2017-01-01", decision == "TOTAL", sex == "T", age == "TOTAL", citizen == "AF", geo != "EU28") %>% select(geo,values) %>% arrange(desc(values)) cutoff
major_geo_total # we kept TOTAL only to include it in this view major_geo_total <- filter(major_geo_total, geo != "TOTAL")
OK, this is what we'll be working with.
The first question is: How much information did we lose when ignoring countries with less than 500 decisions?
Looking at the number of decisions per country brought the first surprise for me. In 2017, there were 15 countries with more than 1000 decisions on asylum application from Afghan citizens. All other 17 countries account for only 580 decisions.
In total there were 184'265 decisions on Afghanistan in 2017.
other_geo_total=filter(migr_asydcfsta, values <= cutoff, time == "2017-01-01", decision == "TOTAL", sex == "T", age == "TOTAL", citizen == "AF", geo != "EU28", geo != "TOTAL") %>% select(values) dec_others_total <- sum(other_geo_total$values) dec_others_total
If you don't believe this, great. Lack of trust in data is a good thing. You can look it up at eurostat migr_asydcfsta
Download geo.dic from https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&downfile=dic%2Fde%2Fgeo.dic
geodic = read.delim(file = "/tmp/geo.dic", col.names = c("code","label")) geo_migr_asydcfsta = levels(migr_asydcfsta$geo) geodic %>% filter(code %in% geo_migr_asydcfsta)
library(ggplot2) # some technical workarounds for adding the row with the others (XX), # and for sorting major_geo_total$geo <- as.character(major_geo_total$geo) all_geo_total <- rbind(major_geo_total, c("XX",dec_others_total)) all_geo_total$values <- as.numeric(all_geo_total$values) all_geo_total$geo <- factor(all_geo_total$geo, levels = arrange(all_geo_total, values)$geo) # Labels only for the biggest pie pieces big = 8 blables <- c(rev(levels(all_geo_total$geo))[c(1:big)],rep("",dim(all_geo_total)[1]-big)) y.breaks <- cumsum(all_geo_total$values) - all_geo_total$values/2 geo_dec_pie <- ggplot(all_geo_total, aes(x="", y=values, fill=geo)) + geom_col(colour = "black") + coord_polar("y", start=0) + scale_fill_grey(start = 0.4, end = 0.9) + ggtitle ("Decisions on Afghan asylum Cases 2017", subtitle = "EU countries with a high enough number of decisions; XX is the rest") geo_dec_pie + theme_bw() + theme(axis.title=element_blank()) + scale_y_continuous(labels=blables, breaks = y.breaks)
major_geo_total
Germany had by far the most decisions on Afghanistan in 2017. It is also the most populous country. How does it's share of decisions compare to it's share of the population?
For this, we need the eurostat data set demo_pjan
:
demo_pjan <- get_eurostat(demo_pjan) saveRDS(demo_pjan, file = "/tmp/demo_pjan_20180825.rds")
demo_pjan <- readRDS(file="/tmp/demo_pjan_20180825.rds") compare_dec=filter(migr_asydcfsta, time == "2017-01-01", decision == "TOTAL", sex == "T", age == "TOTAL", citizen == "AF", geo != "EU28", geo != "TOTAL") %>% select(geo, values) %>% mutate( stat = "dec") compare_dec <- droplevels(compare_dec) compare_pop <- demo_pjan %>% filter(age == "TOTAL" & sex == "T" & time == "2017-01-01" & geo %in% levels(compare_dec$geo)) %>% select(geo, values) %>% mutate( stat = "pop") compare_pop <- droplevels(compare_pop) # divide population by 1000, otherwise decisions won't be visible compare_pop <- mutate(compare_pop, values = values / 1000) compare_dec_pop <- rbind(compare_dec, compare_pop)
Now, the histogram - only for EU countries with a population over 8 mio :
drop_countries_comp <- filter(compare_pop, values < 8000 & stat == "pop") %>% select(geo) # population first, decisions second compare_dec_pop$stat <- factor(compare_dec_pop$stat, levels=c("pop","dec")) # sort by pop compare_dec_pop$geo <- factor(compare_dec_pop$geo, levels = arrange(filter(compare_dec_pop, stat == "pop"), desc(values))$geo) bar_compare_dec_pop <- ggplot(filter(compare_dec_pop, !geo %in% unlist(drop_countries_comp)), aes(x=geo, y=values, fill=stat)) + geom_bar(stat="identity", position="dodge") + theme_bw() + theme(legend.position="bottom") + scale_fill_grey(start = 0.4, end = 0.7) bar_compare_dec_pop
xxx 2015, 2016? xxx
Now, how did our 15 countries decide in 2017? We reduced too much in the previous step, we need all the decisions now, not only the TOTAL.
major_geo <-filter(migr_asydcfsta, geo %in% major_geo_total$geo, time >= "2017-01-01", sex == "T", age == "TOTAL", citizen == "AF", geo != "EU28", geo != "TOTAL") %>% select(geo,decision,values) major_geo$geo <- factor(major_geo$geo, levels = arrange(filter(major_geo, decision == "TOTAL"), desc(values))$geo) # bring decisions into correct order major_geo$decision <- factor(major_geo$decision, levels=c("REJECTED","TEMP_PROT","HUMSTAT","SUB_PROT","GENCONV","TOTAL_POS","TOTAL")) # define grey palette dec_palette_grey=c("#000000","#C0C0C0","#D3D3D3","#E4E4E4","#E8E8E8","#C8C8C8","#808080") dec_bar <- ggplot(filter(major_geo, decision != "TOTAL" & decision != "TOTAL_POS")) + geom_col(aes(x=geo, y=values, fill=decision)) dec_bar + scale_fill_manual(values = dec_palette_grey) + theme_bw() + theme(legend.position="bottom")
looking at the percentages:
dec_bar_fill <- ggplot(filter(major_geo, decision != "TOTAL" & decision != "TOTAL_POS")) + geom_col(aes(x=geo, y=values, fill=decision), position="fill") + scale_fill_grey() + theme_bw() + theme(legend.position="bottom") dec_bar_fill
To decypher the decision codes, download the decision dictionary from https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=dic%2Fde%2Fdecision.dic
decision.dic <- read.delim(file = "/tmp/decision.dic", col.names = c("code","label")) filter(decision.dic, code %in% levels(major_geo$decision))
Analysis:
Next questions: Is the picture (number and result of decisions) similar for SY? Eritrea? TOTAL? Deportations? We barely have numbers from DE ...
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.