knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Focus Afghanistan, with Eurostat Data

Let's have a look at first decisions on asylum requests from Afghan citizens, in Europe.

This means that we use the dataset migr_asydcfsta: "First instance decisions on applications by citizenship, age and sex Annual aggregated data (rounded)". (Remark: For Germany, these are the the BAMF decisions. Don't confuse this with the "first instance" in the legal system of Germany, that would be the Verwaltungsgericht.) (Second Remark: There's also quarterly data, we can work with this later.)

This data covers the years from 2008 to 2017 (as of August 2018). It's quite big - 85 MB, and 13'311'070 values. Loading this takes some minutes on my pretty strong laptop. We start with a reduced data set that was pulled and saved with the following code:

library(eurostat)
migr_asydcfsta=get_eurostat("migr_asydcfsta")
saveRDS(migr_asydcfsta, file="/tmp/migr_asydcfsta_20180812.rds")

The data covers 32 European countries (geo). Some don't have enough data to do useful analysis. Let's ignore them.

In the code below, we reduce our data to: * Only totals in age and sex * only Afghanistan * only 2017 (for now) * only European countries with at least 500 decisions in 2017 * only columns of interest

library(knitr)
library(dplyr)
migr_asydcfsta=readRDS(file="/tmp/migr_asydcfsta_20180812.rds")

Let's have a quick look at what we have.

# The first few rows:
glimpse(migr_asydcfsta)
# A look at the structure of our data, also showing the categories (sex, age, decisions):
str(migr_asydcfsta)

The important stuff is: geo: The EU country citizen: The applicant's citizenship, in our case, Afghanistan decision: The first decision on the application time: in our case, 2017

This is too much information. Let's reduce it:

cutoff=1000
major_geo_total=filter(migr_asydcfsta, values > cutoff, time == "2017-01-01", decision == "TOTAL", sex == "T", age == "TOTAL", citizen == "AF", geo != "EU28") %>% 
  select(geo,values) %>%
  arrange(desc(values))
cutoff

A first look into the data

major_geo_total
# we kept TOTAL only to include it in this view
major_geo_total <- filter(major_geo_total, geo != "TOTAL")

Reducing to the relevant deciding countries

OK, this is what we'll be working with.

The first question is: How much information did we lose when ignoring countries with less than 500 decisions?

Looking at the number of decisions per country brought the first surprise for me. In 2017, there were 15 countries with more than 1000 decisions on asylum application from Afghan citizens. All other 17 countries account for only 580 decisions.

In total there were 184'265 decisions on Afghanistan in 2017.

other_geo_total=filter(migr_asydcfsta, values <= cutoff, time == "2017-01-01", decision == "TOTAL", sex == "T", age == "TOTAL", citizen == "AF", geo != "EU28", geo != "TOTAL") %>% select(values)
dec_others_total <- sum(other_geo_total$values)
dec_others_total

If you don't believe this, great. Lack of trust in data is a good thing. You can look it up at eurostat migr_asydcfsta

Lookup Table for Country Codes

Download geo.dic from https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&downfile=dic%2Fde%2Fgeo.dic

geodic = read.delim(file = "/tmp/geo.dic", col.names = c("code","label"))
geo_migr_asydcfsta = levels(migr_asydcfsta$geo)
geodic %>% filter(code %in% geo_migr_asydcfsta)

Pie of European Countries, decisions in 2017 on asylum cases Afghanistan

library(ggplot2)
# some technical workarounds for adding the row with the others (XX),
# and for sorting
major_geo_total$geo <- as.character(major_geo_total$geo)
all_geo_total <- rbind(major_geo_total, c("XX",dec_others_total))
all_geo_total$values <- as.numeric(all_geo_total$values) 
all_geo_total$geo <- factor(all_geo_total$geo, levels = arrange(all_geo_total, values)$geo)
# Labels only for the biggest pie pieces
big = 8
blables <- c(rev(levels(all_geo_total$geo))[c(1:big)],rep("",dim(all_geo_total)[1]-big))
y.breaks <- cumsum(all_geo_total$values) - all_geo_total$values/2

geo_dec_pie <- ggplot(all_geo_total, aes(x="", y=values, fill=geo)) + geom_col(colour = "black") + coord_polar("y", start=0) + scale_fill_grey(start = 0.4, end = 0.9) + ggtitle ("Decisions on Afghan asylum Cases 2017", subtitle = "EU countries with a high enough number of decisions; XX is the rest") 
geo_dec_pie + theme_bw() + theme(axis.title=element_blank()) + scale_y_continuous(labels=blables, breaks = y.breaks)
major_geo_total

Germany had by far the most decisions on Afghanistan in 2017. It is also the most populous country. How does it's share of decisions compare to it's share of the population?

For this, we need the eurostat data set demo_pjan:

 demo_pjan <- get_eurostat(demo_pjan)
 saveRDS(demo_pjan, file = "/tmp/demo_pjan_20180825.rds")
demo_pjan <- readRDS(file="/tmp/demo_pjan_20180825.rds")
compare_dec=filter(migr_asydcfsta, time == "2017-01-01", decision == "TOTAL", sex == "T", age == "TOTAL", citizen == "AF", geo != "EU28", geo != "TOTAL") %>% select(geo, values) %>% mutate( stat = "dec")
compare_dec <- droplevels(compare_dec)
compare_pop <- demo_pjan %>% filter(age == "TOTAL" & sex == "T" & time == "2017-01-01" & geo %in% levels(compare_dec$geo)) %>% select(geo, values) %>% mutate( stat = "pop")
compare_pop <- droplevels(compare_pop)
# divide population by 1000, otherwise decisions won't be visible
compare_pop <- mutate(compare_pop, values = values / 1000)
compare_dec_pop <- rbind(compare_dec, compare_pop)

Now, the histogram - only for EU countries with a population over 8 mio :

drop_countries_comp <- filter(compare_pop, values < 8000 & stat == "pop") %>% select(geo)
# population first, decisions second
compare_dec_pop$stat <- factor(compare_dec_pop$stat, levels=c("pop","dec"))
# sort by pop 
compare_dec_pop$geo <- factor(compare_dec_pop$geo, levels = arrange(filter(compare_dec_pop, stat == "pop"), desc(values))$geo)

bar_compare_dec_pop <- ggplot(filter(compare_dec_pop, !geo %in% unlist(drop_countries_comp)),  aes(x=geo, y=values, fill=stat)) + geom_bar(stat="identity", position="dodge") + theme_bw() + theme(legend.position="bottom") + scale_fill_grey(start = 0.4, end = 0.7)

bar_compare_dec_pop 

xxx 2015, 2016? xxx

The Decisions: GENCONV + HUMSTAT + SUB_PROT + TEMP_PROT + REJECTED = TOTAL_POS + REJECTED = TOTAL

Now, how did our 15 countries decide in 2017? We reduced too much in the previous step, we need all the decisions now, not only the TOTAL.

major_geo <-filter(migr_asydcfsta, geo %in% major_geo_total$geo, time >= "2017-01-01", sex == "T", age == "TOTAL", citizen == "AF", geo != "EU28", geo != "TOTAL") %>% 
  select(geo,decision,values) 
major_geo$geo <- factor(major_geo$geo, levels = arrange(filter(major_geo, decision == "TOTAL"), desc(values))$geo)
# bring decisions into correct order
major_geo$decision <- factor(major_geo$decision, levels=c("REJECTED","TEMP_PROT","HUMSTAT","SUB_PROT","GENCONV","TOTAL_POS","TOTAL"))
# define grey palette
dec_palette_grey=c("#000000","#C0C0C0","#D3D3D3","#E4E4E4","#E8E8E8","#C8C8C8","#808080")

dec_bar <- ggplot(filter(major_geo, decision != "TOTAL" & decision != "TOTAL_POS")) + geom_col(aes(x=geo, y=values, fill=decision)) 
dec_bar +  scale_fill_manual(values = dec_palette_grey) + theme_bw() + theme(legend.position="bottom")

looking at the percentages:

dec_bar_fill <- ggplot(filter(major_geo, decision != "TOTAL" & decision != "TOTAL_POS")) + geom_col(aes(x=geo, y=values, fill=decision), position="fill") + scale_fill_grey() + theme_bw() + theme(legend.position="bottom")
dec_bar_fill

To decypher the decision codes, download the decision dictionary from https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=dic%2Fde%2Fdecision.dic

decision.dic <- read.delim(file = "/tmp/decision.dic", col.names = c("code","label"))
filter(decision.dic, code %in% levels(major_geo$decision))

Analysis:

Next questions: Is the picture (number and result of decisions) similar for SY? Eritrea? TOTAL? Deportations? We barely have numbers from DE ...



muc-fluechtlingsrat/r-eurostat-refugees documentation built on May 23, 2019, 8:20 a.m.