Patterns of convergence and divergence within the convergEU package"
In convergEU: Monitoring Convergence of EU Countries

library(ggplot2)
library(dplyr)
library(tidyverse)
library(eurostat)
library(purrr)
library(tibble)
library(tidyr)
library(ggplot2)
library(formattable) 
library(kableExtra)
library(caTools)
library(readxl)

library(convergEU)

knitr::opts_chunk$set(
  collapse = TRUE,
  warning = FALSE,
  message = FALSE,
  comment = "#>",
  fig.width = 4,
  fig.height = 5,
  dpi=200
)

Finding patterns of convergence and divergence within the convergEU package

The convergEU package allows to obtain patterns of change along time for indicators in the European Union (EU) by invoking the ms_pattern_ori function:

help(ms_pattern_ori)

The ms_pattern_ori function allows for obtaining patterns for both lowBest and highBest types of indicators. More specifically, in this function the following patterns are defined through numerical labels and corresponding string labels:

1: Catching up
2: Flattening
3: Inversion
4: Outperforming
5: Slower pace
6: Diving
7: Defending better
8: Escaping
9: Falling away
10: Underperforming
11: Recovering
12: Reacting better
13: Paralleling better over
14: Paralleling equal over
15: Paralleling worse over
16: Paralleling worse under
17: Paralleling equal under
18: Paralleling better under
19: Crossing
20: Crossing reversed
21: Other (Inspection)

It is important to note that for finding patterns for indicators of type "low is better", we assume that higher the indicator value, worse the considered socio/economic feature in a given member state (MS). Instead of creating new labels to tag patterns of this class of indicators, we transform the original indicator after noting that the absolute positioning of values is not relevant while judging for the presence of a given pattern. Thus, the indicators of type "low is better" are transformed, and the distance from the maximum value for each original observation is calculated. If the original index decreases then the transformed value increases, and the pattern recognition scheme applies in the same way as for indicators of type "high is better".

The graphical plots for the defined patterns depending on the type of indicators (lowBest or highBest) are available by invoking the patt_legend function:

help(patt_legend)

When considering indicators of type highBest, the graphical representation of the patterns is as follows:

highind<-patt_legend(indiType="highBest")
highind

while for the lowBest type of indicators the plot of the patterns is:

lowhind<-patt_legend(indiType="lowBest")
lowhind

For further details on the defined patterns we refer to the Eurofound report "Monitoring convergence in the European Union Upward convergence in the EU: Concepts, measurements and indicators" (2018, p. 25-26).

For illustrating practically the points discussed above, let's consider a first example related to the emp_20_64_MS dataset for which the indicator is of type highBest. Thus, for obtaining the patterns for this type of indicator,we invoke the ms_pattern_ori function as follows:

myemp <-ms_pattern_ori(emp_20_64_MS, "time",type="highBest")

The output of the ms_pattern_ori function consists of the usual three list components: "$res" that contains the results, "\$msg" that possibly carries messages for the user and "\$err" which is a string containing an error message, if an error occurs:

names(myemp)

The "\$res" component of the output contains the numerical labels for the patterns as well as their string labels:

mypattemp<-myemp$res$mat_label_tags
mypattempn<-myemp$res$mat_without_summaries
mypattempn
mypattemp

Let's illustrate more in detail one of the obtained patterns; to this end, we consider the time period 2006-2007 and the country France ("FR") for which the obtained pattern is "Slower pace":

mypattemp[["2006/2007"]][12]

with the following graphical plot of the calculated pattern where the dashed blue line refers to the France and the black solid line refers to the EU:

matRaw1 <- dplyr::select(emp_20_64_MS, -time)
matRawT <- dplyr::select(emp_20_64_MS, time)
EUavemp <- dplyr::bind_cols(matRawT ,EUavempp=apply(matRaw1,1,mean))
EUavemph <- EUavemp[5:6,]
avehu <- emp_20_64_MS[5:6,"FR"]
gFR <- ggplot() + geom_point(aes(x=EUavemph$time,y=EUavemph$EUavempp),color='black') +
  geom_point(aes(x=EUavemph$time,y=avehu$FR),color='blue') +
  geom_line(aes(x=EUavemph$time,y=EUavemph$EUavempp),color='black') +
  geom_line(aes(x=EUavemph$time,y=avehu$FR),color='blue',linetype = 2) + 
  ggtitle("Employment rate indicator: 2006-2007")+
  labs(y="France", x="Time")+
  theme(axis.text.x=element_blank())
gFR

The interpretation of the "Slower pace" pattern is straightforward as illustrated in the Eurofound report "Monitoring convergence in the European Union Upward convergence in the EU: Concepts, measurements and indicators" (2018, p. 25).

A second example relates to an indicator of type "low is better". To this end, let's consider the indicator Unemployment rate by sex, age and educational attainment - annual averages for which the data are stored in the une_educ_a.xls file (Subsection 4.2, Tutorial for analyzing convergence with the convergEU package). First, we import the data from the xls file as explained in details in the Tutorial (Subsection 4.2):

# library(readxl)
file_name <- system.file("vign/une_educ_a.xls", package = "convergEU")
myxls2<-read_excel(file_name,
                   sheet="Data",range = "A12:AP22", na=":")
myxls2 <- dplyr::mutate(myxls2, `TIME/GEO` = as.numeric(`TIME/GEO`))

where "une_educ_a.xls" specifies the path (eventually including disk unit or folders) in which the xls file is stored. Then, the cluster EU27_2020 of MS is chosen, the data are checked for unsuited features, and missing values imputation is performed as follows:

EU27estr<-convergEU_glb()$EU27_2020$memberStates$codeMS
myxls<- dplyr::select(myxls2,`TIME/GEO`,all_of(EU27estr))
check_data(myxls)
myxls3<- dplyr::rename(myxls,time=`TIME/GEO`)
myxlsf <- impute_dataset(myxls3, timeName ="time",
                         countries=convergEU_glb()$EU27_2020$memberStates$codeMS,
                         headMiss = c("cut", "constant")[2],
                         tailMiss = c("cut", "constant")[2])$res
check_data(myxlsf)

in order to obtain the final dataset myxlsf for calculating patterns.

The indicator une_educ_a is of type "low is better"; thus, the syntax to find the patterns is as follows:

myres <-  ms_pattern_ori(myxlsf, "time",type="lowBest")

where the "\$res" component of the output contains the numerical labels for the patterns as well as their string labels:

mypattl<-myres$res$mat_label_tags
mypattn<-myres$res$mat_without_summaries
mypattn
mypattl

For this indicator, let's take the time period 2015-2016 and the MS Finland ("FI") for which the obtained pattern is again "Slower pace":

mypattl$`2015/2016`[14]

In this case, given that the indicator is of type "low is better", the plot for the "Slower pace" pattern is:

matRaw <- select(myxlsf,-time)
EUave <- cbind( select(myxlsf,time) ,EUave=apply(matRaw,1,mean))
EUave1<-EUave[7:8,]
avef<-matRaw[7:8, "FI"]

gfr<-ggplot() + geom_point(aes(x=EUave1$time,y=EUave1$EUave),color='black') + 
  geom_point(aes(x=EUave1$time,y=avef$FI),color='blue') +
  geom_line(aes(x=EUave1$time,y=EUave1$EUave),color='black') + 
  geom_line(aes(x=EUave1$time,y=avef$FI),color='blue',linetype = 2) + 
  ggtitle("Unemployment rate indicator: 2015-2016")+
  labs(y="Finland", x="Time")+
  theme(axis.text.x=element_blank())
gfr

where the dashed blue line refers to Finland and the black solid line refers to the EU. Recall that differently from the previous indicator of type "high is better", for this type of indicator the results for the "Slower pace" pattern should be interpreted according to the assumption that "higher the indicator value, worse the considered socio/economic feature in a member country".

To further illustrate other possible patterns, consider again the first example related to the emp_20_64_MS dataset (indicators of type "high is better"). For example, let's take the time period 2011-2012 and the MS Portugal ("PT") for which the pattern is "Crossing":

mypattemp$`2011/2012`[23]

where the corresponding plot for this pattern is:

EUavemph1<-EUavemp[10:11,]
avept<-emp_20_64_MS[10:11,c("PT")]
gpt<-ggplot() + geom_point(aes(x=EUavemph1$time,y=EUavemph1$EUavempp),color='black') +
  geom_point(aes(x=EUavemph1$time,y=avept$PT),color='blue') +
  geom_line(aes(x=EUavemph1$time,y=EUavemph1$EUavempp),color='black') +
  geom_line(aes(x=EUavemph1$time,y=avept$PT),color='blue',linetype = 2) + 
  ggtitle("Employment rate indicator: 2011-2012")+
  labs(y="Portugal", x="Time")+
  theme(axis.text.x=element_blank())
gpt

Similarly, for the member country Lithuania ("LT") in the same time period, the obtained pattern is now "Crossing reversed" and the plot for this pattern is:

avelt<-emp_20_64_MS[10:11,c("LT")]
glt<-ggplot() + geom_point(aes(x=EUavemph1$time,y=EUavemph1$EUavempp),color='black') +
  geom_point(aes(x=EUavemph1$time,y=avelt$LT),color='blue') +
  geom_line(aes(x=EUavemph1$time,y=EUavemph1$EUavempp),color='black') +
  geom_line(aes(x=EUavemph1$time,y=avelt$LT),color='blue',linetype = 2) + 
  ggtitle("Employment rate indicator: 2011-2012")+
  labs(y="Lithuania", x="Time")+
  theme(axis.text.x=element_blank())
glt

Types of convergence/ divergence within the convergEU package

Convergence and divergence may be strict or weak, upward or downward. In the convergEU package, the function upDo_CoDi is specifically implemented for assessing the type of convergence/ divergence occurring for a given indicator, a collection of member states and a period of time:

help(upDo_CoDi)

The interpretation depends on the type of indicator, that is "highBest" or "lowBest". Let's consider a first example for the emp_20_64_MS dataset in which the indicator "Employment rate" is of type highBest. Suppose that we wish to determine the type of convergence/ divergence by considering as reference time the year 2008 (time_0), as target time the year 2010 (time_t), and the variance for summarizing dispersion (argument heter_fun):

Empconv<-upDo_CoDi(emp_20_64_MS,
              timeName = "time",
              indiType = "highBest",
              time_0 = 2008,
              time_t = 2010,
              heter_fun = "var")

The output of the upDo_CoDi function consists of the usual three list components: "$res" that contains the results, "\$msg" that possibly carries messages for the user and "\$err" which is a string containing an error message, if an error occurs:

names(Empconv)
Empconv$msg
Empconv$err

By considering more in details the component "\$res", it contains for example:

a statement if a convergence or a divergence is occurred:

Empconv$res$declaration_type

a statement of the type of convergence/divergence, i.e. strict or weak, downward or upward:

Empconv$res$declaration_strict
Empconv$res$declaration_weak

a list of the member states labels for which the differences for a given indicator between the target time and the reference time are greater than zero :

Empconv$res$declaration_split$names_incre

a list of the member states labels for which the differences for a given indicator between the target time and the reference time are smaller than zero :

Empconv$res$declaration_split$names_decre

a list of the differences for a given idicator between the target time and the reference time for each member state:

Empconv$res$diffe_MS

the average of such differences:

Empconv$res$diffe_averages

the dispersion for the reference time and the target time respectively, and computed on the basis of the type of dispersion specified in the argument heter_fun (e.g. the variance in this example):

Empconv$res$dispersions

Note that if the argument heter_fun is set to var (as in this example) or sd (i.e. the standard deviation), then calculations for those statistics are performed using as a denominator $n-1$, i.e. the number of observations decreased by 1. Thus, if the users prefer to adopt n as a denominator, then the function pop_var may be used as follows:

Empconvpop<-upDo_CoDi(emp_20_64_MS,
                   timeName = "time",
                   indiType = "highBest",
                   time_0 = 2008,
                   time_t = 2010,
                   heter_fun = "pop_var")

User-developed function are also allowed in the argument heter_fun, as illustrated in the following example related to an indicator of type lowBest. To this end, we consider the dataset myxlsf illustrated in the previous Section and related to the indicator Unemployment rate by sex, age and educational attainment. We choose as a reference time the year 2009 and as a target time the year 2011. Moreover, in this case we consider the following user-developed function for summarizing dispersion:

diffQQmu <-  function(vettore){
 (quantile(vettore,0.75)-quantile(vettore,0.25))/mean(vettore)
  }

This user-developed function diffQQmu is specified in the argument heter_fun of the function upDo_CoDi:

unempconvvar<-upDo_CoDi(myxlsf,
                      timeName = "time",
                      indiType = "lowBest",
                      time_0 = 2009,
                      time_t = 2011,
                      heter_fun = "diffQQmu")
unempconvvar

According to the obtained results, for this type of indicator there is an evidence of divergence of type "weak upward" in the period 2009-2011.

References

The following reference may be consulted to find further details on convergence:

Eurofound (2018), Upward convergence in the EU: Concepts, measurements and indicators, Publications Office of the European Union, Luxembourg; by: Massimiliano Mascherini, Martina Bisello, Hans Dubois and Franz Eiffe.
Nedka D. Nikiforova, Federico M. Stefanini, Chiara Litardi, Eleonora Peruffo and Massimiliano Mascherini (2020) Tutorial: analysis of convergence with the convergEU package. Package vignette URL https://www.eurofound.europa.eu/system/files/2022-04/introduction-to-the-convergeu-package-0.6.4-tutorial-v2-apr2022.pdf