Tutorial: analysis of convergence with the convergEU package

Nedka D. Nikiforova, Federico M. Stefanini, Chiara Litardi, Eleonora Peruffo and Massimiliano Mascherini

2020-03-02

Index:




The convergEU R package is a set of S3 functions and data objects suited for the analysis of economic and social convergence of Member States (MS) within the European Union (EU).

This vignette is intended to be a “learn-by-doing” tutorial with the convergEU suite of functions. More precisely, the tutorial illustrates in details how to analyze convergence of MS with data coming from the following sources:

For better illustrating the points listed above, examples with several EU indicators are described in details.

IMPORTANT: the analysis of convergence is performed on clean and imputed data, i.e. a tidy dataset in the format years by countries. This means that the dataset must always have these characteristics:

In the convergEU package, the check_data() function may be called to check for the presence of unsuited features that must be solved before starting the analysis. The object returnes states if the dataset is ready for calculations, and if it is not; the error component states why checking failed:

Examples on how to transform data to achieve the required format are described in the tutorial.

1 Locally accessible datasets: Eurofound data

Locally accessible Eurofound datasets are available from the convergEU package without any active Internet connection. In this Section, we describe in details the analysis of convergence within the convergEU package by considering locally accessible Eurofound data, taking as an example the following two EU indicators:

Before describing in details how to analyze convergence with locally accessible datasets, we briefly illustrate the available datasets in the convergEU package, that are as follows:

data(package = "convergEU")

The object dbEUF2018meta contains a description (metadata) of the locally available data within the convergEU package, without any downloading from the Eurofound website. These data refer to two dimensions: quality of life and working conditions. For each dimension, metainformation is provided for several indicators, for example:

names(dbEUF2018meta)
#>  [1] "DIMENSION"           "SUBDIMENSION"        "INDICATOR"          
#>  [4] "Code_in_database"    "Official_code"       "Unit"               
#>  [7] "Source_organisation" "Source_reference"    "Disaggregation"     
#> [10] "Bookmark_URL"
print(dbEUF2018meta, n=200,width=200)   
#> # A tibble: 13 x 10
#>    DIMENSION          SUBDIMENSION      
#>    <chr>              <chr>             
#>  1 Quality of life    Life satisfaction 
#>  2 Quality of life    Health            
#>  3 Quality of life    Health            
#>  4 Quality of life    Quality of society
#>  5 Quality of life    Quality of society
#>  6 Quality of life    Quality of society
#>  7 Quality of life    Quality of society
#>  8 Quality of life    Quality of society
#>  9 Working conditions Working conditions
#> 10 Working conditions Working conditions
#> 11 Working conditions Working conditions
#> 12 Working conditions Working conditions
#> 13 Working conditions Working conditions
#>    INDICATOR        Code_in_database Official_code        Unit  Source_organisa…
#>    <chr>            <chr>            <chr>                <chr> <chr>           
#>  1 Mean life satis… lifesatisf       y16_q4               --    Eurofound       
#>  2 Mean health sta… health           y16_q48              --    Eurofound       
#>  3 Percentage of p… goodhealth_p     y16_q48              %     Eurofound       
#>  4 Mean level of t… trustlocal       y16_q35f             --    Eurofound       
#>  5 Level of involv… volunt           y16_q29a             --    Eurofound       
#>  6 Percentage of p… volunt_p         y16_q29a             %     Eurofound       
#>  7 Hours per week … caring_h         y16_q43a             hours Eurofound       
#>  8 Social Exclusio… socialexc_i      y16_socexindex       index Eurofound       
#>  9 JQI_Skills and … JQIskill_i       wq_slim -  JQI SLIM… index Eurofound       
#> 10 JQI_Physical en… JQIenviron_i     envsec_slim - JQI S… index Eurofound       
#> 11 JQI_Intensity i… JQIintensity_i   intens_slim - JQI S… index Eurofound       
#> 12 JQI_Working tim… JQItime_i        wlb_slim - JQI SLIM… index Eurofound       
#> 13 Exposition to d… exposdiscr_p     disc_d -  Having be… %     Eurofound       
#>    Source_reference Disaggregation Bookmark_URL                                 
#>    <chr>            <chr>          <chr>                                        
#>  1 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  2 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  3 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  4 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  5 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  6 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  7 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  8 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  9 EWCS             sex            https://www.eurofound.europa.eu/surveys/abou…
#> 10 EWCS             sex            https://www.eurofound.europa.eu/surveys/abou…
#> 11 EWCS             sex            https://www.eurofound.europa.eu/surveys/abou…
#> 12 EWCS             sex            https://www.eurofound.europa.eu/surveys/abou…
#> 13 EWCS             sex            https://www.eurofound.europa.eu/surveys/abou…

where the “Code_in_database” information is of particular importance, since it should be provided by the user when extracting the local data for a given indicator. The data visualization could be also performed in the following way:

View(dbEUF2018meta)

The second locally accessible object in the convergEU package is dbEurofound which is a raw local Eurofound database accessed as follows:

data(dbEurofound)
head(dbEurofound)
#> # A tibble: 6 x 17
#>    time geo   geo_label sex   lifesatisf health goodhealth_p trustlocal volunt
#>   <dbl> <chr> <chr>     <chr>      <dbl>  <dbl>        <dbl>      <dbl>  <dbl>
#> 1  1960 AD    Andorra   Fema…         NA     NA           NA         NA     NA
#> 2  1960 AD    Andorra   Males         NA     NA           NA         NA     NA
#> 3  1960 AD    Andorra   Total         NA     NA           NA         NA     NA
#> 4  1960 AL    Albania   Fema…         NA     NA           NA         NA     NA
#> 5  1960 AL    Albania   Males         NA     NA           NA         NA     NA
#> 6  1960 AL    Albania   Total         NA     NA           NA         NA     NA
#> # … with 8 more variables: volunt_p <dbl>, caring_h <dbl>, socialexc_i <dbl>,
#> #   JQIskill_i <dbl>, JQIenviron_i <dbl>, JQIintensity_i <dbl>,
#> #   JQItime_i <dbl>, exposdiscr_p <dbl>

The names of the variables in dbEurofound are:

names(dbEurofound)
#>  [1] "time"           "geo"            "geo_label"      "sex"           
#>  [5] "lifesatisf"     "health"         "goodhealth_p"   "trustlocal"    
#>  [9] "volunt"         "volunt_p"       "caring_h"       "socialexc_i"   
#> [13] "JQIskill_i"     "JQIenviron_i"   "JQIintensity_i" "JQItime_i"     
#> [17] "exposdiscr_p"

while the time ranges in the interval:

c(min(dbEurofound$time), max(dbEurofound$time))
#> [1] 1960 2017

The database is not complete in such a time range for all considered countries.

The third object in the convergEU package is dbMetaEUStat which provides metainformation on downloadable data from the Eurostat repository. For further details on this object, please see Section 2 related to data downloaded from the Eurostat repository.

The object emp_20_64_MS in the convergEU package is an Eurofound dataset exploited during the development of this package. It refers to the Employment rate indicator from 2002 to 2018 and within the age class 20 to 64. The dataset emp_20_64_MS contains source data downloaded from the Eurostat repository which are reshaped in the required format time by countries:

data("emp_20_64_MS")
head(emp_20_64_MS)
#> # A tibble: 6 x 29
#>    time    AT    BE    BG    CY    CZ    DE    DK    EE    EL    ES    FI    FR
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  2002  70.9  64.7  56.5  75.1  71.7  68.8  78.3  68.3  62.8  63.2  73.2  68.6
#> 2  2003  71.3  64.5  58.7  75.4  71    68.4  77.4  69.1  63.8  64.3  72.9  69.7
#> 3  2004  68.4  65.8  61.2  75.7  70.1  67.9  78.1  70.2  64.3  65.2  72.5  69.2
#> 4  2005  70.4  66.5  61.9  74.4  70.7  69.4  78    72    64.4  67.5  73    69.4
#> 5  2006  71.6  66.5  65.1  75.8  71.2  71.1  79.4  75.9  65.6  69    73.9  69.4
#> 6  2007  72.8  67.7  68.4  76.8  72    72.9  79    76.9  65.8  69.7  74.8  69.9
#> # … with 16 more variables: HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>,
#> #   LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>, RO <dbl>,
#> #   SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>
emp_20_64_MS
#> # A tibble: 17 x 29
#>     time    AT    BE    BG    CY    CZ    DE    DK    EE    EL    ES    FI    FR
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  2002  70.9  64.7  56.5  75.1  71.7  68.8  78.3  68.3  62.8  63.2  73.2  68.6
#>  2  2003  71.3  64.5  58.7  75.4  71    68.4  77.4  69.1  63.8  64.3  72.9  69.7
#>  3  2004  68.4  65.8  61.2  75.7  70.1  67.9  78.1  70.2  64.3  65.2  72.5  69.2
#>  4  2005  70.4  66.5  61.9  74.4  70.7  69.4  78    72    64.4  67.5  73    69.4
#>  5  2006  71.6  66.5  65.1  75.8  71.2  71.1  79.4  75.9  65.6  69    73.9  69.4
#>  6  2007  72.8  67.7  68.4  76.8  72    72.9  79    76.9  65.8  69.7  74.8  69.9
#>  7  2008  73.8  68    70.7  76.5  72.4  74    78.7  77.1  66.3  68.5  75.8  70.5
#>  8  2009  73.4  67.1  68.8  75.3  70.9  74.2  76.1  70    65.6  64    73.5  69.5
#>  9  2010  73.9  67.6  64.7  75    70.4  75    74.9  66.8  63.8  62.8  73    69.3
#> 10  2011  74.2  67.3  62.9  73.4  70.9  76.5  74.8  70.6  59.6  62    73.8  69.2
#> 11  2012  74.4  67.2  63    70.2  71.5  76.9  74.3  72.2  55    59.6  74    69.4
#> 12  2013  74.6  67.2  63.5  67.2  72.5  77.3  74.3  73.3  52.9  58.6  73.3  69.5
#> 13  2014  74.2  67.3  65.1  67.6  73.5  77.7  74.7  74.3  53.3  59.9  73.1  69.2
#> 14  2015  74.3  67.2  67.1  67.9  74.8  78    75.4  76.5  54.9  62    72.9  69.5
#> 15  2016  74.8  67.7  67.7  68.7  76.7  78.6  76    76.6  56.2  63.9  73.4  70  
#> 16  2017  75.4  68.5  71.3  70.8  78.5  79.2  76.6  78.7  57.8  65.5  74.2  70.6
#> 17  2018  76.2  69.7  72.4  73.9  79.9  79.9  77.5  79.5  59.5  67    76.3  71.3
#> # … with 16 more variables: HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>,
#> #   LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>, RO <dbl>,
#> #   SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>

This dataset is always ready for the analysis without further processing. Details on how to download datasets similar to emp_20_64_MS are provided in Section 2. Following, we illustrate how to analyse the convergence with locally accessible data starting from the first example related to the Mean Life Satisfaction indicator.

NOTE: within the convergeEU package, Eurofound data are statically stored. Please update this package to have the most recent version of Eurofound data.

1.1 The Mean Life Satisfaction indicator

1.1.1 Data preparation step

All the indicators are extracted from the locally accessible Eurofound database in the first step of the analysis through the extract_indicator_EUF function. It amounts to choose a time interval, an indicator and a set of countries (MS). As already stated above, the information on the locally accessible indicators are provided in the object dbEUF2018meta as follows:

names(dbEUF2018meta)
head(dbEUF2018meta)

where the “Code_in_database” is the information that should be provided to extract a given indicator. Information on sets of countries are available within the function convergEU_glb() that contains lists of constants and setups for the convergEU package. This function generates global static objects of clusters of countries with their corresponding labels, as well as metainformation on the stored indicators:

names(convergEU_glb())
#>  [1] "EUcodes"         "Eurozone"        "EA"              "EU12"           
#>  [5] "EU15"            "EU19"            "EU25"            "EU27_2007"      
#>  [9] "EU27_2019"       "EU27_2020"       "EU27"            "EU28"           
#> [13] "geoRefEUF"       "metaEUStat"      "tmpl_out"        "paralintags"    
#> [17] "rounDigits"      "epsilonV"        "scoreBoaTB"      "labels_clusters"

The first object, EUcodes, in the function convergEU_glb() relates to the labels for all the MS:

convergEU_glb()$EUcodes
#> # A tibble: 28 x 4
#>    pae   paeF   paeN paeS 
#>    <chr> <chr> <dbl> <chr>
#>  1 1-AT  1-AT      1 AT   
#>  2 10-ES 10-ES    10 ES   
#>  3 11-FI 11-FI    11 FI   
#>  4 12-FR 12-FR    12 FR   
#>  5 13-HR 13-HR    13 HR   
#>  6 14-HU 14-HU    14 HU   
#>  7 15-IE 15-IE    15 IE   
#>  8 16-IT 16-IT    16 IT   
#>  9 17-LT 17-LT    17 LT   
#> 10 18-LU 18-LU    18 LU   
#> # … with 18 more rows

The objects containing clusters of MS with their corresponding labels are: Eurozone, EA, EU12, EU15, EU19, EU25, EU27, EU27_2007, EU27_2019, EU27_2020 and EU28. It must be noted that EU27 refers to Member States after the 1st Februrary 2020, while EU28 is a valid tag up to 3 March 2020. The clusters EU27_2020 and EU27_2007 as defined by Eurofound are also available; for further details see https://ec.europa.eu/eurostat/help/faq/brexit. For example, the clusters EU12 and EU27_2020 of MS consist of the following countries with their corresponding labels:

convergEU_glb()$EU12
#> $dates
#> [1] "01-11-1993" "31/12/1994"
#> 
#> $memberStates
#> # A tibble: 12 x 2
#>    MS             codeMS
#>    <chr>          <chr> 
#>  1 Belgium        BE    
#>  2 Denmark        DK    
#>  3 France         FR    
#>  4 Germany        DE    
#>  5 Greece         EL    
#>  6 Ireland        IE    
#>  7 Italy          IT    
#>  8 Luxembourg     LU    
#>  9 Netherlands    NL    
#> 10 Portugal       PT    
#> 11 Spain          ES    
#> 12 United-Kingdom UK
convergEU_glb()$EU27_2020
#> $dates
#> [1] "01/02/2020" "00/00/0000"
#> 
#> $memberStates
#> # A tibble: 27 x 2
#>    MS          codeMS
#>    <chr>       <chr> 
#>  1 Belgium     BE    
#>  2 Denmark     DK    
#>  3 France      FR    
#>  4 Germany     DE    
#>  5 Greece      EL    
#>  6 Ireland     IE    
#>  7 Italy       IT    
#>  8 Luxembourg  LU    
#>  9 Netherlands NL    
#> 10 Portugal    PT    
#> # … with 17 more rows

Please note that EU19, Eurozone and EA are synonyms:

convergEU_glb()$EU19
#> $dates
#> [1] NA NA
#> 
#> $memberStates
#> # A tibble: 19 x 2
#>    MS          codeMS
#>    <chr>       <chr> 
#>  1 Austria     AT    
#>  2 Belgium     BE    
#>  3 Cyprus      CY    
#>  4 Estonia     EE    
#>  5 Finland     FI    
#>  6 France      FR    
#>  7 Germany     DE    
#>  8 Greece      EL    
#>  9 Ireland     IE    
#> 10 Italy       IT    
#> 11 Latvia      LV    
#> 12 Lithuania   LT    
#> 13 Luxembourg  LU    
#> 14 Malta       MT    
#> 15 Netherlands NL    
#> 16 Portugal    PT    
#> 17 Slovakia    SK    
#> 18 Slovenia    SI    
#> 19 Spain       ES
convergEU_glb()$Eurozone
#> $dates
#> [1] NA NA
#> 
#> $memberStates
#> # A tibble: 19 x 2
#>    MS          codeMS
#>    <chr>       <chr> 
#>  1 Austria     AT    
#>  2 Belgium     BE    
#>  3 Cyprus      CY    
#>  4 Estonia     EE    
#>  5 Finland     FI    
#>  6 France      FR    
#>  7 Germany     DE    
#>  8 Greece      EL    
#>  9 Ireland     IE    
#> 10 Italy       IT    
#> 11 Latvia      LV    
#> 12 Lithuania   LT    
#> 13 Luxembourg  LU    
#> 14 Malta       MT    
#> 15 Netherlands NL    
#> 16 Portugal    PT    
#> 17 Slovakia    SK    
#> 18 Slovenia    SI    
#> 19 Spain       ES
convergEU_glb()$EA
#> $dates
#> [1] NA NA
#> 
#> $memberStates
#> # A tibble: 19 x 2
#>    MS          codeMS
#>    <chr>       <chr> 
#>  1 Austria     AT    
#>  2 Belgium     BE    
#>  3 Cyprus      CY    
#>  4 Estonia     EE    
#>  5 Finland     FI    
#>  6 France      FR    
#>  7 Germany     DE    
#>  8 Greece      EL    
#>  9 Ireland     IE    
#> 10 Italy       IT    
#> 11 Latvia      LV    
#> 12 Lithuania   LT    
#> 13 Luxembourg  LU    
#> 14 Malta       MT    
#> 15 Netherlands NL    
#> 16 Portugal    PT    
#> 17 Slovakia    SK    
#> 18 Slovenia    SI    
#> 19 Spain       ES

Thus, it is important to note that if the user selects the cluster Eurozone, then the calculations are performed NOT for all the MS in the EU, but only for those in the clusters EA/EU19.

The object geoRefEUF contains MS as well as other countries (neighbouring countries of the EU):

convergEU_glb()$geoRefEUF
#> # A tibble: 77 x 2
#>    geo   geo_label  
#>    <chr> <chr>      
#>  1 AD    Andorra    
#>  2 AL    Albania    
#>  3 AM    Armenia    
#>  4 AT    Austria    
#>  5 AZ    Azerbaijan 
#>  6 BE    Belgium    
#>  7 BG    Bulgaria   
#>  8 BY    Belarus    
#>  9 CH    Switzerland
#> 10 CY    Cyprus     
#> # … with 67 more rows

The convergEU_glb() function provides also metainformation on different indicators, for example:

convergEU_glb()$metaEUStat
#> # A tibble: 56 x 6
#>    Official_code Code_in_database indicator Official_code_p… subSelection
#>    <chr>         <chr>            <chr>     <chr>            <chr>       
#>  1 lfsa_argaed   act_15-64_p      Activity… lfsa_argaed      <NA>        
#>  2 lfsa_ergaed   emp_20_64_p      Employme… lfsa_ergaed      <NA>        
#>  3 lfsa_ergacob  emp_nonat_p      Employme… lfsa_ergacob     <NA>        
#>  4 lfsa_urgan    unem_15_74_p     Unemploy… lfsa_urgan       <NA>        
#>  5 une_educ_a    unem_pri_p       Unemploy… une_educ_a       <NA>        
#>  6 lfsa_upgan    ltunemp_p        Long-ter… lfsa_upgan       <NA>        
#>  7 edat_lfse_20  neet_p           NEET's r… edat_lfse_20     <NA>        
#>  8 une_rt_a      youthunemp_p     Youth un… une_rt_a         <NA>        
#>  9 ilc_lvhl36    temptoperm_p     Transiti… ilc_lvhl36       <NA>        
#> 10 ilc_lvhl30    parttofull_p     Transiti… ilc_lvhl30       <NA>        
#> # … with 46 more rows, and 1 more variable: selectorUser <chr>

The general procedure to extract locally accessible data for a given indicator is illustrated below for the Mean Life satisfaction indicator, labelled lifesatisf in the column “Code_in_database” within the object dbEUF2018meta containing the metainformation:

names(dbEUF2018meta)
#>  [1] "DIMENSION"           "SUBDIMENSION"        "INDICATOR"          
#>  [4] "Code_in_database"    "Official_code"       "Unit"               
#>  [7] "Source_organisation" "Source_reference"    "Disaggregation"     
#> [10] "Bookmark_URL"

We proceed to extract the data for the lifesatisf indicator by invoking the extract_indicator_EUF function, and by considering the time interval from 2003 to 2016 and the cluster EU19 of MS:

myTBlf <- extract_indicator_EUF(
  indicator_code = "lifesatisf", #Code_in_database
  fromTime=2003,
  toTime=2016,
  gender= c("Total","Females","Males")[1],
  countries= convergEU_glb()$EU19$memberStates$codeMS
)
myTBlf
#> $res
#> # A tibble: 4 x 21
#>    time sex      AT    BE    CY    DE    EE    EL    ES    FI    FR    IE    IT
#>   <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  2003 Total  7.85  7.52  7.22  7.36  5.94  6.69  7.49  8.14  6.96  7.70  7.22
#> 2  2007 Total  6.95  7.54  7.05  7.16  6.72  6.58  7.25  8.23  7.32  7.59  6.59
#> 3  2011 Total  7.66  7.38  7.16  7.20  6.28  6.16  7.47  8.08  7.23  7.39  6.88
#> 4  2016 Total  7.92  7.31  6.54  7.31  6.73  5.26  6.95  8.07  7.17  7.68  6.56
#> # … with 8 more variables: LT <dbl>, LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>,
#> #   PT <dbl>, SI <dbl>, SK <dbl>
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

which results in a dataset in the format time by countries in the list component named “$res”; further components are “$msg” that possibly carries messages for the user and “$err” which is a string containing an error message, if an error occurs. Please note that the function is conditioned to gender.

IMPORTANT: Users exploiting automated access to the database through web services or bulk download should pay particular attention to changes of labels. Further information about EU aggregates is available at “https://ec.europa.eu/eurostat/help/faq/brexit”.

In this example for the lifesatisf indicator, the data are extracted for both males and females. Otherwise, data only for females or only for males could be extracted as follows:

# Extract data only for females:
feTBlf <- extract_indicator_EUF(
  indicator_code = "lifesatisf", #Code_in_database
  fromTime=2003,
  toTime=2016,
  gender= c("Total","Females","Males")[2],
  countries= convergEU_glb()$EU19$memberStates$codeMS
)
feTBlf
#> $res
#> # A tibble: 4 x 21
#>    time sex       AT    BE    CY    DE    EE    EL    ES    FI    FR    IE    IT
#>   <dbl> <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  2003 Femal…  7.93  7.38  7.13  7.48  5.91  6.75  7.43  8.27  6.97  7.89  7.12
#> 2  2007 Femal…  7.20  7.49  7.06  7.17  6.77  6.55  7.14  8.30  7.34  7.66  6.54
#> 3  2011 Femal…  7.63  7.47  7.07  7.27  6.26  6.13  7.51  8.27  7.17  7.41  6.86
#> 4  2016 Femal…  7.87  7.27  6.50  7.28  6.81  5.30  6.97  8.10  7.24  7.66  6.56
#> # … with 8 more variables: LT <dbl>, LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>,
#> #   PT <dbl>, SI <dbl>, SK <dbl>
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL
# Extract data only for males:
maTBlf <- extract_indicator_EUF(
  indicator_code = "lifesatisf", #Code_in_database
  fromTime=2003,
  toTime=2016,
  gender= c("Total","Females","Males")[3],
  countries= convergEU_glb()$EU19$memberStates$codeMS
)
maTBlf
#> $res
#> # A tibble: 4 x 21
#>    time sex      AT    BE    CY    DE    EE    EL    ES    FI    FR    IE    IT
#>   <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  2003 Males  7.75  7.66  7.32  7.23  5.98  6.63  7.56  7.99  6.95  7.51  7.32
#> 2  2007 Males  6.67  7.60  7.03  7.16  6.67  6.62  7.37  8.16  7.31  7.51  6.64
#> 3  2011 Males  7.69  7.28  7.26  7.13  6.30  6.19  7.41  7.87  7.29  7.37  6.91
#> 4  2016 Males  7.98  7.35  6.58  7.34  6.64  5.23  6.93  8.04  7.09  7.71  6.55
#> # … with 8 more variables: LT <dbl>, LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>,
#> #   PT <dbl>, SI <dbl>, SK <dbl>
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

First, it is convenient to assign a meaningful name to the locally extracted data for both males and females of the lifesatisf indicator:

TBlf<-myTBlf$res
dim(TBlf)
#> [1]  4 21

that is 4 rows and 21 columns, and variables names:

names(TBlf)
#>  [1] "time" "sex"  "AT"   "BE"   "CY"   "DE"   "EE"   "EL"   "ES"   "FI"  
#> [11] "FR"   "IE"   "IT"   "LT"   "LU"   "LV"   "MT"   "NL"   "PT"   "SI"  
#> [21] "SK"

It is worthwhile to mention that the analysis of convergence is performed on clean and imputed tidy dataset in the format years by countries. Thus, the dataset cannot have qualitative variables, neither vector of strings nor missing values for computing convergence measures, and a time variable should also be present. The check_data() function is called to check for the presence of unsuited features that must be solved before starting the analysis.

For example, for the TBlf data we have:

check_data(TBlf)
#> $res
#> NULL
#> 
#> $msg
#> NULL
#> 
#> $err
#> [1] "Error: string  variables in the dataframe."

where the suspected string variable is sex that should be deleted as follows:

lifesatFin <- dplyr::select(TBlf,-sex)
check_data(lifesatFin)
#> $res
#> [1] TRUE
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

Once the string variable sex is deleted, the check_data() function is called again:

check_data(lifesatFin)
#> $res
#> [1] TRUE
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

Thus, lifesatFin is the final object to start the analysis of convergence.

1.1.2 Beta convergence for the lifesatFin dataset

Several measures of convergence have been recently proposed by Eurofound (Eurofound, 2018). In the convergEU package four measures of convergence are implemented: Beta, Sigma, Gamma and Delta convergences. Following, each measure is illustrated in details for the lifesatFin dataset.

Beta convergence is calculated in the convergEU package through the beta_conv() function. The dataset should be in the format years by countries and no other variable besides time and countries indicator must be present. It must be noted that if the time variable is not ordered, it must be sorted. For further details on how to sort not ordered time variables, see the first example in the help documentation:

help(beta_conv)

For the the lifesatFin dataset, the Beta convergence is calculated as follows:

require(ggplot2)
require(dplyr)
require(tibble)

empBC <- beta_conv(lifesatFin, 
                   time_0 = 2003, 
                   time_t = 2016, 
                   all_within = FALSE, 
                   timeName = "time")

where the reference time (time_0) is 2003 while the target time (time_t) is 2016. Note that the option all_within = FALSE is the default in order to ensure that just the first and the last two years are considered, e.g. 2003 and 2016 in this example. Otherwise, if all_within = TRUE then all the years in the specified time interval are considered.

It must be also noted that in the implementation of the beta_conv() function, the same reference time is maintained across different years and the left hand side is divided by the amount of time elapsed (i.e. time_t-time_0) with the argument useTau = TRUE by default. Thus, the division by the number of years elapsed may be skipped when the argument useTau = FALSE is specified:

beta_conv(lifesatFin, 
          time_0 = 2003, 
          time_t = 2016, 
          all_within = FALSE, 
          timeName = "time",
          useTau = FALSE)

The output of the beta_conv() function consists of the usual three list components: “$res” that contains the results, “$msg” that possibly carries messages for the user and “$err” which is a string containing an error message, if an error occurs:

empBC
#> $res
#> $res$workTB
#> # A tibble: 19 x 3
#>    deltaIndic indic countries
#>         <dbl> <dbl> <chr>    
#>  1   0.000743  2.06 AT       
#>  2  -0.00216   2.02 BE       
#>  3  -0.00768   1.98 CY       
#>  4  -0.000537  2.00 DE       
#>  5   0.00962   1.78 EE       
#>  6  -0.0185    1.90 EL       
#>  7  -0.00576   2.01 ES       
#>  8  -0.000606  2.10 FI       
#>  9   0.00222   1.94 FR       
#> 10  -0.000174  2.04 IE       
#> 11  -0.00736   1.98 IT       
#> 12   0.0134    1.69 LT       
#> 13   0.00211   2.04 LU       
#> 14   0.00970   1.72 LV       
#> 15   0.00254   1.99 MT       
#> 16   0.00193   2.02 NL       
#> 17   0.0105    1.79 PT       
#> 18  -0.00209   1.95 SI       
#> 19   0.00954   1.73 SK       
#> 
#> $res$model
#> 
#> Call:
#> stats::lm(formula = deltaIndic ~ indic, data = workTB)
#> 
#> Coefficients:
#> (Intercept)        indic  
#>     0.07130     -0.03639  
#> 
#> 
#> $res$summary
#> # A tibble: 2 x 5
#>   term        estimate std.error statistic p.value
#>   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
#> 1 (Intercept)   0.0713    0.0231      3.09 0.00669
#> 2 indic        -0.0364    0.0119     -3.05 0.00720
#> 
#> $res$beta1
#> [1] -0.0363931
#> 
#> $res$adj.r.squared
#> [1] 0.3160892
#> 
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

By considering more in details the component “$res”, it contains for example:

  • a list with the values of Beta convergence:
empBC$res$workTB
#> # A tibble: 19 x 3
#>    deltaIndic indic countries
#>         <dbl> <dbl> <chr>    
#>  1   0.000743  2.06 AT       
#>  2  -0.00216   2.02 BE       
#>  3  -0.00768   1.98 CY       
#>  4  -0.000537  2.00 DE       
#>  5   0.00962   1.78 EE       
#>  6  -0.0185    1.90 EL       
#>  7  -0.00576   2.01 ES       
#>  8  -0.000606  2.10 FI       
#>  9   0.00222   1.94 FR       
#> 10  -0.000174  2.04 IE       
#> 11  -0.00736   1.98 IT       
#> 12   0.0134    1.69 LT       
#> 13   0.00211   2.04 LU       
#> 14   0.00970   1.72 LV       
#> 15   0.00254   1.99 MT       
#> 16   0.00193   2.02 NL       
#> 17   0.0105    1.79 PT       
#> 18  -0.00209   1.95 SI       
#> 19   0.00954   1.73 SK
  • summary of the model results:
empBC$res$summary
#> # A tibble: 2 x 5
#>   term        estimate std.error statistic p.value
#>   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
#> 1 (Intercept)   0.0713    0.0231      3.09 0.00669
#> 2 indic        -0.0364    0.0119     -3.05 0.00720
  • the \(\beta_{1}\) estimated coefficient:
empBC$res$beta1
#> [1] -0.0363931

and its standard error:

empBC$res$summary$std.error[2]
#> [1] 0.01192145
  • the Adjusted R-Squared:
empBC$res$adj.r.squared
#> [1] 0.3160892

as well as several components for the estimated statistical model (by OLS-Ordinary Least Squares), e.g. estimated coefficients, residuals, fitted values, the residual degrees of freedom. Note that a standard two tails test is reported (p-value and adjusted R-squared).

A plot of transformed data and the straight line for the results obtained for beta_convergence may also be useful. To this end, the function beta_conv_graph is specifically implemented for obtaining this graphical representation as follows:

mybetaplot<-beta_conv_graph(betaRes=empBC,
 indiName = 'Mean Life Satisfaction',
 time_0 = 2003,
 time_t = 2016)

mybetaplot

where in the argument betaRes the user should specify the output of the beta_conv function, while time_0 and time_t refer to the starting and ending time respectively. It must be noted that the name of the indicator is specified in the argument indiName as a string.


1.1.3 Sigma convergence for the lifesatFin dataset

Following, the Sigma convergence is calculated by invoking the sigma_conv function for the lifesatFin dataset, that is already in the required format years by countries.

help(sigma_conv)

Note that if the arguments time_0 and time_t are not specified, then all years are considered:

mySTB <- sigma_conv(lifesatFin,timeName="time")
mySTB
#> $res
#> # A tibble: 4 x 5
#>    time stdDev     CV  mean devianceT
#>   <dbl>  <dbl>  <dbl> <dbl>     <dbl>
#> 1  2003  0.814 0.117   6.97     12.6 
#> 2  2007  0.593 0.0836  7.09      6.69
#> 3  2011  0.542 0.0765  7.09      5.58
#> 4  2016  0.687 0.0977  7.03      8.98
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

Otherwise, it is possible to select a time window for which the Sigma convergence is calculated as follows:

sigma_conv(lifesatFin,time_0 = 2007,time_t = 2011)

The output of the sigma_conv function is the usual list:

  • “$res” which contains the values of Sigma convergence (called stdDev or CV) along time. This component also contains the mean and deviance values along time;
  • “$msg” which possibly carries messages for the user;
  • “$err” containing an error message, if an error has occurred. If no error has occurred, the message is ‘NULL’.

It is also possible to obtain a graphical representation of the standard deviation and the coefficient of variation obtained for the Sigma convergence by invoking the sigma_conv_graph fucntion as follows:

mysigmaplot<-sigma_conv_graph(sigmaconvOut=mySTB,
                              time_0 = 2007,
                              time_t = 2011, 
                              aggregation='EU19')
mysigmaplot

where in the argument sigmaconvOut the user should specify the output of the sigma_conv function, while time_0 and time_t refer to the starting and ending time respectively. Note that the considered cluster of MS is specified in the argument aggregation as a string.

1.1.4 Gamma convergence for the lifesatFin dataset

The third measure of convergence implemented in the convergEU package is the Gamma convergence that is computed by invoking the gamma_conv function. As usual, the dataset should be in the format years by countries. For the lifesatFin dataset, the Gamma convergence is calculated as follows:

gamma_conv(lifesatFin,ref=2003,last=2016,timeName="time")
#> $res
#> [1] 0.5762807
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

with 2003 as reference time (argument ref) and 2016 the last time (argument last) to be considered. The output is the usual list of the three components “$res”, “$msg” and “$err”. It is also possible to print the ranks of countries for each time by specifying the option printRanks=T:

gamma_conv(lifesatFin,ref=2003,last=2016,timeName="time", printRanks = T)
#> Ranks:
#>      AT BE CY DE EE EL ES FI FR IE IT LT LU LV MT NL PT SI SK
#> [1,] 18 14 10 12  4  6 13 19  7 17  9  1 16  2 11 15  5  8  3
#> [2,]  8 14  9 10  7  4 12 19 13 16  5  3 18  1 15 17  2 11  6
#> [3,] 16 13  9 10  3  1 15 19 11 14  7  5 18  2 12 17  6  8  4
#> [4,] 18 13  5 12  7  1 10 19 11 15  6  4 17  2 14 16  9  8  3
#> $res
#> [1] 0.5762807
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

Note that the Gamma convergence could be also calculated between pairs of subsequent years by invoking the gamma_conv_msteps function. For example for the lifesatFin dataset:

GCms <- gamma_conv_msteps(lifesatFin,startTime=2003,endTime=2016, timeName = "time")
GCms
#> $res
#> # A tibble: 4 x 2
#>    time gammaConv
#>   <dbl>     <dbl>
#> 1  2003    NA    
#> 2  2007     0.4  
#> 3  2011     0.510
#> 4  2016     0.576
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

The gamma_conv_msteps function considers the years chosen (in the example above 2003 and 2018) and calculates the Gamma convergence for each pair of subsequent years (2003-2007, 2007-2011 and 2011-2016 in the example above).

1.1.5 Delta convergence for the lifesatFin dataset

The last measure of convergence implemented in the convergEU package is the Delta convergence computed by invoking the delta_conv function. Given a dataset of quantitative indicators along time, the Delta convergence is a statistic describing departures from the best performer. To calculate Delta convergence, it is sufficient to specify the dataset and the time variable name. For example, for the lifesatisf indicator, the results will be:

delta_conv(lifesatFin,"time")
#> $res
#> # A tibble: 4 x 2
#>    time delta
#>   <dbl> <dbl>
#> 1  2003  22.3
#> 2  2007  21.7
#> 3  2011  18.8
#> 4  2016  19.8
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

with an output given by the components “$res” (results), “$msg” (message) and “$err” (error). It must be noted that the delta_conv function allows to obtain also the declaration of convergence. To this end, the argument extended should be specified as TRUE. For example, for the lifesatisf indicator the syntax is as follows:

delta_conv(lifesatFin,"time", extended=TRUE)
#> $res
#> $res$delta_conv
#> # A tibble: 4 x 2
#>    time delta
#>   <dbl> <dbl>
#> 1  2003  22.3
#> 2  2007  21.7
#> 3  2011  18.8
#> 4  2016  19.8
#> 
#> $res$differences
#>             AT        BE        CY        DE       EE       EL        ES FI
#> [1,] 0.2900786 0.6212335 0.9162025 0.7786398 2.200235 1.444585 0.6465211  0
#> [2,] 1.2864127 0.6902347 1.1851578 1.0695601 1.511724 1.651325 0.9796877  0
#> [3,] 0.4182611 0.6992726 0.9145322 0.8737502 1.800689 1.916749 0.6120100  0
#> [4,] 0.1501288 0.7651687 1.5381799 0.7660656 1.345254 2.811414 1.1226888  0
#>             FR        IE        IT       LT        LU       LV        MT
#> [1,] 1.1742063 0.4352756 0.9211011 2.694273 0.4490004 2.563162 0.8151641
#> [2,] 0.9090123 0.6468849 1.6447601 1.911084 0.3330436 2.193221 0.6741619
#> [3,] 0.8511925 0.6848898 1.1936469 1.376650 0.2882094 1.835249 0.8437681
#> [4,] 0.9063277 0.3888202 1.5160985 1.590870 0.1709514 1.750492 0.5050535
#>             NL       PT       SI       SK
#> [1,] 0.5916190 2.146662 1.094598 2.480780
#> [2,] 0.3640208 2.043251 1.005834 1.555401
#> [3,] 0.3846836 1.310083 1.125367 1.691687
#> [4,] 0.3358555 1.202321 1.219944 1.670450
#> 
#> $res$difference_last_first
#>          AT          BE          CY          DE          EE          EL 
#> -0.13994980  0.14393520  0.62197733 -0.01257420 -0.85498142  1.36682844 
#>          ES          FI          FR          IE          IT          LT 
#>  0.47616768  0.00000000 -0.26787853 -0.04645538  0.59499741 -1.10340261 
#>          LU          LV          MT          NL          PT          SI 
#> -0.27804899 -0.81267023 -0.31011057 -0.25576353 -0.94434166  0.12534523 
#>          SK 
#> -0.81032991 
#> 
#> $res$strict_conv_ini_last
#> [1] FALSE
#> 
#> $res$label_strict
#> [1] " "
#> 
#> $res$converg_ini_last
#> [1] TRUE
#> 
#> $res$label_conver
#> [1] "convergence"
#> 
#> $res$diffe_delta
#> [1] -2.507256
#> 
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

In this contex, it is also useful to evaluate how much a collection of MS deviates from the EU mean for a given indicator and a period of time. In order to obtain this further information the demea_change function has been implemented in the convergEU package:

help(demea_change)

This function calculates the differences from the EU mean at subsequent time within each MS, and afterwards these negative/ positive differences are added over years within the MSs. To illustrate in details the demea_change function, let’s consider the data for the lifesatisf indicator. Suppose that we choose to calculate deviations from the EU mean for Italy, France, Germany and Spain (i.e. argument sele_countries), and by considering as starting time 2003 (i.e. the argument time_0) and as ending time 2016 (i.e. the argument time_t):

res1<-demea_change(lifesatFin,
                   timeName="time",
                   time_0 = 2003,
                   time_t = 2016,
                   sele_countries= c('IT','FR','DE','ES'),
                   doplot=TRUE)

where the argument doplot is equal to TRUE for also obtaining the plot of the calculated differences. Note that if the argument sele_countries is equal to NA, then all countries in the dataset are considered. The output of the demea_change function consists of the usual three list components: “$res” that contains the results, “$msg” that possibly carries messages for the user and “$err” which is a string containing an error message, if an error occurs:

names(res1)
#> [1] "res" "msg" "err"

By considering more in details the component “$res”, it contains for example:

  • a list with the differences at each time for each MS:
res1$res$resDiffe
#> # A tibble: 4 x 20
#>    time     AT    BE      CY     DE     EE     EL      ES    FI       FR    IE
#>   <dbl>  <dbl> <dbl>   <dbl>  <dbl>  <dbl>  <dbl>   <dbl> <dbl>    <dbl> <dbl>
#> 1  2003  0.882 0.551  0.256  0.393  -1.03  -0.273  0.525  1.17  -0.00245 0.736
#> 2  2007 -0.147 0.449 -0.0454 0.0702 -0.372 -0.512  0.160  1.14   0.231   0.493
#> 3  2011  0.572 0.291  0.0760 0.117  -0.810 -0.926  0.379  0.991  0.139   0.306
#> 4  2016  0.890 0.275 -0.498  0.274  -0.305 -1.77  -0.0829 1.04   0.133   0.651
#> # … with 9 more variables: IT <dbl>, LT <dbl>, LU <dbl>, LV <dbl>, MT <dbl>,
#> #   NL <dbl>, PT <dbl>, SI <dbl>, SK <dbl>
  • a list with the differences in absoute value at each time for each MS:
res1$res$diffe_abs_diff
#> # A tibble: 3 x 20
#>    time     AT      BE      CY      DE     EE    EL     ES      FI       FR
#>   <dbl>  <dbl>   <dbl>   <dbl>   <dbl>  <dbl> <dbl>  <dbl>   <dbl>    <dbl>
#> 1  2007 -0.735 -0.101  -0.210  -0.323  -0.656 0.239 -0.365 -0.0320  0.228  
#> 2  2011  0.426 -0.158   0.0306  0.0466  0.438 0.415  0.219 -0.149  -0.0913 
#> 3  2016  0.317 -0.0167  0.422   0.157  -0.505 0.845 -0.296  0.0492 -0.00590
#> # … with 10 more variables: IE <dbl>, IT <dbl>, LT <dbl>, LU <dbl>, LV <dbl>,
#> #   MT <dbl>, NL <dbl>, PT <dbl>, SI <dbl>, SK <dbl>
  • a tibble containing negative and positive differences for the selected MSs:
res1$res$stats
#> # A tibble: 4 x 4
#>   MS    negaSum posiSum  posi
#>   <chr>   <dbl>   <dbl> <int>
#> 1 DE    -0.323    0.204     4
#> 2 ES    -0.661    0.219     7
#> 3 FR    -0.0972   0.228     9
#> 4 IT    -0.302    0.528    11

and minimum and maximum of the calculated differences:

res1$res$miniX
#> [1] -1.161148
res1$res$maxiX
#> [1] 1.498789

To plot the calculated differences, the user should invoke the plot function as follows:

plot(res1$res$res_graph)

1.1.6 Country fiches

The counvergEU package provides the function go_ms_fi to prepare country fiches. This function prepares an html file (which can then be saved as pdf) that shows several results for a given reference country, compared to one or more other countries of interest. More precisely, the results relate to several comparisons through the average (i.e. departures from the mean of the selected countries, total increase/ decrease in the gap with the EU mean, total gap with respect to the best country performer within each year) as well as patterns of convergence and divergence and scoreboards for the selected countries. By invoking the go_ms_fi, the html file can be created for different indicators, timeframes and countries. Note that, the go_ms_fi function automatically prepares the country fiches but it is also able to create a directory along an existing path and to copy the rmarkdown file representing the template within it. The Rmarkdown file is parameterized so that passing different parameters the compilation to HTML format takes place with different data, say different indicators and countries.

It is very important to prepare complete data in the required tidy format time by countries; that is, the dataset is made by a time variable and as many other variables as countries that enter into the calculation of the time average (see Section 1.1 for further information). For example, the lifesatFin dataset is in the required format time by countries:

lifesatFin
#> # A tibble: 4 x 20
#>    time    AT    BE    CY    DE    EE    EL    ES    FI    FR    IE    IT    LT
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  2003  7.85  7.52  7.22  7.36  5.94  6.69  7.49  8.14  6.96  7.70  7.22  5.44
#> 2  2007  6.95  7.54  7.05  7.16  6.72  6.58  7.25  8.23  7.32  7.59  6.59  6.32
#> 3  2011  7.66  7.38  7.16  7.20  6.28  6.16  7.47  8.08  7.23  7.39  6.88  6.70
#> 4  2016  7.92  7.31  6.54  7.31  6.73  5.26  6.95  8.07  7.17  7.68  6.56  6.48
#> # … with 7 more variables: LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PT <dbl>,
#> #   SI <dbl>, SK <dbl>

Below, a call to the function go_ms_fi() illustrates the syntax for the lifesatFin dataset:

go_ms_fi(
  workDF ='lifesatFin',
  countryRef ='DE',
  otherCountries = "c('IT','UK','FR')",
  time_0 = 2003,
  time_t = 2016,
  tName = 'time',
  indiType = 'highBest',
  aggregation= 'EU12',
  x_angle=  45,
  dataNow=  Sys.time(),
  author = 'A.Student',
  outFile = 'Germany-up2-2016-LIFESATFIN', 
  outDir = 'F:/analysis/IT2018',
  indiName= 'lifesatisf'
)

  

It is very important to emphasize some constraints and quite unusual ways to pass parameters to such a function. In fact, note that the first argument is the working dataset which is passed not as an R object but as a string, the name of the dataset that must be available in the R workspace before invoking go_ms_fi.
The second argument countryRef is a string with the short name of a member country that will be shown in one-country plots. One key country must be specified with some other countries of interest to better decorate graphs and compare performances. Less obvious, argument indiType = “lowBest” specifies if the considered indicator is built so that a low value is good for a country or if a high value is good (indiType = “highBest”). It must be noted that the argument lowBest should be used for indicators where the best possible situation is when the indicator has a low value, for instance poverty rate. The option highBest should be used when the best situation is determined by the indicator having a high value, for instance Employment rate as in the example above.

Of particular importance is the argument outFile that can be a string indicating the name of the output file. Similarly outDir is the path (unit and folders) in which the final compiled html will be stored. The sintax of the path depend on the operating system; for example outDir=‘F:/analysis/IT2018’ indicates that in the usb disk called ‘F’, within the folder ‘analysis’ is located folder ‘IT2018’ where R will write the country fiche. Note that a disk called ‘F’ must exist and also folder ‘analysis’ must exist in such unit, while on the contrary folder ‘IT2018’ is created by the function if it does not already exist.

Within the above mentioned output directory, besides the compiled html, it is also stored a file that has the name specified by the argument outFile but with added the string ‘-workspace.RData’ that contains data and plots produced during the compilation of the country fiche for further subsequent use in other technical reports.

NOTE: Internet connection should be available when invoking go_ms_fi function to properly rendering the results in the html file.


1.1.7 Indicator fiches

An auxiliary function go_indica_fi() is provided in the R package convergEU to produce an indicator fiche, where the output is an html file (which can then be saved as pdf). This means that the user chooses a given indicator, a time window and a set of countries, and the corresponding fiche is automatically produced by invoking the go_indica_fi() function. The html file produced in output reports measures of convergence as well as time series plots, box-plots, and patterns of convergence and divergence. Differently from the country fiches that require a reference country and a set of other countries for comparison, the indicators fiches contain the results of the analysis of convergence for a given indicator and a given cluster of countries.

When invoking the go_indica_fi() function, an output directory must be also specified. Note that some arguments are passed as strings instead of objects, as described in the last section above for the country fiches.

An example of syntax to invoke the procedure for the lifesatFin dataset is:

go_indica_fi(
  time_0 = 2003,
  time_t = 2016,
  timeName = 'time',
  workDF = 'lifesatFin' ,
  indicaT = 'lifesatisf',
  indiType = c('highBest','lowBest')[1],
  seleMeasure = 'all',
  seleAggre = 'EU19',
  x_angle =  45,
  data_res_download =  FALSE,
  auth = 'A.Student',
  dataNow =  Sys.time(),
  outFile = 'test_IT-lifesatFin',
  outDir = 'F:/analysis/EU19'
)

Note that the argument seleMeasure refers to the set of measures of convergence for which the analysis should be performed; if seleMeasure = ‘all’ as in the syntax above, then the results are produced for all four measures of convergence (Beta, Sigma, Delta and Gamma). The set of countries for which the analysis of convergence should be performed are specified in the argument seleAggre as a string; the default is the cluster of Member States EU27.

For advanced users: if the selection of countries is different from a standard aggregate (EU12, EU19 etc.), the argument “custom” can be used.

The option outDir specifies an existing unit and a number of existing folders, the path, in which the compiled fiche will be saved. Only the last folder may be created before compilation if it does not exist.

NOTE: Internet connection should be available when invoking go_indica_fi function to properly rendering the results in the html file. The fiches have been tested with the browsers Mozilla Firefox and Google Chrome.



1.2 The JQI_Intensity index indicator

A second example for locally accessible Eurofound data is reported below by considering the JQI_Intensity index indicator related to the working conditions dimension. Similarly to what described for the Mean Life Satisfaction indicator, the first step is the data preparation in order to subsequently perform the analysis of convergence.

1.2.1 Data preparation step: indicators’ extraction from the Eurofound dataset

To extract the data for the JQI_Intensity index indicator from the locally accessible Eurofound database, its label indication is necessary, and it is provided in the object dbEUF2018meta from the column “Code_in_database”:

print(dbEUF2018meta, n=200,width=200) 
#> # A tibble: 13 x 10
#>    DIMENSION          SUBDIMENSION      
#>    <chr>              <chr>             
#>  1 Quality of life    Life satisfaction 
#>  2 Quality of life    Health            
#>  3 Quality of life    Health            
#>  4 Quality of life    Quality of society
#>  5 Quality of life    Quality of society
#>  6 Quality of life    Quality of society
#>  7 Quality of life    Quality of society
#>  8 Quality of life    Quality of society
#>  9 Working conditions Working conditions
#> 10 Working conditions Working conditions
#> 11 Working conditions Working conditions
#> 12 Working conditions Working conditions
#> 13 Working conditions Working conditions
#>    INDICATOR        Code_in_database Official_code        Unit  Source_organisa…
#>    <chr>            <chr>            <chr>                <chr> <chr>           
#>  1 Mean life satis… lifesatisf       y16_q4               --    Eurofound       
#>  2 Mean health sta… health           y16_q48              --    Eurofound       
#>  3 Percentage of p… goodhealth_p     y16_q48              %     Eurofound       
#>  4 Mean level of t… trustlocal       y16_q35f             --    Eurofound       
#>  5 Level of involv… volunt           y16_q29a             --    Eurofound       
#>  6 Percentage of p… volunt_p         y16_q29a             %     Eurofound       
#>  7 Hours per week … caring_h         y16_q43a             hours Eurofound       
#>  8 Social Exclusio… socialexc_i      y16_socexindex       index Eurofound       
#>  9 JQI_Skills and … JQIskill_i       wq_slim -  JQI SLIM… index Eurofound       
#> 10 JQI_Physical en… JQIenviron_i     envsec_slim - JQI S… index Eurofound       
#> 11 JQI_Intensity i… JQIintensity_i   intens_slim - JQI S… index Eurofound       
#> 12 JQI_Working tim… JQItime_i        wlb_slim - JQI SLIM… index Eurofound       
#> 13 Exposition to d… exposdiscr_p     disc_d -  Having be… %     Eurofound       
#>    Source_reference Disaggregation Bookmark_URL                                 
#>    <chr>            <chr>          <chr>                                        
#>  1 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  2 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  3 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  4 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  5 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  6 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  7 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  8 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  9 EWCS             sex            https://www.eurofound.europa.eu/surveys/abou…
#> 10 EWCS             sex            https://www.eurofound.europa.eu/surveys/abou…
#> 11 EWCS             sex            https://www.eurofound.europa.eu/surveys/abou…
#> 12 EWCS             sex            https://www.eurofound.europa.eu/surveys/abou…
#> 13 EWCS             sex            https://www.eurofound.europa.eu/surveys/abou…

The set of MS are chosen from those available by invoking the convergEU_glb():

names(convergEU_glb())
#>  [1] "EUcodes"         "Eurozone"        "EA"              "EU12"           
#>  [5] "EU15"            "EU19"            "EU25"            "EU27_2007"      
#>  [9] "EU27_2019"       "EU27_2020"       "EU27"            "EU28"           
#> [13] "geoRefEUF"       "metaEUStat"      "tmpl_out"        "paralintags"    
#> [17] "rounDigits"      "epsilonV"        "scoreBoaTB"      "labels_clusters"
convergEU_glb()$EU15$memberStates$MS
#>  [1] "Belgium"        "Denmark"        "France"         "Germany"       
#>  [5] "Greece"         "Ireland"        "Italy"          "Luxembourg"    
#>  [9] "Netherlands"    "Portugal"       "Spain"          "United-Kingdom"
#> [13] "Austria"        "Finland"        "Sweden"
convergEU_glb()$EU15$memberStates$codeMS
#>  [1] "BE" "DK" "FR" "DE" "EL" "IE" "IT" "LU" "NL" "PT" "ES" "UK" "AT" "FI" "SE"

where the EU15 cluster of MS is selected for this type of indicator. Lastly, a time interval is chosen, say from 1995 to 2015. Therefore, we proceed to extract the “JQIintensity_i” indicator for both males and females:

myTBjqt <- extract_indicator_EUF(
  indicator_code = "JQIintensity_i", #Code_in_database
  fromTime=1995,
  toTime=2015,
  gender= c("Total","Females","Males")[1],
  countries= convergEU_glb()$EU15$memberStates$codeMS
)
myTBjqt
#> $res
#> # A tibble: 5 x 17
#>    time sex      AT    BE    DE    DK    EL    ES    FI    FR    IE    IT    LU
#>   <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  1995 Total  48.8  33.2  40.8  39.0  40.9  34.2  47.1  38.4  39.0  34.1  31.4
#> 2  2000 Total  42.9  37.3  40.9  37.6  43.5  36.2  46.7  39.5  42.2  39.7  37.6
#> 3  2005 Total  47.6  42.8  46.9  47.9  50.5  41.2  49.6  40.5  36.9  41.9  40.6
#> 4  2010 Total  42.1  40.2  44.9  39.1  48.6  38.0  45.9  43.0  47.0  40.8  40.8
#> 5  2015 Total  42.4  41.5  40.2  45.0  49.3  46.5  41.1  42.7  42.8  38.1  42.4
#> # … with 4 more variables: NL <dbl>, PT <dbl>, SE <dbl>, UK <dbl>
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

which results in a dataset with the list components “$res”,“$msg” and“$err”.

1.2.2 The issue of missing all the data for a given country

For the purposes of better explaining other possible data formats into which the user could run into, let’s suppose that the values for the country Austria (“AT”) are completely missing, as highlighted in the following dataset myTBjqmiss:

myTBjqmiss<- mutate(myTBjqt$res, AT = as.numeric(NA))
myTBjqmiss
#> # A tibble: 5 x 17
#>    time sex      AT    BE    DE    DK    EL    ES    FI    FR    IE    IT    LU
#>   <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  1995 Total    NA  33.2  40.8  39.0  40.9  34.2  47.1  38.4  39.0  34.1  31.4
#> 2  2000 Total    NA  37.3  40.9  37.6  43.5  36.2  46.7  39.5  42.2  39.7  37.6
#> 3  2005 Total    NA  42.8  46.9  47.9  50.5  41.2  49.6  40.5  36.9  41.9  40.6
#> 4  2010 Total    NA  40.2  44.9  39.1  48.6  38.0  45.9  43.0  47.0  40.8  40.8
#> 5  2015 Total    NA  41.5  40.2  45.0  49.3  46.5  41.1  42.7  42.8  38.1  42.4
#> # … with 4 more variables: NL <dbl>, PT <dbl>, SE <dbl>, UK <dbl>

Thus, if we proceed to check the data for the presence of unsuited features (missing values, qualitative variables, vectors of strings):

check_data(myTBjqmiss, timeName = "time")
#> $res
#> NULL
#> 
#> $msg
#> NULL
#> 
#> $err
#> [1] "Error: one or more missing values in the dataframe."

it results in the error message due to the presence of missing values for Austria. Note that in the convergEU package, the function impute_dataset is specifically implemented for the imputation of missing values (see Section 2 and laters). Nevertheless, this function requires at least two non missing values for a given country. Thus, if all the values for a given country are missing, the imputation is not performed. Consequently, if the user proceeds to calculate the four measures of convergence, the following error message for each measure of convergence is returned:

#> [1] "Error: one or more missing values in the dataframe."

That is, the error message states that missing values are present in the dataset since the values of Austria are completely missing. There are some possible solutions even though in the literature it is still an open issue.

Lastly, it becomes clear that when the values for a given country are completely missing in the dataset, neither the country fiches nor the indicator fiches are computed.

2 Downloadable data from the Eurostat repository

2.1 Data preparation step: indicator’s downloading from the Eurostat repository

Eurostat data are downloaded on the fly from an active Internet connection.
The heterogeneity in the structure of different indicators normalized into the database requires some attentions. A list of covariates for each indicator is sometimes present besides age and gender, thus their values must be set to produce a tidy dataset in the format time by countries. The indicator “Employment rate” is chosen to illustrate how to prepare downloadable data from the Eurostat repository for the analysis of convergence. The data for a specific indicator are downloaded from the Eurostat repository by invoking the function download_indicator_EUS:

help(download_indicator_EUS)

It amounts to choose an indicator, a time interval, and a set of countries (MS). The function allows to also condition to gender, age and other variables according to the chosen indicator. It must be noted that when the argument rawDump is TRUE, raw data are downloaded which produce not a tidy dataset; thus, to obtain a tidy dataset in the format years by countries, the argument rawDump=F should be specified.

IMPORTANT: Users exploiting automated access to the database through web services or bulk download should pay particular attention to changes of labels. Further information about EU aggregates is available at “https://ec.europa.eu/eurostat/help/faq/brexit”.

In order to download the data for the “Employment rate” indicator, its indicator code is necessary. The metainformation on the indicator’s labelling is provided by invoking the convergEU_glb() function, as follows:

convergEU_glb()$metaEUStat
#> # A tibble: 56 x 6
#>    Official_code Code_in_database indicator Official_code_p… subSelection
#>    <chr>         <chr>            <chr>     <chr>            <chr>       
#>  1 lfsa_argaed   act_15-64_p      Activity… lfsa_argaed      <NA>        
#>  2 lfsa_ergaed   emp_20_64_p      Employme… lfsa_ergaed      <NA>        
#>  3 lfsa_ergacob  emp_nonat_p      Employme… lfsa_ergacob     <NA>        
#>  4 lfsa_urgan    unem_15_74_p     Unemploy… lfsa_urgan       <NA>        
#>  5 une_educ_a    unem_pri_p       Unemploy… une_educ_a       <NA>        
#>  6 lfsa_upgan    ltunemp_p        Long-ter… lfsa_upgan       <NA>        
#>  7 edat_lfse_20  neet_p           NEET's r… edat_lfse_20     <NA>        
#>  8 une_rt_a      youthunemp_p     Youth un… une_rt_a         <NA>        
#>  9 ilc_lvhl36    temptoperm_p     Transiti… ilc_lvhl36       <NA>        
#> 10 ilc_lvhl30    parttofull_p     Transiti… ilc_lvhl30       <NA>        
#> # … with 46 more rows, and 1 more variable: selectorUser <chr>

where both the columns selectorUser and Official_code_purified provides this information:

convergEU_glb()$metaEUStat$selectorUser
#>  [1] "lfsa_argaed"    "lfsa_ergaed"    "lfsa_ergacob"   "lfsa_urgan"    
#>  [5] "une_educ_a"     "lfsa_upgan"     "edat_lfse_20"   "une_rt_a"      
#>  [9] "ilc_lvhl36"     "ilc_lvhl30"     "lfsa_egad"      "lfsa_eppgai"   
#> [13] "lfsi_emp_a"     "lfsi_pt_a"      "lfsa_egised"    "earn_gr_gpgr2" 
#> [17] "selfemp_p"      "selfempwe_p"    "lfsa_etgar"     "prc_ppp_ind"   
#> [21] "gov_10dd_edpt1" "nasa_10_ki"     "gov_10a_main"   "labourcost_i"  
#> [25] "labourprod_i"   "compemp_pps"    "earn_nt_net"    "spr_exp_sum"   
#> [29] "expspr_p"       "ilc_pnp3"       "spr_exp_pens"   "spr_pns_ben"   
#> [33] "edat_lfse_14"   "trng_lfse_01"   "edat_lfse_03"   "expedu_p"      
#> [37] "hlth_silc_08"   "exphlth_p"      "isoc_sk_dskl_i" "isoc_bde15b_h" 
#> [41] "isoc_ci_im_i"   "isoc_ec_ibuy"   "isoc_ci_it_en2" "rd_p_perslf"   
#> [45] "rd_e_gerdtot"   "isoc_ciegi_ac"  "exppubserv_p"   "lfsa_eegan2"   
#> [49] "prc_hicp_aind"  "ilc_di12"       "ilc_di11"       "lfst_r_lmder"  
#> [53] "demo_mlexpec"   "hlth_hlye"      "y16_deprindex"  "hlth_dm060"
convergEU_glb()$metaEUStat$Official_code_purified
#>  [1] "lfsa_argaed"    "lfsa_ergaed"    "lfsa_ergacob"   "lfsa_urgan"    
#>  [5] "une_educ_a"     "lfsa_upgan"     "edat_lfse_20"   "une_rt_a"      
#>  [9] "ilc_lvhl36"     "ilc_lvhl30"     "lfsa_egad"      "lfsa_eppgai"   
#> [13] "lfsi_emp_a"     "lfsi_pt_a"      "lfsa_egised"    "earn_gr_gpgr2" 
#> [17] "lfsa_esgan"     "lfsa_esgan"     "lfsa_etgar"     "prc_ppp_ind"   
#> [21] "gov_10dd_edpt1" "nasa_10_ki"     "gov_10a_main"   "nama_10_lp_ulc"
#> [25] "nama_10_lp_ulc" "nama_10_lp_ulc" "earn_nt_net"    "spr_exp_sum"   
#> [29] "gov_10a_exp"    "ilc_pnp3"       "spr_exp_pens"   "spr_pns_ben"   
#> [33] "edat_lfse_14"   "trng_lfse_01"   "edat_lfse_03"   "gov_10a_exp"   
#> [37] "hlth_silc_08"   "gov_10a_exp"    "isoc_sk_dskl_i" "isoc_bde15b_h" 
#> [41] "isoc_ci_im_i"   "isoc_ec_ibuy"   "isoc_ci_it_en2" "rd_p_perslf"   
#> [45] "rd_e_gerdtot"   "isoc_ciegi_ac"  "gov_10a_exp"    "lfsa_eegan2"   
#> [49] "prc_hicp_aind"  "ilc_di12"       "ilc_di11"       "lfst_r_lmder"  
#> [53] "demo_mlexpec"   "hlth_hlye"      "y16_deprindex"  "hlth_dm060"

Similarly to the locally accessible Eurofound datasets (Section 1), for the data considered in this Section, the sets of countries (MS) are available in the convergEU_glb() function:

names(convergEU_glb())
#>  [1] "EUcodes"         "Eurozone"        "EA"              "EU12"           
#>  [5] "EU15"            "EU19"            "EU25"            "EU27_2007"      
#>  [9] "EU27_2019"       "EU27_2020"       "EU27"            "EU28"           
#> [13] "geoRefEUF"       "metaEUStat"      "tmpl_out"        "paralintags"    
#> [17] "rounDigits"      "epsilonV"        "scoreBoaTB"      "labels_clusters"

where, for example, the cluster EU27_2020 of MS:

convergEU_glb()$EU27_2020
#> $dates
#> [1] "01/02/2020" "00/00/0000"
#> 
#> $memberStates
#> # A tibble: 27 x 2
#>    MS          codeMS
#>    <chr>       <chr> 
#>  1 Belgium     BE    
#>  2 Denmark     DK    
#>  3 France      FR    
#>  4 Germany     DE    
#>  5 Greece      EL    
#>  6 Ireland     IE    
#>  7 Italy       IT    
#>  8 Luxembourg  LU    
#>  9 Netherlands NL    
#> 10 Portugal    PT    
#> # … with 17 more rows

The set of countries for which a given indicator is downloaded should be specified in the argument countries of the function download_indicator_EUS. It is up to the user to take the lists of available sets of countries in the convergEU_glb() function, or, alternatively, to make its own list of countries names labels.

Once we have obtained the indicator code for the Employment rate indicator (e.g. lfsa_ergaed) and we have chosen the time interval (say from 2005 to 2015), and the set of countries (say the cluster EU27_2020 of MS), the data are downloaded from the Eurostat repository by invoking the download_indicator_EUS function as follows:

emTB <- download_indicator_EUS(
  indicator_code= "lfsa_ergaed",
  fromTime = 2005,
  toTime = 2015,
  countries =  convergEU_glb()$EU27_2020$memberStates$codeMS,
  rawDump=T)
emTB

Note that raw downloaded data are returned since the argument rawDump is TRUE. To obtain filtered values, rawDump=F should be specified:

emprTB <- download_indicator_EUS(
  indicator_code= "lfsa_ergaed",
  fromTime = 2005,
  toTime = 2015,
  countries =  convergEU_glb()$EU27_2020$memberStates$codeMS,
  rawDump=F)
emprTB

which returns the data in the format years by countries.

The result is a list with the following components:

It is therefore possible to call several times the same function and specify the argument uniqueIdentif as an integer among those in the first column left of $msg$Further_Conditioning$available_seleTagLs to obtain the same indicator under different scales and contexts. That is, the user could download the data and visualize the list in the uniqueIdentif argument by invoking:

emprTB$msg$Further_Conditioning$available_seleTagLs

Then, the user could choose the scales and contexts in which he/she is interested in.
For example, uniqueIdentif=3:

emprTB1 <- download_indicator_EUS(
  indicator_code= "lfsa_ergaed",
  fromTime = 2005,
  toTime = 2015,
  countries =  convergEU_glb()$EU27_2020$memberStates$codeMS,
  rawDump=F,uniqueIdentif=3)

or uniqueIdentif=5:

emprTB2 <- download_indicator_EUS(
  indicator_code= "lfsa_ergaed",
  fromTime = 2005,
  toTime = 2015,
  countries =  convergEU_glb()$EU27_2020$memberStates$codeMS,
  rawDump=F,uniqueIdentif=5)

It is also possible to download data only for females or only for males by explicitly specifying it in the argument gender, for example:

emprTB3 <- download_indicator_EUS(
  indicator_code= "lfsa_ergaed",
  fromTime = 2005,
  toTime = 2015,
  gender='F',
  countries =  convergEU_glb()$EU27_2020$memberStates$codeMS,
  rawDump=F,uniqueIdentif=1)

emprTB4 <- download_indicator_EUS(
  indicator_code= "lfsa_ergaed",
  fromTime = 2005,
  toTime = 2015,
  gender="M",
  countries =  convergEU_glb()$EU27_2020$memberStates$codeMS,
  rawDump=F,uniqueIdentif=1)

The data could be also downloaded by considering a specific interval of ages; to this end, it is useful to visualize the available interval of ages for the emprTB data:

unique(emprTB$msg$Conditioning$ageInterv)
#> [1] Y15-19
#> 32 Levels: Y15-19 Y15-24 Y15-39 Y15-59 Y15-64 Y15-74 Y20-24 Y20-29 ... Y70-74

For example, to download data by considering the age interval Y20-29 and time interval from 1995 to 2015:

emprTB5 <- download_indicator_EUS(
  indicator_code= 'lfsa_ergaed',
  fromTime = 1995,
  toTime = 2015,
  gender='T',
  ageInterv = 'Y20-29',
  countries =  convergEU_glb()$EU27_2020$memberStates$codeMS,
  rawDump=F,uniqueIdentif=1)

or similarly for the age interval Y15-64:

emprTB6 <- download_indicator_EUS(
  indicator_code= 'lfsa_ergaed',
  fromTime = 2005,
  toTime = 2015,
  gender='T',
  ageInterv = 'Y15-64',
  countries =  convergEU_glb()$EU27_2020$memberStates$codeMS,
  rawDump=F,uniqueIdentif=1)
emprTB6

Note that the ageInterv argument is specified as a string.

Following, it is also convenient to assign a meaningful name to the locally downloaded data for the Employment rate indicator. Within the available data above, we choose the emprTB5 one that relates to both males and females in the age interval “Y20-29”:

empTBFin<-emprTB5$res

2.2 Imputation of missing values

As already stated above, the analysis of convergence is performed on clean and imputed data without missing values: a tidy dataset in the format years by countries (see Section 1.1). In this subsection, we describe in detail how to perform imputation of missing values within the convergEU package. It must be emphasized that typical EU indicators are measured over a few years, thus \(10\%\) of missing values within a country may already represent a context that requires substantive reasoning before interpreting results after imputation. The user may or may not go ahead with the analysis depending on the considered context.

Note that as explained in the previous Section, the imputation of missing values in the convergEU package requires at least two non missing values for a given country. If the values for a given country are completely missing, the imputation is not covered.

The final downloaded dataset for the Employment rate indicator is empTBFin:

empTBFin
#> # A tibble: 21 x 30
#>    age   sex    time    BE    DK    FR    DE    EL    IE    IT    LU    NL    PT
#>    <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 Y20-… T      1995  55.5  57.6  53.9  56.5  61.6  56.6  53.8  70.2  NA    73.5
#>  2 Y20-… T      1996  55    67.5  51.1  54.9  61.6  56.3  52.3  69.2  65.7  73.9
#>  3 Y20-… T      1997  51.3  67.4  48.1  54.9  62.4  58.5  51.9  72.3  69.4  75  
#>  4 Y20-… T      1998  53.8  70.8  51.4  NA    63.7  NA    53.9  NA    71.5  79.3
#>  5 Y20-… T      1999  56    69.7  51.3  61.7  63.4  65.9  54    70.5  71.6  76.8
#>  6 Y20-… T      2000  57.6  71.3  52.4  61.3  63.2  66.2  53.6  75.4  72.7  78.5
#>  7 Y20-… T      2001  50.1  70.9  52.5  59.3  63    66.3  54.6  66.6  75.8  80.1
#>  8 Y20-… T      2002  51.9  68.6  51.8  57.3  64.5  62.5  56.7  68.3  74.2  79.6
#>  9 Y20-… T      2003  50.4  64.2  54.5  55.2  66.6  61.4  56.6  62.5  71.2  76.8
#> 10 Y20-… T      2004  49.2  66.8  53.5  51.5  64.4  59.4  59.9  59.2  73.6  76.1
#> # … with 11 more rows, and 17 more variables: ES <dbl>, AT <dbl>, FI <dbl>,
#> #   SE <dbl>, CY <dbl>, CZ <dbl>, EE <dbl>, HU <dbl>, LV <dbl>, LT <dbl>,
#> #   MT <dbl>, PL <dbl>, SK <dbl>, SI <dbl>, BG <dbl>, RO <dbl>, HR <dbl>

The inspection of the print output reveals that missing values are present. Otherwise, the check_data() function may be called to check for the presence of unsuited features that must be solved before starting the analysis (e.g. missing values, string variables).

check_data(empTBFin, timeName = "time")
#> $res
#> NULL
#> 
#> $msg
#> NULL
#> 
#> $err
#> [1] "Error: one or more missing values in the dataframe."

which reveals the presence of missing values. Thus, imputation of the missing values should be performed prior to the analysis of convergence through the impute_dataset function:

help(impute_dataset)

The output of the impute_dataset is a list with the usual three components: “$res”, “$msg” and “$err”. It must be noted that for starting or final missing values, the impute_dataset function provides two options for imputation:

For all other missing values within the dataset, deterministic linear imputation is applied in order to obtain an imputed dataset. The examples below help to better explain the missing values imputation procedure with the options “cut” and “constant”. First, let’s impute the starting missing values for the empTBFin dataset with the option “constant”:

EmpImp <- impute_dataset(empTBFin, timeName = "time",
                         countries=convergEU_glb()$EU27_2020$memberStates$codeMS,
                         tailMiss = c("cut", "constant")[2],
                         headMiss = c("cut", "constant")[2])$res 

EmpImp
#> # A tibble: 21 x 30
#>    age   sex    time    BE    DK    FR    DE    EL    IE    IT    LU    NL    PT
#>    <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 Y20-… T      1995  55.5  57.6  53.9  56.5  61.6  56.6  53.8  70.2  65.7  73.5
#>  2 Y20-… T      1996  55    67.5  51.1  54.9  61.6  56.3  52.3  69.2  65.7  73.9
#>  3 Y20-… T      1997  51.3  67.4  48.1  54.9  62.4  58.5  51.9  72.3  69.4  75  
#>  4 Y20-… T      1998  53.8  70.8  51.4  58.3  63.7  62.2  53.9  71.4  71.5  79.3
#>  5 Y20-… T      1999  56    69.7  51.3  61.7  63.4  65.9  54    70.5  71.6  76.8
#>  6 Y20-… T      2000  57.6  71.3  52.4  61.3  63.2  66.2  53.6  75.4  72.7  78.5
#>  7 Y20-… T      2001  50.1  70.9  52.5  59.3  63    66.3  54.6  66.6  75.8  80.1
#>  8 Y20-… T      2002  51.9  68.6  51.8  57.3  64.5  62.5  56.7  68.3  74.2  79.6
#>  9 Y20-… T      2003  50.4  64.2  54.5  55.2  66.6  61.4  56.6  62.5  71.2  76.8
#> 10 Y20-… T      2004  49.2  66.8  53.5  51.5  64.4  59.4  59.9  59.2  73.6  76.1
#> # … with 11 more rows, and 17 more variables: ES <dbl>, AT <dbl>, FI <dbl>,
#> #   SE <dbl>, CY <dbl>, CZ <dbl>, EE <dbl>, HU <dbl>, LV <dbl>, LT <dbl>,
#> #   MT <dbl>, PL <dbl>, SK <dbl>, SI <dbl>, BG <dbl>, RO <dbl>, HR <dbl>

As an example consider the MS Netherlands (“NL”) which has a starting missing value for the year 1995:

View(empTBFin)
print(EmpImp,n=7,width=250)
#> # A tibble: 21 x 30
#>   age    sex    time    BE    DK    FR    DE    EL    IE    IT    LU    NL    PT
#>   <fct>  <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Y20-29 T      1995  55.5  57.6  53.9  56.5  61.6  56.6  53.8  70.2  65.7  73.5
#> 2 Y20-29 T      1996  55    67.5  51.1  54.9  61.6  56.3  52.3  69.2  65.7  73.9
#> 3 Y20-29 T      1997  51.3  67.4  48.1  54.9  62.4  58.5  51.9  72.3  69.4  75  
#> 4 Y20-29 T      1998  53.8  70.8  51.4  58.3  63.7  62.2  53.9  71.4  71.5  79.3
#> 5 Y20-29 T      1999  56    69.7  51.3  61.7  63.4  65.9  54    70.5  71.6  76.8
#> 6 Y20-29 T      2000  57.6  71.3  52.4  61.3  63.2  66.2  53.6  75.4  72.7  78.5
#> 7 Y20-29 T      2001  50.1  70.9  52.5  59.3  63    66.3  54.6  66.6  75.8  80.1
#>      ES    AT    FI    SE    CY    CZ    EE    HU    LV    LT    MT    PL    SK
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  53.4  68.6  44.6  53.1  74.6  45.2  49.4  42.8  53.3  53.9  72.2  41.5  24.5
#> 2  53.4  67    47.2  49.2  74.6  45.2  49.4  42.8  53.3  53.9  72.2  41.5  24.5
#> 3  57.3  62.3  44.3  49.8  74.6  45.2  49.4  42.8  53.3  53.9  72.2  41.5  24.5
#> 4  59.6  65.1  54.6  46.9  74.6  45.2  52.2  42.7  53.3  53.9  72.2  40.4  24.5
#> 5  65.2  65.1  57.9  50.8  74.6  40.1  51.7  43.6  54.5  50.9  72.2  32.7  22.3
#> 6  67.3  63.2  58.4  54.1  74.1  38.9  46.7  44    50    46    72.2  36    19  
#> 7  68.4  61.5  56.3  60.2  77.5  41.6  53.7  45.1  55.5  44.3  73.6  35.6  19  
#>      SI    BG    RO    HR
#>   <dbl> <dbl> <dbl> <dbl>
#> 1  70    32.1  60.9  52.8
#> 2  70    32.1  60.9  52.8
#> 3  67.5  32.1  60.9  52.8
#> 4  62.3  32.1  61.3  52.8
#> 5  55.6  32.1  64.3  52.8
#> 6  54    32.1  65.1  52.8
#> 7  56.4  31.9  65.7  52.8
#> # … with 14 more rows

Note that in the print output above for the first seven years, the starting missing value for Netherlands and time 1995, is imputed equal to the first observed year (e.g. 1996). Similarly, the first two missing values for Romania (times 1995 and 1996) are imputed as equal to the year 1997 (its first observed year). Equivalently, for the other MS with starting missing values (“CY”, “CZ”,“BG” etc.).

Now, we are going to impute the missing values with the option “cut”:


EmpImp1 <- impute_dataset(empTBFin, timeName = "time",
                             countries=convergEU_glb()$EU27_2020$memberStates$codeMS,
                             tailMiss = c("cut", "constant")[1],
                             headMiss = c("cut", "constant")[1])$res 
print(EmpImp1,n=35,width=250)
#> # A tibble: 14 x 30
#>    age    sex    time    BE    DK    FR    DE    EL    IE    IT    LU    NL
#>    <fct>  <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 Y20-29 T      2002  51.9  68.6  51.8  57.3  64.5  62.5  56.7  68.3  74.2
#>  2 Y20-29 T      2003  50.4  64.2  54.5  55.2  66.6  61.4  56.6  62.5  71.2
#>  3 Y20-29 T      2004  49.2  66.8  53.5  51.5  64.4  59.4  59.9  59.2  73.6
#>  4 Y20-29 T      2005  51.1  68    52.1  52.1  66.4  60    58.3  59.4  67.2
#>  5 Y20-29 T      2006  51    70    52.9  54.2  68.2  62.4  58.2  57.3  68.6
#>  6 Y20-29 T      2007  51.2  70.6  53.7  55.2  69.2  62.8  57.3  58.4  70.2
#>  7 Y20-29 T      2008  49.8  70.1  53.6  57.5  68.3  56.8  56.1  56.8  70.7
#>  8 Y20-29 T      2009  47.4  67.2  50.8  57    66.4  45.7  52.8  54.4  69  
#>  9 Y20-29 T      2010  47.2  59.5  49.5  57.4  60.3  39.8  49    49.8  66.3
#> 10 Y20-29 T      2011  46.5  56.9  49.7  58.6  52.3  34.5  48.1  50    67.7
#> 11 Y20-29 T      2012  46.9  51.6  48.7  57    45.6  34.1  46.3  51.5  66.5
#> 12 Y20-29 T      2013  42.9  51.2  47.5  57.4  39.3  38.6  41.2  50    62.5
#> 13 Y20-29 T      2014  43.2  51.4  45.8  56.3  42.3  31.9  39.8  44    59.5
#> 14 Y20-29 T      2015  40.3  51.6  43    56.3  42.9  37.1  39.8  56.6  61.3
#>       PT    ES    AT    FI    SE    CY    CZ    EE    HU    LV    LT    MT    PL
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  79.6  68.4  59.8  56.6  62.9  79.3  39.2  48.9  44.9  51.8  52.9  74.9  30.8
#>  2  76.8  69.1  57.1  53.7  57.6  75.4  35.2  38    43.5  52.1  54.2  69.6  30.6
#>  3  76.1  69.6  56.1  52.1  53.4  78    33    55.3  40.9  57.1  53.3  70.1  30.3
#>  4  75.4  70.6  56.1  55.5  53.8  77.1  32.1  50.9  40.2  55.7  57.5  66.9  32.6
#>  5  75.9  72.3  57    54.7  55.2  79.9  34.8  61.2  39.7  61.3  53.9  70.6  34.9
#>  6  75.1  71.9  62.4  58.5  58.7  78.4  38.5  60.1  37.8  61.7  53.9  71.7  39.6
#>  7  75.6  65.7  59.7  60.5  56.9  74.9  38    65.2  37.1  58.4  49.4  72.9  42.9
#>  8  71.1  54.6  55.9  52.4  49.8  74.3  34.5  47.9  32.1  40.9  38.8  68.3  42.1
#>  9  66.4  52.3  55.3  49.4  48    77.1  34.5  43.4  31.2  43.9  24.5  68.7  37  
#> 10  64    49.5  59.3  46.9  49.1  73.1  31.6  53.8  30.8  45.3  28.3  69.9  35.9
#> 11  59.7  43.4  58    47.1  47.2  62    29.9  47.5  29.5  49.9  36.1  70.5  36.4
#> 12  55.1  41.4  54.5  43.9  44.1  58.7  36.4  55.5  31.7  49    36.1  72.1  34  
#> 13  57.3  42.6  54.6  42.2  44.8  63.7  37    57.5  40.7  48.9  42    70.4  34.1
#> 14  60.6  44.2  56.4  38    46.3  56.9  31.6  54.5  43.3  55.7  41.1  67.8  36.2
#>       SK    SI    BG    RO    HR
#>    <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  18.7  51.6  32.1  58.8  52.8
#>  2  19.9  51.1  32.2  57.9  48.1
#>  3  19.9  49.6  36.6  54.9  53.3
#>  4  15.1  48.5  33.1  53.4  51.8
#>  5  17.6  46.3  34.6  52.2  48.2
#>  6  19.5  54.8  39.1  51.8  47.9
#>  7  22.5  56.7  40.8  51.6  50.6
#>  8  19    50.7  36.1  49.2  42.7
#>  9  20    47.7  30.8  54.6  41.4
#> 10  23.2  44.3  25.8  50.6  44.7
#> 11  21.9  40.3  26    53    31.4
#> 12  24.2  38.6  26.4  53.4  26.6
#> 13  27.1  38.2  28.9  53.5  23.3
#> 14  30.7  43.6  29.7  54.7  28.7

The print output shows how actually the option “cut” imputes missing values. More precisely, note that now all countries from 1995 to 2001 are deleted. This is due to the fact that Croatia has initial missing values from year 1995 to year 2001.

For all other missing values within the dataset, deterministic linear imputation is applied. Note also that if a country is processed but it has no missing, then no numerical value changes. In case more information is needed on years where missing values are located for a given country, say HR, the following simple code helps:

select(filter(empTBFin, is.na(HR)),time,HR) 
#> # A tibble: 7 x 2
#>    time    HR
#>   <dbl> <dbl>
#> 1  1995    NA
#> 2  1996    NA
#> 3  1997    NA
#> 4  1998    NA
#> 5  1999    NA
#> 6  2000    NA
#> 7  2001    NA

or by using pipes:

empTBFin %>% 
  filter(is.na(HR)) %>%
  select(time,HR)
#> # A tibble: 7 x 2
#>    time    HR
#>   <dbl> <dbl>
#> 1  1995    NA
#> 2  1996    NA
#> 3  1997    NA
#> 4  1998    NA
#> 5  1999    NA
#> 6  2000    NA
#> 7  2001    NA

Once the missing values are imputed, the check_data function is called again on the imputed data:

check_data(EmpImp)
#> $res
#> NULL
#> 
#> $msg
#> NULL
#> 
#> $err
#> [1] "Error: qualitative variables  in the dataframe."

where the suspected string variables are age and sex; therefore, these two string variables are deleted as follows:

EmpImpF <- dplyr::select(EmpImp,-sex,-age)
check_data(EmpImpF)
#> $res
#> [1] TRUE
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

Now, the check_data function confirms that EmpImpF is the final object on which the analysis of convergence should be performed.

2.3 Analysis of convergence for the EmpImpF dataset

Once the dataset is clean and imputed, the analysis of convergence is performed through the functions discussed in the previous Section. Therefore, the four measures of convergence are calculated for the EmpImpF dataset as follows (details in the Section above):

require(ggplot2)
require(dplyr)
require(tibble)

EmpBC <- beta_conv(EmpImpF, 
                   time_0 = 2005, 
                   time_t = 2015, 
                   all_within = FALSE, 
                   timeName = "time")
empSC <- sigma_conv(EmpImpF,timeName="time")
gamma_conv(EmpImpF,ref=2005,last=2015,timeName="time")
gamma_conv_msteps(EmpImpF,startTime=2005,endTime=2015, timeName = "time")

and

delta_conv(EmpImpF,"time")

2.4 Country fiches

Following, we report the syntax to obtain Country fiches for the EmpImpF dataset as follows:

go_ms_fi(
  workDF ='EmpImpF',
  countryRef ='DE',
  otherCountries = "c('IT','ES','FR')",
  time_0 = 2005,
  time_t = 2015,
  tName = 'time',
  indiType = 'highBest',
  aggregation= 'EU27_2020',
  x_angle=  45,
  dataNow=  Sys.time(),
  author = 'A.Student',
  outFile = 'Germany-up2-2016-EMP', 
  outDir = 'F:/analysis/EU27_2020',
  indiName= 'EmpImpF'
)
  

As noted previously, recall that the data are specified as a string, e.g. workDF =‘EmpImpF’. The country of reference is Germany (“DE”) compared with the other countries, “IT”, “ES” and “FR”, according to the highBest type of indicator.

For advanced users: if the selection of countries is different from a standard aggregate (EU12, EU19 etc.) the argument “custom” can be used.

2.5 Indicator fiches

Indicators fiches for the EmpImpF dataset are obtained through the following syntax:

go_indica_fi(
  time_0 = 2005,
  time_t = 2015,
  timeName = 'time',
  workDF = 'EmpImpF' ,
  indicaT = 'lfsa_ergaed',
  indiType = c('highBest','lowBest')[1],
  seleMeasure = 'all',
  seleAggre = 'EU27_2020',
  x_angle =  45,
  data_res_download =  FALSE,
  auth = 'A.Student',
  dataNow =  Sys.time(),
  outFile = 'EmpImp',
  outDir = 'C:/Users/UTENTE/Desktop/PROJECT EUROFOUND/FI'
)

where the indicator “lfsa_ergaed” is of type highBest, and all the measures are considered for the analysis of convergence.



3 Downloadable data from the Eurostat web site

In this Section, we describe in details the procedure to download data from the Eurostat web site (https://ec.europa.eu/eurostat/data/database). The function down_lo_EUS is specifically implemented to achieve this goal in the convergEU package. In what follows, the procedure is described taking as an example the indicator “hlth_silc_18” related to the Self-perceived health by sex, age and degree of urbanisation. For further details on this type of indicator see: https://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=hlth_silc_18&lang=en.

3.1 Data preparation step: indicator’s downloading from the Eurostat web site

The data are downloaded from the Eurostat website (https://ec.europa.eu/eurostat/data/database) on the fly from an active Internet connection.
A list of covariates for each indicator is sometimes present besides age and gender, thus their values must be deleted to produce a tidy dataset in the format time by countries. A given indicator is downloaded from the Eurostat web site through the function down_lo_EUS:

help(down_lo_EUS)

IMPORTANT: Users exploiting automated access to the database through web services or bulk download should pay particular attention to changes of labels. Further information about EU aggregates is available at “https://ec.europa.eu/eurostat/help/faq/brexit”.

To download the data, a time interval should be also chosen. The function allows to also select gender, age and other variables depending on the specific indicator. It must be noted that when the argument rawDump is TRUE, raw data are downloaded which does not produce a tidy dataset; to obtain a tidy dataset in the format years by countries, the argument rawDump=F should be specified. Note also that all indicators are supported and, after downloading, the data are not filtered by country members (geo) and/or EU clusters. Therefore, it is up to the user to proceed with further filtering/selection so that the desired collection of countries is obtained.

In order to download the data for a given indicator, its indicator code is required, and it is available on the Eurostat web site next to each indicator’s name within brackets. For example, the Self-perceived health indicator code is hlth_silc_18, as illustrated in the following Figure by the red arrow.
Figure 1: Indicators code on the Eurostat web site

Figure 1: Indicators code on the Eurostat web site

We choose to download the data for this indicator from 2003 to 2018 by invoking the down_lo_EUS function:

myTBhea <- down_lo_EUS(
  indicator_code = "hlth_silc_18",
  fromTime = 2003,
  toTime = 2018,
  gender=  "T",
  ageInterv = NA,
  rawDump=T,
  uniqueIdentif = 1)
myTBhea
#> # A tibble: 720,216 x 8
#>    deg_urb levels sex   age    unit  geo    time values
#>    <fct>   <fct>  <fct> <fct>  <fct> <fct> <dbl>  <dbl>
#>  1 DEG1    BAD    F     Y16-24 PC    AT     2018    2.9
#>  2 DEG1    BAD    F     Y16-24 PC    BE     2018    1.1
#>  3 DEG1    BAD    F     Y16-24 PC    BG     2018    0  
#>  4 DEG1    BAD    F     Y16-24 PC    CH     2018    1.1
#>  5 DEG1    BAD    F     Y16-24 PC    CY     2018    0.2
#>  6 DEG1    BAD    F     Y16-24 PC    CZ     2018    0  
#>  7 DEG1    BAD    F     Y16-24 PC    DE     2018    3.4
#>  8 DEG1    BAD    F     Y16-24 PC    DK     2018    1.1
#>  9 DEG1    BAD    F     Y16-24 PC    EA     2018    1.3
#> 10 DEG1    BAD    F     Y16-24 PC    EA18   2018    1.3
#> # … with 720,206 more rows

Note that raw downloaded data are returned since the argument rawDump is TRUE. To obtain filtered values rawDump=F should be specified:

myTBheal <- down_lo_EUS(
  indicator_code = "hlth_silc_18",
  fromTime = 2003,
  toTime = 2018,
  gender=  "T",
  ageInterv = NA,
  rawDump=F,
  uniqueIdentif = 1)
myTBheal
#> $res
#> # A tibble: 16 x 43
#>    age   sex    time    AT    BE    BG    CH    CY    CZ    DE    DK    EA  EA18
#>    <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 Y16-… T      2003   1.7   1.6  NA    NA    NA    NA    NA     0    NA    NA  
#>  2 Y16-… T      2004   1.3   1.2  NA    NA    NA    NA    NA     1.6  NA    NA  
#>  3 Y16-… T      2005   1.1   1.7   1    NA     0.4   0.1   0.9   2.8  NA     1.1
#>  4 Y16-… T      2006   0.9   1.6   1    NA     0.4   0.3   1     1.3  NA     1  
#>  5 Y16-… T      2007   0.7   1.8   1.1   1.6   0.4   0.9   1.2   1.6  NA     1.1
#>  6 Y16-… T      2008   2.5   1.7   0.8   1.1   0.2   1.1   0.8   1.9   1.1   1.1
#>  7 Y16-… T      2009   1.6   1.1   0.6   0.3   0     1.5   0.9   2.6   0.9   0.9
#>  8 Y16-… T      2010   2.4   1.3   0.8   0.3   0.2   0.4   1.4   2.9   1.2   1.2
#>  9 Y16-… T      2011   2.4   1.5   0.7   0.6   0.8   1.6   1     1.7   1.1   1.1
#> 10 Y16-… T      2012   2.3   1.8   0     0.5   1.2   0     1     1.8   1.1   1.1
#> 11 Y16-… T      2013   1.7   1.1   0.5   0.5   1     0     0.8   0.3   1.3   1.3
#> 12 Y16-… T      2014   2.7   2.4   0.9   1.3   0.7   1     0.7   1     1.1   1.1
#> 13 Y16-… T      2015   0.6   1.6   0.4   0.5   0.1   2.1   1.9   1.5   1.4   1.4
#> 14 Y16-… T      2016   0.9   2     0.7   1.6   0.6   1.3   1     0.8   1.3   1.3
#> 15 Y16-… T      2017   2.6   1.5   0.7   1.6   0.2   2.4   1     2.6   0.7   0.7
#> 16 Y16-… T      2018   1.7   1.3   0.1   0.5   0.2   0.5   2.6   0.7   1.3   1.3
#> # … with 30 more variables: EA19 <dbl>, EE <dbl>, EL <dbl>, ES <dbl>, EU <dbl>,
#> #   EU27_2007 <dbl>, EU27_2020 <dbl>, EU28 <dbl>, FI <dbl>, FR <dbl>, HR <dbl>,
#> #   HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>, LU <dbl>, LV <dbl>, MK <dbl>,
#> #   MT <dbl>, NL <dbl>, NO <dbl>, PL <dbl>, PT <dbl>, RO <dbl>, RS <dbl>,
#> #   SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>, IS <dbl>
#> 
#> $msg
#> $msg$Age
#> [1] "Age automatically set."
#> 
#> $msg$Further_Conditioning
#> $msg$Further_Conditioning$current
#> [1] "Selected uniqueIdentif = 1 -> deg_urb: DEG1; levels: BAD; unit: PC"
#> 
#> $msg$Further_Conditioning$available_seleTagLs
#>    uniqueIdentif                                    tags
#> 1              1    deg_urb: DEG1; levels: BAD; unit: PC
#> 2              2   deg_urb: DEG1; levels: B_VB; unit: PC
#> 3              3   deg_urb: DEG1; levels: FAIR; unit: PC
#> 4              4   deg_urb: DEG1; levels: GOOD; unit: PC
#> 5              5   deg_urb: DEG1; levels: VBAD; unit: PC
#> 6              6  deg_urb: DEG1; levels: VGOOD; unit: PC
#> 7              7   deg_urb: DEG1; levels: VG_G; unit: PC
#> 8              8    deg_urb: DEG2; levels: BAD; unit: PC
#> 9              9   deg_urb: DEG2; levels: B_VB; unit: PC
#> 10            10   deg_urb: DEG2; levels: FAIR; unit: PC
#> 11            11   deg_urb: DEG2; levels: GOOD; unit: PC
#> 12            12   deg_urb: DEG2; levels: VBAD; unit: PC
#> 13            13  deg_urb: DEG2; levels: VGOOD; unit: PC
#> 14            14   deg_urb: DEG2; levels: VG_G; unit: PC
#> 15            15    deg_urb: DEG3; levels: BAD; unit: PC
#> 16            16   deg_urb: DEG3; levels: B_VB; unit: PC
#> 17            17   deg_urb: DEG3; levels: FAIR; unit: PC
#> 18            18   deg_urb: DEG3; levels: GOOD; unit: PC
#> 19            19   deg_urb: DEG3; levels: VBAD; unit: PC
#> 20            20  deg_urb: DEG3; levels: VGOOD; unit: PC
#> 21            21   deg_urb: DEG3; levels: VG_G; unit: PC
#> 22            22   deg_urb: TOTAL; levels: BAD; unit: PC
#> 23            23  deg_urb: TOTAL; levels: B_VB; unit: PC
#> 24            24  deg_urb: TOTAL; levels: FAIR; unit: PC
#> 25            25  deg_urb: TOTAL; levels: GOOD; unit: PC
#> 26            26  deg_urb: TOTAL; levels: VBAD; unit: PC
#> 27            27 deg_urb: TOTAL; levels: VGOOD; unit: PC
#> 28            28  deg_urb: TOTAL; levels: VG_G; unit: PC
#> 
#> 
#> $msg$Conditioning
#> $msg$Conditioning$indicator_code
#> [1] "hlth_silc_18"
#> 
#> $msg$Conditioning$ageInterv
#> [1] Y16-24
#> 17 Levels: Y16-24 Y16-29 Y16-44 Y16-64 Y25-29 Y25-34 Y35-44 Y45-54 ... TOTAL
#> 
#> $msg$Conditioning$gender
#> [1] "T"
#> 
#> 
#> 
#> $err
#> NULL

The result is a list with the following components:

Similarly to the download_indicator_EUS function, it is therefore possible to call several times the same function and specify the argument uniqueIdentif as an integer among those in the first column left of $msg$Further_Conditioning$available_seleTagLs to obtain the same indicator under different scales and contexts. For example for degree 1 of urbanization, level good and for degree 3 of urbanization and level fair, respectively:

myTBheal1 <- down_lo_EUS(
  indicator_code = "hlth_silc_18",
  fromTime = 2003,
  toTime = 2018,
  gender=  "T",
  ageInterv = NA,
  rawDump=F,
  uniqueIdentif = 4)

myTBheal2 <- down_lo_EUS(
  indicator_code = "hlth_silc_18",
  fromTime = 2003,
  toTime = 2018,
  gender=  "T",
  ageInterv = NA,
  rawDump=F,
  uniqueIdentif = 17)

It is also possible to condition to gender in order to download data only for females or only for males, for example:

myTBheal3 <- down_lo_EUS(
  indicator_code = "hlth_silc_18",
  fromTime = 2003,
  toTime = 2018,
  gender=  "F",
  ageInterv = NA,
  rawDump=F,
  uniqueIdentif = 1)


myTBheal4 <- down_lo_EUS(
  indicator_code = "hlth_silc_18",
  fromTime = 2003,
  toTime = 2018,
  gender=  "M",
  ageInterv = NA,
  rawDump=F,
  uniqueIdentif = 1)

The user could also download data by considering explicitly a specific interval of ages; to visualize the available interval ages for the myTBheal data:

unique(myTBheal$msg$Conditioning$ageInterv)

Thus, to download data, say in the age interval Y35-44 and time interval 2003-2018:

myTBheal5 <- down_lo_EUS(
  indicator_code = 'hlth_silc_18',
  fromTime = 2003,
  toTime = 2018,
  gender=  'T',
  ageInterv = 'Y35-44',
  rawDump=F,
  uniqueIdentif = 1)
myTBheal5
#> $res
#> # A tibble: 16 x 43
#>    age   sex    time    AT    BE    BG    CH    CY    CZ    DE    DK    EA  EA18
#>    <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 Y35-… T      2003   3.1   4.5  NA    NA    NA    NA    NA     4    NA    NA  
#>  2 Y35-… T      2004   5.5   4.4  NA    NA    NA    NA    NA     5.3  NA    NA  
#>  3 Y35-… T      2005   4.2   4.9   4.7  NA     2     5.6   4.9   5.4  NA     4  
#>  4 Y35-… T      2006   2.6   5.3   4.7  NA     2.7   4.4   4.4   3.9  NA     4.1
#>  5 Y35-… T      2007   4     5.2   3.1   3.1   2.8   4.6   3.8   6.9  NA     3.5
#>  6 Y35-… T      2008   4.6   5.5   4.1   2.9   2.3   3.4   2.8   6.2   3     3  
#>  7 Y35-… T      2009   4.6   6.6   1.9   2.1   2.1   4     3.4   6.8   3.4   3.4
#>  8 Y35-… T      2010   3.4   5.8   1.6   2.4   1.6   2.7   3.6   2.8   3.2   3.2
#>  9 Y35-… T      2011   5.6   6.5   1.2   1.8   1.3   3.1   3.4   5.5   3.4   3.4
#> 10 Y35-… T      2012   7.3   6.8   0.9   1.6   1.8   3     4.5   4.3   3.7   3.7
#> 11 Y35-… T      2013   5.4   6.9   1     2.4   0.9   3.6   4.5   3.3   3.4   3.4
#> 12 Y35-… T      2014   5.7   6.9   1.6   2.3   2.5   2     5.7   5.5   3.5   3.5
#> 13 Y35-… T      2015   5.6   8.9   1.3   0.9   2.4   2     4.9   1.9   3.2   3.2
#> 14 Y35-… T      2016   2.9   7.9   1.5   6.3   1.3   2.7   4.3   3.6   3.1   3.1
#> 15 Y35-… T      2017   4.4   6.2   1.8   2.3   1.5   1.8   2.9   5.6   2.5   2.5
#> 16 Y35-… T      2018   4     5.4   2     4.2   2.1   1.9   3.5   2.5   2.9   2.9
#> # … with 30 more variables: EA19 <dbl>, EE <dbl>, EL <dbl>, ES <dbl>, EU <dbl>,
#> #   EU27_2007 <dbl>, EU27_2020 <dbl>, EU28 <dbl>, FI <dbl>, FR <dbl>, HR <dbl>,
#> #   HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>, LU <dbl>, LV <dbl>, MK <dbl>,
#> #   MT <dbl>, NL <dbl>, NO <dbl>, PL <dbl>, PT <dbl>, RO <dbl>, RS <dbl>,
#> #   SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>, IS <dbl>
#> 
#> $msg
#> $msg$Further_Conditioning
#> $msg$Further_Conditioning$current
#> [1] "Selected uniqueIdentif = 1 -> deg_urb: DEG1; levels: BAD; unit: PC"
#> 
#> $msg$Further_Conditioning$available_seleTagLs
#>    uniqueIdentif                                    tags
#> 1              1    deg_urb: DEG1; levels: BAD; unit: PC
#> 2              2   deg_urb: DEG1; levels: B_VB; unit: PC
#> 3              3   deg_urb: DEG1; levels: FAIR; unit: PC
#> 4              4   deg_urb: DEG1; levels: GOOD; unit: PC
#> 5              5   deg_urb: DEG1; levels: VBAD; unit: PC
#> 6              6  deg_urb: DEG1; levels: VGOOD; unit: PC
#> 7              7   deg_urb: DEG1; levels: VG_G; unit: PC
#> 8              8    deg_urb: DEG2; levels: BAD; unit: PC
#> 9              9   deg_urb: DEG2; levels: B_VB; unit: PC
#> 10            10   deg_urb: DEG2; levels: FAIR; unit: PC
#> 11            11   deg_urb: DEG2; levels: GOOD; unit: PC
#> 12            12   deg_urb: DEG2; levels: VBAD; unit: PC
#> 13            13  deg_urb: DEG2; levels: VGOOD; unit: PC
#> 14            14   deg_urb: DEG2; levels: VG_G; unit: PC
#> 15            15    deg_urb: DEG3; levels: BAD; unit: PC
#> 16            16   deg_urb: DEG3; levels: B_VB; unit: PC
#> 17            17   deg_urb: DEG3; levels: FAIR; unit: PC
#> 18            18   deg_urb: DEG3; levels: GOOD; unit: PC
#> 19            19   deg_urb: DEG3; levels: VBAD; unit: PC
#> 20            20  deg_urb: DEG3; levels: VGOOD; unit: PC
#> 21            21   deg_urb: DEG3; levels: VG_G; unit: PC
#> 22            22   deg_urb: TOTAL; levels: BAD; unit: PC
#> 23            23  deg_urb: TOTAL; levels: B_VB; unit: PC
#> 24            24  deg_urb: TOTAL; levels: FAIR; unit: PC
#> 25            25  deg_urb: TOTAL; levels: GOOD; unit: PC
#> 26            26  deg_urb: TOTAL; levels: VBAD; unit: PC
#> 27            27 deg_urb: TOTAL; levels: VGOOD; unit: PC
#> 28            28  deg_urb: TOTAL; levels: VG_G; unit: PC
#> 
#> 
#> $msg$Conditioning
#> $msg$Conditioning$indicator_code
#> [1] "hlth_silc_18"
#> 
#> $msg$Conditioning$ageInterv
#> [1] "Y35-44"
#> 
#> $msg$Conditioning$gender
#> [1] "T"
#> 
#> 
#> 
#> $err
#> NULL

or similarly for the age interval Y55-64:

myTBheal6 <- down_lo_EUS(
  indicator_code = 'hlth_silc_18',
  fromTime = 2003,
  toTime = 2018,
  gender=  'T',
  ageInterv = 'Y55-64',
  rawDump=F,
  uniqueIdentif = 1)
myTBheal6
#> $res
#> # A tibble: 16 x 43
#>    age   sex    time    AT    BE    BG    CH    CY    CZ    DE    DK    EA  EA18
#>    <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 Y55-… T      2003   4.8  11.1  NA    NA    NA    NA    NA     9.9  NA    NA  
#>  2 Y55-… T      2004   9.2   9.2  NA    NA    NA    NA    NA    10    NA    NA  
#>  3 Y55-… T      2005   8.5   9.6  18.8  NA     8.7  13.8  13    10.9  NA    11.7
#>  4 Y55-… T      2006   7.8  10.3  18.8  NA    11    13.4  12.2  10.2  NA    11.3
#>  5 Y55-… T      2007  12.6  10.3  16     4.5  10.4  14.7  11     8.8  NA    11.2
#>  6 Y55-… T      2008  10     8.5  16.4   4.1  10.6  12.8  10.6   9.2   9.9  10  
#>  7 Y55-… T      2009  10.1  10.1  13.2   4.4  12.5  11.8  10.9   4.6   9.6   9.6
#>  8 Y55-… T      2010  13.1   9.3  11.4   5.1  10.1  13.5  10     6.7   8.9   9  
#>  9 Y55-… T      2011  12.4   9.6  10.2   4.2   8.5  11.9   9.3   7.9   9.3   9.3
#> 10 Y55-… T      2012  12.5  11.4  10.7   6.5   7.8  11.6  10.6  11.5   9.6   9.6
#> 11 Y55-… T      2013  10.7  11     9.2   6.1  10.7  12     9.8  11.4   9.5   9.6
#> 12 Y55-… T      2014  10    14.5   7.5   3.8   7.8  10.8  11.8   9.8  10     9.9
#> 13 Y55-… T      2015  14.4  12.2   8.1   7.3   6.2   9.9  12.6  10.9  10.3  10.2
#> 14 Y55-… T      2016  10.4  12     7.4   5.7   4.7  12.8  12.2   7.6   8.8   8.8
#> 15 Y55-… T      2017  11.1  10.2   8     5.7   4.6  13.5   9.6   9.9   7.5   7.5
#> 16 Y55-… T      2018   9.8   7.8   7.9   7.9   3.7   9.6  12     9.6   8.3   8.3
#> # … with 30 more variables: EA19 <dbl>, EE <dbl>, EL <dbl>, ES <dbl>, EU <dbl>,
#> #   EU27_2007 <dbl>, EU27_2020 <dbl>, EU28 <dbl>, FI <dbl>, FR <dbl>, HR <dbl>,
#> #   HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>, LU <dbl>, LV <dbl>, MK <dbl>,
#> #   MT <dbl>, NL <dbl>, NO <dbl>, PL <dbl>, PT <dbl>, RO <dbl>, RS <dbl>,
#> #   SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>, IS <dbl>
#> 
#> $msg
#> $msg$Further_Conditioning
#> $msg$Further_Conditioning$current
#> [1] "Selected uniqueIdentif = 1 -> deg_urb: DEG1; levels: BAD; unit: PC"
#> 
#> $msg$Further_Conditioning$available_seleTagLs
#>    uniqueIdentif                                    tags
#> 1              1    deg_urb: DEG1; levels: BAD; unit: PC
#> 2              2   deg_urb: DEG1; levels: B_VB; unit: PC
#> 3              3   deg_urb: DEG1; levels: FAIR; unit: PC
#> 4              4   deg_urb: DEG1; levels: GOOD; unit: PC
#> 5              5   deg_urb: DEG1; levels: VBAD; unit: PC
#> 6              6  deg_urb: DEG1; levels: VGOOD; unit: PC
#> 7              7   deg_urb: DEG1; levels: VG_G; unit: PC
#> 8              8    deg_urb: DEG2; levels: BAD; unit: PC
#> 9              9   deg_urb: DEG2; levels: B_VB; unit: PC
#> 10            10   deg_urb: DEG2; levels: FAIR; unit: PC
#> 11            11   deg_urb: DEG2; levels: GOOD; unit: PC
#> 12            12   deg_urb: DEG2; levels: VBAD; unit: PC
#> 13            13  deg_urb: DEG2; levels: VGOOD; unit: PC
#> 14            14   deg_urb: DEG2; levels: VG_G; unit: PC
#> 15            15    deg_urb: DEG3; levels: BAD; unit: PC
#> 16            16   deg_urb: DEG3; levels: B_VB; unit: PC
#> 17            17   deg_urb: DEG3; levels: FAIR; unit: PC
#> 18            18   deg_urb: DEG3; levels: GOOD; unit: PC
#> 19            19   deg_urb: DEG3; levels: VBAD; unit: PC
#> 20            20  deg_urb: DEG3; levels: VGOOD; unit: PC
#> 21            21   deg_urb: DEG3; levels: VG_G; unit: PC
#> 22            22   deg_urb: TOTAL; levels: BAD; unit: PC
#> 23            23  deg_urb: TOTAL; levels: B_VB; unit: PC
#> 24            24  deg_urb: TOTAL; levels: FAIR; unit: PC
#> 25            25  deg_urb: TOTAL; levels: GOOD; unit: PC
#> 26            26  deg_urb: TOTAL; levels: VBAD; unit: PC
#> 27            27 deg_urb: TOTAL; levels: VGOOD; unit: PC
#> 28            28  deg_urb: TOTAL; levels: VG_G; unit: PC
#> 
#> 
#> $msg$Conditioning
#> $msg$Conditioning$indicator_code
#> [1] "hlth_silc_18"
#> 
#> $msg$Conditioning$ageInterv
#> [1] "Y55-64"
#> 
#> $msg$Conditioning$gender
#> [1] "T"
#> 
#> 
#> 
#> $err
#> NULL

Note that the age interval is specified in the ageInterv argument as a string.

Once the data are downloaded, we assign them a meaningful name. Within the available data above, we choose myTBheal5 dataset that relates to both males and females between thirty five and forty four years old:

TBheal<-myTBheal5$res

It is worthwhile to remember that the data are not filtered by country members or EU clusters. Therefore, some further data processing is needed if the user desire to obtain the data only for a given collection of countries. For example, to select only the cluster EU27_2020 of MS:

EU27estr<-convergEU_glb()$EU27_2020$memberStates$codeMS
TBhealth<- dplyr::select(TBheal,time,EU27estr)
TBhealth
#> # A tibble: 16 x 28
#>     time    BE    DK    FR    DE    EL    IE    IT    LU    NL    PT    ES    AT
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  2003   4.5   4    NA    NA     1.6   2.2  NA     6.6  NA    NA    NA     3.1
#>  2  2004   4.4   5.3   3.9  NA     1.1   1.8   3.2   5.2  NA     5.5   3.5   5.5
#>  3  2005   4.9   5.4   4.4   4.9   1.6   2.8   2.6   3.5   3     7.1   3.5   4.2
#>  4  2006   5.3   3.9   4.9   4.4   2.3   0.9   3.1   3.2   4     5.2   4     2.6
#>  5  2007   5.2   6.9   3.1   3.8   1.6   3     3     1.9   3.4   5.8   3.5   4  
#>  6  2008   5.5   6.2   3.4   2.8   2.2   2.8   2.6   5.5   3.2   3.5   2.7   4.6
#>  7  2009   6.6   6.8   3.4   3.4   1.4   4     2.7   6.6   3.5   4.8   3.2   4.6
#>  8  2010   5.8   2.8   3.8   3.6   1.1   2.6   2.2   4.3   2.9   5.3   3     3.4
#>  9  2011   6.5   5.5   4.3   3.4   1.9   1.9   3.4   5.3   2     5     2.3   5.6
#> 10  2012   6.8   4.3   4.9   4.5   1.8   2.9   3.1   1.6   3.9   3.2   2.3   7.3
#> 11  2013   6.9   3.3   4.1   4.5   1.4   0.6   3.2   1.9   3.3   5     1.7   5.4
#> 12  2014   6.9   5.5   3.7   5.7   2.4   2.5   3.3   4.7   3.5   2.9   1.8   5.7
#> 13  2015   8.9   1.9   2.9   4.9   2     1.7   3.2   3.4   2.8   4     1.6   5.6
#> 14  2016   7.9   3.6   5     4.3   2.4   0.9   1.8   3.8   1.9   3.3   1.2   2.9
#> 15  2017   6.2   5.6   3.6   2.9   1.8   1.6   1     3.7   2.8   3.5   1.4   4.4
#> 16  2018   5.4   2.5   4.3   3.5   2.2   3.2   1.2   3.1   2.3   3.4   1.8   4  
#> # … with 15 more variables: FI <dbl>, SE <dbl>, CY <dbl>, CZ <dbl>, EE <dbl>,
#> #   HU <dbl>, LV <dbl>, LT <dbl>, MT <dbl>, PL <dbl>, SK <dbl>, SI <dbl>,
#> #   BG <dbl>, RO <dbl>, HR <dbl>

or by using pipes:

TBheal %>%
  as_tibble %>%
  select(time,EU27estr)
#> # A tibble: 16 x 28
#>     time    BE    DK    FR    DE    EL    IE    IT    LU    NL    PT    ES    AT
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  2003   4.5   4    NA    NA     1.6   2.2  NA     6.6  NA    NA    NA     3.1
#>  2  2004   4.4   5.3   3.9  NA     1.1   1.8   3.2   5.2  NA     5.5   3.5   5.5
#>  3  2005   4.9   5.4   4.4   4.9   1.6   2.8   2.6   3.5   3     7.1   3.5   4.2
#>  4  2006   5.3   3.9   4.9   4.4   2.3   0.9   3.1   3.2   4     5.2   4     2.6
#>  5  2007   5.2   6.9   3.1   3.8   1.6   3     3     1.9   3.4   5.8   3.5   4  
#>  6  2008   5.5   6.2   3.4   2.8   2.2   2.8   2.6   5.5   3.2   3.5   2.7   4.6
#>  7  2009   6.6   6.8   3.4   3.4   1.4   4     2.7   6.6   3.5   4.8   3.2   4.6
#>  8  2010   5.8   2.8   3.8   3.6   1.1   2.6   2.2   4.3   2.9   5.3   3     3.4
#>  9  2011   6.5   5.5   4.3   3.4   1.9   1.9   3.4   5.3   2     5     2.3   5.6
#> 10  2012   6.8   4.3   4.9   4.5   1.8   2.9   3.1   1.6   3.9   3.2   2.3   7.3
#> 11  2013   6.9   3.3   4.1   4.5   1.4   0.6   3.2   1.9   3.3   5     1.7   5.4
#> 12  2014   6.9   5.5   3.7   5.7   2.4   2.5   3.3   4.7   3.5   2.9   1.8   5.7
#> 13  2015   8.9   1.9   2.9   4.9   2     1.7   3.2   3.4   2.8   4     1.6   5.6
#> 14  2016   7.9   3.6   5     4.3   2.4   0.9   1.8   3.8   1.9   3.3   1.2   2.9
#> 15  2017   6.2   5.6   3.6   2.9   1.8   1.6   1     3.7   2.8   3.5   1.4   4.4
#> 16  2018   5.4   2.5   4.3   3.5   2.2   3.2   1.2   3.1   2.3   3.4   1.8   4  
#> # … with 15 more variables: FI <dbl>, SE <dbl>, CY <dbl>, CZ <dbl>, EE <dbl>,
#> #   HU <dbl>, LV <dbl>, LT <dbl>, MT <dbl>, PL <dbl>, SK <dbl>, SI <dbl>,
#> #   BG <dbl>, RO <dbl>, HR <dbl>

Similarly, the user can select the cluster of EU27_2020 of MS and other countries available, say Russia for example:

TBhealthR<- select(TBheal,time,EU27estr,RS)
TBhealthR
#> # A tibble: 16 x 29
#>     time    BE    DK    FR    DE    EL    IE    IT    LU    NL    PT    ES    AT
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  2003   4.5   4    NA    NA     1.6   2.2  NA     6.6  NA    NA    NA     3.1
#>  2  2004   4.4   5.3   3.9  NA     1.1   1.8   3.2   5.2  NA     5.5   3.5   5.5
#>  3  2005   4.9   5.4   4.4   4.9   1.6   2.8   2.6   3.5   3     7.1   3.5   4.2
#>  4  2006   5.3   3.9   4.9   4.4   2.3   0.9   3.1   3.2   4     5.2   4     2.6
#>  5  2007   5.2   6.9   3.1   3.8   1.6   3     3     1.9   3.4   5.8   3.5   4  
#>  6  2008   5.5   6.2   3.4   2.8   2.2   2.8   2.6   5.5   3.2   3.5   2.7   4.6
#>  7  2009   6.6   6.8   3.4   3.4   1.4   4     2.7   6.6   3.5   4.8   3.2   4.6
#>  8  2010   5.8   2.8   3.8   3.6   1.1   2.6   2.2   4.3   2.9   5.3   3     3.4
#>  9  2011   6.5   5.5   4.3   3.4   1.9   1.9   3.4   5.3   2     5     2.3   5.6
#> 10  2012   6.8   4.3   4.9   4.5   1.8   2.9   3.1   1.6   3.9   3.2   2.3   7.3
#> 11  2013   6.9   3.3   4.1   4.5   1.4   0.6   3.2   1.9   3.3   5     1.7   5.4
#> 12  2014   6.9   5.5   3.7   5.7   2.4   2.5   3.3   4.7   3.5   2.9   1.8   5.7
#> 13  2015   8.9   1.9   2.9   4.9   2     1.7   3.2   3.4   2.8   4     1.6   5.6
#> 14  2016   7.9   3.6   5     4.3   2.4   0.9   1.8   3.8   1.9   3.3   1.2   2.9
#> 15  2017   6.2   5.6   3.6   2.9   1.8   1.6   1     3.7   2.8   3.5   1.4   4.4
#> 16  2018   5.4   2.5   4.3   3.5   2.2   3.2   1.2   3.1   2.3   3.4   1.8   4  
#> # … with 16 more variables: FI <dbl>, SE <dbl>, CY <dbl>, CZ <dbl>, EE <dbl>,
#> #   HU <dbl>, LV <dbl>, LT <dbl>, MT <dbl>, PL <dbl>, SK <dbl>, SI <dbl>,
#> #   BG <dbl>, RO <dbl>, HR <dbl>, RS <dbl>

and by using pipes:

TBheal %>%
  as_tibble %>%
  select(time,EU27estr,RS)
#> # A tibble: 16 x 29
#>     time    BE    DK    FR    DE    EL    IE    IT    LU    NL    PT    ES    AT
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  2003   4.5   4    NA    NA     1.6   2.2  NA     6.6  NA    NA    NA     3.1
#>  2  2004   4.4   5.3   3.9  NA     1.1   1.8   3.2   5.2  NA     5.5   3.5   5.5
#>  3  2005   4.9   5.4   4.4   4.9   1.6   2.8   2.6   3.5   3     7.1   3.5   4.2
#>  4  2006   5.3   3.9   4.9   4.4   2.3   0.9   3.1   3.2   4     5.2   4     2.6
#>  5  2007   5.2   6.9   3.1   3.8   1.6   3     3     1.9   3.4   5.8   3.5   4  
#>  6  2008   5.5   6.2   3.4   2.8   2.2   2.8   2.6   5.5   3.2   3.5   2.7   4.6
#>  7  2009   6.6   6.8   3.4   3.4   1.4   4     2.7   6.6   3.5   4.8   3.2   4.6
#>  8  2010   5.8   2.8   3.8   3.6   1.1   2.6   2.2   4.3   2.9   5.3   3     3.4
#>  9  2011   6.5   5.5   4.3   3.4   1.9   1.9   3.4   5.3   2     5     2.3   5.6
#> 10  2012   6.8   4.3   4.9   4.5   1.8   2.9   3.1   1.6   3.9   3.2   2.3   7.3
#> 11  2013   6.9   3.3   4.1   4.5   1.4   0.6   3.2   1.9   3.3   5     1.7   5.4
#> 12  2014   6.9   5.5   3.7   5.7   2.4   2.5   3.3   4.7   3.5   2.9   1.8   5.7
#> 13  2015   8.9   1.9   2.9   4.9   2     1.7   3.2   3.4   2.8   4     1.6   5.6
#> 14  2016   7.9   3.6   5     4.3   2.4   0.9   1.8   3.8   1.9   3.3   1.2   2.9
#> 15  2017   6.2   5.6   3.6   2.9   1.8   1.6   1     3.7   2.8   3.5   1.4   4.4
#> 16  2018   5.4   2.5   4.3   3.5   2.2   3.2   1.2   3.1   2.3   3.4   1.8   4  
#> # … with 16 more variables: FI <dbl>, SE <dbl>, CY <dbl>, CZ <dbl>, EE <dbl>,
#> #   HU <dbl>, LV <dbl>, LT <dbl>, MT <dbl>, PL <dbl>, SK <dbl>, SI <dbl>,
#> #   BG <dbl>, RO <dbl>, HR <dbl>, RS <dbl>

3.2 Imputation of missing values

Prior to the analysis of convergence, we have to perform the imputation of missing values for the data previously downloaded from the Eurostat web site. The final downloaded dataset is TBhealth in which the countries are the cluster EU27_2020 of MS. The inspection of the print output in the previous Subsection reveals that missing values are present. Otherwise, we can invoke the check_data() function to check for the presence of unsuited features that must be solved before starting the analysis (e.g. missing values, string variables).

check_data(TBhealth, timeName = "time")
#> $res
#> NULL
#> 
#> $msg
#> NULL
#> 
#> $err
#> [1] "Error: one or more missing values in the dataframe."

Thus, we proceed to impute the missing values through the impute_dataset function. Note that the print output of the TBhealth dataset reveals both missing values starting at the oldest year and missing values ending at the last year. Therefore, the arguments tailMiss and headMiss should be correctly specified according to how the user decide to impute the missing values. The example for the TBhealth dataset helps to better illustrate this point. Suppose that missing values starting at the oldest year should be propagated (i.e. option constant in the tailMiss argument), while those ending at the last year should be deleted (i.e. option cut in the headMiss argument), that is:

TBhealthF <- impute_dataset(TBhealth, timeName = "time",
                         countries=convergEU_glb()$EU27_2020$memberStates$codeMS,
                         tailMiss = c("cut", "constant")[2],
                         headMiss = c("cut", "constant")[1])$res 

TBhealthF
#> # A tibble: 16 x 28
#>     time    BE    DK    FR    DE    EL    IE    IT    LU    NL    PT    ES    AT
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  2003   4.5   4     3.9   4.9   1.6   2.2   3.2   6.6   3     5.5   3.5   3.1
#>  2  2004   4.4   5.3   3.9   4.9   1.1   1.8   3.2   5.2   3     5.5   3.5   5.5
#>  3  2005   4.9   5.4   4.4   4.9   1.6   2.8   2.6   3.5   3     7.1   3.5   4.2
#>  4  2006   5.3   3.9   4.9   4.4   2.3   0.9   3.1   3.2   4     5.2   4     2.6
#>  5  2007   5.2   6.9   3.1   3.8   1.6   3     3     1.9   3.4   5.8   3.5   4  
#>  6  2008   5.5   6.2   3.4   2.8   2.2   2.8   2.6   5.5   3.2   3.5   2.7   4.6
#>  7  2009   6.6   6.8   3.4   3.4   1.4   4     2.7   6.6   3.5   4.8   3.2   4.6
#>  8  2010   5.8   2.8   3.8   3.6   1.1   2.6   2.2   4.3   2.9   5.3   3     3.4
#>  9  2011   6.5   5.5   4.3   3.4   1.9   1.9   3.4   5.3   2     5     2.3   5.6
#> 10  2012   6.8   4.3   4.9   4.5   1.8   2.9   3.1   1.6   3.9   3.2   2.3   7.3
#> 11  2013   6.9   3.3   4.1   4.5   1.4   0.6   3.2   1.9   3.3   5     1.7   5.4
#> 12  2014   6.9   5.5   3.7   5.7   2.4   2.5   3.3   4.7   3.5   2.9   1.8   5.7
#> 13  2015   8.9   1.9   2.9   4.9   2     1.7   3.2   3.4   2.8   4     1.6   5.6
#> 14  2016   7.9   3.6   5     4.3   2.4   0.9   1.8   3.8   1.9   3.3   1.2   2.9
#> 15  2017   6.2   5.6   3.6   2.9   1.8   1.6   1     3.7   2.8   3.5   1.4   4.4
#> 16  2018   5.4   2.5   4.3   3.5   2.2   3.2   1.2   3.1   2.3   3.4   1.8   4  
#> # … with 15 more variables: FI <dbl>, SE <dbl>, CY <dbl>, CZ <dbl>, EE <dbl>,
#> #   HU <dbl>, LV <dbl>, LT <dbl>, MT <dbl>, PL <dbl>, SK <dbl>, SI <dbl>,
#> #   BG <dbl>, RO <dbl>, HR <dbl>

Thus, as the output reveals, missing values from 2003 to 2009 for the corresponding countries have been propagated, while the indicator’s values for 2018 have been deleted. Thus, the TBhealthF is the final imputed dataset on which the check_data function is called again:

check_data(TBhealthF)
#> $res
#> [1] TRUE
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

and TBhealthF is the final object on which the analysis should be performed.

3.3 Weighted average smoothing of a complete dataset

It may be of interest to assume that part of the variability observed in a country on a given index is not structural, i.e. not due to causal determinants by transient fluctuations. Furthermore, the interest here is not directed towards prediction but on smoothing values observed in the whole considered time interval. In such a case, a smoothing procedure remove sudden large changes showing a less variable time series than the original. Given that here short time series (panel data) are considered, a three points weighted average is proposed in the convergEU package and implemented in the smoo_dataset function:

help(smoo_dataset)

It must be noted that the smoothing is performed on a dataset without missing values, e.g. on the imputed dataset. Note also that the time variable should be deleted when performing the smoothing with the smoo_dataset function. To better illustrate this point, let’s suppose that we want to perform smoothing for the TBhealthF dataset in which the missing values are already imputed. Furthermore, suppose that we choose the following three MS on which the smoothing is to be performed: Italy, France and Germany, coded “IT”, “FR” and “DE” respectively. Thus, we select the data with just these three countries and the time variable:

workTB <- dplyr::select(TBhealthF, time, IT,DE,FR)
check_data(workTB)
#> $res
#> [1] TRUE
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

The check_data function confirms that the dataset is ready for performing smoothing. Thus, once checking is passed, we go with the smoothing step after deleting the time variable:

resSM <- smoo_dataset(select(workTB,-time), leadW = 0.15, timeTB= select(workTB,time))
resSM
#> # A tibble: 16 x 4
#>     time    IT    DE    FR
#>    <dbl> <dbl> <dbl> <dbl>
#>  1  2003  3.2   4.9   3.9 
#>  2  2004  2.94  4.9   4.11
#>  3  2005  3.07  4.69  4.4 
#>  4  2006  2.84  4.36  3.92
#>  5  2007  2.87  3.63  3.99
#>  6  2008  2.81  3.48  3.27
#>  7  2009  2.44  3.23  3.57
#>  8  2010  2.92  3.43  3.84
#>  9  2011  2.76  3.95  4.34
#> 10  2012  3.27  4.03  4.30
#> 11  2013  3.20  5.01  4.27
#> 12  2014  3.22  4.85  3.53
#> 13  2015  2.65  4.98  4.13
#> 14  2016  2.06  3.96  3.51
#> 15  2017  1.42  3.75  4.49
#> 16  2018  1.03  2.99  3.70

Note that, the first argument is the complete dataset workTB with just country columns while the third argument timeTB is the dataset that will be displayed in the output and in which the time variable has been added after performing smoothing. The argument leadw refers to a leading positive weight where \(0< w \leq 1\). The special case \(w=1\) corresponds to no smoothing. In case of missing values, an NA is returned. If the weight is outside the interval \((0,1]\), then an NA is returned. In our example, the smoothing is firstly performed with an weight equal to 0.15. To carry out comparison between the original indicator’s values and the smoothed ones, the following dataset is built:

compaTB <- select(bind_cols(workTB, select(resSM,-time)), time,IT,IT1,DE,DE1, FR, FR1)
compaTB
#> # A tibble: 16 x 7
#>     time    IT   IT1    DE   DE1    FR   FR1
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  2003   3.2  3.2    4.9  4.9    3.9  3.9 
#>  2  2004   3.2  2.94   4.9  4.9    3.9  4.11
#>  3  2005   2.6  3.07   4.9  4.69   4.4  4.4 
#>  4  2006   3.1  2.84   4.4  4.36   4.9  3.92
#>  5  2007   3    2.87   3.8  3.63   3.1  3.99
#>  6  2008   2.6  2.81   2.8  3.48   3.4  3.27
#>  7  2009   2.7  2.44   3.4  3.23   3.4  3.57
#>  8  2010   2.2  2.92   3.6  3.43   3.8  3.84
#>  9  2011   3.4  2.76   3.4  3.95   4.3  4.34
#> 10  2012   3.1  3.27   4.5  4.03   4.9  4.30
#> 11  2013   3.2  3.20   4.5  5.01   4.1  4.27
#> 12  2014   3.3  3.22   5.7  4.85   3.7  3.53
#> 13  2015   3.2  2.65   4.9  4.98   2.9  4.13
#> 14  2016   1.8  2.06   4.3  3.96   5    3.51
#> 15  2017   1    1.42   2.9  3.75   3.6  4.49
#> 16  2018   1.2  1.03   3.5  2.99   4.3  3.70

Therefore, a graphical output shows changes for Italy (“IT”), Germany (“DE”) and France (“FR”) with original index in blue and smoothed index in red:

require(gridExtra)
ITq<-qplot(time,IT,data=compaTB) + 
  geom_line(colour="navyblue") +
  geom_line(aes(x=time,y=IT1),colour="red") +
  geom_point(aes(x=time,y=IT1),colour="red",shape=8)
DEq<-qplot(time,DE,data=compaTB) + 
  geom_line(colour="navyblue") +
  geom_line(aes(x=time,y=DE1),colour="red") +
  geom_point(aes(x=time,y=DE1),colour="red",shape=8)
FRq<-qplot(time,FR,data=compaTB) + 
  geom_line(colour="navyblue") +
  geom_line(aes(x=time,y=FR1),colour="red") +
  geom_point(aes(x=time,y=FR1),colour="red",shape=8)

grid.arrange(ITq, DEq, FRq, nrow=3)

Note that higher is the weight fixed, less smoothed are the indicator’s values. For example, with \(w=0.5\):

require(gridExtra)
resSM2 <- smoo_dataset(select(workTB,-time), leadW = 0.5, timeTB= select(workTB,time))
compaTB2 <- select(bind_cols(workTB, select(resSM2,-time)), time,IT,IT1,DE,DE1, FR, FR1)
ITq2<-qplot(time,IT,data=compaTB2) + 
  geom_line(colour="navyblue") +
  geom_line(aes(x=time,y=IT1),colour="red") +
  geom_point(aes(x=time,y=IT1),colour="red",shape=8)
DEq2<-qplot(time,DE,data=compaTB2) + 
  geom_line(colour="navyblue") +
  geom_line(aes(x=time,y=DE1),colour="red") +
  geom_point(aes(x=time,y=DE1),colour="red",shape=8)
FRq2<-qplot(time,FR,data=compaTB2) + 
  geom_line(colour="navyblue") +
  geom_line(aes(x=time,y=FR1),colour="red") +
  geom_point(aes(x=time,y=FR1),colour="red",shape=8)
grid.arrange(ITq2, DEq2, FRq2, nrow=3)

Lastly, note that a weight equal to 1 leaves data unchanged:

resSM3 <- smoo_dataset(select(workTB,-time),leadW=1,
                       timeTB= select(workTB,time))
compaTB3 <- select(bind_cols(workTB,
                             select(resSM3,-time)),
                   time,IT,IT1,DE,DE1, FR, FR1)
ITq3<-qplot(time,IT,data=compaTB3) +
  geom_line(colour="navyblue",
            position=position_jitter(width=0,height=0.07)) +
  geom_line(aes(x=time,y=IT1),colour="red",
            position=position_jitter(width=0, 
                                     height=0.07)) +
  geom_point(aes(x=time,y=IT1),colour="red",
             shape=8,position=position_jitter(width=0,
                                              height=0.07))
DEq3<-qplot(time,DE,data=compaTB3) +
  geom_line(colour="navyblue",
            position=position_jitter(width=0, 
                                     height=0.07)) +
  geom_line(aes(x=time,y=DE1),colour="red",
            position=position_jitter(width=0, 
                                     height=0.07)) +
  geom_point(aes(x=time,y=DE1),colour="red",shape=8,
             position=position_jitter(width=0, 
                                      height=0.07))
FRq3<-qplot(time,FR,data=compaTB3) +
  geom_line(colour="navyblue",
            position=position_jitter(width=0, 
                                     height=0.07)) +
  geom_line(aes(x=time,y=FR1),colour="red", 
            position=position_jitter(width=0, 
                                     height=0.07)) +
  geom_point(aes(x=time,y=FR1),colour="red",
             position=position_jitter(width=0, 
                                      height=0.07))
grid.arrange(ITq3, DEq3, FRq3, nrow=3)

A time window larger than 3 could be considered, but deep thoughts are recommended about economic and social changes that may happen in EU during 5 consecutive years.

3.4 Moving Average smoother

Several alternative smoothing algorithm are available in the R software. Classical moving average smoothers are described in the caTools package. See for example:

require(caTools)
help("runmean")

In the convergEU package, smoother based on moving average is implemented in the function ma_dataset:

help(ma_dataset)

where the time window of the moving average is set through the argument kappa. Following, we are going to compute smoother based on moving average for the TBhealthF dataset by considering the countries Italy (“IT”), Germany (“DE”) and France (“FR”). Thus, the time by countries dataset is:

cuTB1 <-  TBhealthF[,c("time","IT","DE","FR")]

We proceed by considering the following time windows of the moving average:

ma3<-ma_dataset(cuTB1, kappa=3, timeName= "time")
ma4<-ma_dataset(cuTB1, kappa=4, timeName= "time")
ma5<-ma_dataset(cuTB1, kappa=5, timeName= "time")

The output is a standard tidy dataset of smoothed values; for example the ma4 dataset with \(k=4\):

ma4
#> $res
#> # A tibble: 16 x 4
#>     time    IT    DE    FR
#>    <dbl> <dbl> <dbl> <dbl>
#>  1  2003  3.2   4.9   3.9 
#>  2  2004  3.03  4.78  4.28
#>  3  2005  2.98  4.5   4.08
#>  4  2006  2.82  3.98  3.95
#>  5  2007  2.85  3.6   3.7 
#>  6  2008  2.62  3.4   3.42
#>  7  2009  2.72  3.3   3.72
#>  8  2010  2.85  3.72  4.1 
#>  9  2011  2.98  4     4.28
#> 10  2012  3.25  4.53  4.25
#> 11  2013  3.2   4.9   3.9 
#> 12  2014  2.88  4.85  3.92
#> 13  2015  2.33  4.45  3.8 
#> 14  2016  1.8   3.9   3.95
#> 15  2017  1     2.9   3.6 
#> 16  2018  1.2   3.5   4.3 
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

The Figure below shows results for Italy, Germany and France for different degrees of smoothing: original (black), \(k=3\) (red), \(k=4\) (blue), \(k=5\) (green):

myGit <- ggplot(cuTB1,aes(x=time,y=cuTB1$IT))+geom_line()+geom_point()+
  geom_line(aes(x=time,y=ma3$res$IT),colour="red")+
  geom_point(aes(x=time,y=ma3$res$IT),colour="red")+
  labs(y="IT")+
  #
  geom_line(aes(x=time,y=ma4$res$IT),colour="blue")+
  geom_point(aes(x=time,y=ma4$res$IT),colour="blue")+
  #
  geom_line(aes(x=time,y=ma5$res$IT),colour="green")+
  geom_point(aes(x=time,y=ma5$res$IT),colour="green")

myGde <- ggplot(cuTB1,aes(x=time,y=cuTB1$DE))+geom_line()+geom_point()+
  geom_line(aes(x=time,y=ma3$res$DE),colour="red")+
  geom_point(aes(x=time,y=ma3$res$DE),colour="red")+
  labs(y="DE")+
  #
  geom_line(aes(x=time,y=ma4$res$DE),colour="blue")+
  geom_point(aes(x=time,y=ma4$res$DE),colour="blue")+
  #
  geom_line(aes(x=time,y=ma5$res$DE),colour="green")+
  geom_point(aes(x=time,y=ma5$res$DE),colour="green")

myGfr <- ggplot(cuTB1,aes(x=time,y=cuTB1$FR))+geom_line()+geom_point()+
  geom_line(aes(x=time,y=ma3$res$FR),colour="red")+
  geom_point(aes(x=time,y=ma3$res$FR),colour="red")+
  labs(y="FR")+
  #
  geom_line(aes(x=time,y=ma4$res$FR),colour="blue")+
  geom_point(aes(x=time,y=ma4$res$FR),colour="blue")+
  #
  geom_line(aes(x=time,y=ma5$res$FR),colour="green")+
  geom_point(aes(x=time,y=ma5$res$FR),colour="green")

grid.arrange(myGit, myGde, myGfr, nrow=3)

It must be noted that typically the time series is so short (for example with 15 observations) that at \(k=7\) a lot of observations are smoothed with different number of observations (shorter at start and end). Therefore, in the considered context it is typical to perform the smoothing for a maximum of 5 consecutive years.

4 Data stored and/or downloaded in external Excel files

It often happens that the user could have data stored in some external source files. For example, usually the data are stored within Excel files. In this Section we illustrate in details how to import and prepare for analysis of convergence data stored in several types of Excel files. More precisely, in the following Subsection, we describe how to import and process data stored in two types of Excel file:

In Subsection 4.2 instead we describe how to import and process Excel files downloaded directly from the Eurostat web site.

4.1 Data import and processing of external Excel files

4.1.1 Import data for .xls and .xlsx files

First, let’s suppose that we have data for a given indicator, say for example the demo_mlexpec (Life expectancy by age and sex) indicator stored locally in an .xlsx or .xls Excel files. To this end, in the R package readxl the function read_excel is implemented that allows to import data stored in both .xls and .xlsx Excel files:

require(readxl)
help("read_excel")

To import the data from the .xlsx file for the indicator demo_mlexpec, we proceed as follows:

require(readxl)
myExc1<-read_excel(file.path("vign","excel1.xlsx"), sheet = "Data")
print(myExc1,n=35,width=250)
#> # A tibble: 16 x 29
#>     time    BE    DK    FR    DE    EL    IE    IT    LU    NL    PT    ES    UK
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  2001  77.5  76.4  78.6  77.9  78.5  76.6  79.6  77.5  77.8  76.6  79.1  77.7
#>  2  2002  77.5  76.4  78.7  77.9  78.6  77.1  79.7  77.5  77.8  76.8  79.2  77.7
#>  3  2003  77.6  76.8  78.6  78    78.6  77.6  79.5  77.2  78.1  76.8  79    77.8
#>  4  2004  78.3  77.2  79.7  78.6  78.7  78    80.2  78.6  78.7  77.7  79.7  78.4
#>  5  2005  78.4  77.6  79.7  78.7  78.9  78.3  80.2  78.8  79    77.5  79.6  78.6
#>  6  2006  78.8  77.7  80.3  79.2  79.2  78.6  80.7  78.6  79.3  78.3  80.4  78.9
#>  7  2007  79.2  77.7  80.6  79.4  79    79    80.8  78.7  79.7  78.5  80.4  79.1
#>  8  2008  79.1  78.1  80.7  79.5  79.5  79.5  80.9  79.8  79.8  78.8  80.8  79.2
#>  9  2009  79.4  78.3  80.9  79.6  79.6  79.5  81.1  80    80.2  79    81.2  79.8
#> 10  2010  79.6  78.6  81.1  79.8  79.9  80.1  81.5  80.1  80.3  79.3  81.6  80  
#> 11  2011  79.9  79.1  81.6  79.9  80    80.1  81.6  80.4  80.6  79.9  81.8  80.4
#> 12  2012  79.8  79.4  81.4  80    79.9  80.2  81.6  80.7  80.5  79.8  81.8  80.3
#> 13  2013  80    79.6  81.7  79.9  80.7  80.3  82.1  81.2  80.7  80.1  82.4  80.4
#> 14  2014  80.6  80.1  82.2  80.4  80.8  80.7  82.5  81.5  81.1  80.6  82.5  80.7
#> 15  2015  80.4  80.1  81.8  80    80.4  80.8  81.9  81.6  80.9  80.5  82.2  80.3
#> 16  2016  80.8  80.2  82    80.3  80.8  81    82.6  82    81    80.6  82.7  80.6
#>       AT    FI    SE    CY    CZ    EE    HU    LV    LT    MT    PL    SK    SI
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  78.2  77.5  79.2  78.4  74.6  70.5  72.1  NA    71.2  78.2  73.8  73.1  75.7
#>  2  78.2  77.6  79.3  78    74.7  70.8  72.1  69.9  71.3  78.3  74.1  73.3  75.9
#>  3  78.1  77.8  79.5  78.3  74.6  71.4  72.1  70.3  71.5  78.1  74.2  73.4  75.7
#>  4  78.7  78.2  79.9  78.4  75.2  71.9  72.5  70.6  71.5  78.8  74.4  73.7  76.5
#>  5  78.9  78.4  79.9  78    75.4  72.4  72.4  70.2  70.7  78.9  74.5  73.7  76.8
#>  6  79.4  78.8  80.2  79.4  76    72.5  72.9  70.1  70.5  78.8  74.8  73.9  77.5
#>  7  79.6  78.8  80.3  79.1  76.3  72.6  73    70.4  70.2  79.5  74.8  74    77.6
#>  8  79.9  79.1  80.5  79.9  76.5  73.7  73.6  71.6  71.1  79.4  75.1  74.4  78.3
#>  9  79.8  79.3  80.7  80.2  76.6  74.6  73.8  72.3  72.4  79.8  75.3  74.8  78.5
#> 10  80.1  79.4  80.9  80.8  76.9  75.3  74.1  72.5  72.6  80.9  75.8  75    79  
#> 11  80.4  79.8  81    80.5  77.2  75.8  74.5  73.4  73.1  80.4  76.2  75.5  79.4
#> 12  80.3  79.9  81    80.4  77.4  76    74.6  73.6  73.4  80.3  76.2  75.7  79.4
#> 13  80.5  80.3  81.2  81.7  77.5  76.7  75.2  73.7  73.4  81.4  76.5  76    79.7
#> 14  80.9  80.5  81.5  81.5  78.1  76.6  75.3  73.7  74    81.5  77.1  76.4  80.4
#> 15  80.6  80.7  81.4  81    77.9  77.2  75    74.1  73.9  81.5  76.9  76.1  80  
#> 16  81    80.6  81.6  81.9  78.4  77.2  75.5  74.1  74.3  82.2  77.3  76.7  80.4
#>       BG    RO    HR
#>    <dbl> <dbl> <dbl>
#>  1  71.9  71.4  74.1
#>  2  72.1  71.1  74.2
#>  3  72.2  71.2  74  
#>  4  72.4  71.6  74.9
#>  5  72.3  72    74.7
#>  6  72.5  72.5  75.3
#>  7  72.7  73    75.2
#>  8  73    73.4  75.3
#>  9  73.4  73.5  75.7
#> 10  73.5  73.5  76.1
#> 11  73.9  74.2  76.5
#> 12  74    74.2  76.6
#> 13  74.5  74.8  77.1
#> 14  74.1  74.6  77.3
#> 15  74.1  74.5  76.8
#> 16  74.4  74.9  77.5

where “excel1.xlsx” is the path (unit and folders) in which the Excel file is stored. The sheet argument specifies the sheet name in which the data are stored, in our example named “Data”; note that if the sheet is not specified, the default is to read in the first available one.

Note that by default the read_excel function treats blank cells as missing values. Thus, if missing values are marked in any other way different from blank cells, the argument na should be specified accordingly. For example, suppose that the “excel1.xlsx” file (sheet “Datamiss”) contains missing values marked by “.” instead of blank cells. Thus, to correctly import the data, we specify accordingly the na argument as follows:

myExc1<-read_excel(file.path("vign","excel1.xlsx"), sheet = "Datamiss", na=".")
print(myExc1,n=35,width=250)
#> # A tibble: 16 x 29
#>     time    BE    DK    FR    DE    EL    IE    IT    LU    NL    PT    ES    UK
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  2001  77.5  76.4  78.6  77.9  78.5  76.6  79.6  77.5  77.8  76.6  79.1  77.7
#>  2  2002  77.5  76.4  78.7  77.9  78.6  77.1  79.7  77.5  77.8  76.8  79.2  77.7
#>  3  2003  77.6  76.8  78.6  78    78.6  77.6  79.5  77.2  78.1  76.8  79    77.8
#>  4  2004  78.3  77.2  79.7  78.6  78.7  78    80.2  78.6  78.7  77.7  79.7  78.4
#>  5  2005  78.4  77.6  79.7  78.7  78.9  78.3  80.2  78.8  79    77.5  79.6  78.6
#>  6  2006  78.8  77.7  80.3  79.2  79.2  78.6  80.7  78.6  79.3  78.3  80.4  78.9
#>  7  2007  79.2  77.7  80.6  79.4  79    79    80.8  78.7  79.7  78.5  80.4  79.1
#>  8  2008  79.1  78.1  80.7  79.5  79.5  79.5  80.9  79.8  79.8  78.8  80.8  79.2
#>  9  2009  79.4  78.3  80.9  79.6  79.6  79.5  81.1  80    80.2  79    81.2  79.8
#> 10  2010  79.6  78.6  81.1  79.8  79.9  80.1  81.5  80.1  80.3  79.3  81.6  80  
#> 11  2011  79.9  79.1  81.6  79.9  80    80.1  81.6  80.4  80.6  79.9  81.8  80.4
#> 12  2012  79.8  79.4  81.4  80    79.9  80.2  81.6  80.7  80.5  79.8  81.8  80.3
#> 13  2013  80    79.6  81.7  79.9  80.7  80.3  82.1  81.2  80.7  80.1  82.4  80.4
#> 14  2014  80.6  80.1  82.2  80.4  80.8  80.7  82.5  81.5  81.1  80.6  82.5  80.7
#> 15  2015  80.4  80.1  81.8  80    80.4  80.8  81.9  81.6  80.9  80.5  82.2  80.3
#> 16  2016  80.8  80.2  82    80.3  80.8  81    82.6  82    81    80.6  82.7  80.6
#>       AT    FI    SE    CY    CZ    EE    HU    LV    LT    MT    PL    SK    SI
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  78.2  77.5  79.2  78.4  74.6  70.5  72.1  NA    71.2  78.2  73.8  73.1  75.7
#>  2  78.2  77.6  79.3  78    74.7  70.8  72.1  69.9  71.3  78.3  74.1  73.3  75.9
#>  3  78.1  77.8  79.5  78.3  74.6  71.4  72.1  70.3  71.5  78.1  74.2  73.4  75.7
#>  4  78.7  78.2  79.9  78.4  75.2  71.9  72.5  70.6  71.5  78.8  74.4  73.7  76.5
#>  5  78.9  78.4  79.9  78    75.4  72.4  72.4  70.2  70.7  78.9  74.5  73.7  76.8
#>  6  79.4  78.8  80.2  79.4  76    72.5  72.9  70.1  70.5  78.8  74.8  73.9  77.5
#>  7  79.6  78.8  80.3  79.1  76.3  72.6  73    70.4  70.2  79.5  74.8  74    77.6
#>  8  79.9  79.1  80.5  79.9  76.5  73.7  73.6  71.6  71.1  79.4  75.1  74.4  78.3
#>  9  79.8  79.3  80.7  80.2  76.6  74.6  73.8  72.3  72.4  79.8  75.3  74.8  78.5
#> 10  80.1  79.4  80.9  80.8  76.9  75.3  74.1  72.5  72.6  80.9  75.8  75    79  
#> 11  80.4  79.8  81    80.5  77.2  75.8  74.5  73.4  73.1  80.4  76.2  75.5  79.4
#> 12  80.3  79.9  81    80.4  77.4  76    74.6  73.6  73.4  80.3  76.2  75.7  79.4
#> 13  80.5  80.3  81.2  81.7  77.5  76.7  75.2  73.7  73.4  81.4  76.5  76    79.7
#> 14  80.9  80.5  81.5  81.5  78.1  76.6  75.3  73.7  74    81.5  77.1  76.4  80.4
#> 15  80.6  80.7  81.4  81    77.9  77.2  75    74.1  73.9  81.5  76.9  76.1  80  
#> 16  81    80.6  81.6  81.9  78.4  77.2  75.5  74.1  74.3  82.2  77.3  76.7  80.4
#>       BG    RO    HR
#>    <dbl> <dbl> <dbl>
#>  1  71.9  71.4  74.1
#>  2  72.1  71.1  74.2
#>  3  72.2  71.2  74  
#>  4  72.4  71.6  74.9
#>  5  72.3  72    74.7
#>  6  72.5  72.5  75.3
#>  7  72.7  73    75.2
#>  8  73    73.4  75.3
#>  9  73.4  73.5  75.7
#> 10  73.5  73.5  76.1
#> 11  73.9  74.2  76.5
#> 12  74    74.2  76.6
#> 13  74.5  74.8  77.1
#> 14  74.1  74.6  77.3
#> 15  74.1  74.5  76.8
#> 16  74.4  74.9  77.5

The dataset myExc1 contains the data imported from the .xlsx file:

print(myExc1,n=35,width=250)
#> # A tibble: 16 x 29
#>     time    BE    DK    FR    DE    EL    IE    IT    LU    NL    PT    ES    UK
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  2001  77.5  76.4  78.6  77.9  78.5  76.6  79.6  77.5  77.8  76.6  79.1  77.7
#>  2  2002  77.5  76.4  78.7  77.9  78.6  77.1  79.7  77.5  77.8  76.8  79.2  77.7
#>  3  2003  77.6  76.8  78.6  78    78.6  77.6  79.5  77.2  78.1  76.8  79    77.8
#>  4  2004  78.3  77.2  79.7  78.6  78.7  78    80.2  78.6  78.7  77.7  79.7  78.4
#>  5  2005  78.4  77.6  79.7  78.7  78.9  78.3  80.2  78.8  79    77.5  79.6  78.6
#>  6  2006  78.8  77.7  80.3  79.2  79.2  78.6  80.7  78.6  79.3  78.3  80.4  78.9
#>  7  2007  79.2  77.7  80.6  79.4  79    79    80.8  78.7  79.7  78.5  80.4  79.1
#>  8  2008  79.1  78.1  80.7  79.5  79.5  79.5  80.9  79.8  79.8  78.8  80.8  79.2
#>  9  2009  79.4  78.3  80.9  79.6  79.6  79.5  81.1  80    80.2  79    81.2  79.8
#> 10  2010  79.6  78.6  81.1  79.8  79.9  80.1  81.5  80.1  80.3  79.3  81.6  80  
#> 11  2011  79.9  79.1  81.6  79.9  80    80.1  81.6  80.4  80.6  79.9  81.8  80.4
#> 12  2012  79.8  79.4  81.4  80    79.9  80.2  81.6  80.7  80.5  79.8  81.8  80.3
#> 13  2013  80    79.6  81.7  79.9  80.7  80.3  82.1  81.2  80.7  80.1  82.4  80.4
#> 14  2014  80.6  80.1  82.2  80.4  80.8  80.7  82.5  81.5  81.1  80.6  82.5  80.7
#> 15  2015  80.4  80.1  81.8  80    80.4  80.8  81.9  81.6  80.9  80.5  82.2  80.3
#> 16  2016  80.8  80.2  82    80.3  80.8  81    82.6  82    81    80.6  82.7  80.6
#>       AT    FI    SE    CY    CZ    EE    HU    LV    LT    MT    PL    SK    SI
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  78.2  77.5  79.2  78.4  74.6  70.5  72.1  NA    71.2  78.2  73.8  73.1  75.7
#>  2  78.2  77.6  79.3  78    74.7  70.8  72.1  69.9  71.3  78.3  74.1  73.3  75.9
#>  3  78.1  77.8  79.5  78.3  74.6  71.4  72.1  70.3  71.5  78.1  74.2  73.4  75.7
#>  4  78.7  78.2  79.9  78.4  75.2  71.9  72.5  70.6  71.5  78.8  74.4  73.7  76.5
#>  5  78.9  78.4  79.9  78    75.4  72.4  72.4  70.2  70.7  78.9  74.5  73.7  76.8
#>  6  79.4  78.8  80.2  79.4  76    72.5  72.9  70.1  70.5  78.8  74.8  73.9  77.5
#>  7  79.6  78.8  80.3  79.1  76.3  72.6  73    70.4  70.2  79.5  74.8  74    77.6
#>  8  79.9  79.1  80.5  79.9  76.5  73.7  73.6  71.6  71.1  79.4  75.1  74.4  78.3
#>  9  79.8  79.3  80.7  80.2  76.6  74.6  73.8  72.3  72.4  79.8  75.3  74.8  78.5
#> 10  80.1  79.4  80.9  80.8  76.9  75.3  74.1  72.5  72.6  80.9  75.8  75    79  
#> 11  80.4  79.8  81    80.5  77.2  75.8  74.5  73.4  73.1  80.4  76.2  75.5  79.4
#> 12  80.3  79.9  81    80.4  77.4  76    74.6  73.6  73.4  80.3  76.2  75.7  79.4
#> 13  80.5  80.3  81.2  81.7  77.5  76.7  75.2  73.7  73.4  81.4  76.5  76    79.7
#> 14  80.9  80.5  81.5  81.5  78.1  76.6  75.3  73.7  74    81.5  77.1  76.4  80.4
#> 15  80.6  80.7  81.4  81    77.9  77.2  75    74.1  73.9  81.5  76.9  76.1  80  
#> 16  81    80.6  81.6  81.9  78.4  77.2  75.5  74.1  74.3  82.2  77.3  76.7  80.4
#>       BG    RO    HR
#>    <dbl> <dbl> <dbl>
#>  1  71.9  71.4  74.1
#>  2  72.1  71.1  74.2
#>  3  72.2  71.2  74  
#>  4  72.4  71.6  74.9
#>  5  72.3  72    74.7
#>  6  72.5  72.5  75.3
#>  7  72.7  73    75.2
#>  8  73    73.4  75.3
#>  9  73.4  73.5  75.7
#> 10  73.5  73.5  76.1
#> 11  73.9  74.2  76.5
#> 12  74    74.2  76.6
#> 13  74.5  74.8  77.1
#> 14  74.1  74.6  77.3
#> 15  74.1  74.5  76.8
#> 16  74.4  74.9  77.5

The print output reveals that the dataset is in the required format, time by countries. Following, the data are checked for missing values and other unsuited features:

check_data(myExc1)
#> $res
#> NULL
#> 
#> $msg
#> NULL
#> 
#> $err
#> [1] "Error: one or more missing values in the dataframe."

revealing that missing values are present. Thus, imputation of the missing values is performed by invoking the impute_dataset function:

TBex <- impute_dataset(myExc1, timeName = "time",
                        countries=convergEU_glb()$EU27_2020$memberStates$codeMS,
                        headMiss = c("cut", "constant")[2])$res 
check_data(TBex)
#> $res
#> [1] TRUE
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

The check_data function, called again on the imputed dataset, confirms that TBexF is the final dataset to perform the analysis of convergence.

Note that when importing data from Excel files, a particular attention should be also devoted to the name assigned to the variables in the dataset. As an example, consider that the data imported for the demo_mlexpec indicator are as follows:

myExc3<-read_excel(file.path("vign","excel1.xlsx"), sheet = "Datalab")
print(myExc3,n=35,width=250)
#> # A tibble: 16 x 29
#>     time    BE    DK France    DE    EL    IE    IT    LU    NL    PT    ES
#>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  2001  77.5  76.4   78.6  77.9  78.5  76.6  79.6  77.5  77.8  76.6  79.1
#>  2  2002  77.5  76.4   78.7  77.9  78.6  77.1  79.7  77.5  77.8  76.8  79.2
#>  3  2003  77.6  76.8   78.6  78    78.6  77.6  79.5  77.2  78.1  76.8  79  
#>  4  2004  78.3  77.2   79.7  78.6  78.7  78    80.2  78.6  78.7  77.7  79.7
#>  5  2005  78.4  77.6   79.7  78.7  78.9  78.3  80.2  78.8  79    77.5  79.6
#>  6  2006  78.8  77.7   80.3  79.2  79.2  78.6  80.7  78.6  79.3  78.3  80.4
#>  7  2007  79.2  77.7   80.6  79.4  79    79    80.8  78.7  79.7  78.5  80.4
#>  8  2008  79.1  78.1   80.7  79.5  79.5  79.5  80.9  79.8  79.8  78.8  80.8
#>  9  2009  79.4  78.3   80.9  79.6  79.6  79.5  81.1  80    80.2  79    81.2
#> 10  2010  79.6  78.6   81.1  79.8  79.9  80.1  81.5  80.1  80.3  79.3  81.6
#> 11  2011  79.9  79.1   81.6  79.9  80    80.1  81.6  80.4  80.6  79.9  81.8
#> 12  2012  79.8  79.4   81.4  80    79.9  80.2  81.6  80.7  80.5  79.8  81.8
#> 13  2013  80    79.6   81.7  79.9  80.7  80.3  82.1  81.2  80.7  80.1  82.4
#> 14  2014  80.6  80.1   82.2  80.4  80.8  80.7  82.5  81.5  81.1  80.6  82.5
#> 15  2015  80.4  80.1   81.8  80    80.4  80.8  81.9  81.6  80.9  80.5  82.2
#> 16  2016  80.8  80.2   82    80.3  80.8  81    82.6  82    81    80.6  82.7
#>       UK    AT Finland    SE    CY    CZ    EE    HU    LV    LT    MT    PL
#>    <dbl> <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  77.7  78.2    77.5  79.2  78.4  74.6  70.5  72.1  NA    71.2  78.2  73.8
#>  2  77.7  78.2    77.6  79.3  78    74.7  70.8  72.1  69.9  71.3  78.3  74.1
#>  3  77.8  78.1    77.8  79.5  78.3  74.6  71.4  72.1  70.3  71.5  78.1  74.2
#>  4  78.4  78.7    78.2  79.9  78.4  75.2  71.9  72.5  70.6  71.5  78.8  74.4
#>  5  78.6  78.9    78.4  79.9  78    75.4  72.4  72.4  70.2  70.7  78.9  74.5
#>  6  78.9  79.4    78.8  80.2  79.4  76    72.5  72.9  70.1  70.5  78.8  74.8
#>  7  79.1  79.6    78.8  80.3  79.1  76.3  72.6  73    70.4  70.2  79.5  74.8
#>  8  79.2  79.9    79.1  80.5  79.9  76.5  73.7  73.6  71.6  71.1  79.4  75.1
#>  9  79.8  79.8    79.3  80.7  80.2  76.6  74.6  73.8  72.3  72.4  79.8  75.3
#> 10  80    80.1    79.4  80.9  80.8  76.9  75.3  74.1  72.5  72.6  80.9  75.8
#> 11  80.4  80.4    79.8  81    80.5  77.2  75.8  74.5  73.4  73.1  80.4  76.2
#> 12  80.3  80.3    79.9  81    80.4  77.4  76    74.6  73.6  73.4  80.3  76.2
#> 13  80.4  80.5    80.3  81.2  81.7  77.5  76.7  75.2  73.7  73.4  81.4  76.5
#> 14  80.7  80.9    80.5  81.5  81.5  78.1  76.6  75.3  73.7  74    81.5  77.1
#> 15  80.3  80.6    80.7  81.4  81    77.9  77.2  75    74.1  73.9  81.5  76.9
#> 16  80.6  81      80.6  81.6  81.9  78.4  77.2  75.5  74.1  74.3  82.2  77.3
#>       SK    SI Bulgaria    RO    HR
#>    <dbl> <dbl>    <dbl> <dbl> <dbl>
#>  1  73.1  75.7     71.9  71.4  74.1
#>  2  73.3  75.9     72.1  71.1  74.2
#>  3  73.4  75.7     72.2  71.2  74  
#>  4  73.7  76.5     72.4  71.6  74.9
#>  5  73.7  76.8     72.3  72    74.7
#>  6  73.9  77.5     72.5  72.5  75.3
#>  7  74    77.6     72.7  73    75.2
#>  8  74.4  78.3     73    73.4  75.3
#>  9  74.8  78.5     73.4  73.5  75.7
#> 10  75    79       73.5  73.5  76.1
#> 11  75.5  79.4     73.9  74.2  76.5
#> 12  75.7  79.4     74    74.2  76.6
#> 13  76    79.7     74.5  74.8  77.1
#> 14  76.4  80.4     74.1  74.6  77.3
#> 15  76.1  80       74.1  74.5  76.8
#> 16  76.7  80.4     74.4  74.9  77.5

where the names of the columns related to the countries are a mixing of MS label (for example “BE” and “IT”) and MS names (for example “France” and “Finland”). Thus, when invoking the impute_dataset function, the arguments countries could not be longer specified with convergEU_glb()$EU27_2020$memberStates$codeMS as previously done, since it returns a NULL component:

colnames(myExc3)
#>  [1] "time"     "BE"       "DK"       "France"   "DE"       "EL"      
#>  [7] "IE"       "IT"       "LU"       "NL"       "PT"       "ES"      
#> [13] "UK"       "AT"       "Finland"  "SE"       "CY"       "CZ"      
#> [19] "EE"       "HU"       "LV"       "LT"       "MT"       "PL"      
#> [25] "SK"       "SI"       "Bulgaria" "RO"       "HR"
TBex1 <- impute_dataset(myExc3, timeName = "time",
                        countries=convergEU_glb()$EU27_2020$memberStates$codeMS,
                        headMiss = c("cut", "constant")[2])$res 
TBex1
#> NULL

A first strategy to overcome this issue is to extract the names of the countries as specified in the Excel file, and subsequently to specify these countries names in the argument countries as follows:

ms<-names(myExc3)[2:29]
TBexF4 <- impute_dataset(myExc3, timeName = "time",
                        countries=ms,
                        headMiss = c("cut", "constant")[2])$res 
print(TBexF4,n=35,width=250)
#> # A tibble: 16 x 29
#>     time    BE    DK France    DE    EL    IE    IT    LU    NL    PT    ES
#>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  2001  77.5  76.4   78.6  77.9  78.5  76.6  79.6  77.5  77.8  76.6  79.1
#>  2  2002  77.5  76.4   78.7  77.9  78.6  77.1  79.7  77.5  77.8  76.8  79.2
#>  3  2003  77.6  76.8   78.6  78    78.6  77.6  79.5  77.2  78.1  76.8  79  
#>  4  2004  78.3  77.2   79.7  78.6  78.7  78    80.2  78.6  78.7  77.7  79.7
#>  5  2005  78.4  77.6   79.7  78.7  78.9  78.3  80.2  78.8  79    77.5  79.6
#>  6  2006  78.8  77.7   80.3  79.2  79.2  78.6  80.7  78.6  79.3  78.3  80.4
#>  7  2007  79.2  77.7   80.6  79.4  79    79    80.8  78.7  79.7  78.5  80.4
#>  8  2008  79.1  78.1   80.7  79.5  79.5  79.5  80.9  79.8  79.8  78.8  80.8
#>  9  2009  79.4  78.3   80.9  79.6  79.6  79.5  81.1  80    80.2  79    81.2
#> 10  2010  79.6  78.6   81.1  79.8  79.9  80.1  81.5  80.1  80.3  79.3  81.6
#> 11  2011  79.9  79.1   81.6  79.9  80    80.1  81.6  80.4  80.6  79.9  81.8
#> 12  2012  79.8  79.4   81.4  80    79.9  80.2  81.6  80.7  80.5  79.8  81.8
#> 13  2013  80    79.6   81.7  79.9  80.7  80.3  82.1  81.2  80.7  80.1  82.4
#> 14  2014  80.6  80.1   82.2  80.4  80.8  80.7  82.5  81.5  81.1  80.6  82.5
#> 15  2015  80.4  80.1   81.8  80    80.4  80.8  81.9  81.6  80.9  80.5  82.2
#> 16  2016  80.8  80.2   82    80.3  80.8  81    82.6  82    81    80.6  82.7
#>       UK    AT Finland    SE    CY    CZ    EE    HU    LV    LT    MT    PL
#>    <dbl> <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  77.7  78.2    77.5  79.2  78.4  74.6  70.5  72.1  69.9  71.2  78.2  73.8
#>  2  77.7  78.2    77.6  79.3  78    74.7  70.8  72.1  69.9  71.3  78.3  74.1
#>  3  77.8  78.1    77.8  79.5  78.3  74.6  71.4  72.1  70.3  71.5  78.1  74.2
#>  4  78.4  78.7    78.2  79.9  78.4  75.2  71.9  72.5  70.6  71.5  78.8  74.4
#>  5  78.6  78.9    78.4  79.9  78    75.4  72.4  72.4  70.2  70.7  78.9  74.5
#>  6  78.9  79.4    78.8  80.2  79.4  76    72.5  72.9  70.1  70.5  78.8  74.8
#>  7  79.1  79.6    78.8  80.3  79.1  76.3  72.6  73    70.4  70.2  79.5  74.8
#>  8  79.2  79.9    79.1  80.5  79.9  76.5  73.7  73.6  71.6  71.1  79.4  75.1
#>  9  79.8  79.8    79.3  80.7  80.2  76.6  74.6  73.8  72.3  72.4  79.8  75.3
#> 10  80    80.1    79.4  80.9  80.8  76.9  75.3  74.1  72.5  72.6  80.9  75.8
#> 11  80.4  80.4    79.8  81    80.5  77.2  75.8  74.5  73.4  73.1  80.4  76.2
#> 12  80.3  80.3    79.9  81    80.4  77.4  76    74.6  73.6  73.4  80.3  76.2
#> 13  80.4  80.5    80.3  81.2  81.7  77.5  76.7  75.2  73.7  73.4  81.4  76.5
#> 14  80.7  80.9    80.5  81.5  81.5  78.1  76.6  75.3  73.7  74    81.5  77.1
#> 15  80.3  80.6    80.7  81.4  81    77.9  77.2  75    74.1  73.9  81.5  76.9
#> 16  80.6  81      80.6  81.6  81.9  78.4  77.2  75.5  74.1  74.3  82.2  77.3
#>       SK    SI Bulgaria    RO    HR
#>    <dbl> <dbl>    <dbl> <dbl> <dbl>
#>  1  73.1  75.7     71.9  71.4  74.1
#>  2  73.3  75.9     72.1  71.1  74.2
#>  3  73.4  75.7     72.2  71.2  74  
#>  4  73.7  76.5     72.4  71.6  74.9
#>  5  73.7  76.8     72.3  72    74.7
#>  6  73.9  77.5     72.5  72.5  75.3
#>  7  74    77.6     72.7  73    75.2
#>  8  74.4  78.3     73    73.4  75.3
#>  9  74.8  78.5     73.4  73.5  75.7
#> 10  75    79       73.5  73.5  76.1
#> 11  75.5  79.4     73.9  74.2  76.5
#> 12  75.7  79.4     74    74.2  76.6
#> 13  76    79.7     74.5  74.8  77.1
#> 14  76.4  80.4     74.1  74.6  77.3
#> 15  76.1  80       74.1  74.5  76.8
#> 16  76.7  80.4     74.4  74.9  77.5

where ms are the countries names extracted from the dataset myExc3. Thus, the impute_dataset function works with any country name or label as long as it is correctly specified in the countries argument.

A second strategy is to rename the MS names with their corresponding labels in order to be compliant with those reported in the convergEU_glb()$EU27_2020$memberStates$codeMS as follows:

myExc4<-myExc3 %>% dplyr::rename(FR=France,FI=Finland,BG=Bulgaria)

Note that in the function rename, firstly, the new name to be assigned is specified followed by the original name. For example, FR=France indicates that France should be renamed in FR. A further check if the countries label within the working dataset are compliant with those reported in the convergEU_glb() could be performed with the function not_in implemented in the convergEU package:

help(not_in)
cnmyExc<-colnames(myExc4[2:29])
cnConv<-convergEU_glb()$EU27_2020$memberStates$codeMS
not_in(cnmyExc,cnConv)
#>  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
#> [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [25] FALSE FALSE FALSE FALSE

where cnmyExc are the countries labels in the working dataset and cnConv are the countries labels in the object convergEU_glb()$EU27_2020$memberStates$codeMS. Note that the output FALSE confirms that all the countries labels in the myExc4 dataset are within the convergEU_glb()$EU27_2020$memberStates$codeMS object. Note also that deterministic linear imputation is not the only one available. Further possible methods of missing values imputation are available in other R packages, for example MICE, Amelia, missForest, Hmisc, mi.

Once the countries names are compliant with those reported in the object convergEU_glb()$EU27_2020$memberStates$codeMS, the countries argument in the impute_dataset function could be specified as:

TBexF5 <- impute_dataset(myExc4, timeName = "time",
                         countries=convergEU_glb()$EU27_2020$memberStates$codeMS,
                         headMiss = c("cut", "constant")[2])$res 

4.1.2 Import .csv files

Excel files in the format .csv are imported by invoking the read_csv function in the package readr:

require(readr)
help("read_csv")

Let’s suppose that the data for the demo_mlexpec indicator described in the previous Subsection are stored in a .csv file, and missing data are reported with “.” instead of blank cells. Thus, the data are imported as follows:

myCsv1<-read_csv(file.path("vign","excel2.csv"), na=".")

where “inst/vign/excel2.csv” is the path (eventually including disk unit and folders) in which the .csv file is stored, while the argument na indicates how the missing values are reported in the .csv file.

print(myCsv1, n=35, width = 250)
#> # A tibble: 16 x 29
#>     time    BE    DK    FR    DE    EL    IE    IT    LU    NL    PT    ES    UK
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  2001  77.5  76.4  78.6  77.9  78.5  76.6  79.6  77.5  77.8  76.6  79.1  77.7
#>  2  2002  77.5  76.4  78.7  77.9  78.6  77.1  79.7  77.5  77.8  76.8  79.2  77.7
#>  3  2003  77.6  76.8  78.6  78    78.6  77.6  79.5  77.2  78.1  76.8  79    77.8
#>  4  2004  78.3  77.2  79.7  78.6  78.7  78    80.2  78.6  78.7  77.7  79.7  78.4
#>  5  2005  78.4  77.6  79.7  78.7  78.9  78.3  80.2  78.8  79    77.5  79.6  78.6
#>  6  2006  78.8  77.7  80.3  79.2  79.2  78.6  80.7  78.6  79.3  78.3  80.4  78.9
#>  7  2007  79.2  77.7  80.6  79.4  79    79    80.8  78.7  79.7  78.5  80.4  79.1
#>  8  2008  79.1  78.1  80.7  79.5  79.5  79.5  80.9  79.8  79.8  78.8  80.8  79.2
#>  9  2009  79.4  78.3  80.9  79.6  79.6  79.5  81.1  80    80.2  79    81.2  79.8
#> 10  2010  79.6  78.6  81.1  79.8  79.9  80.1  81.5  80.1  80.3  79.3  81.6  80  
#> 11  2011  79.9  79.1  81.6  79.9  80    80.1  81.6  80.4  80.6  79.9  81.8  80.4
#> 12  2012  79.8  79.4  81.4  80    79.9  80.2  81.6  80.7  80.5  79.8  81.8  80.3
#> 13  2013  80    79.6  81.7  79.9  80.7  80.3  82.1  81.2  80.7  80.1  82.4  80.4
#> 14  2014  80.6  80.1  82.2  80.4  80.8  80.7  82.5  81.5  81.1  80.6  82.5  80.7
#> 15  2015  80.4  80.1  81.8  80    80.4  80.8  81.9  81.6  80.9  80.5  82.2  80.3
#> 16  2016  80.8  80.2  82    80.3  80.8  81    82.6  82    81    80.6  82.7  80.6
#>       AT    FI    SE    CY    CZ    EE    HU LV       LT    MT    PL    SK    SI
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  78.2  77.5  79.2  78.4  74.6  70.5  72.1 NA     71.2  78.2  73.8  73.1  75.7
#>  2  78.2  77.6  79.3  78    74.7  70.8  72.1 69.9   71.3  78.3  74.1  73.3  75.9
#>  3  78.1  77.8  79.5  78.3  74.6  71.4  72.1 70.3   71.5  78.1  74.2  73.4  75.7
#>  4  78.7  78.2  79.9  78.4  75.2  71.9  72.5 70.6   71.5  78.8  74.4  73.7  76.5
#>  5  78.9  78.4  79.9  78    75.4  72.4  72.4 70.2   70.7  78.9  74.5  73.7  76.8
#>  6  79.4  78.8  80.2  79.4  76    72.5  72.9 70.1   70.5  78.8  74.8  73.9  77.5
#>  7  79.6  78.8  80.3  79.1  76.3  72.6  73   70.4   70.2  79.5  74.8  74    77.6
#>  8  79.9  79.1  80.5  79.9  76.5  73.7  73.6 71.6   71.1  79.4  75.1  74.4  78.3
#>  9  79.8  79.3  80.7  80.2  76.6  74.6  73.8 72.3   72.4  79.8  75.3  74.8  78.5
#> 10  80.1  79.4  80.9  80.8  76.9  75.3  74.1 72.5   72.6  80.9  75.8  75    79  
#> 11  80.4  79.8  81    80.5  77.2  75.8  74.5 73.4   73.1  80.4  76.2  75.5  79.4
#> 12  80.3  79.9  81    80.4  77.4  76    74.6 73.6   73.4  80.3  76.2  75.7  79.4
#> 13  80.5  80.3  81.2  81.7  77.5  76.7  75.2 73.7   73.4  81.4  76.5  76    79.7
#> 14  80.9  80.5  81.5  81.5  78.1  76.6  75.3 73.7   74    81.5  77.1  76.4  80.4
#> 15  80.6  80.7  81.4  81    77.9  77.2  75   74.1   73.9  81.5  76.9  76.1  80  
#> 16  81    80.6  81.6  81.9  78.4  77.2  75.5 74.1   74.3  82.2  77.3  76.7  80.4
#>       BG    RO    HR
#>    <dbl> <dbl> <dbl>
#>  1  71.9  71.4  74.1
#>  2  72.1  71.1  74.2
#>  3  72.2  71.2  74  
#>  4  72.4  71.6  74.9
#>  5  72.3  72    74.7
#>  6  72.5  72.5  75.3
#>  7  72.7  73    75.2
#>  8  73    73.4  75.3
#>  9  73.4  73.5  75.7
#> 10  73.5  73.5  76.1
#> 11  73.9  74.2  76.5
#> 12  74    74.2  76.6
#> 13  74.5  74.8  77.1
#> 14  74.1  74.6  77.3
#> 15  74.1  74.5  76.8
#> 16  74.4  74.9  77.5

Once the data from the .csv are imported, the common operations of data checking and imputation of missing values are performed as described previously for the excel1.xlsx file.

4.2 Data import and processing for Excel files downloaded from the Eurostat web site

For a given indicator, data stored in Excel files could be downloaded directly from the Eurostat website: https://ec.europa.eu/eurostat/data/database. The Eurostat website allows to download data in both .xls and .csv formats. Once a given indicator is chosen, the user should click on the button Data explorer, and the data tables will be visualized. Note that the data tables allow for customization. Thus, it is convenient to visualize the data by choosing “Codes” for “Labeling”. Moreover, the data tables can also be handled in order to be in the required years by countries format. To better illustrate this points, we take as an example the indicator une_educ_a related to the Unemployment rate by sex, age and educational attainment - annual averages. From the Eurostat web site, we click on the Data explorer button (highlighted by the red arrow in Figure 2):
Figure 2: Data explorer button on the Eurostat web site

Figure 2: Data explorer button on the Eurostat web site

that automatically open the web site http://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=une_educ_a&lang=en containing the data tables for the une_educ_a indicator. Before downloading the data, we customize the data table by clicking on Codes for Labeling (see the green arrow in Figure 3). Note that by default the data are arranged in the format countries by time. Therefore, the user should click on the button highlighted by the red arrow in Figure 3 in order to arrange the data in the required format time by countries:
Figure 3: Data tables customization on the Eurostat web site

Figure 3: Data tables customization on the Eurostat web site

Lastly, to download the data we click on the Download button (blue arrow, Figure 3).

4.2.1 Data import for .xls downloaded Excel file

As outlined above, for the indicator une_educ_a, we proceed to download the .xls file from the web site http://appsso.eurostat.ec.europa.eu/nui/submitViewTableAction.do (blue arrow, Figure 3) by considering:

  • Codes for Labeling (green arrow, Figure 3);

  • Data tables arranged in the format years by countries (red arrow, Figure 3).

More precisely, to download the .xls file, the option Full extraction on separate sheets should be specified, as highlighted by the blue arrow in Figure 4:
Figure 4: Data extraction Eurostat web site (.xls file)

Figure 4: Data extraction Eurostat web site (.xls file)

Once the .xls is downloaded and properly saved in a local folder, we proceed to import the data by considering the first data table extracted in the shift “Data”. Note that each Excel shift contains different data tables, and also further heading information on the extracted data, e.g. the indicator’s name and label, dates of update and extraction, further conditioning variables (sex, age, education). Therefore, to properly select only the data on the indicator’s itself, a range of cells should be properly specified by the user. For example, for the une_educ_a indicator:

myxls1<-read_excel(file.path("vign","une_educ_a.xls"),
                   sheet="Data",range = "A12:AP22")

where “inst/vign/une_educ_a.xls” specifies the path (eventually including disk unit or folders) in which the file is stored; the argument sheet indicate the Excel sheet from which the data should be imported; the argument range specifies the cells in which the data are arranged.

print(myxls1, n=100, width = 250)
#> # A tibble: 10 x 42
#>    `TIME/GEO`  EU28  EU15  EA19  EA18  EA17    BE    BG    CZ    DK    DE    EE
#>    <chr>      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 2009        14.4  14.5  15    14.9  14.9  13.6  15.5  24.1   9.1  15.7  28.1
#>  2 2010        15.7  15.7  16.2  16.2  16.1  15.3  22.7  25    11.1  14.7  30.5
#>  3 2011        16.2  16.2  16.7  16.7  16.7  13.9  26.4  24.3  11.3  13    25.7
#>  4 2012        18.2  18.3  19.2  19.1  19.1  14    28    28.5  11.7  12.2  23.1
#>  5 2013        19.2  19.5  20.5  20.5  20.5  15.8  29.9  25.6  11.1  11.9  15  
#>  6 2014        18.5  18.8  20.1  20.1  20.1  16.2  28.3  22.1  10.3  11.8  13  
#>  7 2015        17.3  17.6  18.9  18.9  18.9  16.8  25.1  22.7   9.7  11.2  12.2
#>  8 2016        16.1  16.4  17.8  17.7  17.7  15.9  22.2  20.5   9.1  10.1  12.7
#>  9 2017        14.7  15.1  16.4  16.3  16.3  14.6  18.1  13.1   9     9.5  10.9
#> 10 2018        13.2  13.6  14.8  14.8  14.8  13.2  15.5  10.7   8     8.8  10.3
#>       IE    EL    ES FR       FX    HR    IT    CY    LV    LT    LU    HU    MT
#>    <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  18.1   9.5  24.5 :      13.6   9.9   9.5   6.1  31.1  29.7   8.1  23.2   9  
#>  2  22.1  12.7  27.1 :      14.5  13    10.3   7.1  32.8  39.4   6.1  25.1   9.6
#>  3  24    18.1  28.9 :      14.4  17.4  10.6   7.5  29.2  38.4   8.3  25     9  
#>  4  25.8  26    33.7 :      15.2  18.6  13.6  13.4  26.6  34.7   8.4  24.8   9.5
#>  5  22.2  29.8  35.3 :      16.2  21.5  15.9  19.3  25.1  32.8  10.2  23.7   9.5
#>  6  19.8  28.2  33.8 17.1…  16.3  25.7  16.6  19.4  24    29.8  10.2  18.5   9  
#>  7  17.1  26.7  31   17.6…  16.9  21.5  15.6  18.5  21.9  26.3  10.6  17.4   8.7
#>  8  14.5  26.4  28   18     17.3  17.4  15.7  15.6  20.6  25.1   9.9  13.2   7.4
#>  9  11.6  24.3  25   17     16.3  19.8  15.5  14.1  18.8  20.9   8.9  11.1   6.1
#> 10   9.9  22.3  22.1 16.1…  15.5  11.6  14.6   9.8  16.5  17.9   8.4  10.3   5.6
#>       NL    AT    PL    PT    RO    SI    SK    FI    SE    UK    IS    NO CH   
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#>  1   6.8  10.7  14.7  10.3   7.5   8.8  41.5  14.8  15.8  12.9  10.2   6.5 :    
#>  2   8.1   9.2  17.5  11.8   5.9  11.7  44.2  16.1  17    13.8  10.5   7.3 8    
#>  3   7.8   9.1  18.3  13.9   7.2  13.4  42.5  16.1  16.5  14.1  10.3   6.9 8    
#>  4   9.3   9.8  19.5  16.5   6.7  14.8  44.6  16    17.6  13.9  10.3   6.2 7.79…
#>  5  11.3  10.4  20.5  17.4   6.7  17.8  42.4  17.1  18.8  13.8   8.4   7.7 8.30…
#>  6  12.1  11.4  19    15.3   6.7  15.4  41.1  17.2  19.1  11.3   7     7.5 8.69…
#>  7  11.2  11.2  16.9  13.6   8.1  13.9  37.6  17.6  18.9   9.6   6.1   9.3 8.80…
#>  8   9.8  12.7  14.5  12.1   7.6  14.5  31.6  16.7  18.8   8.3   4.5   9.9 9.40…
#>  9   8.3  13    12.2   9.8   6.8  11.1  29.8  17.9  18.5   7.3   4.7   9.4 8.30…
#> 10   6.6  11.4   9.9   7.4   5.8   8.6  29.8  15.9  19     6.4   4.5   8.4 8.30…
#>    ME                    MK RS                    TR
#>    <chr>              <dbl> <chr>              <dbl>
#>  1 :                   38.6 :                   12.2
#>  2 :                   39   :                   10.2
#>  3 30.300000000000001  37.6 :                    8.1
#>  4 34.899999999999999  37.8 :                    7.5
#>  5 41.399999999999999  34.3 :                    8.1
#>  6 32.200000000000003  32.2 18.199999999999999   9.3
#>  7 28.300000000000001  29.9 15.6                 9.7
#>  8 24.600000000000001  29.2 13.1                 9.9
#>  9 22.199999999999999  26.5 11.6                 9.6
#> 10 20.100000000000001  23.8 12.6                 9.9

Note that the print output reveals the characters “:” which indicate missing values. Thus, the na argument should be properly specified in order to correctly import such type of data as missing values. Moreover, all the variables should be numeric, including the time variable that actually is of type “chr”; therefore, also the argument col_types that allows to specify several collection of strings, should be settled to “numeric” when importing the data from the .xls file:

require(readxl)
myxls2<-read_excel(file.path("vign","une_educ_a.xls"),
                   sheet="Data",range = "A12:AP22", na=":", col_types = "numeric")
print(myxls2, n=100, width = 250)
#> # A tibble: 10 x 42
#>    `TIME/GEO`  EU28  EU15  EA19  EA18  EA17    BE    BG    CZ    DK    DE    EE
#>         <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1       2009  14.4  14.5  15    14.9  14.9  13.6  15.5  24.1   9.1  15.7  28.1
#>  2       2010  15.7  15.7  16.2  16.2  16.1  15.3  22.7  25    11.1  14.7  30.5
#>  3       2011  16.2  16.2  16.7  16.7  16.7  13.9  26.4  24.3  11.3  13    25.7
#>  4       2012  18.2  18.3  19.2  19.1  19.1  14    28    28.5  11.7  12.2  23.1
#>  5       2013  19.2  19.5  20.5  20.5  20.5  15.8  29.9  25.6  11.1  11.9  15  
#>  6       2014  18.5  18.8  20.1  20.1  20.1  16.2  28.3  22.1  10.3  11.8  13  
#>  7       2015  17.3  17.6  18.9  18.9  18.9  16.8  25.1  22.7   9.7  11.2  12.2
#>  8       2016  16.1  16.4  17.8  17.7  17.7  15.9  22.2  20.5   9.1  10.1  12.7
#>  9       2017  14.7  15.1  16.4  16.3  16.3  14.6  18.1  13.1   9     9.5  10.9
#> 10       2018  13.2  13.6  14.8  14.8  14.8  13.2  15.5  10.7   8     8.8  10.3
#>       IE    EL    ES    FR    FX    HR    IT    CY    LV    LT    LU    HU    MT
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  18.1   9.5  24.5  NA    13.6   9.9   9.5   6.1  31.1  29.7   8.1  23.2   9  
#>  2  22.1  12.7  27.1  NA    14.5  13    10.3   7.1  32.8  39.4   6.1  25.1   9.6
#>  3  24    18.1  28.9  NA    14.4  17.4  10.6   7.5  29.2  38.4   8.3  25     9  
#>  4  25.8  26    33.7  NA    15.2  18.6  13.6  13.4  26.6  34.7   8.4  24.8   9.5
#>  5  22.2  29.8  35.3  NA    16.2  21.5  15.9  19.3  25.1  32.8  10.2  23.7   9.5
#>  6  19.8  28.2  33.8  17.1  16.3  25.7  16.6  19.4  24    29.8  10.2  18.5   9  
#>  7  17.1  26.7  31    17.6  16.9  21.5  15.6  18.5  21.9  26.3  10.6  17.4   8.7
#>  8  14.5  26.4  28    18    17.3  17.4  15.7  15.6  20.6  25.1   9.9  13.2   7.4
#>  9  11.6  24.3  25    17    16.3  19.8  15.5  14.1  18.8  20.9   8.9  11.1   6.1
#> 10   9.9  22.3  22.1  16.2  15.5  11.6  14.6   9.8  16.5  17.9   8.4  10.3   5.6
#>       NL    AT    PL    PT    RO    SI    SK    FI    SE    UK    IS    NO    CH
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1   6.8  10.7  14.7  10.3   7.5   8.8  41.5  14.8  15.8  12.9  10.2   6.5  NA  
#>  2   8.1   9.2  17.5  11.8   5.9  11.7  44.2  16.1  17    13.8  10.5   7.3   8  
#>  3   7.8   9.1  18.3  13.9   7.2  13.4  42.5  16.1  16.5  14.1  10.3   6.9   8  
#>  4   9.3   9.8  19.5  16.5   6.7  14.8  44.6  16    17.6  13.9  10.3   6.2   7.8
#>  5  11.3  10.4  20.5  17.4   6.7  17.8  42.4  17.1  18.8  13.8   8.4   7.7   8.3
#>  6  12.1  11.4  19    15.3   6.7  15.4  41.1  17.2  19.1  11.3   7     7.5   8.7
#>  7  11.2  11.2  16.9  13.6   8.1  13.9  37.6  17.6  18.9   9.6   6.1   9.3   8.8
#>  8   9.8  12.7  14.5  12.1   7.6  14.5  31.6  16.7  18.8   8.3   4.5   9.9   9.4
#>  9   8.3  13    12.2   9.8   6.8  11.1  29.8  17.9  18.5   7.3   4.7   9.4   8.3
#> 10   6.6  11.4   9.9   7.4   5.8   8.6  29.8  15.9  19     6.4   4.5   8.4   8.3
#>       ME    MK    RS    TR
#>    <dbl> <dbl> <dbl> <dbl>
#>  1  NA    38.6  NA    12.2
#>  2  NA    39    NA    10.2
#>  3  30.3  37.6  NA     8.1
#>  4  34.9  37.8  NA     7.5
#>  5  41.4  34.3  NA     8.1
#>  6  32.2  32.2  18.2   9.3
#>  7  28.3  29.9  15.6   9.7
#>  8  24.6  29.2  13.1   9.9
#>  9  22.2  26.5  11.6   9.6
#> 10  20.1  23.8  12.6   9.9

Now, the missing values are correctly imported in the time by countries dataset myxls2 and all the variables are of type numeric. Once the data are correctly imported, the data processing could take place as described in the previous sections. For instance, we can choose only a set of countries, say the cluster EU27_2020:

names(myxls2)
#>  [1] "TIME/GEO" "EU28"     "EU15"     "EA19"     "EA18"     "EA17"    
#>  [7] "BE"       "BG"       "CZ"       "DK"       "DE"       "EE"      
#> [13] "IE"       "EL"       "ES"       "FR"       "FX"       "HR"      
#> [19] "IT"       "CY"       "LV"       "LT"       "LU"       "HU"      
#> [25] "MT"       "NL"       "AT"       "PL"       "PT"       "RO"      
#> [31] "SI"       "SK"       "FI"       "SE"       "UK"       "IS"      
#> [37] "NO"       "CH"       "ME"       "MK"       "RS"       "TR"
EU27estr<-convergEU_glb()$EU27_2020$memberStates$codeMS
myxls<- select(myxls2,`TIME/GEO`,EU27estr)

where the function select identifies the string EU27estr as a collection of strings. For the list of all possible clusters of countries see the convergEU_glb() function described in details in Section 1.

Subsequently, we can proceed with data checking and missing values imputation in order to obtain the final dataset on which to perform the analysis of convergence:

EU27estr<-convergEU_glb()$EU27_2020$memberStates$codeMS
myxls<- select(myxls2,`TIME/GEO`,EU27estr)
check_data(myxls)
myxls3<-myxls %>% rename(TIME=`TIME/GEO`)
myxlsf <- impute_dataset(myxls3, timeName ="TIME",
                         countries=convergEU_glb()$EU27_2020$memberStates$codeMS,
                         headMiss = c("cut", "constant")[2],
                         tailMiss = c("cut", "constant")[2])$res

For convenience, we also change the name of the time variable (e.g. from “TIME/GEO” to “TIME”).

4.2.2 Data import for .csv downloaded Excel file

Now, we describe how to import the same data for the indicator une_educ_a but in this case downloaded as .csv file. To this end, when downloading the .csv file from the web site http://appsso.eurostat.ec.europa.eu/nui/setupDownloads.do, we specify the full extraction of all data tables in multiple files, as highlighted by the blue arrow in Figure 5:
Figure 5: Data extraction Eurostat web site (.csv file)

Figure 5: Data extraction Eurostat web site (.csv file)

To import the .csv file, we use the function read_csv from the package readr, as follows:

require(readr)
mycsv<-read_csv(file.path("vign","une_educ_a_1_Data.csv"), na=":")

where “vign/une_educ_a_1_Data.csv” is the path (eventually including disk unit and folders) in which the .csv file is located, while the argument na specifies how the missing values are reported in the Excel file.

mycsv
#> # A tibble: 410 x 7
#>    GEO    TIME SEX   ISCED11 AGE    UNIT   Value
#>    <chr> <dbl> <lgl> <chr>   <chr>  <chr>  <dbl>
#>  1 EU28   2009 TRUE  ED0-2   Y15-74 PC_ACT  14.4
#>  2 EU15   2009 TRUE  ED0-2   Y15-74 PC_ACT  14.5
#>  3 EA19   2009 TRUE  ED0-2   Y15-74 PC_ACT  15  
#>  4 EA18   2009 TRUE  ED0-2   Y15-74 PC_ACT  14.9
#>  5 EA17   2009 TRUE  ED0-2   Y15-74 PC_ACT  14.9
#>  6 BE     2009 TRUE  ED0-2   Y15-74 PC_ACT  13.6
#>  7 BG     2009 TRUE  ED0-2   Y15-74 PC_ACT  15.5
#>  8 CZ     2009 TRUE  ED0-2   Y15-74 PC_ACT  24.1
#>  9 DK     2009 TRUE  ED0-2   Y15-74 PC_ACT   9.1
#> 10 DE     2009 TRUE  ED0-2   Y15-74 PC_ACT  15.7
#> # … with 400 more rows

Note that the data imported from the .csv file are in a long raw format. Therefore, a further data processing is needed in order to convert the data in the required format years by countries. To this end, the function spread in the package tidyr is used as follows:

mycsv1<- select(mycsv,GEO,TIME, Value)
mycsvf <- spread(mycsv1, GEO, Value)
mycsvf
#> # A tibble: 10 x 42
#>     TIME    AT    BE    BG    CH    CY    CZ    DE    DK  EA17  EA18  EA19    EE
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  2009  10.7  13.6  15.5  NA     6.1  24.1  15.7   9.1  14.9  14.9  15    28.1
#>  2  2010   9.2  15.3  22.7   8     7.1  25    14.7  11.1  16.1  16.2  16.2  30.5
#>  3  2011   9.1  13.9  26.4   8     7.5  24.3  13    11.3  16.7  16.7  16.7  25.7
#>  4  2012   9.8  14    28     7.8  13.4  28.5  12.2  11.7  19.1  19.1  19.2  23.1
#>  5  2013  10.4  15.8  29.9   8.3  19.3  25.6  11.9  11.1  20.5  20.5  20.5  15  
#>  6  2014  11.4  16.2  28.3   8.7  19.4  22.1  11.8  10.3  20.1  20.1  20.1  13  
#>  7  2015  11.2  16.8  25.1   8.8  18.5  22.7  11.2   9.7  18.9  18.9  18.9  12.2
#>  8  2016  12.7  15.9  22.2   9.4  15.6  20.5  10.1   9.1  17.7  17.7  17.8  12.7
#>  9  2017  13    14.6  18.1   8.3  14.1  13.1   9.5   9    16.3  16.3  16.4  10.9
#> 10  2018  11.4  13.2  15.5   8.3   9.8  10.7   8.8   8    14.8  14.8  14.8  10.3
#> # … with 29 more variables: EL <dbl>, ES <dbl>, EU15 <dbl>, EU28 <dbl>,
#> #   FI <dbl>, FR <dbl>, FX <dbl>, HR <dbl>, HU <dbl>, IE <dbl>, IS <dbl>,
#> #   IT <dbl>, LT <dbl>, LU <dbl>, LV <dbl>, ME <dbl>, MK <dbl>, MT <dbl>,
#> #   NL <dbl>, NO <dbl>, PL <dbl>, PT <dbl>, RO <dbl>, RS <dbl>, SE <dbl>,
#> #   SI <dbl>, SK <dbl>, TR <dbl>, UK <dbl>

The dataset mycsvf is now in the format time by countries, and all the variables are of type “numeric”. So, we can proceed with data checking and imputation of missing values:

EU27estr<-convergEU_glb()$EU27_2020$memberStates$codeMS

mycsvf1<- select(mycsvf,TIME,EU27estr)
check_data(mycsvf1)
mycsvf2 <- impute_dataset(mycsvf1, timeName = "TIME",
                         countries=convergEU_glb()$EU27_2020$memberStates$codeMS,
                         headMiss = c("cut", "constant")[2],
                         tailMiss = c("cut", "constant")[2])$res

5 Downloader of social scoreboard indicators

In the convergEU package, the function dow_soc_scor_boa is implemented for downloading social scoreboard indicators. It must be noted that this is an “envelope function” that automatically download all the indicators involved in the social scoreboard from Eurostat. The user should pay particular attention because the function is time consuming since a large quantity of data is downloaded. Moreover, an active Internet connection is also necessary. For example, all the indicators involved in the social scoreboard from 2007 to 2017 are downloaded as follows:

socInd<-dow_soc_scor_boa(fromTime = 2007 , toTime = 2017)
names(socInd)
#>  [1] "I.01.01.00"   "I.01.02.00"   "I.01.04.00"   "I.02.01.00"   "I.02.02.00"  
#>  [6] "I.02.03.00"   "I.03.01.00"   "I.04.01.00"   "I.04.02.00"   "I.04.03.00"  
#> [11] "I.04.04.00"   "I.04.05.00"   "I.04.06.00"   "I.05.01.00"   "II.06.01.00" 
#> [16] "II.06.02.00"  "II.06.03.00"  "II.06.04.00"  "II.07.01.00"  "II.07.03.00" 
#> [21] "II.07.04.00"  "II.07.05.00"  "II.07.06.00"  "II.07.07.00"  "II.08.01.00" 
#> [26] "II.08.04.00"  "III.09.01.00" "III.09.02.00" "III.09.03.00" "III.09.04.00"
#> [31] "III.09.05.00" "III.10.01.00" "III.11.01.00" "III.11.02.00" "III.11.03.00"
#> [36] "III.11.04.00" "III.12.01.00"

In this example, 37 social scoreboard indicators are downloaded and stored in the user’s local repository. To better illustrate how to use the data, let’s take as an example the first downloaded indicator, e.g. I.01.01.00:

socInd1<-socInd$I.01.01.00
socInd1
#> $res
#> # A tibble: 11 x 41
#>    age   sex    time    AT    BE    BG    CH    CY    CZ    DE    DK  EA19    EE
#>    <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 Y18-… T      2007  10.8  12.1  14.9   7.6  12.5   5.2  12.5  12.9  16.7  14.4
#>  2 Y18-… T      2008  10.2  12    14.8   7.7  13.7   5.6  11.8  12.5  16.3  14  
#>  3 Y18-… T      2009   8.8  11.1  14.7   9.1  11.7   5.4  11.1  11.3  15.8  13.5
#>  4 Y18-… T      2010   8.3  11.9  12.6   6.7  12.7   4.9  11.8  11    15.4  11  
#>  5 Y18-… T      2011   8.5  12.3  11.8   6.3  11.3   4.9  11.6   9.6  14.6  10.6
#>  6 Y18-… T      2012   7.8  12    12.5   5.5  11.4   5.5  10.5   9.1  13.8  10.3
#>  7 Y18-… T      2013   7.5  11    12.5   5.6   9.1   5.4   9.8   8    12.8   9.7
#>  8 Y18-… T      2014   7     9.8  12.9   5.6   6.8   5.5   9.5   7.8  11.8  12  
#>  9 Y18-… T      2015   7.3  10.1  13.4   5.2   5.2   6.2  10.1   7.8  11.6  12.2
#> 10 Y18-… T      2016   6.9   8.8  13.8   4.9   7.6   6.6  10.3   7.2  11.1  10.9
#> 11 Y18-… T      2017   7.4   8.9  12.7   4.5   8.5   6.7  10.1   8.8  11    10.8
#> # … with 28 more variables: EL <dbl>, ES <dbl>, EU15 <dbl>, EU28 <dbl>,
#> #   FI <dbl>, FR <dbl>, HR <dbl>, HU <dbl>, IE <dbl>, IS <dbl>, IT <dbl>,
#> #   LT <dbl>, LU <dbl>, LV <dbl>, ME <dbl>, MK <dbl>, MT <dbl>, NL <dbl>,
#> #   NO <dbl>, PL <dbl>, PT <dbl>, RO <dbl>, RS <dbl>, SE <dbl>, SI <dbl>,
#> #   SK <dbl>, TR <dbl>, UK <dbl>
#> 
#> $msg
#> $msg$gender
#> [1] "Gender automatically set to 'T'."
#> 
#> $msg$Further_Conditioning
#> $msg$Further_Conditioning$current
#> [1] "Selected uniqueIdentif = 4 -> wstatus: POP; unit: PC"
#> 
#> $msg$Further_Conditioning$available_seleTagLs
#>   uniqueIdentif                     tags
#> 1             1   wstatus: EMP; unit: PC
#> 2             2  wstatus: NEMP; unit: PC
#> 3             3 wstatus: NWANT; unit: PC
#> 4             4   wstatus: POP; unit: PC
#> 5             5  wstatus: WANT; unit: PC
#> 
#> 
#> $msg$Conditioning
#> $msg$Conditioning$indicator_code
#> [1] "edat_lfse_14"
#> 
#> $msg$Conditioning$ageInterv
#> [1] "Y18-24"
#> 
#> $msg$Conditioning$gender
#> [1] "T"
#> 
#> 
#> 
#> $err
#> NULL

The result is a list with the following components, similar to the functions download_indicator_EUS and down_lo_EUS:

First, we assign a meaningful name of the dataset:

TBedat<-socInd1$res

Afterwards, we select the set of countries for which the analysis should be performed, for example the cluster EU12 of MS and Russia:

EU12estr<-convergEU_glb()$EU12$memberStates$codeMS
TBed<- select(TBedat,time,EU12estr,RS)

and we proceed to check for unsuited features:

check_data(TBed)
#> $res
#> NULL
#> 
#> $msg
#> NULL
#> 
#> $err
#> [1] "Error: one or more missing values in the dataframe."

in which one or more missing values are present.

print(TBed)
#> # A tibble: 11 x 14
#>     time    BE    DK    FR    DE    EL    IE    IT    LU    NL    PT    ES    UK
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  2007  12.1  12.9  12.8  12.5  14.3  12    19.5  12.5  11.9  36.5  30.8  16.6
#>  2  2008  12    12.5  11.8  11.8  14.4  11.7  19.6  13.4  11.4  34.9  31.7  16.9
#>  3  2009  11.1  11.3  12.4  11.1  14.2  11.8  19.1   7.7  11.3  30.9  30.9  15.7
#>  4  2010  11.9  11    12.7  11.8  13.5  11.9  18.6   7.1  10.1  28.3  28.2  14.8
#>  5  2011  12.3   9.6  12.3  11.6  12.9  11.1  17.8   6.2   9.2  23    26.3  14.9
#>  6  2012  12     9.1  11.8  10.5  11.3   9.9  17.3   8.1   8.9  20.5  24.7  13.4
#>  7  2013  11     8     9.7   9.8  10.1   8.7  16.8   6.1   9.3  18.9  23.6  12.4
#>  8  2014   9.8   7.8   8.8   9.5   9     6.7  15     6.1   8.7  17.4  21.9  11.8
#>  9  2015  10.1   7.8   9.2  10.1   7.9   6.8  14.7   9.3   8.2  13.7  20    10.8
#> 10  2016   8.8   7.2   8.8  10.3   6.2   6    13.8   5.5   8    14    19    11.2
#> 11  2017   8.9   8.8   8.9  10.1   6     5    14     7.3   7.1  12.6  18.3  10.6
#> # … with 1 more variable: RS <dbl>

The print output reveals that starting missing values are present for Russia. Thus, we propagate them by imputing constant values equal to the first observed year:

TBedF <- impute_dataset(TBed, timeName = "time",
                            countries="RS",
                            headMiss = c("cut", "constant")[2])$res 

Once the imputation of missing values is performed, the data are checked again:

check_data(TBedF)
#> $res
#> [1] TRUE
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

Thus, TBedF is the final dataset on which the analysis of convergence could be performed.

6 Auxiliary functions implemented in the convergEU package

6.1 Absolute changes across countries

Absolute change as described in the reserved Eurofound Annex is defined as: \[ \Delta y_{m,i,t} = y_{m,i,t} - y_{m,i,t-1} \] for country \(m\), indicator \(i\) at time \(t\).

In the convergEU package, absolute changes across countries are calculated by invoking the abso_change function on a sorted dataframe in the format time by countries and without missing values. To illustrate this function, we take as an example the locally available dataset emp_20_64_MS described in Section 1:

emp_20_64_MS
#> # A tibble: 17 x 29
#>     time    AT    BE    BG    CY    CZ    DE    DK    EE    EL    ES    FI    FR
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  2002  70.9  64.7  56.5  75.1  71.7  68.8  78.3  68.3  62.8  63.2  73.2  68.6
#>  2  2003  71.3  64.5  58.7  75.4  71    68.4  77.4  69.1  63.8  64.3  72.9  69.7
#>  3  2004  68.4  65.8  61.2  75.7  70.1  67.9  78.1  70.2  64.3  65.2  72.5  69.2
#>  4  2005  70.4  66.5  61.9  74.4  70.7  69.4  78    72    64.4  67.5  73    69.4
#>  5  2006  71.6  66.5  65.1  75.8  71.2  71.1  79.4  75.9  65.6  69    73.9  69.4
#>  6  2007  72.8  67.7  68.4  76.8  72    72.9  79    76.9  65.8  69.7  74.8  69.9
#>  7  2008  73.8  68    70.7  76.5  72.4  74    78.7  77.1  66.3  68.5  75.8  70.5
#>  8  2009  73.4  67.1  68.8  75.3  70.9  74.2  76.1  70    65.6  64    73.5  69.5
#>  9  2010  73.9  67.6  64.7  75    70.4  75    74.9  66.8  63.8  62.8  73    69.3
#> 10  2011  74.2  67.3  62.9  73.4  70.9  76.5  74.8  70.6  59.6  62    73.8  69.2
#> 11  2012  74.4  67.2  63    70.2  71.5  76.9  74.3  72.2  55    59.6  74    69.4
#> 12  2013  74.6  67.2  63.5  67.2  72.5  77.3  74.3  73.3  52.9  58.6  73.3  69.5
#> 13  2014  74.2  67.3  65.1  67.6  73.5  77.7  74.7  74.3  53.3  59.9  73.1  69.2
#> 14  2015  74.3  67.2  67.1  67.9  74.8  78    75.4  76.5  54.9  62    72.9  69.5
#> 15  2016  74.8  67.7  67.7  68.7  76.7  78.6  76    76.6  56.2  63.9  73.4  70  
#> 16  2017  75.4  68.5  71.3  70.8  78.5  79.2  76.6  78.7  57.8  65.5  74.2  70.6
#> 17  2018  76.2  69.7  72.4  73.9  79.9  79.9  77.5  79.5  59.5  67    76.3  71.3
#> # … with 16 more variables: HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>,
#> #   LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>, RO <dbl>,
#> #   SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>

The dataset is ready for the analysis given that missing values are already imputed. Thus, we proceed to calculate the absolute changes across all the countries in the emp_20_64_MS dataset by considering as reference time 2007 (argument time_0) and as focus time 2018 (argument time_t):

myAC <- abso_change(emp_20_64_MS, 
                     time_0 = 2007, 
                     time_t = 2018,
                     all_within=TRUE,
                     timeName = "time")
names(myAC$res)
#> [1] "abso_change"        "sum_abs_change"     "average_abs_change"

The results consist in:

myAC$res$abso_change
#>    time   AT   BE   BG   CY   CZ   DE   DK   EE   EL   ES   FI   FR   HR   HU
#> 1  2003  0.4 -0.2  2.2  0.3 -0.7 -0.4 -0.9  0.8  1.0  1.1 -0.3  1.1  0.5  1.0
#> 2  2004 -2.9  1.3  2.5  0.3 -0.9 -0.5  0.7  1.1  0.5  0.9 -0.4 -0.5  1.3 -0.4
#> 3  2005  2.0  0.7  0.7 -1.3  0.6  1.5 -0.1  1.8  0.1  2.3  0.5  0.2  0.3  0.2
#> 4  2006  1.2  0.0  3.2  1.4  0.5  1.7  1.4  3.9  1.2  1.5  0.9  0.0  0.6  0.4
#> 5  2007  1.2  1.2  3.3  1.0  0.8  1.8 -0.4  1.0  0.2  0.7  0.9  0.5  3.3 -0.3
#> 6  2008  1.0  0.3  2.3 -0.3  0.4  1.1 -0.3  0.2  0.5 -1.2  1.0  0.6  1.0 -0.8
#> 7  2009 -0.4 -0.9 -1.9 -1.2 -1.5  0.2 -2.6 -7.1 -0.7 -4.5 -2.3 -1.0 -0.7 -1.4
#> 8  2010  0.5  0.5 -4.1 -0.3 -0.5  0.8 -1.2 -3.2 -1.8 -1.2 -0.5 -0.2 -2.1 -0.2
#> 9  2011  0.3 -0.3 -1.8 -1.6  0.5  1.5 -0.1  3.8 -4.2 -0.8  0.8 -0.1 -2.3  0.5
#> 10 2012  0.2 -0.1  0.1 -3.2  0.6  0.4 -0.5  1.6 -4.6 -2.4  0.2  0.2 -1.7  1.2
#> 11 2013  0.2  0.0  0.5 -3.0  1.0  0.4  0.0  1.1 -2.1 -1.0 -0.7  0.1 -0.9  1.4
#>      IE   IT   LT   LU   LV   MT   NL   PL   PT   RO   SE   SI   SK   UK
#> 1  -0.4  0.9  2.7 -1.2  1.1 -0.4 -0.5 -0.4 -1.1  0.5 -0.3 -1.9  1.8  0.4
#> 2   0.6  1.6 -1.1  0.5  0.0 -0.5 -0.4 -0.3 -0.4 -0.1 -0.7  2.9 -1.5  0.2
#> 3   1.6 -0.2  1.1  1.3  1.7  0.1 -2.2  1.3 -0.4 -1.1  0.3  0.1  1.0  0.3
#> 4   0.8  0.9  0.6  0.1  4.1  0.5  1.0  1.8  0.4  1.2  0.7  0.4  1.5  0.0
#> 5   1.7  0.3  1.4  0.5  2.0  0.7  1.8  2.6 -0.1 -0.4  1.3  0.9  1.2  0.0
#> 6  -1.6  0.2 -0.7 -0.8  0.2  0.6  1.4  2.3  0.6  0.0  0.3  0.6  1.6  0.0
#> 7  -5.5 -1.3 -5.0  1.6 -8.8 -0.2 -0.1 -0.1 -2.0 -0.9 -2.1 -1.1 -2.4 -1.3
#> 8  -2.5 -0.6 -2.7  0.3 -2.3  1.1 -0.6 -0.6 -0.8  1.3 -0.2 -1.6 -1.8 -0.4
#> 9  -0.9  0.0  2.6 -0.6  2.0  1.5  0.2  0.2 -1.5 -1.0  1.3 -1.9  0.4  0.0
#> 10 -0.1 -0.1  1.6  1.3  1.8  2.3  0.2  0.2 -2.5  1.0  0.0 -0.1  0.1  0.6
#> 11  2.0 -1.2  1.4 -0.3  1.6  2.3 -0.7  0.2 -0.9 -0.1  0.4 -1.1 -0.1  0.7
myAC$res$sum_abs_change
#>   AT   BE   BG   CY   CZ   DE   DK   EE   EL   ES   FI   FR   HR   HU   IE   IT 
#> 10.3  5.5 22.6 13.9  8.0 10.3  8.2 25.6 16.9 17.6  8.5  4.5 14.7  7.8 17.7  7.3 
#>   LT   LU   LV   MT   NL   PL   PT   RO   SE   SI   SK   UK 
#> 20.9  8.5 25.6 10.2  9.1 10.0 10.7  7.6  7.6 12.6 13.4  3.9
myAC$res$average_abs_change
#>        AT        BE        BG        CY        CZ        DE        DK        EE 
#> 0.9363636 0.5000000 2.0545455 1.2636364 0.7272727 0.9363636 0.7454545 2.3272727 
#>        EL        ES        FI        FR        HR        HU        IE        IT 
#> 1.5363636 1.6000000 0.7727273 0.4090909 1.3363636 0.7090909 1.6090909 0.6636364 
#>        LT        LU        LV        MT        NL        PL        PT        RO 
#> 1.9000000 0.7727273 2.3272727 0.9272727 0.8272727 0.9090909 0.9727273 0.6909091 
#>        SE        SI        SK        UK 
#> 0.6909091 1.1454545 1.2181818 0.3545455

If desired, less digits may be displayed for the outputs, for example after rounding. Thus, the user could select how many decimals should be displayed in the results by using the functions round or format as follows:

round(myAC$res$abso_change,4)
round(myAC$res$sum_abs_change,4)
round(myAC$res$average_abs_change,4)

format(myAC$res$abso_change, digits = 4)
format(myAC$res$sum_abs_change, digits = 4)
format(myAC$res$average_abs_change, digits = 4)

6.2 Unweighted average of countries clusters

An important summary is obtained as unweighted average of country values. The cluster of considered countries may be specified and is also stored within the function generating global static objects and tables, e.g. the convergEU_glb() function (see Section 1 for further details). For example, the unweighted average in the emp_20_64_MS dataset for the EU25 cluster is:

average_clust(emp_20_64_MS, 
              timeName = "time",
              cluster = "EU25")$res[,c(1,30)]
#> # A tibble: 17 x 2
#>     time  EU25
#>    <dbl> <dbl>
#>  1  2002  68.5
#>  2  2003  68.6
#>  3  2004  68.6
#>  4  2005  69.2
#>  5  2006  70.3
#>  6  2007  71.2
#>  7  2008  71.5
#>  8  2009  69.4
#>  9  2010  68.6
#> 10  2011  68.8
#> 11  2012  68.7
#> 12  2013  68.8
#> 13  2014  69.7
#> 14  2015  70.6
#> 15  2016  71.7
#> 16  2017  73.1
#> 17  2018  74.4

while for EU12 is:

average_clust(emp_20_64_MS,timeName = "time",cluster = "EU12")$res[,c(1,30)]
#> # A tibble: 17 x 2
#>     time  EU12
#>    <dbl> <dbl>
#>  1  2002  69.1
#>  2  2003  69.1
#>  3  2004  69.4
#>  4  2005  69.9
#>  5  2006  70.6
#>  6  2007  71.3
#>  7  2008  71.4
#>  8  2009  69.9
#>  9  2010  69.2
#> 10  2011  68.6
#> 11  2012  68.0
#> 12  2013  67.8
#> 13  2014  68.4
#> 14  2015  69.2
#> 15  2016  70.1
#> 16  2017  71.2
#> 17  2018  72.3

An unknown label, like “EUspirit”, causes computation error:

average_clust(emp_20_64_MS,timeName = "time",cluster = "EUspirit")
#> $res
#> NULL
#> 
#> $msg
#> NULL
#> 
#> $err
#> [1] "Error: badly specified countries."

and similarly for a wrong time name:

average_clust(emp_20_64_MS,timeName = "TTime",cluster = "EU19")
#> $res
#> NULL
#> 
#> $msg
#> NULL
#> 
#> $err
#> [1] "Error: Time variable not in the dataframe."

Time series can be also plotted:

wwTB <- average_clust(emp_20_64_MS,timeName = "time",cluster = "EU27")$res[,c(1,30)]
mini_EU <- min(wwTB$EU27)
maxi_EU <- max(wwTB$EU27)

qplot(time, EU27, data=wwTB,
      ylim=c(mini_EU,maxi_EU))+geom_line(colour="navy blue")+
      ylab("emp_20_64")

6.3 Departures from the average

In the convergEU package, the function departure_mean is implemented and it calculates for each country the departure from the average:

help(departure_mean)

The calculations are performed with respect to the results obtained for the Sigma convergence. An example follows for the emp_20_64_MS dataset. First, the Sigma convergence is calculated:

mySTB <- sigma_conv(emp_20_64_MS,timeName="time")
mySTB
#> $res
#> # A tibble: 17 x 5
#>     time stdDev     CV  mean devianceT
#>    <dbl>  <dbl>  <dbl> <dbl>     <dbl>
#>  1  2002   6.34 0.0939  67.5     1125.
#>  2  2003   5.95 0.0878  67.8      991.
#>  3  2004   5.70 0.0839  67.9      909.
#>  4  2005   5.54 0.0809  68.4      858.
#>  5  2006   5.57 0.0801  69.6      869.
#>  6  2007   5.47 0.0775  70.6      838.
#>  7  2008   5.36 0.0755  71.0      804.
#>  8  2009   5.03 0.0730  69.0      710.
#>  9  2010   5.24 0.0769  68.1      768.
#> 10  2011   5.59 0.0821  68.1      875.
#> 11  2012   5.98 0.0880  68       1002.
#> 12  2013   6.28 0.0922  68.0     1103.
#> 13  2014   5.98 0.0867  69.0     1000.
#> 14  2015   5.74 0.0820  70.0      922.
#> 15  2016   5.60 0.0789  71.0      879.
#> 16  2017   5.37 0.0741  72.5      808.
#> 17  2018   5.30 0.0717  73.8      786.
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

Then, the departure from the mean can be characterized:

res <- departure_mean(oriTB = emp_20_64_MS, sigmaTB = mySTB$res)
names(res$res)
#> [1] "departures"      "squaredContrib"  "devianceContrib"
res$res$departures
#> # A tibble: 17 x 33
#>     time stdDev     CV  mean devianceT    AT    BE    BG    CY    CZ    DE    DK
#>    <dbl>  <dbl>  <dbl> <dbl>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  2002   6.34 0.0939  67.5     1125.     0     0    -1     1     0     0     1
#>  2  2003   5.95 0.0878  67.8      991.     0     0    -1     1     0     0     1
#>  3  2004   5.70 0.0839  67.9      909.     0     0    -1     1     0     0     1
#>  4  2005   5.54 0.0809  68.4      858.     0     0    -1     1     0     0     1
#>  5  2006   5.57 0.0801  69.6      869.     0     0     0     1     0     0     1
#>  6  2007   5.47 0.0775  70.6      838.     0     0     0     1     0     0     1
#>  7  2008   5.36 0.0755  71.0      804.     0     0     0     1     0     0     1
#>  8  2009   5.03 0.0730  69.0      710.     0     0     0     1     0     1     1
#>  9  2010   5.24 0.0769  68.1      768.     1     0     0     1     0     1     1
#> 10  2011   5.59 0.0821  68.1      875.     1     0     0     0     0     1     1
#> 11  2012   5.98 0.0880  68       1002.     1     0     0     0     0     1     1
#> 12  2013   6.28 0.0922  68.0     1103.     1     0     0     0     0     1     0
#> 13  2014   5.98 0.0867  69.0     1000.     0     0     0     0     0     1     0
#> 14  2015   5.74 0.0820  70.0      922.     0     0     0     0     0     1     0
#> 15  2016   5.60 0.0789  71.0      879.     0     0     0     0     1     1     0
#> 16  2017   5.37 0.0741  72.5      808.     0     0     0     0     1     1     0
#> 17  2018   5.30 0.0717  73.8      786.     0     0     0     0     1     1     0
#> # … with 21 more variables: EE <dbl>, EL <dbl>, ES <dbl>, FI <dbl>, FR <dbl>,
#> #   HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>, LU <dbl>, LV <dbl>,
#> #   MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>, RO <dbl>, SE <dbl>, SI <dbl>,
#> #   SK <dbl>, UK <dbl>

where \(-1,0,1\) indicates values respectively below \(-1\), within the interval \((-1,1)\) and above \(+1\). Details on the contribution of each MS to the variance at a given time \(t\) is evaluated by the square difference from the mean, say EU27_2020:

res$res$squaredContrib
#> # A tibble: 17 x 28
#>        AT     BE      BG      CY    CZ      DE    DK     EE    EL     ES    FI
#>     <dbl>  <dbl>   <dbl>   <dbl> <dbl>   <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl>
#>  1 11.4    7.94  1.21e+2 57.5    17.5  1.64e+0 116.   0.612  22.3 18.6   32.3 
#>  2 12.5   10.7   8.23e+1 58.2    10.4  3.95e-1  92.7  1.77   15.8 12.1   26.3 
#>  3  0.243  4.44  4.50e+1 60.7     4.81 5.10e-5 104.   5.26   13.0  7.33  21.1 
#>  4  3.91   3.69  4.25e+1 35.7     5.19 9.58e-1  91.7 12.8    16.2  0.849 21.0 
#>  5  4.16   9.37  1.99e+1 38.9     2.69 2.37e+0  96.8 40.2    15.7  0.314 18.8 
#>  6  4.84   8.41  4.84e+0 38.4     1.96 5.29e+0  70.6 39.7    23.0  0.810 17.6 
#>  7  7.98   8.85  7.56e-2 30.5     2.03 9.15e+0  59.7 37.5    21.9  6.13  23.3 
#>  8 19.3    3.62  4.14e-2 39.6     3.60 2.70e+1  50.4  0.993  11.6 25.0   20.2 
#>  9 33.5    0.264 1.17e+1 47.4     5.22 4.74e+1  46.0  1.73   18.6 28.2   23.9 
#> 10 37.7    0.579 2.66e+1 28.5     8.06 7.12e+1  45.4  6.45   71.6 36.7   32.9 
#> 11 41.0    0.640 2.50e+1  4.84   12.2  7.92e+1  39.7 17.6   169   70.6   36   
#> 12 43.0    0.710 2.06e+1  0.710  19.9  8.57e+1  39.2 27.6   229.  89.2   27.6 
#> 13 27.3    2.81  1.50e+1  1.89   20.5  7.61e+1  32.8 28.4   246.  82.4   17.0 
#> 14 18.6    7.74  8.31e+0  4.34   23.2  6.43e+1  29.4 42.5   227.  63.7    8.51
#> 15 14.4   11.0   1.10e+1  5.34   32.4  5.76e+1  24.9 31.2   219.  50.6    5.71
#> 16  8.37  16.1   1.46e+0  2.91   35.9  4.48e+1  16.8 38.4   216.  49.1    2.87
#> 17  5.52  17.2   2.10e+0  0.0025 36.6  3.66e+1  13.3 31.9   206.  46.9    6.00
#> # … with 17 more variables: FR <dbl>, HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>,
#> #   LT <dbl>, LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>,
#> #   RO <dbl>, SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>

It is also possible to decompose the numerator of the variance, called deviance, at each time in order to appreciate the percentage of contribution provided by each MS to the total deviance:

res$res$devianceContrib
#> # A tibble: 17 x 28
#>        AT     BE      BG      CY    CZ      DE    DK     EE    EL     ES    FI
#>     <dbl>  <dbl>   <dbl>   <dbl> <dbl>   <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl>
#>  1 1.02   0.706  1.08e+1 5.11e+0 1.56  1.46e-1 10.3  0.0544  1.98 1.66   2.87 
#>  2 1.26   1.08   8.30e+0 5.87e+0 1.05  3.99e-2  9.35 0.178   1.59 1.22   2.65 
#>  3 0.0267 0.488  4.95e+0 6.68e+0 0.529 5.61e-6 11.4  0.578   1.43 0.806  2.32 
#>  4 0.456  0.430  4.95e+0 4.16e+0 0.605 1.12e-1 10.7  1.49    1.88 0.0989 2.44 
#>  5 0.479  1.08   2.29e+0 4.48e+0 0.309 2.73e-1 11.1  4.63    1.81 0.0362 2.17 
#>  6 0.578  1.00   5.78e-1 4.59e+0 0.234 6.31e-1  8.42 4.74    2.75 0.0967 2.11 
#>  7 0.993  1.10   9.41e-3 3.80e+0 0.253 1.14e+0  7.43 4.67    2.72 0.762  2.90 
#>  8 2.72   0.511  5.84e-3 5.59e+0 0.507 3.81e+0  7.10 0.140   1.63 3.53   2.85 
#>  9 4.36   0.0344 1.52e+0 6.17e+0 0.680 6.17e+0  6.00 0.225   2.42 3.68   3.11 
#> 10 4.31   0.0662 3.04e+0 3.26e+0 0.922 8.14e+0  5.19 0.737   8.18 4.20   3.77 
#> 11 4.09   0.0639 2.50e+0 4.83e-1 1.22  7.91e+0  3.96 1.76   16.9  7.04   3.59 
#> 12 3.90   0.0644 1.87e+0 6.44e-2 1.80  7.77e+0  3.55 2.51   20.8  8.08   2.51 
#> 13 2.73   0.280  1.50e+0 1.89e-1 2.05  7.61e+0  3.28 2.83   24.6  8.23   1.70 
#> 14 2.02   0.839  9.01e-1 4.70e-1 2.52  6.97e+0  3.18 4.61   24.7  6.91   0.923
#> 15 1.63   1.25   1.25e+0 6.08e-1 3.68  6.55e+0  2.83 3.56   25.0  5.75   0.650
#> 16 1.04   1.99   1.80e-1 3.60e-1 4.44  5.54e+0  2.07 4.74   26.8  6.07   0.354
#> 17 0.703  2.19   2.68e-1 3.18e-4 4.66  4.66e+0  1.70 4.06   26.2  5.97   0.764
#> # … with 17 more variables: FR <dbl>, HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>,
#> #   LT <dbl>, LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>,
#> #   RO <dbl>, SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>

Thus, each row adds to 100. It is possible to produce a graphical output about the main features of country time series by invoking the function graph_departure, as shown below:

myGD <- graph_departure(res$res$departures,
                timeName = "time",
                indiType = "highBest",
                displace = 0.25,
                displaceh = 0.5,
                dimeFontNum = 2,
                myfont_scale = 1.25,
                x_angle = 45,
                axis_name_y = "Countries",
                axis_name_x = "Time",
                alpha_color = 0.9
                )
myGD
#> $res

#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

It must be noted that given that the indicator is of type “highBest”, the default for the colours of the rectangles are red for “-1”, grey for “0” and light sky blue for “1”. Otherwise, if an indicator is of type “lowBest”, the default for the colours of the rectangles changes in light sky blue for “-1”, grey for “0” and red for “1”. The user could also change these colours through the argument color_rect. For example, for “-1” coloured in red, “0” in yellow and “1” in dark grey, and by selecting only a subset of the available countries:

myGD1 <- graph_departure(res$res$departures[,1:10],
                timeName = "time",
                indiType = "highBest",
                displace = 0.25,
                displaceh = 0.45,
                dimeFontNum = 4,
                myfont_scale = 1.35,
                x_angle = 45,
                color_rect = c("-1"='red', "0"='yellow',"1"='green4'),
                axis_name_y = "Countries",
                axis_name_x = "Time",
                alpha_color = 0.29
                )

myGD1
#> $res

#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

Moreover, the graphical output could be further customized by the user; for example, by changing the scale of the argument displaceh, or that of x_angle related to the angle according to which the years are displayed on the x axes.

6.4 Countries ranking

Ranking of countries by each considered years is implemented in the country_ranking function. The argument typeInd allows to select according to which indicator the ranking should be performed: highBest (“higher is the best”) or lowBest (“lower is the best”). For example, for the emp_20_64_MS data, the ranking is performed according to the highBest indicator from 2005 to 2010:

data(emp_20_64_MS)
myCR<-country_ranking(emp_20_64_MS,timeName = "time", time_0 = 2005, time_t = 2010, 
                      typeInd= "highBest")
myCR
#> $res
#> # A tibble: 6 x 29
#>    time    AT    BE    BG    CY    CZ    DE    DK    EE    EL    ES    FI    FR
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  2005    13    19    24     4    11    14     2     9    21    18     5    14
#> 2  2006    11    19    22     4    14    15     1     3    21    18     6    16
#> 3  2007    11    20    19     4    15    10     2     3    22    17     9    16
#> 4  2008    10    21    16     5    14     9     2     3    22    20     6    17
#> 5  2009     8    17    15     4    11     5     3    13    21    24     7    14
#> 6  2010     6    14    18     3    10     3     5    15    23    24     8    13
#> # … with 16 more variables: HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>,
#> #   LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>, RO <dbl>,
#> #   SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>
#> 
#> $msg
#> NULL

6.5 Departures from the best performing country

The function departure_best in the convergEU package allows to calculate for each country the departure from the best performing MS. For example, for the emp_20_64_MS data by considering the highBest indicator:

mySTB2 <- departure_best(emp_20_64_MS,timeName="time",indiType = "highBest")
mySTB2
#> $res
#> $res$raw_departures
#> # A tibble: 17 x 29
#>     time    AT    BE    BG     CY    CZ     DE     DK     EE    EL    ES    FI
#>    <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl>
#>  1  2002 -7.90 -14.1 -22.3  -3.7  -7.10 -10    -0.5   -10.5  -16   -15.6 -5.60
#>  2  2003 -7.2  -14   -19.8  -3.10 -7.5  -10.1  -1.10   -9.4  -14.7 -14.2 -5.60
#>  3  2004 -9.70 -12.3 -16.9  -2.40 -8    -10.2   0      -7.90 -13.8 -12.9 -5.60
#>  4  2005 -7.70 -11.6 -16.2  -3.70 -7.40  -8.70 -0.100  -6.10 -13.7 -10.6 -5.10
#>  5  2006 -7.8  -12.9 -14.3  -3.6  -8.2   -8.3   0      -3.5  -13.8 -10.4 -5.5 
#>  6  2007 -7.30 -12.4 -11.7  -3.30 -8.10  -7.20 -1.10   -3.20 -14.3 -10.4 -5.30
#>  7  2008 -6.6  -12.4  -9.7  -3.9  -8     -6.4  -1.7    -3.3  -14.1 -11.9 -4.6 
#>  8  2009 -4.90 -11.2  -9.5  -3    -7.40  -4.10 -2.2    -8.30 -12.7 -14.3 -4.80
#>  9  2010 -4.20 -10.5 -13.4  -3.10 -7.70  -3.10 -3.20  -11.3  -14.3 -15.3 -5.10
#> 10  2011 -5.2  -12.1 -16.5  -6    -8.5   -2.9  -4.6    -8.8  -19.8 -17.4 -5.6 
#> 11  2012 -5    -12.2 -16.4  -9.2  -7.9   -2.5  -5.1    -7.2  -24.4 -19.8 -5.4 
#> 12  2013 -5.2  -12.6 -16.3 -12.6  -7.30  -2.5  -5.5    -6.5  -26.9 -21.2 -6.5 
#> 13  2014 -5.80 -12.7 -14.9 -12.4  -6.5   -2.30 -5.30   -5.7  -26.7 -20.1 -6.9 
#> 14  2015 -6.2  -13.3 -13.4 -12.6  -5.7   -2.5  -5.10   -4    -25.6 -18.5 -7.60
#> 15  2016 -6.4  -13.5 -13.5 -12.5  -4.5   -2.6  -5.2    -4.6  -25   -17.3 -7.80
#> 16  2017 -6.40 -13.3 -10.5 -11    -3.30  -2.60 -5.2    -3.10 -24   -16.3 -7.60
#> 17  2018 -6.40 -12.9 -10.2  -8.70 -2.70  -2.70 -5.10   -3.10 -23.1 -15.6 -6.30
#> # … with 17 more variables: FR <dbl>, HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>,
#> #   LT <dbl>, LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>,
#> #   RO <dbl>, SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>
#> 
#> $res$cumulated_dif
#>     AT     BE     BG     CY     CZ     DE     DK     EE     EL     ES     FI 
#> -109.9 -214.0 -245.5 -114.8 -115.8  -88.7  -51.0 -106.5 -322.9 -261.8 -100.9 
#>     FR     HR     HU     IE     IT     LT     LU     LV     MT     NL     PL 
#> -170.3 -317.7 -260.3 -162.1 -312.5 -148.8 -163.7 -157.5 -280.8  -61.7 -266.4 
#>     PT     RO     SE     SI     SK     UK 
#> -146.4 -245.2   -0.9 -155.6 -223.3  -72.6 
#> 
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

where the “$res” component is a list of the departures from the best performer and the cumulated differences over years.

It is also possible to plot for each country the deviations from the best performer as follows:

departure_best_plot(mySTB2$res$cumulated_dif,
                    mainCountry = "IT",
                    countries=c("DE", "FR","SE"),
                    displace = 0.25,
                    axis_name_y = "Countries",
                    val_alpha  = 0.95,
                    debug=FALSE)
#> $res

#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

Note that in the argument mainCountry, the country of interest should be specified (that is Italy in our example), while the argument countries reports the set of countries with respect to which the deviations are plotted.

6.6 Scoreboards of countries

Scoreboards of countries show the departure of an indicator level from the average for each year in the dataset. The syntax to calculate such quantities for the dataset emp_20_64_MS is the following:

data(emp_20_64_MS)
resTB <- scoreb_yrs(emp_20_64_MS,timeName = "time")
resTB
#> $res
#> $res$sigma_conv
#> # A tibble: 17 x 9
#>     time stdDev     CV  mean devianceT elle1in elle1su elle2in elle2su
#>    <dbl>  <dbl>  <dbl> <dbl>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#>  1  2002   6.34 0.0939  67.5     1125.    64.3    70.7    61.2    73.9
#>  2  2003   5.95 0.0878  67.8      991.    64.8    70.7    61.8    73.7
#>  3  2004   5.70 0.0839  67.9      909.    65.1    70.8    62.2    73.6
#>  4  2005   5.54 0.0809  68.4      858.    65.7    71.2    62.9    74.0
#>  5  2006   5.57 0.0801  69.6      869.    66.8    72.3    64.0    75.1
#>  6  2007   5.47 0.0775  70.6      838.    67.9    73.3    65.1    76.1
#>  7  2008   5.36 0.0755  71.0      804.    68.3    73.7    65.6    76.3
#>  8  2009   5.03 0.0730  69.0      710.    66.5    71.5    64.0    74.0
#>  9  2010   5.24 0.0769  68.1      768.    65.5    70.7    62.9    73.4
#> 10  2011   5.59 0.0821  68.1      875.    65.3    70.9    62.5    73.7
#> 11  2012   5.98 0.0880  68       1002.    65.0    71.0    62.0    74.0
#> 12  2013   6.28 0.0922  68.0     1103.    64.9    71.2    61.8    74.3
#> 13  2014   5.98 0.0867  69.0     1000.    66.0    72.0    63.0    75.0
#> 14  2015   5.74 0.0820  70.0      922.    67.1    72.9    64.2    75.7
#> 15  2016   5.60 0.0789  71.0      879.    68.2    73.8    65.4    76.6
#> 16  2017   5.37 0.0741  72.5      808.    69.8    75.2    67.1    77.9
#> 17  2018   5.30 0.0717  73.8      786.    71.2    76.5    68.6    79.1
#> 
#> $res$sco_level
#> # A tibble: 17 x 29
#>     time    AT    BE    BG    CY    CZ    DE    DK    EE    EL    ES    FI    FR
#>    <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#>  1  2002     4     3     1     5     4     3     5     3     2     2     4     3
#>  2  2003     4     2     1     5     4     3     5     3     2     2     4     3
#>  3  2004     3     3     1     5     3     3     5     3     2     3     4     3
#>  4  2005     3     3     1     5     3     3     5     4     2     3     4     3
#>  5  2006     3     2     2     5     3     3     5     5     2     3     4     3
#>  6  2007     3     2     3     5     3     3     5     5     2     3     4     3
#>  7  2008     4     2     3     5     3     4     5     5     2     3     4     3
#>  8  2009     4     3     3     5     3     5     5     3     2     2     4     3
#>  9  2010     5     3     2     5     3     5     5     3     2     1     4     3
#> 10  2011     5     3     2     4     4     5     5     3     1     1     5     3
#> 11  2012     5     3     2     3     4     5     5     4     1     1     5     3
#> 12  2013     5     3     2     3     4     5     4     4     1     1     4     3
#> 13  2014     4     3     2     3     4     5     4     4     1     1     4     3
#> 14  2015     4     3     2     3     4     5     4     5     1     1     4     3
#> 15  2016     4     2     2     3     5     5     4     4     1     1     3     3
#> 16  2017     4     2     3     3     5     5     4     5     1     1     3     3
#> 17  2018     3     2     3     3     5     5     4     5     1     1     3     3
#> # … with 16 more variables: HR <int>, HU <int>, IE <int>, IT <int>, LT <int>,
#> #   LU <int>, LV <int>, MT <int>, NL <int>, PL <int>, PT <int>, RO <int>,
#> #   SE <int>, SI <int>, SK <int>, UK <int>
#> 
#> $res$sco_change
#> # A tibble: 17 x 29
#>     time    AT    BE    BG    CY    CZ    DE    DK    EE    EL    ES    FI    FR
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  2002    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
#>  2  2003     3     3     5     3     2     2     1     4     4     4     2     4
#>  3  2004     1     4     5     3     2     2     3     4     3     4     3     2
#>  4  2005     5     3     3     1     3     4     2     5     3     5     3     3
#>  5  2006     3     1     5     3     2     4     3     5     3     3     3     1
#>  6  2007     3     3     5     3     3     4     1     3     2     3     3     2
#>  7  2008     4     3     5     2     3     4     2     3     3     1     4     3
#>  8  2009     4     3     3     3     3     4     3     1     4     1     3     3
#>  9  2010     5     5     1     3     3     5     3     1     2     3     3     4
#> 10  2011     3     3     1     2     3     4     3     5     1     3     4     3
#> 11  2012     3     3     3     1     3     3     3     5     1     1     3     3
#> 12  2013     3     3     3     1     4     3     3     4     1     2     2     3
#> 13  2014     1     2     4     2     3     2     2     3     2     3     1     1
#> 14  2015     1     1     5     2     3     2     3     5     4     5     1     2
#> 15  2016     2     2     2     3     5     2     2     1     3     5     2     2
#> 16  2017     1     2     5     4     3     1     1     4     3     3     2     1
#> 17  2018     2     3     3     5     3     1     2     2     4     3     5     1
#> # … with 16 more variables: HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>,
#> #   LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>, RO <dbl>,
#> #   SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>
#> 
#> $res$sco_level_num
#> # A tibble: 17 x 29
#>     time    AT    BE    BG    CY    CZ    DE    DK    EE    EL    ES    FI    FR
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  2002   0.5   0    -1     1     0.5   0     1     0    -0.5  -0.5   0.5     0
#>  2  2003   0.5  -0.5  -1     1     0.5   0     1     0    -0.5  -0.5   0.5     0
#>  3  2004   0     0    -1     1     0     0     1     0    -0.5   0     0.5     0
#>  4  2005   0     0    -1     1     0     0     1     0.5  -0.5   0     0.5     0
#>  5  2006   0    -0.5  -0.5   1     0     0     1     1    -0.5   0     0.5     0
#>  6  2007   0    -0.5   0     1     0     0     1     1    -0.5   0     0.5     0
#>  7  2008   0.5  -0.5   0     1     0     0.5   1     1    -0.5   0     0.5     0
#>  8  2009   0.5   0     0     1     0     1     1     0    -0.5  -0.5   0.5     0
#>  9  2010   1     0    -0.5   1     0     1     1     0    -0.5  -1     0.5     0
#> 10  2011   1     0    -0.5   0.5   0.5   1     1     0    -1    -1     1       0
#> 11  2012   1     0    -0.5   0     0.5   1     1     0.5  -1    -1     1       0
#> 12  2013   1     0    -0.5   0     0.5   1     0.5   0.5  -1    -1     0.5     0
#> 13  2014   0.5   0    -0.5   0     0.5   1     0.5   0.5  -1    -1     0.5     0
#> 14  2015   0.5   0    -0.5   0     0.5   1     0.5   1    -1    -1     0.5     0
#> 15  2016   0.5  -0.5  -0.5   0     1     1     0.5   0.5  -1    -1     0       0
#> 16  2017   0.5  -0.5   0     0     1     1     0.5   1    -1    -1     0       0
#> 17  2018   0    -0.5   0     0     1     1     0.5   1    -1    -1     0       0
#> # … with 16 more variables: HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>,
#> #   LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>, RO <dbl>,
#> #   SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>
#> 
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

where the result is a list of three components: the summary statistics, the numerical labels to indicate the interval of the partition a level belongs to, the interval of the partition a change belongs to.

Numerical labels are assigned as described in the DRAFT JOINT EMPLOYMENT REPORT FROM THE COMMISSION AND THE COUNCIL (2018):

For the comparison of a country with the EU average, the following steps are recommended, from raw data:

First, we select the country Italy (“IT”) to perform comparison with the EU average:

selectedCountry <- "IT"
timeName <-  "time"

Then, we calculate the Sigma convergence:

outSig <- sigma_conv(emp_20_64_MS, timeName = timeName,
           time_0=2002,time_t=2016)

and we build the ttmp dataset composed of i) the results of the $res component of Sigma convergence, and ii) the emp_20_64_MS dataset without the time variable:

estrattore<-  emp_20_64_MS[,timeName] >= 2002  &  emp_20_64_MS[,timeName] <= 2016
ttmp <- cbind(outSig$res, dplyr::select(emp_20_64_MS[estrattore,], -contains(timeName)))

Following, we extract the minimum and the maximum for the indicator’s values in the emp_20_64_MS dataset:

miniY <- min(emp_20_64_MS[,- which(names(emp_20_64_MS) == timeName )])
maxiY <-  max(emp_20_64_MS[,- which(names(emp_20_64_MS) == timeName )])

and we build the plot for the comparison of Italy and the EU average:

myx_angle <-  45
myG2 <- 
  ggplot(ttmp) + ggtitle(
  paste("EU average (black, solid) and country",selectedCountry ," (red, dotted)") )+
  geom_line(aes(x=ttmp[,timeName], y =ttmp[,"mean"]),colour="black") +
  geom_point(aes(x=ttmp[,timeName],y =ttmp[,"mean"]),colour="black") +
#        geom_line()+geom_point()+
    ylim(c(miniY,maxiY)) + xlab("Year") +ylab("Indicator") +
  theme(legend.position = "none")+
  # add countries
  geom_line( aes(x=ttmp[,timeName], y = ttmp[,"IT"],colour="red"),linetype="dotted") + 
  geom_point( aes(x=ttmp[,timeName], y = ttmp[,"IT"],colour="red")) +
  ggplot2::scale_x_continuous(breaks = ttmp[,timeName],
                     labels = ttmp[,timeName]) +
   ggplot2::theme(
         axis.text.x=ggplot2::element_text(
         #size = ggplot2::rel(myfont_scale ),
         angle = myx_angle 
         #vjust = 1,
         #hjust=1
         ))
  
myG2

It is also possible to graphically show departures in terms of the above defined partition by invoking the ms_dynam function:

#{r,fig.height=6,fig.width=6}
obe_lvl <- scoreb_yrs(emp_20_64_MS,timeName = timeName)$res$sco_level_num
# select subset of time
estrattore <- obe_lvl[,timeName] >= 2009 & obe_lvl[,timeName] <= 2016  
scobelvl <- obe_lvl[estrattore,]

my_MSstd <- ms_dynam( scobelvl,
                timeName = "time",
                displace = 0.25,
                displaceh = 0.45,
                dimeFontNum = 3,
                myfont_scale = 1.35,
                x_angle = 45,
                axis_name_y = "Countries",
                axis_name_x = "Time",
                alpha_color = 0.9,
                indiType = "highBest")   

my_MSstd

Note that the colours of the labels for the defined partition change depending on the type of indicator. More precisely, in the example considered above, the indicator is of type “highBest”; thus, the colours of the rectangles are red for the label “-1”, yellow (ocra) for “-0.5”, yellow for “0”, pale green for “0.5” and dark green for “+1”. Otherwise, if an indicator is of type “lowBest”, these colours change in dark green for “-1”, pale green for “-0.5”, yellow for “0”, yellow (ocra) for “0.5” and red for “1”.


7 References

The following reference may be consulted for details: