Tutorial: analysis of convergence with the convergEU package

Nedka D. Nikiforova, Federico M. Stefanini, Chiara Litardi, Eleonora Peruffo and Massimiliano Mascherini

2020-03-02

Index:




The convergEU R package is a set of S3 functions and data objects suited for the analysis of economic and social convergence of Member States (MS) within the European Union (EU).

This vignette is intended to be a “learn-by-doing” tutorial with the convergEU suite of functions. More precisely, the tutorial illustrates in details how to analyze convergence of MS with data coming from the following sources:

For better illustrating the points listed above, examples with several EU indicators are described in details.

IMPORTANT: the analysis of convergence is performed on clean and imputed data, i.e. a tidy dataset in the format years by countries. This means that the dataset must always have these characteristics:

In the convergEU package, the check_data() function may be called to check for the presence of unsuited features that must be solved before starting the analysis. The object returnes states if the dataset is ready for calculations, and if it is not; the error component states why checking failed:

Examples on how to transform data to achieve the required format are described in the tutorial.

1 Locally accessible datasets: Eurofound data

Locally accessible Eurofound datasets are available from the convergEU package without any active Internet connection. In this Section, we describe in details the analysis of convergence within the convergEU package by considering locally accessible Eurofound data, taking as an example the following two EU indicators:

Before describing in details how to analyze convergence with locally accessible datasets, we briefly illustrate the available datasets in the convergEU package, that are as follows:

data(package = "convergEU")

The object dbEUF2018meta contains a description (metadata) of the locally available data within the convergEU package, without any downloading from the Eurofound website. These data refer to two dimensions: quality of life and working conditions. For each dimension, metainformation is provided for several indicators, for example:

names(dbEUF2018meta)
#>  [1] "DIMENSION"           "SUBDIMENSION"        "INDICATOR"          
#>  [4] "Code_in_database"    "Official_code"       "Unit"               
#>  [7] "Source_organisation" "Source_reference"    "Disaggregation"     
#> [10] "Bookmark_URL"
print(dbEUF2018meta, n=200,width=200)   
#> # A tibble: 13 x 10
#>    DIMENSION          SUBDIMENSION      
#>    <chr>              <chr>             
#>  1 Quality of life    Life satisfaction 
#>  2 Quality of life    Health            
#>  3 Quality of life    Health            
#>  4 Quality of life    Quality of society
#>  5 Quality of life    Quality of society
#>  6 Quality of life    Quality of society
#>  7 Quality of life    Quality of society
#>  8 Quality of life    Quality of society
#>  9 Working conditions Working conditions
#> 10 Working conditions Working conditions
#> 11 Working conditions Working conditions
#> 12 Working conditions Working conditions
#> 13 Working conditions Working conditions
#>    INDICATOR        Code_in_database Official_code        Unit  Source_organisa…
#>    <chr>            <chr>            <chr>                <chr> <chr>           
#>  1 Mean life satis… lifesatisf       y16_q4               --    Eurofound       
#>  2 Mean health sta… health           y16_q48              --    Eurofound       
#>  3 Percentage of p… goodhealth_p     y16_q48              %     Eurofound       
#>  4 Mean level of t… trustlocal       y16_q35f             --    Eurofound       
#>  5 Level of involv… volunt           y16_q29a             --    Eurofound       
#>  6 Percentage of p… volunt_p         y16_q29a             %     Eurofound       
#>  7 Hours per week … caring_h         y16_q43a             hours Eurofound       
#>  8 Social Exclusio… socialexc_i      y16_socexindex       index Eurofound       
#>  9 JQI_Skills and … JQIskill_i       wq_slim -  JQI SLIM… index Eurofound       
#> 10 JQI_Physical en… JQIenviron_i     envsec_slim - JQI S… index Eurofound       
#> 11 JQI_Intensity i… JQIintensity_i   intens_slim - JQI S… index Eurofound       
#> 12 JQI_Working tim… JQItime_i        wlb_slim - JQI SLIM… index Eurofound       
#> 13 Exposition to d… exposdiscr_p     disc_d -  Having be… %     Eurofound       
#>    Source_reference Disaggregation Bookmark_URL                                 
#>    <chr>            <chr>          <chr>                                        
#>  1 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  2 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  3 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  4 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  5 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  6 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  7 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  8 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  9 EWCS             sex            https://www.eurofound.europa.eu/surveys/abou…
#> 10 EWCS             sex            https://www.eurofound.europa.eu/surveys/abou…
#> 11 EWCS             sex            https://www.eurofound.europa.eu/surveys/abou…
#> 12 EWCS             sex            https://www.eurofound.europa.eu/surveys/abou…
#> 13 EWCS             sex            https://www.eurofound.europa.eu/surveys/abou…

where the “Code_in_database” information is of particular importance, since it should be provided by the user when extracting the local data for a given indicator. The data visualization could be also performed in the following way:

View(dbEUF2018meta)

The second locally accessible object in the convergEU package is dbEurofound which is a raw local Eurofound database accessed as follows:

data(dbEurofound)
head(dbEurofound)
#> # A tibble: 6 x 17
#>    time geo   geo_label sex   lifesatisf health goodhealth_p trustlocal volunt
#>   <dbl> <chr> <chr>     <chr>      <dbl>  <dbl>        <dbl>      <dbl>  <dbl>
#> 1  1960 AD    Andorra   Fema…         NA     NA           NA         NA     NA
#> 2  1960 AD    Andorra   Males         NA     NA           NA         NA     NA
#> 3  1960 AD    Andorra   Total         NA     NA           NA         NA     NA
#> 4  1960 AL    Albania   Fema…         NA     NA           NA         NA     NA
#> 5  1960 AL    Albania   Males         NA     NA           NA         NA     NA
#> 6  1960 AL    Albania   Total         NA     NA           NA         NA     NA
#> # … with 8 more variables: volunt_p <dbl>, caring_h <dbl>, socialexc_i <dbl>,
#> #   JQIskill_i <dbl>, JQIenviron_i <dbl>, JQIintensity_i <dbl>,
#> #   JQItime_i <dbl>, exposdiscr_p <dbl>

The names of the variables in dbEurofound are:

names(dbEurofound)
#>  [1] "time"           "geo"            "geo_label"      "sex"           
#>  [5] "lifesatisf"     "health"         "goodhealth_p"   "trustlocal"    
#>  [9] "volunt"         "volunt_p"       "caring_h"       "socialexc_i"   
#> [13] "JQIskill_i"     "JQIenviron_i"   "JQIintensity_i" "JQItime_i"     
#> [17] "exposdiscr_p"

while the time ranges in the interval:

c(min(dbEurofound$time), max(dbEurofound$time))
#> [1] 1960 2017

The database is not complete in such a time range for all considered countries.

The third object in the convergEU package is dbMetaEUStat which provides metainformation on downloadable data from the Eurostat repository. For further details on this object, please see Section 2 related to data downloaded from the Eurostat repository.

The object emp_20_64_MS in the convergEU package is an Eurofound dataset exploited during the development of this package. It refers to the Employment rate indicator from 2002 to 2018 and within the age class 20 to 64. The dataset emp_20_64_MS contains source data downloaded from the Eurostat repository which are reshaped in the required format time by countries:

data("emp_20_64_MS")
head(emp_20_64_MS)
#> # A tibble: 6 x 29
#>    time    AT    BE    BG    CY    CZ    DE    DK    EE    EL    ES    FI    FR
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  2002  70.9  64.7  56.5  75.1  71.7  68.8  78.3  68.3  62.8  63.2  73.2  68.6
#> 2  2003  71.3  64.5  58.7  75.4  71    68.4  77.4  69.1  63.8  64.3  72.9  69.7
#> 3  2004  68.4  65.8  61.2  75.7  70.1  67.9  78.1  70.2  64.3  65.2  72.5  69.2
#> 4  2005  70.4  66.5  61.9  74.4  70.7  69.4  78    72    64.4  67.5  73    69.4
#> 5  2006  71.6  66.5  65.1  75.8  71.2  71.1  79.4  75.9  65.6  69    73.9  69.4
#> 6  2007  72.8  67.7  68.4  76.8  72    72.9  79    76.9  65.8  69.7  74.8  69.9
#> # … with 16 more variables: HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>,
#> #   LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>, RO <dbl>,
#> #   SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>
emp_20_64_MS
#> # A tibble: 17 x 29
#>     time    AT    BE    BG    CY    CZ    DE    DK    EE    EL    ES    FI    FR
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  2002  70.9  64.7  56.5  75.1  71.7  68.8  78.3  68.3  62.8  63.2  73.2  68.6
#>  2  2003  71.3  64.5  58.7  75.4  71    68.4  77.4  69.1  63.8  64.3  72.9  69.7
#>  3  2004  68.4  65.8  61.2  75.7  70.1  67.9  78.1  70.2  64.3  65.2  72.5  69.2
#>  4  2005  70.4  66.5  61.9  74.4  70.7  69.4  78    72    64.4  67.5  73    69.4
#>  5  2006  71.6  66.5  65.1  75.8  71.2  71.1  79.4  75.9  65.6  69    73.9  69.4
#>  6  2007  72.8  67.7  68.4  76.8  72    72.9  79    76.9  65.8  69.7  74.8  69.9
#>  7  2008  73.8  68    70.7  76.5  72.4  74    78.7  77.1  66.3  68.5  75.8  70.5
#>  8  2009  73.4  67.1  68.8  75.3  70.9  74.2  76.1  70    65.6  64    73.5  69.5
#>  9  2010  73.9  67.6  64.7  75    70.4  75    74.9  66.8  63.8  62.8  73    69.3
#> 10  2011  74.2  67.3  62.9  73.4  70.9  76.5  74.8  70.6  59.6  62    73.8  69.2
#> 11  2012  74.4  67.2  63    70.2  71.5  76.9  74.3  72.2  55    59.6  74    69.4
#> 12  2013  74.6  67.2  63.5  67.2  72.5  77.3  74.3  73.3  52.9  58.6  73.3  69.5
#> 13  2014  74.2  67.3  65.1  67.6  73.5  77.7  74.7  74.3  53.3  59.9  73.1  69.2
#> 14  2015  74.3  67.2  67.1  67.9  74.8  78    75.4  76.5  54.9  62    72.9  69.5
#> 15  2016  74.8  67.7  67.7  68.7  76.7  78.6  76    76.6  56.2  63.9  73.4  70  
#> 16  2017  75.4  68.5  71.3  70.8  78.5  79.2  76.6  78.7  57.8  65.5  74.2  70.6
#> 17  2018  76.2  69.7  72.4  73.9  79.9  79.9  77.5  79.5  59.5  67    76.3  71.3
#> # … with 16 more variables: HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>,
#> #   LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>, RO <dbl>,
#> #   SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>

This dataset is always ready for the analysis without further processing. Details on how to download datasets similar to emp_20_64_MS are provided in Section 2. Following, we illustrate how to analyse the convergence with locally accessible data starting from the first example related to the Mean Life Satisfaction indicator.

NOTE: within the convergeEU package, Eurofound data are statically stored. Please update this package to have the most recent version of Eurofound data.

1.1 The Mean Life Satisfaction indicator

1.1.1 Data preparation step

All the indicators are extracted from the locally accessible Eurofound database in the first step of the analysis through the extract_indicator_EUF function. It amounts to choose a time interval, an indicator and a set of countries (MS). As already stated above, the information on the locally accessible indicators are provided in the object dbEUF2018meta as follows:

names(dbEUF2018meta)
head(dbEUF2018meta)

where the “Code_in_database” is the information that should be provided to extract a given indicator. Information on sets of countries are available within the function convergEU_glb() that contains lists of constants and setups for the convergEU package. This function generates global static objects of clusters of countries with their corresponding labels, as well as metainformation on the stored indicators:

names(convergEU_glb())
#>  [1] "EUcodes"         "Eurozone"        "EA"              "EU12"           
#>  [5] "EU15"            "EU19"            "EU25"            "EU27_2007"      
#>  [9] "EU27_2019"       "EU27_2020"       "EU27"            "EU28"           
#> [13] "geoRefEUF"       "metaEUStat"      "tmpl_out"        "paralintags"    
#> [17] "rounDigits"      "epsilonV"        "scoreBoaTB"      "labels_clusters"

The first object, EUcodes, in the function convergEU_glb() relates to the labels for all the MS:

convergEU_glb()$EUcodes
#> # A tibble: 28 x 4
#>    pae   paeF   paeN paeS 
#>    <chr> <chr> <dbl> <chr>
#>  1 1-AT  1-AT      1 AT   
#>  2 10-ES 10-ES    10 ES   
#>  3 11-FI 11-FI    11 FI   
#>  4 12-FR 12-FR    12 FR   
#>  5 13-HR 13-HR    13 HR   
#>  6 14-HU 14-HU    14 HU   
#>  7 15-IE 15-IE    15 IE   
#>  8 16-IT 16-IT    16 IT   
#>  9 17-LT 17-LT    17 LT   
#> 10 18-LU 18-LU    18 LU   
#> # … with 18 more rows

The objects containing clusters of MS with their corresponding labels are: Eurozone, EA, EU12, EU15, EU19, EU25, EU27, EU27_2007, EU27_2019, EU27_2020 and EU28. It must be noted that EU27 refers to Member States after the 1st Februrary 2020, while EU28 is a valid tag up to 3 March 2020. The clusters EU27_2020 and EU27_2007 as defined by Eurofound are also available; for further details see https://ec.europa.eu/eurostat/help/faq/brexit. For example, the clusters EU12 and EU27_2020 of MS consist of the following countries with their corresponding labels:

convergEU_glb()$EU12
#> $dates
#> [1] "01-11-1993" "31/12/1994"
#> 
#> $memberStates
#> # A tibble: 12 x 2
#>    MS             codeMS
#>    <chr>          <chr> 
#>  1 Belgium        BE    
#>  2 Denmark        DK    
#>  3 France         FR    
#>  4 Germany        DE    
#>  5 Greece         EL    
#>  6 Ireland        IE    
#>  7 Italy          IT    
#>  8 Luxembourg     LU    
#>  9 Netherlands    NL    
#> 10 Portugal       PT    
#> 11 Spain          ES    
#> 12 United-Kingdom UK
convergEU_glb()$EU27_2020
#> $dates
#> [1] "01/02/2020" "00/00/0000"
#> 
#> $memberStates
#> # A tibble: 27 x 2
#>    MS          codeMS
#>    <chr>       <chr> 
#>  1 Belgium     BE    
#>  2 Denmark     DK    
#>  3 France      FR    
#>  4 Germany     DE    
#>  5 Greece      EL    
#>  6 Ireland     IE    
#>  7 Italy       IT    
#>  8 Luxembourg  LU    
#>  9 Netherlands NL    
#> 10 Portugal    PT    
#> # … with 17 more rows

Please note that EU19, Eurozone and EA are synonyms:

convergEU_glb()$EU19
#> $dates
#> [1] NA NA
#> 
#> $memberStates
#> # A tibble: 19 x 2
#>    MS          codeMS
#>    <chr>       <chr> 
#>  1 Austria     AT    
#>  2 Belgium     BE    
#>  3 Cyprus      CY    
#>  4 Estonia     EE    
#>  5 Finland     FI    
#>  6 France      FR    
#>  7 Germany     DE    
#>  8 Greece      EL    
#>  9 Ireland     IE    
#> 10 Italy       IT    
#> 11 Latvia      LV    
#> 12 Lithuania   LT    
#> 13 Luxembourg  LU    
#> 14 Malta       MT    
#> 15 Netherlands NL    
#> 16 Portugal    PT    
#> 17 Slovakia    SK    
#> 18 Slovenia    SI    
#> 19 Spain       ES
convergEU_glb()$Eurozone
#> $dates
#> [1] NA NA
#> 
#> $memberStates
#> # A tibble: 19 x 2
#>    MS          codeMS
#>    <chr>       <chr> 
#>  1 Austria     AT    
#>  2 Belgium     BE    
#>  3 Cyprus      CY    
#>  4 Estonia     EE    
#>  5 Finland     FI    
#>  6 France      FR    
#>  7 Germany     DE    
#>  8 Greece      EL    
#>  9 Ireland     IE    
#> 10 Italy       IT    
#> 11 Latvia      LV    
#> 12 Lithuania   LT    
#> 13 Luxembourg  LU    
#> 14 Malta       MT    
#> 15 Netherlands NL    
#> 16 Portugal    PT    
#> 17 Slovakia    SK    
#> 18 Slovenia    SI    
#> 19 Spain       ES
convergEU_glb()$EA
#> $dates
#> [1] NA NA
#> 
#> $memberStates
#> # A tibble: 19 x 2
#>    MS          codeMS
#>    <chr>       <chr> 
#>  1 Austria     AT    
#>  2 Belgium     BE    
#>  3 Cyprus      CY    
#>  4 Estonia     EE    
#>  5 Finland     FI    
#>  6 France      FR    
#>  7 Germany     DE    
#>  8 Greece      EL    
#>  9 Ireland     IE    
#> 10 Italy       IT    
#> 11 Latvia      LV    
#> 12 Lithuania   LT    
#> 13 Luxembourg  LU    
#> 14 Malta       MT    
#> 15 Netherlands NL    
#> 16 Portugal    PT    
#> 17 Slovakia    SK    
#> 18 Slovenia    SI    
#> 19 Spain       ES

Thus, it is important to note that if the user selects the cluster Eurozone, then the calculations are performed NOT for all the MS in the EU, but only for those in the clusters EA/EU19.

The object geoRefEUF contains MS as well as other countries (neighbouring countries of the EU):

convergEU_glb()$geoRefEUF
#> # A tibble: 77 x 2
#>    geo   geo_label  
#>    <chr> <chr>      
#>  1 AD    Andorra    
#>  2 AL    Albania    
#>  3 AM    Armenia    
#>  4 AT    Austria    
#>  5 AZ    Azerbaijan 
#>  6 BE    Belgium    
#>  7 BG    Bulgaria   
#>  8 BY    Belarus    
#>  9 CH    Switzerland
#> 10 CY    Cyprus     
#> # … with 67 more rows

The convergEU_glb() function provides also metainformation on different indicators, for example:

convergEU_glb()$metaEUStat
#> # A tibble: 56 x 6
#>    Official_code Code_in_database indicator Official_code_p… subSelection
#>    <chr>         <chr>            <chr>     <chr>            <chr>       
#>  1 lfsa_argaed   act_15-64_p      Activity… lfsa_argaed      <NA>        
#>  2 lfsa_ergaed   emp_20_64_p      Employme… lfsa_ergaed      <NA>        
#>  3 lfsa_ergacob  emp_nonat_p      Employme… lfsa_ergacob     <NA>        
#>  4 lfsa_urgan    unem_15_74_p     Unemploy… lfsa_urgan       <NA>        
#>  5 une_educ_a    unem_pri_p       Unemploy… une_educ_a       <NA>        
#>  6 lfsa_upgan    ltunemp_p        Long-ter… lfsa_upgan       <NA>        
#>  7 edat_lfse_20  neet_p           NEET's r… edat_lfse_20     <NA>        
#>  8 une_rt_a      youthunemp_p     Youth un… une_rt_a         <NA>        
#>  9 ilc_lvhl36    temptoperm_p     Transiti… ilc_lvhl36       <NA>        
#> 10 ilc_lvhl30    parttofull_p     Transiti… ilc_lvhl30       <NA>        
#> # … with 46 more rows, and 1 more variable: selectorUser <chr>

The general procedure to extract locally accessible data for a given indicator is illustrated below for the Mean Life satisfaction indicator, labelled lifesatisf in the column “Code_in_database” within the object dbEUF2018meta containing the metainformation:

names(dbEUF2018meta)
#>  [1] "DIMENSION"           "SUBDIMENSION"        "INDICATOR"          
#>  [4] "Code_in_database"    "Official_code"       "Unit"               
#>  [7] "Source_organisation" "Source_reference"    "Disaggregation"     
#> [10] "Bookmark_URL"

We proceed to extract the data for the lifesatisf indicator by invoking the extract_indicator_EUF function, and by considering the time interval from 2003 to 2016 and the cluster EU19 of MS:

myTBlf <- extract_indicator_EUF(
  indicator_code = "lifesatisf", #Code_in_database
  fromTime=2003,
  toTime=2016,
  gender= c("Total","Females","Males")[1],
  countries= convergEU_glb()$EU19$memberStates$codeMS
)
myTBlf
#> $res
#> # A tibble: 4 x 21
#>    time sex      AT    BE    CY    DE    EE    EL    ES    FI    FR    IE    IT
#>   <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  2003 Total  7.85  7.52  7.22  7.36  5.94  6.69  7.49  8.14  6.96  7.70  7.22
#> 2  2007 Total  6.95  7.54  7.05  7.16  6.72  6.58  7.25  8.23  7.32  7.59  6.59
#> 3  2011 Total  7.66  7.38  7.16  7.20  6.28  6.16  7.47  8.08  7.23  7.39  6.88
#> 4  2016 Total  7.92  7.31  6.54  7.31  6.73  5.26  6.95  8.07  7.17  7.68  6.56
#> # … with 8 more variables: LT <dbl>, LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>,
#> #   PT <dbl>, SI <dbl>, SK <dbl>
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

which results in a dataset in the format time by countries in the list component named “$res”; further components are “$msg” that possibly carries messages for the user and “$err” which is a string containing an error message, if an error occurs. Please note that the function is conditioned to gender.

IMPORTANT: Users exploiting automated access to the database through web services or bulk download should pay particular attention to changes of labels. Further information about EU aggregates is available at “https://ec.europa.eu/eurostat/help/faq/brexit”.

In this example for the lifesatisf indicator, the data are extracted for both males and females. Otherwise, data only for females or only for males could be extracted as follows:

# Extract data only for females:
feTBlf <- extract_indicator_EUF(
  indicator_code = "lifesatisf", #Code_in_database
  fromTime=2003,
  toTime=2016,
  gender= c("Total","Females","Males")[2],
  countries= convergEU_glb()$EU19$memberStates$codeMS
)
feTBlf
#> $res
#> # A tibble: 4 x 21
#>    time sex       AT    BE    CY    DE    EE    EL    ES    FI    FR    IE    IT
#>   <dbl> <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  2003 Femal…  7.93  7.38  7.13  7.48  5.91  6.75  7.43  8.27  6.97  7.89  7.12
#> 2  2007 Femal…  7.20  7.49  7.06  7.17  6.77  6.55  7.14  8.30  7.34  7.66  6.54
#> 3  2011 Femal…  7.63  7.47  7.07  7.27  6.26  6.13  7.51  8.27  7.17  7.41  6.86
#> 4  2016 Femal…  7.87  7.27  6.50  7.28  6.81  5.30  6.97  8.10  7.24  7.66  6.56
#> # … with 8 more variables: LT <dbl>, LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>,
#> #   PT <dbl>, SI <dbl>, SK <dbl>
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL
# Extract data only for males:
maTBlf <- extract_indicator_EUF(
  indicator_code = "lifesatisf", #Code_in_database
  fromTime=2003,
  toTime=2016,
  gender= c("Total","Females","Males")[3],
  countries= convergEU_glb()$EU19$memberStates$codeMS
)
maTBlf
#> $res
#> # A tibble: 4 x 21
#>    time sex      AT    BE    CY    DE    EE    EL    ES    FI    FR    IE    IT
#>   <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  2003 Males  7.75  7.66  7.32  7.23  5.98  6.63  7.56  7.99  6.95  7.51  7.32
#> 2  2007 Males  6.67  7.60  7.03  7.16  6.67  6.62  7.37  8.16  7.31  7.51  6.64
#> 3  2011 Males  7.69  7.28  7.26  7.13  6.30  6.19  7.41  7.87  7.29  7.37  6.91
#> 4  2016 Males  7.98  7.35  6.58  7.34  6.64  5.23  6.93  8.04  7.09  7.71  6.55
#> # … with 8 more variables: LT <dbl>, LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>,
#> #   PT <dbl>, SI <dbl>, SK <dbl>
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

First, it is convenient to assign a meaningful name to the locally extracted data for both males and females of the lifesatisf indicator:

TBlf<-myTBlf$res
dim(TBlf)
#> [1]  4 21

that is 4 rows and 21 columns, and variables names:

names(TBlf)
#>  [1] "time" "sex"  "AT"   "BE"   "CY"   "DE"   "EE"   "EL"   "ES"   "FI"  
#> [11] "FR"   "IE"   "IT"   "LT"   "LU"   "LV"   "MT"   "NL"   "PT"   "SI"  
#> [21] "SK"

It is worthwhile to mention that the analysis of convergence is performed on clean and imputed tidy dataset in the format years by countries. Thus, the dataset cannot have qualitative variables, neither vector of strings nor missing values for computing convergence measures, and a time variable should also be present. The check_data() function is called to check for the presence of unsuited features that must be solved before starting the analysis.

For example, for the TBlf data we have:

check_data(TBlf)
#> $res
#> NULL
#> 
#> $msg
#> NULL
#> 
#> $err
#> [1] "Error: string  variables in the dataframe."

where the suspected string variable is sex that should be deleted as follows:

lifesatFin <- dplyr::select(TBlf,-sex)
check_data(lifesatFin)
#> $res
#> [1] TRUE
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

Once the string variable sex is deleted, the check_data() function is called again:

check_data(lifesatFin)
#> $res
#> [1] TRUE
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

Thus, lifesatFin is the final object to start the analysis of convergence.

1.1.2 Beta convergence for the lifesatFin dataset

Several measures of convergence have been recently proposed by Eurofound (Eurofound, 2018). In the convergEU package four measures of convergence are implemented: Beta, Sigma, Gamma and Delta convergences. Following, each measure is illustrated in details for the lifesatFin dataset.

Beta convergence is calculated in the convergEU package through the beta_conv() function. The dataset should be in the format years by countries and no other variable besides time and countries indicator must be present. It must be noted that if the time variable is not ordered, it must be sorted. For further details on how to sort not ordered time variables, see the first example in the help documentation:

help(beta_conv)

For the the lifesatFin dataset, the Beta convergence is calculated as follows:

require(ggplot2)
require(dplyr)
require(tibble)

empBC <- beta_conv(lifesatFin, 
                   time_0 = 2003, 
                   time_t = 2016, 
                   all_within = FALSE, 
                   timeName = "time")

where the reference time (time_0) is 2003 while the target time (time_t) is 2016. Note that the option all_within = FALSE is the default in order to ensure that just the first and the last two years are considered, e.g. 2003 and 2016 in this example. Otherwise, if all_within = TRUE then all the years in the specified time interval are considered.

It must be also noted that in the implementation of the beta_conv() function, the same reference time is maintained across different years and the left hand side is divided by the amount of time elapsed (i.e. time_t-time_0) with the argument useTau = TRUE by default. Thus, the division by the number of years elapsed may be skipped when the argument useTau = FALSE is specified:

beta_conv(lifesatFin, 
          time_0 = 2003, 
          time_t = 2016, 
          all_within = FALSE, 
          timeName = "time",
          useTau = FALSE)

The output of the beta_conv() function consists of the usual three list components: “$res” that contains the results, “$msg” that possibly carries messages for the user and “$err” which is a string containing an error message, if an error occurs:

empBC
#> $res
#> $res$workTB
#> # A tibble: 19 x 3
#>    deltaIndic indic countries
#>         <dbl> <dbl> <chr>    
#>  1   0.000743  2.06 AT       
#>  2  -0.00216   2.02 BE       
#>  3  -0.00768   1.98 CY       
#>  4  -0.000537  2.00 DE       
#>  5   0.00962   1.78 EE       
#>  6  -0.0185    1.90 EL       
#>  7  -0.00576   2.01 ES       
#>  8  -0.000606  2.10 FI       
#>  9   0.00222   1.94 FR       
#> 10  -0.000174  2.04 IE       
#> 11  -0.00736   1.98 IT       
#> 12   0.0134    1.69 LT       
#> 13   0.00211   2.04 LU       
#> 14   0.00970   1.72 LV       
#> 15   0.00254   1.99 MT       
#> 16   0.00193   2.02 NL       
#> 17   0.0105    1.79 PT       
#> 18  -0.00209   1.95 SI       
#> 19   0.00954   1.73 SK       
#> 
#> $res$model
#> 
#> Call:
#> stats::lm(formula = deltaIndic ~ indic, data = workTB)
#> 
#> Coefficients:
#> (Intercept)        indic  
#>     0.07130     -0.03639  
#> 
#> 
#> $res$summary
#> # A tibble: 2 x 5
#>   term        estimate std.error statistic p.value
#>   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
#> 1 (Intercept)   0.0713    0.0231      3.09 0.00669
#> 2 indic        -0.0364    0.0119     -3.05 0.00720
#> 
#> $res$beta1
#> [1] -0.0363931
#> 
#> $res$adj.r.squared
#> [1] 0.3160892
#> 
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

By considering more in details the component “$res”, it contains for example:

  • a list with the values of Beta convergence:
empBC$res$workTB
#> # A tibble: 19 x 3
#>    deltaIndic indic countries
#>         <dbl> <dbl> <chr>    
#>  1   0.000743  2.06 AT       
#>  2  -0.00216   2.02 BE       
#>  3  -0.00768   1.98 CY       
#>  4  -0.000537  2.00 DE       
#>  5   0.00962   1.78 EE       
#>  6  -0.0185    1.90 EL       
#>  7  -0.00576   2.01 ES       
#>  8  -0.000606  2.10 FI       
#>  9   0.00222   1.94 FR       
#> 10  -0.000174  2.04 IE       
#> 11  -0.00736   1.98 IT       
#> 12   0.0134    1.69 LT       
#> 13   0.00211   2.04 LU       
#> 14   0.00970   1.72 LV       
#> 15   0.00254   1.99 MT       
#> 16   0.00193   2.02 NL       
#> 17   0.0105    1.79 PT       
#> 18  -0.00209   1.95 SI       
#> 19   0.00954   1.73 SK
  • summary of the model results:
empBC$res$summary
#> # A tibble: 2 x 5
#>   term        estimate std.error statistic p.value
#>   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
#> 1 (Intercept)   0.0713    0.0231      3.09 0.00669
#> 2 indic        -0.0364    0.0119     -3.05 0.00720
  • the \(\beta_{1}\) estimated coefficient:
empBC$res$beta1
#> [1] -0.0363931

and its standard error:

empBC$res$summary$std.error[2]
#> [1] 0.01192145
  • the Adjusted R-Squared:
empBC$res$adj.r.squared
#> [1] 0.3160892

as well as several components for the estimated statistical model (by OLS-Ordinary Least Squares), e.g. estimated coefficients, residuals, fitted values, the residual degrees of freedom. Note that a standard two tails test is reported (p-value and adjusted R-squared).

A plot of transformed data and the straight line for the results obtained for beta_convergence may also be useful. To this end, the function beta_conv_graph is specifically implemented for obtaining this graphical representation as follows:

mybetaplot<-beta_conv_graph(betaRes=empBC,
 indiName = 'Mean Life Satisfaction',
 time_0 = 2003,
 time_t = 2016)

mybetaplot

where in the argument betaRes the user should specify the output of the beta_conv function, while time_0 and time_t refer to the starting and ending time respectively. It must be noted that the name of the indicator is specified in the argument indiName as a string.


1.1.3 Sigma convergence for the lifesatFin dataset

Following, the Sigma convergence is calculated by invoking the sigma_conv function for the lifesatFin dataset, that is already in the required format years by countries.

help(sigma_conv)

Note that if the arguments time_0 and time_t are not specified, then all years are considered:

mySTB <- sigma_conv(lifesatFin,timeName="time")
mySTB
#> $res
#> # A tibble: 4 x 5
#>    time stdDev     CV  mean devianceT
#>   <dbl>  <dbl>  <dbl> <dbl>     <dbl>
#> 1  2003  0.814 0.117   6.97     12.6 
#> 2  2007  0.593 0.0836  7.09      6.69
#> 3  2011  0.542 0.0765  7.09      5.58
#> 4  2016  0.687 0.0977  7.03      8.98
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

Otherwise, it is possible to select a time window for which the Sigma convergence is calculated as follows:

sigma_conv(lifesatFin,time_0 = 2007,time_t = 2011)

The output of the sigma_conv function is the usual list:

  • “$res” which contains the values of Sigma convergence (called stdDev or CV) along time. This component also contains the mean and deviance values along time;
  • “$msg” which possibly carries messages for the user;
  • “$err” containing an error message, if an error has occurred. If no error has occurred, the message is ‘NULL’.

It is also possible to obtain a graphical representation of the standard deviation and the coefficient of variation obtained for the Sigma convergence by invoking the sigma_conv_graph fucntion as follows:

mysigmaplot<-sigma_conv_graph(sigmaconvOut=mySTB,
                              time_0 = 2007,
                              time_t = 2011, 
                              aggregation='EU19')
mysigmaplot

where in the argument sigmaconvOut the user should specify the output of the sigma_conv function, while time_0 and time_t refer to the starting and ending time respectively. Note that the considered cluster of MS is specified in the argument aggregation as a string.

1.1.4 Gamma convergence for the lifesatFin dataset

The third measure of convergence implemented in the convergEU package is the Gamma convergence that is computed by invoking the gamma_conv function. As usual, the dataset should be in the format years by countries. For the lifesatFin dataset, the Gamma convergence is calculated as follows:

gamma_conv(lifesatFin,ref=2003,last=2016,timeName="time")
#> $res
#> [1] 0.5762807
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

with 2003 as reference time (argument ref) and 2016 the last time (argument last) to be considered. The output is the usual list of the three components “$res”, “$msg” and “$err”. It is also possible to print the ranks of countries for each time by specifying the option printRanks=T:

gamma_conv(lifesatFin,ref=2003,last=2016,timeName="time", printRanks = T)
#> Ranks:
#>      AT BE CY DE EE EL ES FI FR IE IT LT LU LV MT NL PT SI SK
#> [1,] 18 14 10 12  4  6 13 19  7 17  9  1 16  2 11 15  5  8  3
#> [2,]  8 14  9 10  7  4 12 19 13 16  5  3 18  1 15 17  2 11  6
#> [3,] 16 13  9 10  3  1 15 19 11 14  7  5 18  2 12 17  6  8  4
#> [4,] 18 13  5 12  7  1 10 19 11 15  6  4 17  2 14 16  9  8  3
#> $res
#> [1] 0.5762807
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

Note that the Gamma convergence could be also calculated between pairs of subsequent years by invoking the gamma_conv_msteps function. For example for the lifesatFin dataset:

GCms <- gamma_conv_msteps(lifesatFin,startTime=2003,endTime=2016, timeName = "time")
GCms
#> $res
#> # A tibble: 4 x 2
#>    time gammaConv
#>   <dbl>     <dbl>
#> 1  2003    NA    
#> 2  2007     0.4  
#> 3  2011     0.510
#> 4  2016     0.576
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

The gamma_conv_msteps function considers the years chosen (in the example above 2003 and 2018) and calculates the Gamma convergence for each pair of subsequent years (2003-2007, 2007-2011 and 2011-2016 in the example above).

1.1.5 Delta convergence for the lifesatFin dataset

The last measure of convergence implemented in the convergEU package is the Delta convergence computed by invoking the delta_conv function. Given a dataset of quantitative indicators along time, the Delta convergence is a statistic describing departures from the best performer. To calculate Delta convergence, it is sufficient to specify the dataset and the time variable name. For example, for the lifesatisf indicator, the results will be:

delta_conv(lifesatFin,"time")
#> $res
#> # A tibble: 4 x 2
#>    time delta
#>   <dbl> <dbl>
#> 1  2003  22.3
#> 2  2007  21.7
#> 3  2011  18.8
#> 4  2016  19.8
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

with an output given by the components “$res” (results), “$msg” (message) and “$err” (error). It must be noted that the delta_conv function allows to obtain also the declaration of convergence. To this end, the argument extended should be specified as TRUE. For example, for the lifesatisf indicator the syntax is as follows:

delta_conv(lifesatFin,"time", extended=TRUE)
#> $res
#> $res$delta_conv
#> # A tibble: 4 x 2
#>    time delta
#>   <dbl> <dbl>
#> 1  2003  22.3
#> 2  2007  21.7
#> 3  2011  18.8
#> 4  2016  19.8
#> 
#> $res$differences
#>             AT        BE        CY        DE       EE       EL        ES FI
#> [1,] 0.2900786 0.6212335 0.9162025 0.7786398 2.200235 1.444585 0.6465211  0
#> [2,] 1.2864127 0.6902347 1.1851578 1.0695601 1.511724 1.651325 0.9796877  0
#> [3,] 0.4182611 0.6992726 0.9145322 0.8737502 1.800689 1.916749 0.6120100  0
#> [4,] 0.1501288 0.7651687 1.5381799 0.7660656 1.345254 2.811414 1.1226888  0
#>             FR        IE        IT       LT        LU       LV        MT
#> [1,] 1.1742063 0.4352756 0.9211011 2.694273 0.4490004 2.563162 0.8151641
#> [2,] 0.9090123 0.6468849 1.6447601 1.911084 0.3330436 2.193221 0.6741619
#> [3,] 0.8511925 0.6848898 1.1936469 1.376650 0.2882094 1.835249 0.8437681
#> [4,] 0.9063277 0.3888202 1.5160985 1.590870 0.1709514 1.750492 0.5050535
#>             NL       PT       SI       SK
#> [1,] 0.5916190 2.146662 1.094598 2.480780
#> [2,] 0.3640208 2.043251 1.005834 1.555401
#> [3,] 0.3846836 1.310083 1.125367 1.691687
#> [4,] 0.3358555 1.202321 1.219944 1.670450
#> 
#> $res$difference_last_first
#>          AT          BE          CY          DE          EE          EL 
#> -0.13994980  0.14393520  0.62197733 -0.01257420 -0.85498142  1.36682844 
#>          ES          FI          FR          IE          IT          LT 
#>  0.47616768  0.00000000 -0.26787853 -0.04645538  0.59499741 -1.10340261 
#>          LU          LV          MT          NL          PT          SI 
#> -0.27804899 -0.81267023 -0.31011057 -0.25576353 -0.94434166  0.12534523 
#>          SK 
#> -0.81032991 
#> 
#> $res$strict_conv_ini_last
#> [1] FALSE
#> 
#> $res$label_strict
#> [1] " "
#> 
#> $res$converg_ini_last
#> [1] TRUE
#> 
#> $res$label_conver
#> [1] "convergence"
#> 
#> $res$diffe_delta
#> [1] -2.507256
#> 
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

In this contex, it is also useful to evaluate how much a collection of MS deviates from the EU mean for a given indicator and a period of time. In order to obtain this further information the demea_change function has been implemented in the convergEU package:

help(demea_change)

This function calculates the differences from the EU mean at subsequent time within each MS, and afterwards these negative/ positive differences are added over years within the MSs. To illustrate in details the demea_change function, let’s consider the data for the lifesatisf indicator. Suppose that we choose to calculate deviations from the EU mean for Italy, France, Germany and Spain (i.e. argument sele_countries), and by considering as starting time 2003 (i.e. the argument time_0) and as ending time 2016 (i.e. the argument time_t):

res1<-demea_change(lifesatFin,
                   timeName="time",
                   time_0 = 2003,
                   time_t = 2016,
                   sele_countries= c('IT','FR','DE','ES'),
                   doplot=TRUE)

where the argument doplot is equal to TRUE for also obtaining the plot of the calculated differences. Note that if the argument sele_countries is equal to NA, then all countries in the dataset are considered. The output of the demea_change function consists of the usual three list components: “$res” that contains the results, “$msg” that possibly carries messages for the user and “$err” which is a string containing an error message, if an error occurs:

names(res1)
#> [1] "res" "msg" "err"

By considering more in details the component “$res”, it contains for example:

  • a list with the differences at each time for each MS:
res1$res$resDiffe
#> # A tibble: 4 x 20
#>    time     AT    BE      CY     DE     EE     EL      ES    FI       FR    IE
#>   <dbl>  <dbl> <dbl>   <dbl>  <dbl>  <dbl>  <dbl>   <dbl> <dbl>    <dbl> <dbl>
#> 1  2003  0.882 0.551  0.256  0.393  -1.03  -0.273  0.525  1.17  -0.00245 0.736
#> 2  2007 -0.147 0.449 -0.0454 0.0702 -0.372 -0.512  0.160  1.14   0.231   0.493
#> 3  2011  0.572 0.291  0.0760 0.117  -0.810 -0.926  0.379  0.991  0.139   0.306
#> 4  2016  0.890 0.275 -0.498  0.274  -0.305 -1.77  -0.0829 1.04   0.133   0.651
#> # … with 9 more variables: IT <dbl>, LT <dbl>, LU <dbl>, LV <dbl>, MT <dbl>,
#> #   NL <dbl>, PT <dbl>, SI <dbl>, SK <dbl>
  • a list with the differences in absoute value at each time for each MS:
res1$res$diffe_abs_diff
#> # A tibble: 3 x 20
#>    time     AT      BE      CY      DE     EE    EL     ES      FI       FR
#>   <dbl>  <dbl>   <dbl>   <dbl>   <dbl>  <dbl> <dbl>  <dbl>   <dbl>    <dbl>
#> 1  2007 -0.735 -0.101  -0.210  -0.323  -0.656 0.239 -0.365 -0.0320  0.228  
#> 2  2011  0.426 -0.158   0.0306  0.0466  0.438 0.415  0.219 -0.149  -0.0913 
#> 3  2016  0.317 -0.0167  0.422   0.157  -0.505 0.845 -0.296  0.0492 -0.00590
#> # … with 10 more variables: IE <dbl>, IT <dbl>, LT <dbl>, LU <dbl>, LV <dbl>,
#> #   MT <dbl>, NL <dbl>, PT <dbl>, SI <dbl>, SK <dbl>
  • a tibble containing negative and positive differences for the selected MSs:
res1$res$stats
#> # A tibble: 4 x 4
#>   MS    negaSum posiSum  posi
#>   <chr>   <dbl>   <dbl> <int>
#> 1 DE    -0.323    0.204     4
#> 2 ES    -0.661    0.219     7
#> 3 FR    -0.0972   0.228     9
#> 4 IT    -0.302    0.528    11

and minimum and maximum of the calculated differences:

res1$res$miniX
#> [1] -1.161148
res1$res$maxiX
#> [1] 1.498789

To plot the calculated differences, the user should invoke the plot function as follows:

plot(res1$res$res_graph)

1.1.6 Country fiches

The counvergEU package provides the function go_ms_fi to prepare country fiches. This function prepares an html file (which can then be saved as pdf) that shows several results for a given reference country, compared to one or more other countries of interest. More precisely, the results relate to several comparisons through the average (i.e. departures from the mean of the selected countries, total increase/ decrease in the gap with the EU mean, total gap with respect to the best country performer within each year) as well as patterns of convergence and divergence and scoreboards for the selected countries. By invoking the go_ms_fi, the html file can be created for different indicators, timeframes and countries. Note that, the go_ms_fi function automatically prepares the country fiches but it is also able to create a directory along an existing path and to copy the rmarkdown file representing the template within it. The Rmarkdown file is parameterized so that passing different parameters the compilation to HTML format takes place with different data, say different indicators and countries.

It is very important to prepare complete data in the required tidy format time by countries; that is, the dataset is made by a time variable and as many other variables as countries that enter into the calculation of the time average (see Section 1.1 for further information). For example, the lifesatFin dataset is in the required format time by countries:

lifesatFin
#> # A tibble: 4 x 20
#>    time    AT    BE    CY    DE    EE    EL    ES    FI    FR    IE    IT    LT
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  2003  7.85  7.52  7.22  7.36  5.94  6.69  7.49  8.14  6.96  7.70  7.22  5.44
#> 2  2007  6.95  7.54  7.05  7.16  6.72  6.58  7.25  8.23  7.32  7.59  6.59  6.32
#> 3  2011  7.66  7.38  7.16  7.20  6.28  6.16  7.47  8.08  7.23  7.39  6.88  6.70
#> 4  2016  7.92  7.31  6.54  7.31  6.73  5.26  6.95  8.07  7.17  7.68  6.56  6.48
#> # … with 7 more variables: LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PT <dbl>,
#> #   SI <dbl>, SK <dbl>

Below, a call to the function go_ms_fi() illustrates the syntax for the lifesatFin dataset:

go_ms_fi(
  workDF ='lifesatFin',
  countryRef ='DE',
  otherCountries = "c('IT','UK','FR')",
  time_0 = 2003,
  time_t = 2016,
  tName = 'time',
  indiType = 'highBest',
  aggregation= 'EU12',
  x_angle=  45,
  dataNow=  Sys.time(),
  author = 'A.Student',
  outFile = 'Germany-up2-2016-LIFESATFIN', 
  outDir = 'F:/analysis/IT2018',
  indiName= 'lifesatisf'
)

  

It is very important to emphasize some constraints and quite unusual ways to pass parameters to such a function. In fact, note that the first argument is the working dataset which is passed not as an R object but as a string, the name of the dataset that must be available in the R workspace before invoking go_ms_fi.
The second argument countryRef is a string with the short name of a member country that will be shown in one-country plots. One key country must be specified with some other countries of interest to better decorate graphs and compare performances. Less obvious, argument indiType = “lowBest” specifies if the considered indicator is built so that a low value is good for a country or if a high value is good (indiType = “highBest”). It must be noted that the argument lowBest should be used for indicators where the best possible situation is when the indicator has a low value, for instance poverty rate. The option highBest should be used when the best situation is determined by the indicator having a high value, for instance Employment rate as in the example above.

Of particular importance is the argument outFile that can be a string indicating the name of the output file. Similarly outDir is the path (unit and folders) in which the final compiled html will be stored. The sintax of the path depend on the operating system; for example outDir=‘F:/analysis/IT2018’ indicates that in the usb disk called ‘F’, within the folder ‘analysis’ is located folder ‘IT2018’ where R will write the country fiche. Note that a disk called ‘F’ must exist and also folder ‘analysis’ must exist in such unit, while on the contrary folder ‘IT2018’ is created by the function if it does not already exist.

Within the above mentioned output directory, besides the compiled html, it is also stored a file that has the name specified by the argument outFile but with added the string ‘-workspace.RData’ that contains data and plots produced during the compilation of the country fiche for further subsequent use in other technical reports.

NOTE: Internet connection should be available when invoking go_ms_fi function to properly rendering the results in the html file.


1.1.7 Indicator fiches

An auxiliary function go_indica_fi() is provided in the R package convergEU to produce an indicator fiche, where the output is an html file (which can then be saved as pdf). This means that the user chooses a given indicator, a time window and a set of countries, and the corresponding fiche is automatically produced by invoking the go_indica_fi() function. The html file produced in output reports measures of convergence as well as time series plots, box-plots, and patterns of convergence and divergence. Differently from the country fiches that require a reference country and a set of other countries for comparison, the indicators fiches contain the results of the analysis of convergence for a given indicator and a given cluster of countries.

When invoking the go_indica_fi() function, an output directory must be also specified. Note that some arguments are passed as strings instead of objects, as described in the last section above for the country fiches.

An example of syntax to invoke the procedure for the lifesatFin dataset is:

go_indica_fi(
  time_0 = 2003,
  time_t = 2016,
  timeName = 'time',
  workDF = 'lifesatFin' ,
  indicaT = 'lifesatisf',
  indiType = c('highBest','lowBest')[1],
  seleMeasure = 'all',
  seleAggre = 'EU19',
  x_angle =  45,
  data_res_download =  FALSE,
  auth = 'A.Student',
  dataNow =  Sys.time(),
  outFile = 'test_IT-lifesatFin',
  outDir = 'F:/analysis/EU19'
)

Note that the argument seleMeasure refers to the set of measures of convergence for which the analysis should be performed; if seleMeasure = ‘all’ as in the syntax above, then the results are produced for all four measures of convergence (Beta, Sigma, Delta and Gamma). The set of countries for which the analysis of convergence should be performed are specified in the argument seleAggre as a string; the default is the cluster of Member States EU27.

For advanced users: if the selection of countries is different from a standard aggregate (EU12, EU19 etc.), the argument “custom” can be used.

The option outDir specifies an existing unit and a number of existing folders, the path, in which the compiled fiche will be saved. Only the last folder may be created before compilation if it does not exist.

NOTE: Internet connection should be available when invoking go_indica_fi function to properly rendering the results in the html file. The fiches have been tested with the browsers Mozilla Firefox and Google Chrome.



1.2 The JQI_Intensity index indicator

A second example for locally accessible Eurofound data is reported below by considering the JQI_Intensity index indicator related to the working conditions dimension. Similarly to what described for the Mean Life Satisfaction indicator, the first step is the data preparation in order to subsequently perform the analysis of convergence.

1.2.1 Data preparation step: indicators’ extraction from the Eurofound dataset

To extract the data for the JQI_Intensity index indicator from the locally accessible Eurofound database, its label indication is necessary, and it is provided in the object dbEUF2018meta from the column “Code_in_database”:

print(dbEUF2018meta, n=200,width=200) 
#> # A tibble: 13 x 10
#>    DIMENSION          SUBDIMENSION      
#>    <chr>              <chr>             
#>  1 Quality of life    Life satisfaction 
#>  2 Quality of life    Health            
#>  3 Quality of life    Health            
#>  4 Quality of life    Quality of society
#>  5 Quality of life    Quality of society
#>  6 Quality of life    Quality of society
#>  7 Quality of life    Quality of society
#>  8 Quality of life    Quality of society
#>  9 Working conditions Working conditions
#> 10 Working conditions Working conditions
#> 11 Working conditions Working conditions
#> 12 Working conditions Working conditions
#> 13 Working conditions Working conditions
#>    INDICATOR        Code_in_database Official_code        Unit  Source_organisa…
#>    <chr>            <chr>            <chr>                <chr> <chr>           
#>  1 Mean life satis… lifesatisf       y16_q4               --    Eurofound       
#>  2 Mean health sta… health           y16_q48              --    Eurofound       
#>  3 Percentage of p… goodhealth_p     y16_q48              %     Eurofound       
#>  4 Mean level of t… trustlocal       y16_q35f             --    Eurofound       
#>  5 Level of involv… volunt           y16_q29a             --    Eurofound       
#>  6 Percentage of p… volunt_p         y16_q29a             %     Eurofound       
#>  7 Hours per week … caring_h         y16_q43a             hours Eurofound       
#>  8 Social Exclusio… socialexc_i      y16_socexindex       index Eurofound       
#>  9 JQI_Skills and … JQIskill_i       wq_slim -  JQI SLIM… index Eurofound       
#> 10 JQI_Physical en… JQIenviron_i     envsec_slim - JQI S… index Eurofound       
#> 11 JQI_Intensity i… JQIintensity_i   intens_slim - JQI S… index Eurofound       
#> 12 JQI_Working tim… JQItime_i        wlb_slim - JQI SLIM… index Eurofound       
#> 13 Exposition to d… exposdiscr_p     disc_d -  Having be… %     Eurofound       
#>    Source_reference Disaggregation Bookmark_URL                                 
#>    <chr>            <chr>          <chr>                                        
#>  1 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  2 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  3 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  4 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  5 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  6 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  7 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  8 EQLS             sex            https://www.eurofound.europa.eu/surveys/abou…
#>  9 EWCS             sex            https://www.eurofound.europa.eu/surveys/abou…
#> 10 EWCS             sex            https://www.eurofound.europa.eu/surveys/abou…
#> 11 EWCS             sex            https://www.eurofound.europa.eu/surveys/abou…
#> 12 EWCS             sex            https://www.eurofound.europa.eu/surveys/abou…
#> 13 EWCS             sex            https://www.eurofound.europa.eu/surveys/abou…

The set of MS are chosen from those available by invoking the convergEU_glb():

names(convergEU_glb())
#>  [1] "EUcodes"         "Eurozone"        "EA"              "EU12"           
#>  [5] "EU15"            "EU19"            "EU25"            "EU27_2007"      
#>  [9] "EU27_2019"       "EU27_2020"       "EU27"            "EU28"           
#> [13] "geoRefEUF"       "metaEUStat"      "tmpl_out"        "paralintags"    
#> [17] "rounDigits"      "epsilonV"        "scoreBoaTB"      "labels_clusters"
convergEU_glb()$EU15$memberStates$MS
#>  [1] "Belgium"        "Denmark"        "France"         "Germany"       
#>  [5] "Greece"         "Ireland"        "Italy"          "Luxembourg"    
#>  [9] "Netherlands"    "Portugal"       "Spain"          "United-Kingdom"
#> [13] "Austria"        "Finland"        "Sweden"
convergEU_glb()$EU15$memberStates$codeMS
#>  [1] "BE" "DK" "FR" "DE" "EL" "IE" "IT" "LU" "NL" "PT" "ES" "UK" "AT" "FI" "SE"

where the EU15 cluster of MS is selected for this type of indicator. Lastly, a time interval is chosen, say from 1995 to 2015. Therefore, we proceed to extract the “JQIintensity_i” indicator for both males and females:

myTBjqt <- extract_indicator_EUF(
  indicator_code = "JQIintensity_i", #Code_in_database
  fromTime=1995,
  toTime=2015,
  gender= c("Total","Females","Males")[1],
  countries= convergEU_glb()$EU15$memberStates$codeMS
)
myTBjqt
#> $res
#> # A tibble: 5 x 17
#>    time sex      AT    BE    DE    DK    EL    ES    FI    FR    IE    IT    LU
#>   <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  1995 Total  48.8  33.2  40.8  39.0  40.9  34.2  47.1  38.4  39.0  34.1  31.4
#> 2  2000 Total  42.9  37.3  40.9  37.6  43.5  36.2  46.7  39.5  42.2  39.7  37.6
#> 3  2005 Total  47.6  42.8  46.9  47.9  50.5  41.2  49.6  40.5  36.9  41.9  40.6
#> 4  2010 Total  42.1  40.2  44.9  39.1  48.6  38.0  45.9  43.0  47.0  40.8  40.8
#> 5  2015 Total  42.4  41.5  40.2  45.0  49.3  46.5  41.1  42.7  42.8  38.1  42.4
#> # … with 4 more variables: NL <dbl>, PT <dbl>, SE <dbl>, UK <dbl>
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

which results in a dataset with the list components “$res”,“$msg” and“$err”.

1.2.2 The issue of missing all the data for a given country

For the purposes of better explaining other possible data formats into which the user could run into, let’s suppose that the values for the country Austria (“AT”) are completely missing, as highlighted in the following dataset myTBjqmiss:

myTBjqmiss<- mutate(myTBjqt$res, AT = as.numeric(NA))
myTBjqmiss
#> # A tibble: 5 x 17
#>    time sex      AT    BE    DE    DK    EL    ES    FI    FR    IE    IT    LU
#>   <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  1995 Total    NA  33.2  40.8  39.0  40.9  34.2  47.1  38.4  39.0  34.1  31.4
#> 2  2000 Total    NA  37.3  40.9  37.6  43.5  36.2  46.7  39.5  42.2  39.7  37.6
#> 3  2005 Total    NA  42.8  46.9  47.9  50.5  41.2  49.6  40.5  36.9  41.9  40.6
#> 4  2010 Total    NA  40.2  44.9  39.1  48.6  38.0  45.9  43.0  47.0  40.8  40.8
#> 5  2015 Total    NA  41.5  40.2  45.0  49.3  46.5  41.1  42.7  42.8  38.1  42.4
#> # … with 4 more variables: NL <dbl>, PT <dbl>, SE <dbl>, UK <dbl>

Thus, if we proceed to check the data for the presence of unsuited features (missing values, qualitative variables, vectors of strings):

check_data(myTBjqmiss, timeName = "time")
#> $res
#> NULL
#> 
#> $msg
#> NULL
#> 
#> $err
#> [1] "Error: one or more missing values in the dataframe."

it results in the error message due to the presence of missing values for Austria. Note that in the convergEU package, the function impute_dataset is specifically implemented for the imputation of missing values (see Section 2 and laters). Nevertheless, this function requires at least two non missing values for a given country. Thus, if all the values for a given country are missing, the imputation is not performed. Consequently, if the user proceeds to calculate the four measures of convergence, the following error message for each measure of convergence is returned:

#> [1] "Error: one or more missing values in the dataframe."

That is, the error message states that missing values are present in the dataset since the values of Austria are completely missing. There are some possible solutions even though in the literature it is still an open issue.

Lastly, it becomes clear that when the values for a given country are completely missing in the dataset, neither the country fiches nor the indicator fiches are computed.

2 Downloadable data from the Eurostat repository

2.1 Data preparation step: indicator’s downloading from the Eurostat repository

Eurostat data are downloaded on the fly from an active Internet connection.
The heterogeneity in the structure of different indicators normalized into the database requires some attentions. A list of covariates for each indicator is sometimes present besides age and gender, thus their values must be set to produce a tidy dataset in the format time by countries. The indicator “Employment rate” is chosen to illustrate how to prepare downloadable data from the Eurostat repository for the analysis of convergence. The data for a specific indicator are downloaded from the Eurostat repository by invoking the function download_indicator_EUS:

help(download_indicator_EUS)

It amounts to choose an indicator, a time interval, and a set of countries (MS). The function allows to also condition to gender, age and other variables according to the chosen indicator. It must be noted that when the argument rawDump is TRUE, raw data are downloaded which produce not a tidy dataset; thus, to obtain a tidy dataset in the format years by countries, the argument rawDump=F should be specified.

IMPORTANT: Users exploiting automated access to the database through web services or bulk download should pay particular attention to changes of labels. Further information about EU aggregates is available at “https://ec.europa.eu/eurostat/help/faq/brexit”.

In order to download the data for the “Employment rate” indicator, its indicator code is necessary. The metainformation on the indicator’s labelling is provided by invoking the convergEU_glb() function, as follows:

convergEU_glb()$metaEUStat
#> # A tibble: 56 x 6
#>    Official_code Code_in_database indicator Official_code_p… subSelection
#>    <chr>         <chr>            <chr>     <chr>            <chr>       
#>  1 lfsa_argaed   act_15-64_p      Activity… lfsa_argaed      <NA>        
#>  2 lfsa_ergaed   emp_20_64_p      Employme… lfsa_ergaed      <NA>        
#>  3 lfsa_ergacob  emp_nonat_p      Employme… lfsa_ergacob     <NA>        
#>  4 lfsa_urgan    unem_15_74_p     Unemploy… lfsa_urgan       <NA>        
#>  5 une_educ_a    unem_pri_p       Unemploy… une_educ_a       <NA>        
#>  6 lfsa_upgan    ltunemp_p        Long-ter… lfsa_upgan       <NA>        
#>  7 edat_lfse_20  neet_p           NEET's r… edat_lfse_20     <NA>        
#>  8 une_rt_a      youthunemp_p     Youth un… une_rt_a         <NA>        
#>  9 ilc_lvhl36    temptoperm_p     Transiti… ilc_lvhl36       <NA>        
#> 10 ilc_lvhl30    parttofull_p     Transiti… ilc_lvhl30       <NA>        
#> # … with 46 more rows, and 1 more variable: selectorUser <chr>

where both the columns selectorUser and Official_code_purified provides this information:

convergEU_glb()$metaEUStat$selectorUser
#>  [1] "lfsa_argaed"    "lfsa_ergaed"    "lfsa_ergacob"   "lfsa_urgan"    
#>  [5] "une_educ_a"     "lfsa_upgan"     "edat_lfse_20"   "une_rt_a"      
#>  [9] "ilc_lvhl36"     "ilc_lvhl30"     "lfsa_egad"      "lfsa_eppgai"   
#> [13] "lfsi_emp_a"     "lfsi_pt_a"      "lfsa_egised"    "earn_gr_gpgr2" 
#> [17] "selfemp_p"      "selfempwe_p"    "lfsa_etgar"     "prc_ppp_ind"   
#> [21] "gov_10dd_edpt1" "nasa_10_ki"     "gov_10a_main"   "labourcost_i"  
#> [25] "labourprod_i"   "compemp_pps"    "earn_nt_net"    "spr_exp_sum"   
#> [29] "expspr_p"       "ilc_pnp3"       "spr_exp_pens"   "spr_pns_ben"   
#> [33] "edat_lfse_14"   "trng_lfse_01"   "edat_lfse_03"   "expedu_p"      
#> [37] "hlth_silc_08"   "exphlth_p"      "isoc_sk_dskl_i" "isoc_bde15b_h" 
#> [41] "isoc_ci_im_i"   "isoc_ec_ibuy"   "isoc_ci_it_en2" "rd_p_perslf"   
#> [45] "rd_e_gerdtot"   "isoc_ciegi_ac"  "exppubserv_p"   "lfsa_eegan2"   
#> [49] "prc_hicp_aind"  "ilc_di12"       "ilc_di11"       "lfst_r_lmder"  
#> [53] "demo_mlexpec"   "hlth_hlye"      "y16_deprindex"  "hlth_dm060"
convergEU_glb()$metaEUStat$Official_code_purified
#>  [1] "lfsa_argaed"    "lfsa_ergaed"    "lfsa_ergacob"   "lfsa_urgan"    
#>  [5] "une_educ_a"     "lfsa_upgan"     "edat_lfse_20"   "une_rt_a"      
#>  [9] "ilc_lvhl36"     "ilc_lvhl30"     "lfsa_egad"      "lfsa_eppgai"   
#> [13] "lfsi_emp_a"     "lfsi_pt_a"      "lfsa_egised"    "earn_gr_gpgr2" 
#> [17] "lfsa_esgan"     "lfsa_esgan"     "lfsa_etgar"     "prc_ppp_ind"   
#> [21] "gov_10dd_edpt1" "nasa_10_ki"     "gov_10a_main"   "nama_10_lp_ulc"
#> [25] "nama_10_lp_ulc" "nama_10_lp_ulc" "earn_nt_net"    "spr_exp_sum"   
#> [29] "gov_10a_exp"    "ilc_pnp3"       "spr_exp_pens"   "spr_pns_ben"   
#> [33] "edat_lfse_14"   "trng_lfse_01"   "edat_lfse_03"   "gov_10a_exp"   
#> [37] "hlth_silc_08"   "gov_10a_exp"    "isoc_sk_dskl_i" "isoc_bde15b_h" 
#> [41] "isoc_ci_im_i"   "isoc_ec_ibuy"   "isoc_ci_it_en2" "rd_p_perslf"   
#> [45] "rd_e_gerdtot"   "isoc_ciegi_ac"  "gov_10a_exp"    "lfsa_eegan2"   
#> [49] "prc_hicp_aind"  "ilc_di12"       "ilc_di11"       "lfst_r_lmder"  
#> [53] "demo_mlexpec"   "hlth_hlye"      "y16_deprindex"  "hlth_dm060"

Similarly to the locally accessible Eurofound datasets (Section 1), for the data considered in this Section, the sets of countries (MS) are available in the convergEU_glb() function:

names(convergEU_glb())
#>  [1] "EUcodes"         "Eurozone"        "EA"              "EU12"           
#>  [5] "EU15"            "EU19"            "EU25"            "EU27_2007"      
#>  [9] "EU27_2019"       "EU27_2020"       "EU27"            "EU28"           
#> [13] "geoRefEUF"       "metaEUStat"      "tmpl_out"        "paralintags"    
#> [17] "rounDigits"      "epsilonV"        "scoreBoaTB"      "labels_clusters"

where, for example, the cluster EU27_2020 of MS:

convergEU_glb()$EU27_2020
#> $dates
#> [1] "01/02/2020" "00/00/0000"
#> 
#> $memberStates
#> # A tibble: 27 x 2
#>    MS          codeMS
#>    <chr>       <chr> 
#>  1 Belgium     BE    
#>  2 Denmark     DK    
#>  3 France      FR    
#>  4 Germany     DE    
#>  5 Greece      EL    
#>  6 Ireland     IE    
#>  7 Italy       IT    
#>  8 Luxembourg  LU    
#>  9 Netherlands NL    
#> 10 Portugal    PT    
#> # … with 17 more rows

The set of countries for which a given indicator is downloaded should be specified in the argument countries of the function download_indicator_EUS. It is up to the user to take the lists of available sets of countries in the convergEU_glb() function, or, alternatively, to make its own list of countries names labels.

Once we have obtained the indicator code for the Employment rate indicator (e.g. lfsa_ergaed) and we have chosen the time interval (say from 2005 to 2015), and the set of countries (say the cluster EU27_2020 of MS), the data are downloaded from the Eurostat repository by invoking the download_indicator_EUS function as follows:

emTB <- download_indicator_EUS(
  indicator_code= "lfsa_ergaed",
  fromTime = 2005,
  toTime = 2015,
  countries =  convergEU_glb()$EU27_2020$memberStates$codeMS,
  rawDump=T)
emTB

Note that raw downloaded data are returned since the argument rawDump is TRUE. To obtain filtered values, rawDump=F should be specified:

emprTB <- download_indicator_EUS(
  indicator_code= "lfsa_ergaed",
  fromTime = 2005,
  toTime = 2015,
  countries =  convergEU_glb()$EU27_2020$memberStates$codeMS,
  rawDump=F)
emprTB

which returns the data in the format years by countries.

The result is a list with the following components:

It is therefore possible to call several times the same function and specify the argument uniqueIdentif as an integer among those in the first column left of $msg$Further_Conditioning$available_seleTagLs to obtain the same indicator under different scales and contexts. That is, the user could download the data and visualize the list in the uniqueIdentif argument by invoking:

emprTB$msg$Further_Conditioning$available_seleTagLs

Then, the user could choose the scales and contexts in which he/she is interested in.
For example, uniqueIdentif=3:

emprTB1 <- download_indicator_EUS(
  indicator_code= "lfsa_ergaed",
  fromTime = 2005,
  toTime = 2015,
  countries =  convergEU_glb()$EU27_2020$memberStates$codeMS,
  rawDump=F,uniqueIdentif=3)

or uniqueIdentif=5:

emprTB2 <- download_indicator_EUS(
  indicator_code= "lfsa_ergaed",
  fromTime = 2005,
  toTime = 2015,
  countries =  convergEU_glb()$EU27_2020$memberStates$codeMS,
  rawDump=F,uniqueIdentif=5)

It is also possible to download data only for females or only for males by explicitly specifying it in the argument gender, for example:

emprTB3 <- download_indicator_EUS(
  indicator_code= "lfsa_ergaed",
  fromTime = 2005,
  toTime = 2015,
  gender='F',
  countries =  convergEU_glb()$EU27_2020$memberStates$codeMS,
  rawDump=F,uniqueIdentif=1)

emprTB4 <- download_indicator_EUS(
  indicator_code= "lfsa_ergaed",
  fromTime = 2005,
  toTime = 2015,
  gender="M",
  countries =  convergEU_glb()$EU27_2020$memberStates$codeMS,
  rawDump=F,uniqueIdentif=1)

The data could be also downloaded by considering a specific interval of ages; to this end, it is useful to visualize the available interval of ages for the emprTB data:

unique(emprTB$msg$Conditioning$ageInterv)
#> [1] Y15-19
#> 32 Levels: Y15-19 Y15-24 Y15-39 Y15-59 Y15-64 Y15-74 Y20-24 Y20-29 ... Y70-74

For example, to download data by considering the age interval Y20-29 and time interval from 1995 to 2015:

emprTB5 <- download_indicator_EUS(
  indicator_code= 'lfsa_ergaed',
  fromTime = 1995,
  toTime = 2015,
  gender='T',
  ageInterv = 'Y20-29',
  countries =  convergEU_glb()$EU27_2020$memberStates$codeMS,
  rawDump=F,uniqueIdentif=1)

or similarly for the age interval Y15-64:

emprTB6 <- download_indicator_EUS(
  indicator_code= 'lfsa_ergaed',
  fromTime = 2005,
  toTime = 2015,
  gender='T',
  ageInterv = 'Y15-64',
  countries =  convergEU_glb()$EU27_2020$memberStates$codeMS,
  rawDump=F,uniqueIdentif=1)
emprTB6

Note that the ageInterv argument is specified as a string.

Following, it is also convenient to assign a meaningful name to the locally downloaded data for the Employment rate indicator. Within the available data above, we choose the emprTB5 one that relates to both males and females in the age interval “Y20-29”:

empTBFin<-emprTB5$res

2.2 Imputation of missing values

As already stated above, the analysis of convergence is performed on clean and imputed data without missing values: a tidy dataset in the format years by countries (see Section 1.1). In this subsection, we describe in detail how to perform imputation of missing values within the convergEU package. It must be emphasized that typical EU indicators are measured over a few years, thus \(10\%\) of missing values within a country may already represent a context that requires substantive reasoning before interpreting results after imputation. The user may or may not go ahead with the analysis depending on the considered context.

Note that as explained in the previous Section, the imputation of missing values in the convergEU package requires at least two non missing values for a given country. If the values for a given country are completely missing, the imputation is not covered.

The final downloaded dataset for the Employment rate indicator is empTBFin:

empTBFin
#> # A tibble: 21 x 30
#>    age   sex    time    BE    DK    FR    DE    EL    IE    IT    LU    NL    PT
#>    <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 Y20-… T      1995  55.5  57.6  53.9  56.5  61.6  56.6  53.8  70.2  NA    73.5
#>  2 Y20-… T      1996  55    67.5  51.1  54.9  61.6  56.3  52.3  69.2  65.7  73.9
#>  3 Y20-… T      1997  51.3  67.4  48.1  54.9  62.4  58.5  51.9  72.3  69.4  75  
#>  4 Y20-… T      1998  53.8  70.8  51.4  NA    63.7  NA    53.9  NA    71.5  79.3
#>  5 Y20-… T      1999  56    69.7  51.3  61.7  63.4  65.9  54    70.5  71.6  76.8
#>  6 Y20-… T      2000  57.6  71.3  52.4  61.3  63.2  66.2  53.6  75.4  72.7  78.5
#>  7 Y20-… T      2001  50.1  70.9  52.5  59.3  63    66.3  54.6  66.6  75.8  80.1
#>  8 Y20-… T      2002  51.9  68.6  51.8  57.3  64.5  62.5  56.7  68.3  74.2  79.6
#>  9 Y20-… T      2003  50.4  64.2  54.5  55.2  66.6  61.4  56.6  62.5  71.2  76.8
#> 10 Y20-… T      2004  49.2  66.8  53.5  51.5  64.4  59.4  59.9  59.2  73.6  76.1
#> # … with 11 more rows, and 17 more variables: ES <dbl>, AT <dbl>, FI <dbl>,
#> #   SE <dbl>, CY <dbl>, CZ <dbl>, EE <dbl>, HU <dbl>, LV <dbl>, LT <dbl>,
#> #   MT <dbl>, PL <dbl>, SK <dbl>, SI <dbl>, BG <dbl>, RO <dbl>, HR <dbl>

The inspection of the print output reveals that missing values are present. Otherwise, the check_data() function may be called to check for the presence of unsuited features that must be solved before starting the analysis (e.g. missing values, string variables).

check_data(empTBFin, timeName = "time")
#> $res
#> NULL
#> 
#> $msg
#> NULL
#> 
#> $err
#> [1] "Error: one or more missing values in the dataframe."

which reveals the presence of missing values. Thus, imputation of the missing values should be performed prior to the analysis of convergence through the impute_dataset function:

help(impute_dataset)

The output of the impute_dataset is a list with the usual three components: “$res”, “$msg” and “$err”. It must be noted that for starting or final missing values, the impute_dataset function provides two options for imputation:

For all other missing values within the dataset, deterministic linear imputation is applied in order to obtain an imputed dataset. The examples below help to better explain the missing values imputation procedure with the options “cut” and “constant”. First, let’s impute the starting missing values for the empTBFin dataset with the option “constant”:

EmpImp <- impute_dataset(empTBFin, timeName = "time",
                         countries=convergEU_glb()$EU27_2020$memberStates$codeMS,
                         tailMiss = c("cut", "constant")[2],
                         headMiss = c("cut", "constant")[2])$res 

EmpImp
#> # A tibble: 21 x 30
#>    age   sex    time    BE    DK    FR    DE    EL    IE    IT    LU    NL    PT
#>    <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 Y20-… T      1995  55.5  57.6  53.9  56.5  61.6  56.6  53.8  70.2  65.7  73.5
#>  2 Y20-… T      1996  55    67.5  51.1  54.9  61.6  56.3  52.3  69.2  65.7  73.9
#>  3 Y20-… T      1997  51.3  67.4  48.1  54.9  62.4  58.5  51.9  72.3  69.4  75  
#>  4 Y20-… T      1998  53.8  70.8  51.4  58.3  63.7  62.2  53.9  71.4  71.5  79.3
#>  5 Y20-… T      1999  56    69.7  51.3  61.7  63.4  65.9  54    70.5  71.6  76.8
#>  6 Y20-… T      2000  57.6  71.3  52.4  61.3  63.2  66.2  53.6  75.4  72.7  78.5
#>  7 Y20-… T      2001  50.1  70.9  52.5  59.3  63    66.3  54.6  66.6  75.8  80.1
#>  8 Y20-… T      2002  51.9  68.6  51.8  57.3  64.5  62.5  56.7  68.3  74.2  79.6
#>  9 Y20-… T      2003  50.4  64.2  54.5  55.2  66.6  61.4  56.6  62.5  71.2  76.8
#> 10 Y20-… T      2004  49.2  66.8  53.5  51.5  64.4  59.4  59.9  59.2  73.6  76.1
#> # … with 11 more rows, and 17 more variables: ES <dbl>, AT <dbl>, FI <dbl>,
#> #   SE <dbl>, CY <dbl>, CZ <dbl>, EE <dbl>, HU <dbl>, LV <dbl>, LT <dbl>,
#> #   MT <dbl>, PL <dbl>, SK <dbl>, SI <dbl>, BG <dbl>, RO <dbl>, HR <dbl>

As an example consider the MS Netherlands (“NL”) which has a starting missing value for the year 1995:

View(empTBFin)
print(EmpImp,n=7,width=250)
#> # A tibble: 21 x 30
#>   age    sex    time    BE    DK    FR    DE    EL    IE    IT    LU    NL    PT
#>   <fct>  <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Y20-29 T      1995  55.5  57.6  53.9  56.5  61.6  56.6  53.8  70.2  65.7  73.5
#> 2 Y20-29 T      1996  55    67.5  51.1  54.9  61.6  56.3  52.3  69.2  65.7  73.9
#> 3 Y20-29 T      1997  51.3  67.4  48.1  54.9  62.4  58.5  51.9  72.3  69.4  75  
#> 4 Y20-29 T      1998  53.8  70.8  51.4  58.3  63.7  62.2  53.9  71.4  71.5  79.3
#> 5 Y20-29 T      1999  56    69.7  51.3  61.7  63.4  65.9  54    70.5  71.6  76.8
#> 6 Y20-29 T      2000  57.6  71.3  52.4  61.3  63.2  66.2  53.6  75.4  72.7  78.5
#> 7 Y20-29 T      2001  50.1  70.9  52.5  59.3  63    66.3  54.6  66.6  75.8  80.1
#>      ES    AT    FI    SE    CY    CZ    EE    HU    LV    LT    MT    PL    SK
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  53.4  68.6  44.6  53.1  74.6  45.2  49.4  42.8  53.3  53.9  72.2  41.5  24.5
#> 2  53.4  67    47.2  49.2  74.6  45.2  49.4  42.8  53.3  53.9  72.2  41.5  24.5
#> 3  57.3  62.3  44.3  49.8  74.6  45.2  49.4  42.8  53.3  53.9  72.2  41.5  24.5
#> 4  59.6  65.1  54.6  46.9  74.6  45.2  52.2  42.7  53.3  53.9  72.2  40.4  24.5
#> 5  65.2  65.1  57.9  50.8  74.6  40.1  51.7  43.6  54.5  50.9  72.2  32.7  22.3
#> 6  67.3  63.2  58.4  54.1  74.1  38.9  46.7  44    50    46    72.2  36    19  
#> 7  68.4  61.5  56.3  60.2  77.5  41.6  53.7  45.1  55.5  44.3  73.6  35.6  19  
#>      SI    BG    RO    HR
#>   <dbl> <dbl> <dbl> <dbl>
#> 1  70    32.1  60.9  52.8
#> 2  70    32.1  60.9  52.8
#> 3  67.5  32.1  60.9  52.8
#> 4  62.3  32.1  61.3  52.8
#> 5  55.6  32.1  64.3  52.8
#> 6  54    32.1  65.1  52.8
#> 7  56.4  31.9  65.7  52.8
#> # … with 14 more rows

Note that in the print output above for the first seven years, the starting missing value for Netherlands and time 1995, is imputed equal to the first observed year (e.g. 1996). Similarly, the first two missing values for Romania (times 1995 and 1996) are imputed as equal to the year 1997 (its first observed year). Equivalently, for the other MS with starting missing values (“CY”, “CZ”,“BG” etc.).

Now, we are going to impute the missing values with the option “cut”:


EmpImp1 <- impute_dataset(empTBFin, timeName = "time",
                             countries=convergEU_glb()$EU27_2020$memberStates$codeMS,
                             tailMiss = c("cut", "constant")[1],
                             headMiss = c("cut", "constant")[1])$res 
print(EmpImp1,n=35,width=250)
#> # A tibble: 14 x 30
#>    age    sex    time    BE    DK    FR    DE    EL    IE    IT    LU    NL
#>    <fct>  <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 Y20-29 T      2002  51.9  68.6  51.8  57.3  64.5  62.5  56.7  68.3  74.2
#>  2 Y20-29 T      2003  50.4  64.2  54.5  55.2  66.6  61.4  56.6  62.5  71.2
#>  3 Y20-29 T      2004  49.2  66.8  53.5  51.5  64.4  59.4  59.9  59.2  73.6
#>  4 Y20-29 T      2005  51.1  68    52.1  52.1  66.4  60    58.3  59.4  67.2
#>  5 Y20-29 T      2006  51    70    52.9  54.2  68.2  62.4  58.2  57.3  68.6
#>  6 Y20-29 T      2007  51.2  70.6  53.7  55.2  69.2  62.8  57.3  58.4  70.2
#>  7 Y20-29 T      2008  49.8  70.1  53.6  57.5  68.3  56.8  56.1  56.8  70.7
#>  8 Y20-29 T      2009  47.4  67.2  50.8  57    66.4  45.7  52.8  54.4  69  
#>  9 Y20-29 T      2010  47.2  59.5  49.5  57.4  60.3  39.8  49    49.8  66.3
#> 10 Y20-29 T      2011  46.5  56.9  49.7  58.6  52.3  34.5  48.1  50    67.7
#> 11 Y20-29 T      2012  46.9  51.6  48.7  57    45.6  34.1  46.3  51.5  66.5
#> 12 Y20-29 T      2013  42.9  51.2  47.5  57.4  39.3  38.6  41.2  50    62.5
#> 13 Y20-29 T      2014  43.2  51.4  45.8  56.3  42.3  31.9  39.8  44    59.5
#> 14 Y20-29 T      2015  40.3  51.6  43    56.3  42.9  37.1  39.8  56.6  61.3
#>       PT    ES    AT    FI    SE    CY    CZ    EE    HU    LV    LT    MT    PL
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  79.6  68.4  59.8  56.6  62.9  79.3  39.2  48.9  44.9  51.8  52.9  74.9  30.8
#>  2  76.8  69.1  57.1  53.7  57.6  75.4  35.2  38    43.5  52.1  54.2  69.6  30.6
#>  3  76.1  69.6  56.1  52.1  53.4  78    33    55.3  40.9  57.1  53.3  70.1  30.3
#>  4  75.4  70.6  56.1  55.5  53.8  77.1  32.1  50.9  40.2  55.7  57.5  66.9  32.6
#>  5  75.9  72.3  57    54.7  55.2  79.9  34.8  61.2  39.7  61.3  53.9  70.6  34.9
#>  6  75.1  71.9  62.4  58.5  58.7  78.4  38.5  60.1  37.8  61.7  53.9  71.7  39.6
#>  7  75.6  65.7  59.7  60.5  56.9  74.9  38    65.2  37.1  58.4  49.4  72.9  42.9
#>  8  71.1  54.6  55.9  52.4  49.8  74.3  34.5  47.9  32.1  40.9  38.8  68.3  42.1
#>  9  66.4  52.3  55.3  49.4  48    77.1  34.5  43.4  31.2  43.9  24.5  68.7  37  
#> 10  64    49.5  59.3  46.9  49.1  73.1  31.6  53.8  30.8  45.3  28.3  69.9  35.9
#> 11  59.7  43.4  58    47.1  47.2  62    29.9  47.5  29.5  49.9  36.1  70.5  36.4
#> 12  55.1  41.4  54.5  43.9  44.1  58.7  36.4  55.5  31.7  49    36.1  72.1  34  
#> 13  57.3  42.6  54.6  42.2  44.8  63.7  37    57.5  40.7  48.9  42    70.4  34.1
#> 14  60.6  44.2  56.4  38    46.3  56.9  31.6  54.5  43.3  55.7  41.1  67.8  36.2
#>       SK    SI    BG    RO    HR
#>    <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  18.7  51.6  32.1  58.8  52.8
#>  2  19.9  51.1  32.2  57.9  48.1
#>  3  19.9  49.6  36.6  54.9  53.3
#>  4  15.1  48.5  33.1  53.4  51.8
#>  5  17.6  46.3  34.6  52.2  48.2
#>  6  19.5  54.8  39.1  51.8  47.9
#>  7  22.5  56.7  40.8  51.6  50.6
#>  8  19    50.7  36.1  49.2  42.7
#>  9  20    47.7  30.8  54.6  41.4
#> 10  23.2  44.3  25.8  50.6  44.7
#> 11  21.9  40.3  26    53    31.4
#> 12  24.2  38.6  26.4  53.4  26.6
#> 13  27.1  38.2  28.9  53.5  23.3
#> 14  30.7  43.6  29.7  54.7  28.7

The print output shows how actually the option “cut” imputes missing values. More precisely, note that now all countries from 1995 to 2001 are deleted. This is due to the fact that Croatia has initial missing values from year 1995 to year 2001.

For all other missing values within the dataset, deterministic linear imputation is applied. Note also that if a country is processed but it has no missing, then no numerical value changes. In case more information is needed on years where missing values are located for a given country, say HR, the following simple code helps:

select(filter(empTBFin, is.na(HR)),time,HR) 
#> # A tibble: 7 x 2
#>    time    HR
#>   <dbl> <dbl>
#> 1  1995    NA
#> 2  1996    NA
#> 3  1997    NA
#> 4  1998    NA
#> 5  1999    NA
#> 6  2000    NA
#> 7  2001    NA

or by using pipes:

empTBFin %>% 
  filter(is.na(HR)) %>%
  select(time,HR)
#> # A tibble: 7 x 2
#>    time    HR
#>   <dbl> <dbl>
#> 1  1995    NA
#> 2  1996    NA
#> 3  1997    NA
#> 4  1998    NA
#> 5  1999    NA
#> 6  2000    NA
#> 7  2001    NA

Once the missing values are imputed, the check_data function is called again on the imputed data:

check_data(EmpImp)
#> $res
#> NULL
#> 
#> $msg
#> NULL
#> 
#> $err
#> [1] "Error: qualitative variables  in the dataframe."

where the suspected string variables are age and sex; therefore, these two string variables are deleted as follows:

EmpImpF <- dplyr::select(EmpImp,-sex,-age)
check_data(EmpImpF)
#> $res
#> [1] TRUE
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

Now, the check_data function confirms that EmpImpF is the final object on which the analysis of convergence should be performed.

2.3 Analysis of convergence for the EmpImpF dataset

Once the dataset is clean and imputed, the analysis of convergence is performed through the functions discussed in the previous Section. Therefore, the four measures of convergence are calculated for the EmpImpF dataset as follows (details in the Section above):

require(ggplot2)
require(dplyr)
require(tibble)

EmpBC <- beta_conv(EmpImpF, 
                   time_0 = 2005, 
                   time_t = 2015, 
                   all_within = FALSE, 
                   timeName = "time")
empSC <- sigma_conv(EmpImpF,timeName="time")
gamma_conv(EmpImpF,ref=2005,last=2015,timeName="time")
gamma_conv_msteps(EmpImpF,startTime=2005,endTime=2015, timeName = "time")

and

delta_conv(EmpImpF,"time")

2.4 Country fiches

Following, we report the syntax to obtain Country fiches for the EmpImpF dataset as follows:

go_ms_fi(
  workDF ='EmpImpF',
  countryRef ='DE',
  otherCountries = "c('IT','ES','FR')",
  time_0 = 2005,
  time_t = 2015,
  tName = 'time',
  indiType = 'highBest',
  aggregation= 'EU27_2020',
  x_angle=  45,
  dataNow=  Sys.time(),
  author = 'A.Student',
  outFile = 'Germany-up2-2016-EMP', 
  outDir = 'F:/analysis/EU27_2020',
  indiName= 'EmpImpF'
)
  

As noted previously, recall that the data are specified as a string, e.g. workDF =‘EmpImpF’. The country of reference is Germany (“DE”) compared with the other countries, “IT”, “ES” and “FR”, according to the highBest type of indicator.

For advanced users: if the selection of countries is different from a standard aggregate (EU12, EU19 etc.) the argument “custom” can be used.

2.5 Indicator fiches

Indicators fiches for the EmpImpF dataset are obtained through the following syntax:

go_indica_fi(
  time_0 = 2005,
  time_t = 2015,
  timeName = 'time',
  workDF = 'EmpImpF' ,
  indicaT = 'lfsa_ergaed',
  indiType = c('highBest','lowBest')[1],
  seleMeasure = 'all',
  seleAggre = 'EU27_2020',
  x_angle =  45,
  data_res_download =  FALSE,
  auth = 'A.Student',
  dataNow =  Sys.time(),
  outFile = 'EmpImp',
  outDir = 'C:/Users/UTENTE/Desktop/PROJECT EUROFOUND/FI'
)

where the indicator “lfsa_ergaed” is of type highBest, and all the measures are considered for the analysis of convergence.



3 Downloadable data from the Eurostat web site

In this Section, we describe in details the procedure to download data from the Eurostat web site (https://ec.europa.eu/eurostat/data/database). The function down_lo_EUS is specifically implemented to achieve this goal in the convergEU package. In what follows, the procedure is described taking as an example the indicator “hlth_silc_18” related to the Self-perceived health by sex, age and degree of urbanisation. For further details on this type of indicator see: https://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=hlth_silc_18&lang=en.

3.1 Data preparation step: indicator’s downloading from the Eurostat web site

The data are downloaded from the Eurostat website (https://ec.europa.eu/eurostat/data/database) on the fly from an active Internet connection.
A list of covariates for each indicator is sometimes present besides age and gender, thus their values must be deleted to produce a tidy dataset in the format time by countries. A given indicator is downloaded from the Eurostat web site through the function down_lo_EUS:

help(down_lo_EUS)

IMPORTANT: Users exploiting automated access to the database through web services or bulk download should pay particular attention to changes of labels. Further information about EU aggregates is available at “https://ec.europa.eu/eurostat/help/faq/brexit”.

To download the data, a time interval should be also chosen. The function allows to also select gender, age and other variables depending on the specific indicator. It must be noted that when the argument rawDump is TRUE, raw data are downloaded which does not produce a tidy dataset; to obtain a tidy dataset in the format years by countries, the argument rawDump=F should be specified. Note also that all indicators are supported and, after downloading, the data are not filtered by country members (geo) and/or EU clusters. Therefore, it is up to the user to proceed with further filtering/selection so that the desired collection of countries is obtained.

In order to download the data for a given indicator, its indicator code is required, and it is available on the Eurostat web site next to each indicator’s name within brackets. For example, the Self-perceived health indicator code is hlth_silc_18, as illustrated in the following Figure by the red arrow.
Figure 1: Indicators code on the Eurostat web site

Figure 1: Indicators code on the Eurostat web site

We choose to download the data for this indicator from 2003 to 2018 by invoking the down_lo_EUS function:

myTBhea <- down_lo_EUS(
  indicator_code = "hlth_silc_18",
  fromTime = 2003,
  toTime = 2018,
  gender=  "T",
  ageInterv = NA,
  rawDump=T,
  uniqueIdentif = 1)
myTBhea
#> # A tibble: 720,216 x 8
#>    deg_urb levels sex   age    unit  geo    time values
#>    <fct>   <fct>  <fct> <fct>  <fct> <fct> <dbl>  <dbl>
#>  1 DEG1    BAD    F     Y16-24 PC    AT     2018    2.9
#>  2 DEG1    BAD    F     Y16-24 PC    BE     2018    1.1
#>  3 DEG1    BAD    F     Y16-24 PC    BG     2018    0  
#>  4 DEG1    BAD    F     Y16-24 PC    CH     2018    1.1
#>  5 DEG1    BAD    F     Y16-24 PC    CY     2018    0.2
#>  6 DEG1    BAD    F     Y16-24 PC    CZ     2018    0  
#>  7 DEG1    BAD    F     Y16-24 PC    DE     2018    3.4
#>  8 DEG1    BAD    F     Y16-24 PC    DK     2018    1.1
#>  9 DEG1    BAD    F     Y16-24 PC    EA     2018    1.3
#> 10 DEG1    BAD    F     Y16-24 PC    EA18   2018    1.3
#> # … with 720,206 more rows

Note that raw downloaded data are returned since the argument rawDump is TRUE. To obtain filtered values rawDump=F should be specified:

myTBheal <- down_lo_EUS(
  indicator_code = "hlth_silc_18",
  fromTime = 2003,
  toTime = 2018,
  gender=  "T",
  ageInterv = NA,
  rawDump=F,
  uniqueIdentif = 1)
myTBheal
#> $res
#> # A tibble: 16 x 43
#>    age   sex    time    AT    BE    BG    CH    CY    CZ    DE    DK    EA  EA18
#>    <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 Y16-… T      2003   1.7   1.6  NA    NA    NA    NA    NA     0    NA    NA  
#>  2 Y16-… T      2004   1.3   1.2  NA    NA    NA    NA    NA     1.6  NA    NA  
#>  3 Y16-… T      2005   1.1   1.7   1    NA     0.4   0.1   0.9   2.8  NA     1.1
#>  4 Y16-… T      2006   0.9   1.6   1    NA     0.4   0.3   1     1.3  NA     1  
#>  5 Y16-… T      2007   0.7   1.8   1.1   1.6   0.4   0.9   1.2   1.6  NA     1.1
#>  6 Y16-… T      2008   2.5   1.7   0.8   1.1   0.2   1.1   0.8   1.9   1.1   1.1
#>  7 Y16-… T      2009   1.6   1.1   0.6   0.3   0     1.5   0.9   2.6   0.9   0.9
#>  8 Y16-… T      2010   2.4   1.3   0.8   0.3   0.2   0.4   1.4   2.9   1.2   1.2
#>  9 Y16-… T      2011   2.4   1.5   0.7   0.6   0.8   1.6   1     1.7   1.1   1.1
#> 10 Y16-… T      2012   2.3   1.8   0     0.5   1.2   0     1     1.8   1.1   1.1
#> 11 Y16-… T      2013   1.7   1.1   0.5   0.5   1     0     0.8   0.3   1.3   1.3
#> 12 Y16-… T      2014   2.7   2.4   0.9   1.3   0.7   1     0.7   1     1.1   1.1
#> 13 Y16-… T      2015   0.6   1.6   0.4   0.5   0.1   2.1   1.9   1.5   1.4   1.4
#> 14 Y16-… T      2016   0.9   2     0.7   1.6   0.6   1.3   1     0.8   1.3   1.3
#> 15 Y16-… T      2017   2.6   1.5   0.7   1.6   0.2   2.4   1     2.6   0.7   0.7
#> 16 Y16-… T      2018   1.7   1.3   0.1   0.5   0.2   0.5   2.6   0.7   1.3   1.3
#> # … with 30 more variables: EA19 <dbl>, EE <dbl>, EL <dbl>, ES <dbl>, EU <dbl>,
#> #   EU27_2007 <dbl>, EU27_2020 <dbl>, EU28 <dbl>, FI <dbl>, FR <dbl>, HR <dbl>,
#> #   HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>, LU <dbl>, LV <dbl>, MK <dbl>,
#> #   MT <dbl>, NL <dbl>, NO <dbl>, PL <dbl>, PT <dbl>, RO <dbl>, RS <dbl>,
#> #   SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>, IS <dbl>
#> 
#> $msg
#> $msg$Age
#> [1] "Age automatically set."
#> 
#> $msg$Further_Conditioning
#> $msg$Further_Conditioning$current
#> [1] "Selected uniqueIdentif = 1 -> deg_urb: DEG1; levels: BAD; unit: PC"
#> 
#> $msg$Further_Conditioning$available_seleTagLs
#>    uniqueIdentif                                    tags
#> 1              1    deg_urb: DEG1; levels: BAD; unit: PC
#> 2              2   deg_urb: DEG1; levels: B_VB; unit: PC
#> 3              3   deg_urb: DEG1; levels: FAIR; unit: PC
#> 4              4   deg_urb: DEG1; levels: GOOD; unit: PC
#> 5              5   deg_urb: DEG1; levels: VBAD; unit: PC
#> 6              6  deg_urb: DEG1; levels: VGOOD; unit: PC
#> 7              7   deg_urb: DEG1; levels: VG_G; unit: PC
#> 8              8    deg_urb: DEG2; levels: BAD; unit: PC
#> 9              9   deg_urb: DEG2; levels: B_VB; unit: PC
#> 10            10   deg_urb: DEG2; levels: FAIR; unit: PC
#> 11            11   deg_urb: DEG2; levels: GOOD; unit: PC
#> 12            12   deg_urb: DEG2; levels: VBAD; unit: PC
#> 13            13  deg_urb: DEG2; levels: VGOOD; unit: PC
#> 14            14   deg_urb: DEG2; levels: VG_G; unit: PC
#> 15            15    deg_urb: DEG3; levels: BAD; unit: PC
#> 16            16   deg_urb: DEG3; levels: B_VB; unit: PC
#> 17            17   deg_urb: DEG3; levels: FAIR; unit: PC
#> 18            18   deg_urb: DEG3; levels: GOOD; unit: PC
#> 19            19   deg_urb: DEG3; levels: VBAD; unit: PC
#> 20            20  deg_urb: DEG3; levels: VGOOD; unit: PC
#> 21            21   deg_urb: DEG3; levels: VG_G; unit: PC
#> 22            22   deg_urb: TOTAL; levels: BAD; unit: PC
#> 23            23  deg_urb: TOTAL; levels: B_VB; unit: PC
#> 24            24  deg_urb: TOTAL; levels: FAIR; unit: PC
#> 25            25  deg_urb: TOTAL; levels: GOOD; unit: PC
#> 26            26  deg_urb: TOTAL; levels: VBAD; unit: PC
#> 27            27 deg_urb: TOTAL; levels: VGOOD; unit: PC
#> 28            28  deg_urb: TOTAL; levels: VG_G; unit: PC
#> 
#> 
#> $msg$Conditioning
#> $msg$Conditioning$indicator_code
#> [1] "hlth_silc_18"
#> 
#> $msg$Conditioning$ageInterv
#> [1] Y16-24
#> 17 Levels: Y16-24 Y16-29 Y16-44 Y16-64 Y25-29 Y25-34 Y35-44 Y45-54 ... TOTAL
#> 
#> $msg$Conditioning$gender
#> [1] "T"
#> 
#> 
#> 
#> $err
#> NULL

The result is a list with the following components:

Similarly to the download_indicator_EUS function, it is therefore possible to call several times the same function and specify the argument uniqueIdentif as an integer among those in the first column left of $msg$Further_Conditioning$available_seleTagLs to obtain the same indicator under different scales and contexts. For example for degree 1 of urbanization, level good and for degree 3 of urbanization and level fair, respectively:

myTBheal1 <- down_lo_EUS(
  indicator_code = "hlth_silc_18",
  fromTime = 2003,
  toTime = 2018,
  gender=  "T",
  ageInterv = NA,
  rawDump=F,
  uniqueIdentif = 4)

myTBheal2 <- down_lo_EUS(
  indicator_code = "hlth_silc_18",
  fromTime = 2003,
  toTime = 2018,
  gender=  "T",
  ageInterv = NA,
  rawDump=F,
  uniqueIdentif = 17)

It is also possible to condition to gender in order to download data only for females or only for males, for example:

myTBheal3 <- down_lo_EUS(
  indicator_code = "hlth_silc_18",
  fromTime = 2003,
  toTime = 2018,
  gender=  "F",
  ageInterv = NA,
  rawDump=F,
  uniqueIdentif = 1)


myTBheal4 <- down_lo_EUS(
  indicator_code = "hlth_silc_18",
  fromTime = 2003,
  toTime = 2018,
  gender=  "M",
  ageInterv = NA,
  rawDump=F,
  uniqueIdentif = 1)

The user could also download data by considering explicitly a specific interval of ages; to visualize the available interval ages for the myTBheal data:

unique(myTBheal$msg$Conditioning$ageInterv)

Thus, to download data, say in the age interval Y35-44 and time interval 2003-2018:

myTBheal5 <- down_lo_EUS(
  indicator_code = 'hlth_silc_18',
  fromTime = 2003,
  toTime = 2018,
  gender=  'T',
  ageInterv = 'Y35-44',
  rawDump=F,
  uniqueIdentif = 1)
myTBheal5
#> $res
#> # A tibble: 16 x 43
#>    age   sex    time    AT    BE    BG    CH    CY    CZ    DE    DK    EA  EA18
#>    <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 Y35-… T      2003   3.1   4.5  NA    NA    NA    NA    NA     4    NA    NA  
#>  2 Y35-… T      2004   5.5   4.4  NA    NA    NA    NA    NA     5.3  NA    NA  
#>  3 Y35-… T      2005   4.2   4.9   4.7  NA     2     5.6   4.9   5.4  NA     4  
#>  4 Y35-… T      2006   2.6   5.3   4.7  NA     2.7   4.4   4.4   3.9  NA     4.1
#>  5 Y35-… T      2007   4     5.2   3.1   3.1   2.8   4.6   3.8   6.9  NA     3.5
#>  6 Y35-… T      2008   4.6   5.5   4.1   2.9   2.3   3.4   2.8   6.2   3     3  
#>  7 Y35-… T      2009   4.6   6.6   1.9   2.1   2.1   4     3.4   6.8   3.4   3.4
#>  8 Y35-… T      2010   3.4   5.8   1.6   2.4   1.6   2.7   3.6   2.8   3.2   3.2
#>  9 Y35-… T      2011   5.6   6.5   1.2   1.8   1.3   3.1   3.4   5.5   3.4   3.4
#> 10 Y35-… T      2012   7.3   6.8   0.9   1.6   1.8   3     4.5   4.3   3.7   3.7
#> 11 Y35-… T      2013   5.4   6.9   1     2.4   0.9   3.6   4.5   3.3   3.4   3.4
#> 12 Y35-… T      2014   5.7   6.9   1.6   2.3   2.5   2     5.7   5.5   3.5   3.5
#> 13 Y35-… T      2015   5.6   8.9   1.3   0.9   2.4   2     4.9   1.9   3.2   3.2
#> 14 Y35-… T      2016   2.9   7.9   1.5   6.3   1.3   2.7   4.3   3.6   3.1   3.1
#> 15 Y35-… T      2017   4.4   6.2   1.8   2.3   1.5   1.8   2.9   5.6   2.5   2.5
#> 16 Y35-… T      2018   4     5.4   2     4.2   2.1   1.9   3.5   2.5   2.9   2.9
#> # … with 30 more variables: EA19 <dbl>, EE <dbl>, EL <dbl>, ES <dbl>, EU <dbl>,
#> #   EU27_2007 <dbl>, EU27_2020 <dbl>, EU28 <dbl>, FI <dbl>, FR <dbl>, HR <dbl>,
#> #   HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>, LU <dbl>, LV <dbl>, MK <dbl>,
#> #   MT <dbl>, NL <dbl>, NO <dbl>, PL <dbl>, PT <dbl>, RO <dbl>, RS <dbl>,
#> #   SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>, IS <dbl>
#> 
#> $msg
#> $msg$Further_Conditioning
#> $msg$Further_Conditioning$current
#> [1] "Selected uniqueIdentif = 1 -> deg_urb: DEG1; levels: BAD; unit: PC"
#> 
#> $msg$Further_Conditioning$available_seleTagLs
#>    uniqueIdentif                                    tags
#> 1              1    deg_urb: DEG1; levels: BAD; unit: PC
#> 2              2   deg_urb: DEG1; levels: B_VB; unit: PC
#> 3              3   deg_urb: DEG1; levels: FAIR; unit: PC
#> 4              4   deg_urb: DEG1; levels: GOOD; unit: PC
#> 5              5   deg_urb: DEG1; levels: VBAD; unit: PC
#> 6              6  deg_urb: DEG1; levels: VGOOD; unit: PC
#> 7              7   deg_urb: DEG1; levels: VG_G; unit: PC
#> 8              8    deg_urb: DEG2; levels: BAD; unit: PC
#> 9              9   deg_urb: DEG2; levels: B_VB; unit: PC
#> 10            10   deg_urb: DEG2; levels: FAIR; unit: PC
#> 11            11   deg_urb: DEG2; levels: GOOD; unit: PC
#> 12            12   deg_urb: DEG2; levels: VBAD; unit: PC
#> 13            13  deg_urb: DEG2; levels: VGOOD; unit: PC
#> 14            14   deg_urb: DEG2; levels: VG_G; unit: PC
#> 15            15    deg_urb: DEG3; levels: BAD; unit: PC
#> 16            16   deg_urb: DEG3; levels: B_VB; unit: PC
#> 17            17   deg_urb: DEG3; levels: FAIR; unit: PC
#> 18            18   deg_urb: DEG3; levels: GOOD; unit: PC
#> 19            19   deg_urb: DEG3; levels: VBAD; unit: PC
#> 20            20  deg_urb: DEG3; levels: VGOOD; unit: PC
#> 21            21   deg_urb: DEG3; levels: VG_G; unit: PC
#> 22            22   deg_urb: TOTAL; levels: BAD; unit: PC
#> 23            23  deg_urb: TOTAL; levels: B_VB; unit: PC
#> 24            24  deg_urb: TOTAL; levels: FAIR; unit: PC
#> 25            25  deg_urb: TOTAL; levels: GOOD; unit: PC
#> 26            26  deg_urb: TOTAL; levels: VBAD; unit: PC
#> 27            27 deg_urb: TOTAL; levels: VGOOD; unit: PC
#> 28            28  deg_urb: TOTAL; levels: VG_G; unit: PC
#> 
#> 
#> $msg$Conditioning
#> $msg$Conditioning$indicator_code
#> [1] "hlth_silc_18"
#> 
#> $msg$Conditioning$ageInterv
#> [1] "Y35-44"
#> 
#> $msg$Conditioning$gender
#> [1] "T"
#> 
#> 
#> 
#> $err
#> NULL

or similarly for the age interval Y55-64:

myTBheal6 <- down_lo_EUS(
  indicator_code = 'hlth_silc_18',
  fromTime = 2003,
  toTime = 2018,
  gender=  'T',
  ageInterv = 'Y55-64',
  rawDump=F,
  uniqueIdentif = 1)
myTBheal6
#> $res
#> # A tibble: 16 x 43
#>    age   sex    time    AT    BE    BG    CH    CY    CZ    DE    DK    EA  EA18
#>    <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 Y55-… T      2003   4.8  11.1  NA    NA    NA    NA    NA     9.9  NA    NA  
#>  2 Y55-… T      2004   9.2   9.2  NA    NA    NA    NA    NA    10    NA    NA  
#>  3 Y55-… T      2005   8.5   9.6  18.8  NA     8.7  13.8  13    10.9  NA    11.7
#>  4 Y55-… T      2006   7.8  10.3  18.8  NA    11    13.4  12.2  10.2  NA    11.3
#>  5 Y55-… T      2007  12.6  10.3  16     4.5  10.4  14.7  11     8.8  NA    11.2
#>  6 Y55-… T      2008  10     8.5  16.4   4.1  10.6  12.8  10.6   9.2   9.9  10  
#>  7 Y55-… T      2009  10.1  10.1  13.2   4.4  12.5  11.8  10.9   4.6   9.6   9.6
#>  8 Y55-… T      2010  13.1   9.3  11.4   5.1  10.1  13.5  10     6.7   8.9   9  
#>  9 Y55-… T      2011  12.4   9.6  10.2   4.2   8.5  11.9   9.3   7.9   9.3   9.3
#> 10 Y55-… T      2012  12.5  11.4  10.7   6.5   7.8  11.6  10.6  11.5   9.6   9.6
#> 11 Y55-… T      2013  10.7  11     9.2   6.1  10.7  12     9.8  11.4   9.5   9.6
#> 12 Y55-… T      2014  10    14.5   7.5   3.8   7.8  10.8  11.8   9.8  10     9.9
#> 13 Y55-… T      2015  14.4  12.2   8.1   7.3   6.2   9.9  12.6  10.9  10.3  10.2
#> 14 Y55-… T      2016  10.4  12     7.4   5.7   4.7  12.8  12.2   7.6   8.8   8.8
#> 15 Y55-… T      2017  11.1  10.2   8     5.7   4.6  13.5   9.6   9.9   7.5   7.5
#> 16 Y55-… T      2018   9.8   7.8   7.9   7.9   3.7   9.6  12     9.6   8.3   8.3
#> # … with 30 more variables: EA19 <dbl>, EE <dbl>, EL <dbl>, ES <dbl>, EU <dbl>,
#> #   EU27_2007 <dbl>, EU27_2020 <dbl>, EU28 <dbl>, FI <dbl>, FR <dbl>, HR <dbl>,
#> #   HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>, LU <dbl>, LV <dbl>, MK <dbl>,
#> #   MT <dbl>, NL <dbl>, NO <dbl>, PL <dbl>, PT <dbl>, RO <dbl>, RS <dbl>,
#> #   SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>, IS <dbl>
#> 
#> $msg
#> $msg$Further_Conditioning
#> $msg$Further_Conditioning$current
#> [1] "Selected uniqueIdentif = 1 -> deg_urb: DEG1; levels: BAD; unit: PC"
#> 
#> $msg$Further_Conditioning$available_seleTagLs
#>    uniqueIdentif                                    tags
#> 1              1    deg_urb: DEG1; levels: BAD; unit: PC
#> 2              2   deg_urb: DEG1; levels: B_VB; unit: PC
#> 3              3   deg_urb: DEG1; levels: FAIR; unit: PC
#> 4              4   deg_urb: DEG1; levels: GOOD; unit: PC
#> 5              5   deg_urb: DEG1; levels: VBAD; unit: PC
#> 6              6  deg_urb: DEG1; levels: VGOOD; unit: PC
#> 7              7   deg_urb: DEG1; levels: VG_G; unit: PC
#> 8              8    deg_urb: DEG2; levels: BAD; unit: PC
#> 9              9   deg_urb: DEG2; levels: B_VB; unit: PC
#> 10            10   deg_urb: DEG2; levels: FAIR; unit: PC
#> 11            11   deg_urb: DEG2; levels: GOOD; unit: PC
#> 12            12   deg_urb: DEG2; levels: VBAD; unit: PC
#> 13            13  deg_urb: DEG2; levels: VGOOD; unit: PC
#> 14            14   deg_urb: DEG2; levels: VG_G; unit: PC
#> 15            15    deg_urb: DEG3; levels: BAD; unit: PC
#> 16            16   deg_urb: DEG3; levels: B_VB; unit: PC
#> 17            17   deg_urb: DEG3; levels: FAIR; unit: PC
#> 18            18   deg_urb: DEG3; levels: GOOD; unit: PC
#> 19            19   deg_urb: DEG3; levels: VBAD; unit: PC
#> 20            20  deg_urb: DEG3; levels: VGOOD; unit: PC
#> 21            21   deg_urb: DEG3; levels: VG_G; unit: PC
#> 22            22   deg_urb: TOTAL; levels: BAD; unit: PC
#> 23            23  deg_urb: TOTAL; levels: B_VB; unit: PC
#> 24            24  deg_urb: TOTAL; levels: FAIR; unit: PC
#> 25            25  deg_urb: TOTAL; levels: GOOD; unit: PC
#> 26            26  deg_urb: TOTAL; levels: VBAD; unit: PC
#> 27            27 deg_urb: TOTAL; levels: VGOOD; unit: PC
#> 28            28  deg_urb: TOTAL; levels: VG_G; unit: PC
#> 
#> 
#> $msg$Conditioning
#> $msg$Conditioning$indicator_code
#> [1] "hlth_silc_18"
#> 
#> $msg$Conditioning$ageInterv
#> [1] "Y55-64"
#> 
#> $msg$Conditioning$gender
#> [1] "T"
#> 
#> 
#> 
#> $err
#> NULL

Note that the age interval is specified in the ageInterv argument as a string.

Once the data are downloaded, we assign them a meaningful name. Within the available data above, we choose myTBheal5 dataset that relates to both males and females between thirty five and forty four years old:

TBheal<-myTBheal5$res

It is worthwhile to remember that the data are not filtered by country members or EU clusters. Therefore, some further data processing is needed if the user desire to obtain the data only for a given collection of countries. For example, to select only the cluster EU27_2020 of MS:

EU27estr<-convergEU_glb()$EU27_2020$memberStates$codeMS
TBhealth<- dplyr::select(TBheal,time,EU27estr)
TBhealth
#> # A tibble: 16 x 28
#>     time    BE    DK    FR    DE    EL    IE    IT    LU    NL    PT    ES    AT
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  2003   4.5   4    NA    NA     1.6   2.2  NA     6.6  NA    NA    NA     3.1
#>  2  2004   4.4   5.3   3.9  NA     1.1   1.8   3.2   5.2  NA     5.5   3.5   5.5
#>  3  2005   4.9   5.4   4.4   4.9   1.6   2.8   2.6   3.5   3     7.1   3.5   4.2
#>  4  2006   5.3   3.9   4.9   4.4   2.3   0.9   3.1   3.2   4     5.2   4     2.6
#>  5  2007   5.2   6.9   3.1   3.8   1.6   3     3     1.9   3.4   5.8   3.5   4  
#>  6  2008   5.5   6.2   3.4   2.8   2.2   2.8   2.6   5.5   3.2   3.5   2.7   4.6
#>  7  2009   6.6   6.8   3.4   3.4   1.4   4     2.7   6.6   3.5   4.8   3.2   4.6
#>  8  2010   5.8   2.8   3.8   3.6   1.1   2.6   2.2   4.3   2.9   5.3   3     3.4
#>  9  2011   6.5   5.5   4.3   3.4   1.9   1.9   3.4   5.3   2     5     2.3   5.6
#> 10  2012   6.8   4.3   4.9   4.5   1.8   2.9   3.1   1.6   3.9   3.2   2.3   7.3
#> 11  2013   6.9   3.3   4.1   4.5   1.4   0.6   3.2   1.9   3.3   5     1.7   5.4
#> 12  2014   6.9   5.5   3.7   5.7   2.4   2.5   3.3   4.7   3.5   2.9   1.8   5.7
#> 13  2015   8.9   1.9   2.9   4.9   2     1.7   3.2   3.4   2.8   4     1.6   5.6
#> 14  2016   7.9   3.6   5     4.3   2.4   0.9   1.8   3.8   1.9   3.3   1.2   2.9
#> 15  2017   6.2   5.6   3.6   2.9   1.8   1.6   1     3.7   2.8   3.5   1.4   4.4
#> 16  2018   5.4   2.5   4.3   3.5   2.2   3.2   1.2   3.1   2.3   3.4   1.8   4  
#> # … with 15 more variables: FI <dbl>, SE <dbl>, CY <dbl>, CZ <dbl>, EE <dbl>,
#> #   HU <dbl>, LV <dbl>, LT <dbl>, MT <dbl>, PL <dbl>, SK <dbl>, SI <dbl>,
#> #   BG <dbl>, RO <dbl>, HR <dbl>

or by using pipes:

TBheal %>%
  as_tibble %>%
  select(time,EU27estr)
#> # A tibble: 16 x 28
#>     time    BE    DK    FR    DE    EL    IE    IT    LU    NL    PT    ES    AT
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  2003   4.5   4    NA    NA     1.6   2.2  NA     6.6  NA    NA    NA     3.1
#>  2  2004   4.4   5.3   3.9  NA     1.1   1.8   3.2   5.2  NA     5.5   3.5   5.5
#>  3  2005   4.9   5.4   4.4   4.9   1.6   2.8   2.6   3.5   3     7.1   3.5   4.2
#>  4  2006   5.3   3.9   4.9   4.4   2.3   0.9   3.1   3.2   4     5.2   4     2.6
#>  5  2007   5.2   6.9   3.1   3.8   1.6   3     3     1.9   3.4   5.8   3.5   4  
#>  6  2008   5.5   6.2   3.4   2.8   2.2   2.8   2.6   5.5   3.2   3.5   2.7   4.6
#>  7  2009   6.6   6.8   3.4   3.4   1.4   4     2.7   6.6   3.5   4.8   3.2   4.6
#>  8  2010   5.8   2.8   3.8   3.6   1.1   2.6   2.2   4.3   2.9   5.3   3     3.4
#>  9  2011   6.5   5.5   4.3   3.4   1.9   1.9   3.4   5.3   2     5     2.3   5.6
#> 10  2012   6.8   4.3   4.9   4.5   1.8   2.9   3.1   1.6   3.9   3.2   2.3   7.3
#> 11  2013   6.9   3.3   4.1   4.5   1.4   0.6   3.2   1.9   3.3   5     1.7   5.4
#> 12  2014   6.9   5.5   3.7   5.7   2.4   2.5   3.3   4.7   3.5   2.9   1.8   5.7
#> 13  2015   8.9   1.9   2.9   4.9   2     1.7   3.2   3.4   2.8   4     1.6   5.6
#> 14  2016   7.9   3.6   5     4.3   2.4   0.9   1.8   3.8   1.9   3.3   1.2   2.9
#> 15  2017   6.2   5.6   3.6   2.9   1.8   1.6   1     3.7   2.8   3.5   1.4   4.4
#> 16  2018   5.4   2.5   4.3   3.5   2.2   3.2   1.2   3.1   2.3   3.4   1.8   4  
#> # … with 15 more variables: FI <dbl>, SE <dbl>, CY <dbl>, CZ <dbl>, EE <dbl>,
#> #   HU <dbl>, LV <dbl>, LT <dbl>, MT <dbl>, PL <dbl>, SK <dbl>, SI <dbl>,
#> #   BG <dbl>, RO <dbl>, HR <dbl>

Similarly, the user can select the cluster of EU27_2020 of MS and other countries available, say Russia for example:

TBhealthR<- select(TBheal,time,EU27estr,RS)
TBhealthR
#> # A tibble: 16 x 29
#>     time    BE    DK    FR    DE    EL    IE    IT    LU    NL    PT    ES    AT
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  2003   4.5   4    NA    NA     1.6   2.2  NA     6.6  NA    NA    NA     3.1
#>  2  2004   4.4   5.3   3.9  NA     1.1   1.8   3.2   5.2  NA     5.5   3.5   5.5
#>  3  2005   4.9   5.4   4.4   4.9   1.6   2.8   2.6   3.5   3     7.1   3.5   4.2
#>  4  2006   5.3   3.9   4.9   4.4   2.3   0.9   3.1   3.2   4     5.2   4     2.6
#>  5  2007   5.2   6.9   3.1   3.8   1.6   3     3     1.9   3.4   5.8   3.5   4  
#>  6  2008   5.5   6.2   3.4   2.8   2.2   2.8   2.6   5.5   3.2   3.5   2.7   4.6
#>  7  2009   6.6   6.8   3.4   3.4   1.4   4     2.7   6.6   3.5   4.8   3.2   4.6
#>  8  2010   5.8   2.8   3.8   3.6   1.1   2.6   2.2   4.3   2.9   5.3   3     3.4
#>  9  2011   6.5   5.5   4.3   3.4   1.9   1.9   3.4   5.3   2     5     2.3   5.6
#> 10  2012   6.8   4.3   4.9   4.5   1.8   2.9   3.1   1.6   3.9   3.2   2.3   7.3
#> 11  2013   6.9   3.3   4.1   4.5   1.4   0.6   3.2   1.9   3.3   5     1.7   5.4
#> 12  2014   6.9   5.5   3.7   5.7   2.4   2.5   3.3   4.7   3.5   2.9   1.8   5.7
#> 13  2015   8.9   1.9   2.9   4.9   2     1.7   3.2   3.4   2.8   4     1.6   5.6
#> 14  2016   7.9   3.6   5     4.3   2.4   0.9   1.8   3.8   1.9   3.3   1.2   2.9
#> 15  2017   6.2   5.6   3.6   2.9   1.8   1.6   1     3.7   2.8   3.5   1.4   4.4
#> 16  2018   5.4   2.5   4.3   3.5   2.2   3.2   1.2   3.1   2.3   3.4   1.8   4  
#> # … with 16 more variables: FI <dbl>, SE <dbl>, CY <dbl>, CZ <dbl>, EE <dbl>,
#> #   HU <dbl>, LV <dbl>, LT <dbl>, MT <dbl>, PL <dbl>, SK <dbl>, SI <dbl>,
#> #   BG <dbl>, RO <dbl>, HR <dbl>, RS <dbl>

and by using pipes:

TBheal %>%
  as_tibble %>%
  select(time,EU27estr,RS)
#> # A tibble: 16 x 29
#>     time    BE    DK    FR    DE    EL    IE    IT    LU    NL    PT    ES    AT
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  2003   4.5   4    NA    NA     1.6   2.2  NA     6.6  NA    NA    NA     3.1
#>  2  2004   4.4   5.3   3.9  NA     1.1   1.8   3.2   5.2  NA     5.5   3.5   5.5
#>  3  2005   4.9   5.4   4.4   4.9   1.6   2.8   2.6   3.5   3     7.1   3.5   4.2
#>  4  2006   5.3   3.9   4.9   4.4   2.3   0.9   3.1   3.2   4     5.2   4     2.6
#>  5  2007   5.2   6.9   3.1   3.8   1.6   3     3     1.9   3.4   5.8   3.5   4  
#>  6  2008   5.5   6.2   3.4   2.8   2.2   2.8   2.6   5.5   3.2   3.5   2.7   4.6
#>  7  2009   6.6   6.8   3.4   3.4   1.4   4     2.7   6.6   3.5   4.8   3.2   4.6
#>  8  2010   5.8   2.8   3.8   3.6   1.1   2.6   2.2   4.3   2.9   5.3   3     3.4
#>  9  2011   6.5   5.5   4.3   3.4   1.9   1.9   3.4   5.3   2     5     2.3   5.6
#> 10  2012   6.8   4.3   4.9   4.5   1.8   2.9   3.1   1.6   3.9   3.2   2.3   7.3
#> 11  2013   6.9   3.3   4.1   4.5   1.4   0.6   3.2   1.9   3.3   5     1.7   5.4
#> 12  2014   6.9   5.5   3.7   5.7   2.4   2.5   3.3   4.7   3.5   2.9   1.8   5.7
#> 13  2015   8.9   1.9   2.9   4.9   2     1.7   3.2   3.4   2.8   4     1.6   5.6
#> 14  2016   7.9   3.6   5     4.3   2.4   0.9   1.8   3.8   1.9   3.3   1.2   2.9
#> 15  2017   6.2   5.6   3.6   2.9   1.8   1.6   1     3.7   2.8   3.5   1.4   4.4
#> 16  2018   5.4   2.5   4.3   3.5   2.2   3.2   1.2   3.1   2.3   3.4   1.8   4  
#> # … with 16 more variables: FI <dbl>, SE <dbl>, CY <dbl>, CZ <dbl>, EE <dbl>,
#> #   HU <dbl>, LV <dbl>, LT <dbl>, MT <dbl>, PL <dbl>, SK <dbl>, SI <dbl>,
#> #   BG <dbl>, RO <dbl>, HR <dbl>, RS <dbl>

3.2 Imputation of missing values

Prior to the analysis of convergence, we have to perform the imputation of missing values for the data previously downloaded from the Eurostat web site. The final downloaded dataset is TBhealth in which the countries are the cluster EU27_2020 of MS. The inspection of the print output in the previous Subsection reveals that missing values are present. Otherwise, we can invoke the check_data() function to check for the presence of unsuited features that must be solved before starting the analysis (e.g. missing values, string variables).

check_data(TBhealth, timeName = "time")
#> $res
#> NULL
#> 
#> $msg
#> NULL
#> 
#> $err
#> [1] "Error: one or more missing values in the dataframe."

Thus, we proceed to impute the missing values through the impute_dataset function. Note that the print output of the TBhealth dataset reveals both missing values starting at the oldest year and missing values ending at the last year. Therefore, the arguments tailMiss and headMiss should be correctly specified according to how the user decide to impute the missing values. The example for the TBhealth dataset helps to better illustrate this point. Suppose that missing values starting at the oldest year should be propagated (i.e. option constant in the tailMiss argument), while those ending at the last year should be deleted (i.e. option cut in the headMiss argument), that is:

TBhealthF <- impute_dataset(TBhealth, timeName = "time",
                         countries=convergEU_glb()$EU27_2020$memberStates$codeMS,
                         tailMiss = c("cut", "constant")[2],
                         headMiss = c("cut", "constant")[1])$res 

TBhealthF
#> # A tibble: 16 x 28
#>     time    BE    DK    FR    DE    EL    IE    IT    LU    NL    PT    ES    AT
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  2003   4.5   4     3.9   4.9   1.6   2.2   3.2   6.6   3     5.5   3.5   3.1
#>  2  2004   4.4   5.3   3.9   4.9   1.1   1.8   3.2   5.2   3     5.5   3.5   5.5
#>  3  2005   4.9   5.4   4.4   4.9   1.6   2.8   2.6   3.5   3     7.1   3.5   4.2
#>  4  2006   5.3   3.9   4.9   4.4   2.3   0.9   3.1   3.2   4     5.2   4     2.6
#>  5  2007   5.2   6.9   3.1   3.8   1.6   3     3     1.9   3.4   5.8   3.5   4  
#>  6  2008   5.5   6.2   3.4   2.8   2.2   2.8   2.6   5.5   3.2   3.5   2.7   4.6
#>  7  2009   6.6   6.8   3.4   3.4   1.4   4     2.7   6.6   3.5   4.8   3.2   4.6
#>  8  2010   5.8   2.8   3.8   3.6   1.1   2.6   2.2   4.3   2.9   5.3   3     3.4
#>  9  2011   6.5   5.5   4.3   3.4   1.9   1.9   3.4   5.3   2     5     2.3   5.6
#> 10  2012   6.8   4.3   4.9   4.5   1.8   2.9   3.1   1.6   3.9   3.2   2.3   7.3
#> 11  2013   6.9   3.3   4.1   4.5   1.4   0.6   3.2   1.9   3.3   5     1.7   5.4
#> 12  2014   6.9   5.5   3.7   5.7   2.4   2.5   3.3   4.7   3.5   2.9   1.8   5.7
#> 13  2015   8.9   1.9   2.9   4.9   2     1.7   3.2   3.4   2.8   4     1.6   5.6
#> 14  2016   7.9   3.6   5     4.3   2.4   0.9   1.8   3.8   1.9   3.3   1.2   2.9
#> 15  2017   6.2   5.6   3.6   2.9   1.8   1.6   1     3.7   2.8   3.5   1.4   4.4
#> 16  2018   5.4   2.5   4.3   3.5   2.2   3.2   1.2   3.1   2.3   3.4   1.8   4  
#> # … with 15 more variables: FI <dbl>, SE <dbl>, CY <dbl>, CZ <dbl>, EE <dbl>,
#> #   HU <dbl>, LV <dbl>, LT <dbl>, MT <dbl>, PL <dbl>, SK <dbl>, SI <dbl>,
#> #   BG <dbl>, RO <dbl>, HR <dbl>

Thus, as the output reveals, missing values from 2003 to 2009 for the corresponding countries have been propagated, while the indicator’s values for 2018 have been deleted. Thus, the TBhealthF is the final imputed dataset on which the check_data function is called again:

check_data(TBhealthF)
#> $res
#> [1] TRUE
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

and TBhealthF is the final object on which the analysis should be performed.

3.3 Weighted average smoothing of a complete dataset

It may be of interest to assume that part of the variability observed in a country on a given index is not structural, i.e. not due to causal determinants by transient fluctuations. Furthermore, the interest here is not directed towards prediction but on smoothing values observed in the whole considered time interval. In such a case, a smoothing procedure remove sudden large changes showing a less variable time series than the original. Given that here short time series (panel data) are considered, a three points weighted average is proposed in the convergEU package and implemented in the smoo_dataset function:

help(smoo_dataset)

It must be noted that the smoothing is performed on a dataset without missing values, e.g. on the imputed dataset. Note also that the time variable should be deleted when performing the smoothing with the smoo_dataset function. To better illustrate this point, let’s suppose that we want to perform smoothing for the TBhealthF dataset in which the missing values are already imputed. Furthermore, suppose that we choose the following three MS on which the smoothing is to be performed: Italy, France and Germany, coded “IT”, “FR” and “DE” respectively. Thus, we select the data with just these three countries and the time variable:

workTB <- dplyr::select(TBhealthF, time, IT,DE,FR)
check_data(workTB)
#> $res
#> [1] TRUE
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

The check_data function confirms that the dataset is ready for performing smoothing. Thus, once checking is passed, we go with the smoothing step after deleting the time variable:

resSM <- smoo_dataset(select(workTB,-time), leadW = 0.15, timeTB= select(workTB,time))
resSM
#> # A tibble: 16 x 4
#>     time    IT    DE    FR
#>    <dbl> <dbl> <dbl> <dbl>
#>  1  2003  3.2   4.9   3.9 
#>  2  2004  2.94  4.9   4.11
#>  3  2005  3.07  4.69  4.4 
#>  4  2006  2.84  4.36  3.92
#>  5  2007  2.87  3.63  3.99
#>  6  2008  2.81  3.48  3.27
#>  7  2009  2.44  3.23  3.57
#>  8  2010  2.92  3.43  3.84
#>  9  2011  2.76  3.95  4.34
#> 10  2012  3.27  4.03  4.30
#> 11  2013  3.20  5.01  4.27
#> 12  2014  3.22  4.85  3.53
#> 13  2015  2.65  4.98  4.13
#> 14  2016  2.06  3.96  3.51
#> 15  2017  1.42  3.75  4.49
#> 16  2018  1.03  2.99  3.70

Note that, the first argument is the complete dataset workTB with just country columns while the third argument timeTB is the dataset that will be displayed in the output and in which the time variable has been added after performing smoothing. The argument leadw refers to a leading positive weight where \(0< w \leq 1\). The special case \(w=1\) corresponds to no smoothing. In case of missing values, an NA is returned. If the weight is outside the interval \((0,1]\), then an NA is returned. In our example, the smoothing is firstly performed with an weight equal to 0.15. To carry out comparison between the original indicator’s values and the smoothed ones, the following dataset is built:

compaTB <- select(bind_cols(workTB, select(resSM,-time)), time,IT,IT1,DE,DE1, FR, FR1)
compaTB
#> # A tibble: 16 x 7
#>     time    IT   IT1    DE   DE1    FR   FR1
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  2003   3.2  3.2    4.9  4.9    3.9  3.9 
#>  2  2004   3.2  2.94   4.9  4.9    3.9  4.11
#>  3  2005   2.6  3.07   4.9  4.69   4.4  4.4 
#>  4  2006   3.1  2.84   4.4  4.36   4.9  3.92
#>  5  2007   3    2.87   3.8  3.63   3.1  3.99
#>  6  2008   2.6  2.81   2.8  3.48   3.4  3.27
#>  7  2009   2.7  2.44   3.4  3.23   3.4  3.57
#>  8  2010   2.2  2.92   3.6  3.43   3.8  3.84
#>  9  2011   3.4  2.76   3.4  3.95   4.3  4.34
#> 10  2012   3.1  3.27   4.5  4.03   4.9  4.30
#> 11  2013   3.2  3.20   4.5  5.01   4.1  4.27
#> 12  2014   3.3  3.22   5.7  4.85   3.7  3.53
#> 13  2015   3.2  2.65   4.9  4.98   2.9  4.13
#> 14  2016   1.8  2.06   4.3  3.96   5    3.51
#> 15  2017   1    1.42   2.9  3.75   3.6  4.49
#> 16  2018   1.2  1.03   3.5  2.99   4.3  3.70

Therefore, a graphical output shows changes for Italy (“IT”), Germany (“DE”) and France (“FR”) with original index in blue and smoothed index in red:

require(gridExtra)
ITq<-qplot(time,IT,data=compaTB) + 
  geom_line(colour="navyblue") +
  geom_line(aes(x=time,y=IT1),colour="red") +
  geom_point(aes(x=time,y=IT1),colour="red",shape=8)
DEq<-qplot(time,DE,data=compaTB) + 
  geom_line(colour="navyblue") +
  geom_line(aes(x=time,y=DE1),colour="red") +
  geom_point(aes(x=time,y=DE1),colour="red",shape=8)
FRq<-qplot(time,FR,data=compaTB) + 
  geom_line(colour="navyblue") +
  geom_line(aes(x=time,y=FR1),colour="red") +
  geom_point(aes(x=time,y=FR1),colour="red",shape=8)

grid.arrange(ITq, DEq, FRq, nrow=3)

Note that higher is the weight fixed, less smoothed are the indicator’s values. For example, with \(w=0.5\):

require(gridExtra)
resSM2 <- smoo_dataset(select(workTB,-time), leadW = 0.5, timeTB= select(workTB,time))
compaTB2 <- select(bind_cols(workTB, select(resSM2,-time)), time,IT,IT1,DE,DE1, FR, FR1)
ITq2<-qplot(time,IT,data=compaTB2) + 
  geom_line(colour="navyblue") +
  geom_line(aes(x=time,y=IT1),colour="red") +
  geom_point(aes(x=time,y=IT1),colour="red",shape=8)
DEq2<-qplot(time,DE,data=compaTB2) + 
  geom_line(colour="navyblue") +
  geom_line(aes(x=time,y=DE1),colour="red") +
  geom_point(aes(x=time,y=DE1),colour="red",shape=8)
FRq2<-qplot(time,FR,data=compaTB2) + 
  geom_line(colour="navyblue") +
  geom_line(aes(x=time,y=FR1),colour="red") +
  geom_point(aes(x=time,y=FR1),colour="red",shape=8)
grid.arrange(ITq2, DEq2, FRq2, nrow=3)