The convergEU R package is a set of S3 functions and data objects suited for the analysis of economic and social convergence of Member States (MS) within the European Union (EU).
This vignette is intended to be a “learn-by-doing” tutorial with the convergEU suite of functions. More precisely, the tutorial illustrates in details how to analyze convergence of MS with data coming from the following sources:
For better illustrating the points listed above, examples with several EU indicators are described in details.
IMPORTANT: the analysis of convergence is performed on clean and imputed data, i.e. a tidy dataset in the format years by countries. This means that the dataset must always have these characteristics:
no qualitative variables;
neither vector of strings nor missing values;
a time variable has to be present;
be a tibble (if it is a dataframe, it will not work, further details in the tutorial).
In the convergEU package, the check_data() function may be called to check for the presence of unsuited features that must be solved before starting the analysis. The object returnes states if the dataset is ready for calculations, and if it is not; the error component states why checking failed:
Examples on how to transform data to achieve the required format are described in the tutorial.
Locally accessible Eurofound datasets are available from the convergEU package without any active Internet connection. In this Section, we describe in details the analysis of convergence within the convergEU package by considering locally accessible Eurofound data, taking as an example the following two EU indicators:
Before describing in details how to analyze convergence with locally accessible datasets, we briefly illustrate the available datasets in the convergEU package, that are as follows:
The object dbEUF2018meta contains a description (metadata) of the locally available data within the convergEU package, without any downloading from the Eurofound website. These data refer to two dimensions: quality of life and working conditions. For each dimension, metainformation is provided for several indicators, for example:
names(dbEUF2018meta)
#> [1] "DIMENSION" "SUBDIMENSION" "INDICATOR"
#> [4] "Code_in_database" "Official_code" "Unit"
#> [7] "Source_organisation" "Source_reference" "Disaggregation"
#> [10] "Bookmark_URL"
print(dbEUF2018meta, n=200,width=200)
#> # A tibble: 13 x 10
#> DIMENSION SUBDIMENSION
#> <chr> <chr>
#> 1 Quality of life Life satisfaction
#> 2 Quality of life Health
#> 3 Quality of life Health
#> 4 Quality of life Quality of society
#> 5 Quality of life Quality of society
#> 6 Quality of life Quality of society
#> 7 Quality of life Quality of society
#> 8 Quality of life Quality of society
#> 9 Working conditions Working conditions
#> 10 Working conditions Working conditions
#> 11 Working conditions Working conditions
#> 12 Working conditions Working conditions
#> 13 Working conditions Working conditions
#> INDICATOR Code_in_database Official_code Unit Source_organisa…
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Mean life satis… lifesatisf y16_q4 -- Eurofound
#> 2 Mean health sta… health y16_q48 -- Eurofound
#> 3 Percentage of p… goodhealth_p y16_q48 % Eurofound
#> 4 Mean level of t… trustlocal y16_q35f -- Eurofound
#> 5 Level of involv… volunt y16_q29a -- Eurofound
#> 6 Percentage of p… volunt_p y16_q29a % Eurofound
#> 7 Hours per week … caring_h y16_q43a hours Eurofound
#> 8 Social Exclusio… socialexc_i y16_socexindex index Eurofound
#> 9 JQI_Skills and … JQIskill_i wq_slim - JQI SLIM… index Eurofound
#> 10 JQI_Physical en… JQIenviron_i envsec_slim - JQI S… index Eurofound
#> 11 JQI_Intensity i… JQIintensity_i intens_slim - JQI S… index Eurofound
#> 12 JQI_Working tim… JQItime_i wlb_slim - JQI SLIM… index Eurofound
#> 13 Exposition to d… exposdiscr_p disc_d - Having be… % Eurofound
#> Source_reference Disaggregation Bookmark_URL
#> <chr> <chr> <chr>
#> 1 EQLS sex https://www.eurofound.europa.eu/surveys/abou…
#> 2 EQLS sex https://www.eurofound.europa.eu/surveys/abou…
#> 3 EQLS sex https://www.eurofound.europa.eu/surveys/abou…
#> 4 EQLS sex https://www.eurofound.europa.eu/surveys/abou…
#> 5 EQLS sex https://www.eurofound.europa.eu/surveys/abou…
#> 6 EQLS sex https://www.eurofound.europa.eu/surveys/abou…
#> 7 EQLS sex https://www.eurofound.europa.eu/surveys/abou…
#> 8 EQLS sex https://www.eurofound.europa.eu/surveys/abou…
#> 9 EWCS sex https://www.eurofound.europa.eu/surveys/abou…
#> 10 EWCS sex https://www.eurofound.europa.eu/surveys/abou…
#> 11 EWCS sex https://www.eurofound.europa.eu/surveys/abou…
#> 12 EWCS sex https://www.eurofound.europa.eu/surveys/abou…
#> 13 EWCS sex https://www.eurofound.europa.eu/surveys/abou…where the “Code_in_database” information is of particular importance, since it should be provided by the user when extracting the local data for a given indicator. The data visualization could be also performed in the following way:
The second locally accessible object in the convergEU package is dbEurofound which is a raw local Eurofound database accessed as follows:
data(dbEurofound)
head(dbEurofound)
#> # A tibble: 6 x 17
#> time geo geo_label sex lifesatisf health goodhealth_p trustlocal volunt
#> <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1960 AD Andorra Fema… NA NA NA NA NA
#> 2 1960 AD Andorra Males NA NA NA NA NA
#> 3 1960 AD Andorra Total NA NA NA NA NA
#> 4 1960 AL Albania Fema… NA NA NA NA NA
#> 5 1960 AL Albania Males NA NA NA NA NA
#> 6 1960 AL Albania Total NA NA NA NA NA
#> # … with 8 more variables: volunt_p <dbl>, caring_h <dbl>, socialexc_i <dbl>,
#> # JQIskill_i <dbl>, JQIenviron_i <dbl>, JQIintensity_i <dbl>,
#> # JQItime_i <dbl>, exposdiscr_p <dbl>The names of the variables in dbEurofound are:
names(dbEurofound)
#> [1] "time" "geo" "geo_label" "sex"
#> [5] "lifesatisf" "health" "goodhealth_p" "trustlocal"
#> [9] "volunt" "volunt_p" "caring_h" "socialexc_i"
#> [13] "JQIskill_i" "JQIenviron_i" "JQIintensity_i" "JQItime_i"
#> [17] "exposdiscr_p"while the time ranges in the interval:
The database is not complete in such a time range for all considered countries.
The third object in the convergEU package is dbMetaEUStat which provides metainformation on downloadable data from the Eurostat repository. For further details on this object, please see Section 2 related to data downloaded from the Eurostat repository.
The object emp_20_64_MS in the convergEU package is an Eurofound dataset exploited during the development of this package. It refers to the Employment rate indicator from 2002 to 2018 and within the age class 20 to 64. The dataset emp_20_64_MS contains source data downloaded from the Eurostat repository which are reshaped in the required format time by countries:
data("emp_20_64_MS")
head(emp_20_64_MS)
#> # A tibble: 6 x 29
#> time AT BE BG CY CZ DE DK EE EL ES FI FR
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2002 70.9 64.7 56.5 75.1 71.7 68.8 78.3 68.3 62.8 63.2 73.2 68.6
#> 2 2003 71.3 64.5 58.7 75.4 71 68.4 77.4 69.1 63.8 64.3 72.9 69.7
#> 3 2004 68.4 65.8 61.2 75.7 70.1 67.9 78.1 70.2 64.3 65.2 72.5 69.2
#> 4 2005 70.4 66.5 61.9 74.4 70.7 69.4 78 72 64.4 67.5 73 69.4
#> 5 2006 71.6 66.5 65.1 75.8 71.2 71.1 79.4 75.9 65.6 69 73.9 69.4
#> 6 2007 72.8 67.7 68.4 76.8 72 72.9 79 76.9 65.8 69.7 74.8 69.9
#> # … with 16 more variables: HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>,
#> # LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>, RO <dbl>,
#> # SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>
emp_20_64_MS
#> # A tibble: 17 x 29
#> time AT BE BG CY CZ DE DK EE EL ES FI FR
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2002 70.9 64.7 56.5 75.1 71.7 68.8 78.3 68.3 62.8 63.2 73.2 68.6
#> 2 2003 71.3 64.5 58.7 75.4 71 68.4 77.4 69.1 63.8 64.3 72.9 69.7
#> 3 2004 68.4 65.8 61.2 75.7 70.1 67.9 78.1 70.2 64.3 65.2 72.5 69.2
#> 4 2005 70.4 66.5 61.9 74.4 70.7 69.4 78 72 64.4 67.5 73 69.4
#> 5 2006 71.6 66.5 65.1 75.8 71.2 71.1 79.4 75.9 65.6 69 73.9 69.4
#> 6 2007 72.8 67.7 68.4 76.8 72 72.9 79 76.9 65.8 69.7 74.8 69.9
#> 7 2008 73.8 68 70.7 76.5 72.4 74 78.7 77.1 66.3 68.5 75.8 70.5
#> 8 2009 73.4 67.1 68.8 75.3 70.9 74.2 76.1 70 65.6 64 73.5 69.5
#> 9 2010 73.9 67.6 64.7 75 70.4 75 74.9 66.8 63.8 62.8 73 69.3
#> 10 2011 74.2 67.3 62.9 73.4 70.9 76.5 74.8 70.6 59.6 62 73.8 69.2
#> 11 2012 74.4 67.2 63 70.2 71.5 76.9 74.3 72.2 55 59.6 74 69.4
#> 12 2013 74.6 67.2 63.5 67.2 72.5 77.3 74.3 73.3 52.9 58.6 73.3 69.5
#> 13 2014 74.2 67.3 65.1 67.6 73.5 77.7 74.7 74.3 53.3 59.9 73.1 69.2
#> 14 2015 74.3 67.2 67.1 67.9 74.8 78 75.4 76.5 54.9 62 72.9 69.5
#> 15 2016 74.8 67.7 67.7 68.7 76.7 78.6 76 76.6 56.2 63.9 73.4 70
#> 16 2017 75.4 68.5 71.3 70.8 78.5 79.2 76.6 78.7 57.8 65.5 74.2 70.6
#> 17 2018 76.2 69.7 72.4 73.9 79.9 79.9 77.5 79.5 59.5 67 76.3 71.3
#> # … with 16 more variables: HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>,
#> # LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>, RO <dbl>,
#> # SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>This dataset is always ready for the analysis without further processing. Details on how to download datasets similar to emp_20_64_MS are provided in Section 2. Following, we illustrate how to analyse the convergence with locally accessible data starting from the first example related to the Mean Life Satisfaction indicator.
NOTE: within the convergeEU package, Eurofound data are statically stored. Please update this package to have the most recent version of Eurofound data.
All the indicators are extracted from the locally accessible Eurofound database in the first step of the analysis through the extract_indicator_EUF function. It amounts to choose a time interval, an indicator and a set of countries (MS). As already stated above, the information on the locally accessible indicators are provided in the object dbEUF2018meta as follows:
where the “Code_in_database” is the information that should be provided to extract a given indicator. Information on sets of countries are available within the function convergEU_glb() that contains lists of constants and setups for the convergEU package. This function generates global static objects of clusters of countries with their corresponding labels, as well as metainformation on the stored indicators:
names(convergEU_glb())
#> [1] "EUcodes" "Eurozone" "EA" "EU12"
#> [5] "EU15" "EU19" "EU25" "EU27_2007"
#> [9] "EU27_2019" "EU27_2020" "EU27" "EU28"
#> [13] "geoRefEUF" "metaEUStat" "tmpl_out" "paralintags"
#> [17] "rounDigits" "epsilonV" "scoreBoaTB" "labels_clusters"The first object, EUcodes, in the function convergEU_glb() relates to the labels for all the MS:
convergEU_glb()$EUcodes
#> # A tibble: 28 x 4
#> pae paeF paeN paeS
#> <chr> <chr> <dbl> <chr>
#> 1 1-AT 1-AT 1 AT
#> 2 10-ES 10-ES 10 ES
#> 3 11-FI 11-FI 11 FI
#> 4 12-FR 12-FR 12 FR
#> 5 13-HR 13-HR 13 HR
#> 6 14-HU 14-HU 14 HU
#> 7 15-IE 15-IE 15 IE
#> 8 16-IT 16-IT 16 IT
#> 9 17-LT 17-LT 17 LT
#> 10 18-LU 18-LU 18 LU
#> # … with 18 more rowsThe objects containing clusters of MS with their corresponding labels are: Eurozone, EA, EU12, EU15, EU19, EU25, EU27, EU27_2007, EU27_2019, EU27_2020 and EU28. It must be noted that EU27 refers to Member States after the 1st Februrary 2020, while EU28 is a valid tag up to 3 March 2020. The clusters EU27_2020 and EU27_2007 as defined by Eurofound are also available; for further details see https://ec.europa.eu/eurostat/help/faq/brexit. For example, the clusters EU12 and EU27_2020 of MS consist of the following countries with their corresponding labels:
convergEU_glb()$EU12
#> $dates
#> [1] "01-11-1993" "31/12/1994"
#>
#> $memberStates
#> # A tibble: 12 x 2
#> MS codeMS
#> <chr> <chr>
#> 1 Belgium BE
#> 2 Denmark DK
#> 3 France FR
#> 4 Germany DE
#> 5 Greece EL
#> 6 Ireland IE
#> 7 Italy IT
#> 8 Luxembourg LU
#> 9 Netherlands NL
#> 10 Portugal PT
#> 11 Spain ES
#> 12 United-Kingdom UK
convergEU_glb()$EU27_2020
#> $dates
#> [1] "01/02/2020" "00/00/0000"
#>
#> $memberStates
#> # A tibble: 27 x 2
#> MS codeMS
#> <chr> <chr>
#> 1 Belgium BE
#> 2 Denmark DK
#> 3 France FR
#> 4 Germany DE
#> 5 Greece EL
#> 6 Ireland IE
#> 7 Italy IT
#> 8 Luxembourg LU
#> 9 Netherlands NL
#> 10 Portugal PT
#> # … with 17 more rowsPlease note that EU19, Eurozone and EA are synonyms:
convergEU_glb()$EU19
#> $dates
#> [1] NA NA
#>
#> $memberStates
#> # A tibble: 19 x 2
#> MS codeMS
#> <chr> <chr>
#> 1 Austria AT
#> 2 Belgium BE
#> 3 Cyprus CY
#> 4 Estonia EE
#> 5 Finland FI
#> 6 France FR
#> 7 Germany DE
#> 8 Greece EL
#> 9 Ireland IE
#> 10 Italy IT
#> 11 Latvia LV
#> 12 Lithuania LT
#> 13 Luxembourg LU
#> 14 Malta MT
#> 15 Netherlands NL
#> 16 Portugal PT
#> 17 Slovakia SK
#> 18 Slovenia SI
#> 19 Spain ES
convergEU_glb()$Eurozone
#> $dates
#> [1] NA NA
#>
#> $memberStates
#> # A tibble: 19 x 2
#> MS codeMS
#> <chr> <chr>
#> 1 Austria AT
#> 2 Belgium BE
#> 3 Cyprus CY
#> 4 Estonia EE
#> 5 Finland FI
#> 6 France FR
#> 7 Germany DE
#> 8 Greece EL
#> 9 Ireland IE
#> 10 Italy IT
#> 11 Latvia LV
#> 12 Lithuania LT
#> 13 Luxembourg LU
#> 14 Malta MT
#> 15 Netherlands NL
#> 16 Portugal PT
#> 17 Slovakia SK
#> 18 Slovenia SI
#> 19 Spain ES
convergEU_glb()$EA
#> $dates
#> [1] NA NA
#>
#> $memberStates
#> # A tibble: 19 x 2
#> MS codeMS
#> <chr> <chr>
#> 1 Austria AT
#> 2 Belgium BE
#> 3 Cyprus CY
#> 4 Estonia EE
#> 5 Finland FI
#> 6 France FR
#> 7 Germany DE
#> 8 Greece EL
#> 9 Ireland IE
#> 10 Italy IT
#> 11 Latvia LV
#> 12 Lithuania LT
#> 13 Luxembourg LU
#> 14 Malta MT
#> 15 Netherlands NL
#> 16 Portugal PT
#> 17 Slovakia SK
#> 18 Slovenia SI
#> 19 Spain ESThus, it is important to note that if the user selects the cluster Eurozone, then the calculations are performed NOT for all the MS in the EU, but only for those in the clusters EA/EU19.
The object geoRefEUF contains MS as well as other countries (neighbouring countries of the EU):
convergEU_glb()$geoRefEUF
#> # A tibble: 77 x 2
#> geo geo_label
#> <chr> <chr>
#> 1 AD Andorra
#> 2 AL Albania
#> 3 AM Armenia
#> 4 AT Austria
#> 5 AZ Azerbaijan
#> 6 BE Belgium
#> 7 BG Bulgaria
#> 8 BY Belarus
#> 9 CH Switzerland
#> 10 CY Cyprus
#> # … with 67 more rowsThe convergEU_glb() function provides also metainformation on different indicators, for example:
convergEU_glb()$metaEUStat
#> # A tibble: 56 x 6
#> Official_code Code_in_database indicator Official_code_p… subSelection
#> <chr> <chr> <chr> <chr> <chr>
#> 1 lfsa_argaed act_15-64_p Activity… lfsa_argaed <NA>
#> 2 lfsa_ergaed emp_20_64_p Employme… lfsa_ergaed <NA>
#> 3 lfsa_ergacob emp_nonat_p Employme… lfsa_ergacob <NA>
#> 4 lfsa_urgan unem_15_74_p Unemploy… lfsa_urgan <NA>
#> 5 une_educ_a unem_pri_p Unemploy… une_educ_a <NA>
#> 6 lfsa_upgan ltunemp_p Long-ter… lfsa_upgan <NA>
#> 7 edat_lfse_20 neet_p NEET's r… edat_lfse_20 <NA>
#> 8 une_rt_a youthunemp_p Youth un… une_rt_a <NA>
#> 9 ilc_lvhl36 temptoperm_p Transiti… ilc_lvhl36 <NA>
#> 10 ilc_lvhl30 parttofull_p Transiti… ilc_lvhl30 <NA>
#> # … with 46 more rows, and 1 more variable: selectorUser <chr>The general procedure to extract locally accessible data for a given indicator is illustrated below for the Mean Life satisfaction indicator, labelled lifesatisf in the column “Code_in_database” within the object dbEUF2018meta containing the metainformation:
names(dbEUF2018meta)
#> [1] "DIMENSION" "SUBDIMENSION" "INDICATOR"
#> [4] "Code_in_database" "Official_code" "Unit"
#> [7] "Source_organisation" "Source_reference" "Disaggregation"
#> [10] "Bookmark_URL"We proceed to extract the data for the lifesatisf indicator by invoking the extract_indicator_EUF function, and by considering the time interval from 2003 to 2016 and the cluster EU19 of MS:
myTBlf <- extract_indicator_EUF(
indicator_code = "lifesatisf", #Code_in_database
fromTime=2003,
toTime=2016,
gender= c("Total","Females","Males")[1],
countries= convergEU_glb()$EU19$memberStates$codeMS
)
myTBlf
#> $res
#> # A tibble: 4 x 21
#> time sex AT BE CY DE EE EL ES FI FR IE IT
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2003 Total 7.85 7.52 7.22 7.36 5.94 6.69 7.49 8.14 6.96 7.70 7.22
#> 2 2007 Total 6.95 7.54 7.05 7.16 6.72 6.58 7.25 8.23 7.32 7.59 6.59
#> 3 2011 Total 7.66 7.38 7.16 7.20 6.28 6.16 7.47 8.08 7.23 7.39 6.88
#> 4 2016 Total 7.92 7.31 6.54 7.31 6.73 5.26 6.95 8.07 7.17 7.68 6.56
#> # … with 8 more variables: LT <dbl>, LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>,
#> # PT <dbl>, SI <dbl>, SK <dbl>
#>
#> $msg
#> NULL
#>
#> $err
#> NULLwhich results in a dataset in the format time by countries in the list component named “$res”; further components are “$msg” that possibly carries messages for the user and “$err” which is a string containing an error message, if an error occurs. Please note that the function is conditioned to gender.
IMPORTANT: Users exploiting automated access to the database through web services or bulk download should pay particular attention to changes of labels. Further information about EU aggregates is available at “https://ec.europa.eu/eurostat/help/faq/brexit”.
In this example for the lifesatisf indicator, the data are extracted for both males and females. Otherwise, data only for females or only for males could be extracted as follows:
# Extract data only for females:
feTBlf <- extract_indicator_EUF(
indicator_code = "lifesatisf", #Code_in_database
fromTime=2003,
toTime=2016,
gender= c("Total","Females","Males")[2],
countries= convergEU_glb()$EU19$memberStates$codeMS
)
feTBlf
#> $res
#> # A tibble: 4 x 21
#> time sex AT BE CY DE EE EL ES FI FR IE IT
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2003 Femal… 7.93 7.38 7.13 7.48 5.91 6.75 7.43 8.27 6.97 7.89 7.12
#> 2 2007 Femal… 7.20 7.49 7.06 7.17 6.77 6.55 7.14 8.30 7.34 7.66 6.54
#> 3 2011 Femal… 7.63 7.47 7.07 7.27 6.26 6.13 7.51 8.27 7.17 7.41 6.86
#> 4 2016 Femal… 7.87 7.27 6.50 7.28 6.81 5.30 6.97 8.10 7.24 7.66 6.56
#> # … with 8 more variables: LT <dbl>, LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>,
#> # PT <dbl>, SI <dbl>, SK <dbl>
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
# Extract data only for males:
maTBlf <- extract_indicator_EUF(
indicator_code = "lifesatisf", #Code_in_database
fromTime=2003,
toTime=2016,
gender= c("Total","Females","Males")[3],
countries= convergEU_glb()$EU19$memberStates$codeMS
)
maTBlf
#> $res
#> # A tibble: 4 x 21
#> time sex AT BE CY DE EE EL ES FI FR IE IT
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2003 Males 7.75 7.66 7.32 7.23 5.98 6.63 7.56 7.99 6.95 7.51 7.32
#> 2 2007 Males 6.67 7.60 7.03 7.16 6.67 6.62 7.37 8.16 7.31 7.51 6.64
#> 3 2011 Males 7.69 7.28 7.26 7.13 6.30 6.19 7.41 7.87 7.29 7.37 6.91
#> 4 2016 Males 7.98 7.35 6.58 7.34 6.64 5.23 6.93 8.04 7.09 7.71 6.55
#> # … with 8 more variables: LT <dbl>, LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>,
#> # PT <dbl>, SI <dbl>, SK <dbl>
#>
#> $msg
#> NULL
#>
#> $err
#> NULLFirst, it is convenient to assign a meaningful name to the locally extracted data for both males and females of the lifesatisf indicator:
that is 4 rows and 21 columns, and variables names:
names(TBlf)
#> [1] "time" "sex" "AT" "BE" "CY" "DE" "EE" "EL" "ES" "FI"
#> [11] "FR" "IE" "IT" "LT" "LU" "LV" "MT" "NL" "PT" "SI"
#> [21] "SK"It is worthwhile to mention that the analysis of convergence is performed on clean and imputed tidy dataset in the format years by countries. Thus, the dataset cannot have qualitative variables, neither vector of strings nor missing values for computing convergence measures, and a time variable should also be present. The check_data() function is called to check for the presence of unsuited features that must be solved before starting the analysis.
For example, for the TBlf data we have:
check_data(TBlf)
#> $res
#> NULL
#>
#> $msg
#> NULL
#>
#> $err
#> [1] "Error: string variables in the dataframe."where the suspected string variable is sex that should be deleted as follows:
lifesatFin <- dplyr::select(TBlf,-sex)
check_data(lifesatFin)
#> $res
#> [1] TRUE
#>
#> $msg
#> NULL
#>
#> $err
#> NULLOnce the string variable sex is deleted, the check_data() function is called again:
Thus, lifesatFin is the final object to start the analysis of convergence.
Several measures of convergence have been recently proposed by Eurofound (Eurofound, 2018). In the convergEU package four measures of convergence are implemented: Beta, Sigma, Gamma and Delta convergences. Following, each measure is illustrated in details for the lifesatFin dataset.
Beta convergence is calculated in the convergEU package through the beta_conv() function. The dataset should be in the format years by countries and no other variable besides time and countries indicator must be present. It must be noted that if the time variable is not ordered, it must be sorted. For further details on how to sort not ordered time variables, see the first example in the help documentation:
For the the lifesatFin dataset, the Beta convergence is calculated as follows:
require(ggplot2)
require(dplyr)
require(tibble)
empBC <- beta_conv(lifesatFin,
time_0 = 2003,
time_t = 2016,
all_within = FALSE,
timeName = "time")where the reference time (time_0) is 2003 while the target time (time_t) is 2016. Note that the option all_within = FALSE is the default in order to ensure that just the first and the last two years are considered, e.g. 2003 and 2016 in this example. Otherwise, if all_within = TRUE then all the years in the specified time interval are considered.
It must be also noted that in the implementation of the beta_conv() function, the same reference time is maintained across different years and the left hand side is divided by the amount of time elapsed (i.e. time_t-time_0) with the argument useTau = TRUE by default. Thus, the division by the number of years elapsed may be skipped when the argument useTau = FALSE is specified:
beta_conv(lifesatFin,
time_0 = 2003,
time_t = 2016,
all_within = FALSE,
timeName = "time",
useTau = FALSE)The output of the beta_conv() function consists of the usual three list components: “$res” that contains the results, “$msg” that possibly carries messages for the user and “$err” which is a string containing an error message, if an error occurs:
empBC
#> $res
#> $res$workTB
#> # A tibble: 19 x 3
#> deltaIndic indic countries
#> <dbl> <dbl> <chr>
#> 1 0.000743 2.06 AT
#> 2 -0.00216 2.02 BE
#> 3 -0.00768 1.98 CY
#> 4 -0.000537 2.00 DE
#> 5 0.00962 1.78 EE
#> 6 -0.0185 1.90 EL
#> 7 -0.00576 2.01 ES
#> 8 -0.000606 2.10 FI
#> 9 0.00222 1.94 FR
#> 10 -0.000174 2.04 IE
#> 11 -0.00736 1.98 IT
#> 12 0.0134 1.69 LT
#> 13 0.00211 2.04 LU
#> 14 0.00970 1.72 LV
#> 15 0.00254 1.99 MT
#> 16 0.00193 2.02 NL
#> 17 0.0105 1.79 PT
#> 18 -0.00209 1.95 SI
#> 19 0.00954 1.73 SK
#>
#> $res$model
#>
#> Call:
#> stats::lm(formula = deltaIndic ~ indic, data = workTB)
#>
#> Coefficients:
#> (Intercept) indic
#> 0.07130 -0.03639
#>
#>
#> $res$summary
#> # A tibble: 2 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 0.0713 0.0231 3.09 0.00669
#> 2 indic -0.0364 0.0119 -3.05 0.00720
#>
#> $res$beta1
#> [1] -0.0363931
#>
#> $res$adj.r.squared
#> [1] 0.3160892
#>
#>
#> $msg
#> NULL
#>
#> $err
#> NULLBy considering more in details the component “$res”, it contains for example:
empBC$res$workTB
#> # A tibble: 19 x 3
#> deltaIndic indic countries
#> <dbl> <dbl> <chr>
#> 1 0.000743 2.06 AT
#> 2 -0.00216 2.02 BE
#> 3 -0.00768 1.98 CY
#> 4 -0.000537 2.00 DE
#> 5 0.00962 1.78 EE
#> 6 -0.0185 1.90 EL
#> 7 -0.00576 2.01 ES
#> 8 -0.000606 2.10 FI
#> 9 0.00222 1.94 FR
#> 10 -0.000174 2.04 IE
#> 11 -0.00736 1.98 IT
#> 12 0.0134 1.69 LT
#> 13 0.00211 2.04 LU
#> 14 0.00970 1.72 LV
#> 15 0.00254 1.99 MT
#> 16 0.00193 2.02 NL
#> 17 0.0105 1.79 PT
#> 18 -0.00209 1.95 SI
#> 19 0.00954 1.73 SKempBC$res$summary
#> # A tibble: 2 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 0.0713 0.0231 3.09 0.00669
#> 2 indic -0.0364 0.0119 -3.05 0.00720and its standard error:
as well as several components for the estimated statistical model (by OLS-Ordinary Least Squares), e.g. estimated coefficients, residuals, fitted values, the residual degrees of freedom. Note that a standard two tails test is reported (p-value and adjusted R-squared).
A plot of transformed data and the straight line for the results obtained for beta_convergence may also be useful. To this end, the function beta_conv_graph is specifically implemented for obtaining this graphical representation as follows:
mybetaplot<-beta_conv_graph(betaRes=empBC,
indiName = 'Mean Life Satisfaction',
time_0 = 2003,
time_t = 2016)
mybetaplot where in the argument betaRes the user should specify the output of the beta_conv function, while time_0 and time_t refer to the starting and ending time respectively. It must be noted that the name of the indicator is specified in the argument indiName as a string.
Following, the Sigma convergence is calculated by invoking the sigma_conv function for the lifesatFin dataset, that is already in the required format years by countries.
Note that if the arguments time_0 and time_t are not specified, then all years are considered:
mySTB <- sigma_conv(lifesatFin,timeName="time")
mySTB
#> $res
#> # A tibble: 4 x 5
#> time stdDev CV mean devianceT
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2003 0.814 0.117 6.97 12.6
#> 2 2007 0.593 0.0836 7.09 6.69
#> 3 2011 0.542 0.0765 7.09 5.58
#> 4 2016 0.687 0.0977 7.03 8.98
#>
#> $msg
#> NULL
#>
#> $err
#> NULLOtherwise, it is possible to select a time window for which the Sigma convergence is calculated as follows:
The output of the sigma_conv function is the usual list:
It is also possible to obtain a graphical representation of the standard deviation and the coefficient of variation obtained for the Sigma convergence by invoking the sigma_conv_graph fucntion as follows:
mysigmaplot<-sigma_conv_graph(sigmaconvOut=mySTB,
time_0 = 2007,
time_t = 2011,
aggregation='EU19')
mysigmaplot where in the argument sigmaconvOut the user should specify the output of the sigma_conv function, while time_0 and time_t refer to the starting and ending time respectively. Note that the considered cluster of MS is specified in the argument aggregation as a string.
The third measure of convergence implemented in the convergEU package is the Gamma convergence that is computed by invoking the gamma_conv function. As usual, the dataset should be in the format years by countries. For the lifesatFin dataset, the Gamma convergence is calculated as follows:
gamma_conv(lifesatFin,ref=2003,last=2016,timeName="time")
#> $res
#> [1] 0.5762807
#>
#> $msg
#> NULL
#>
#> $err
#> NULLwith 2003 as reference time (argument ref) and 2016 the last time (argument last) to be considered. The output is the usual list of the three components “$res”, “$msg” and “$err”. It is also possible to print the ranks of countries for each time by specifying the option printRanks=T:
gamma_conv(lifesatFin,ref=2003,last=2016,timeName="time", printRanks = T)
#> Ranks:
#> AT BE CY DE EE EL ES FI FR IE IT LT LU LV MT NL PT SI SK
#> [1,] 18 14 10 12 4 6 13 19 7 17 9 1 16 2 11 15 5 8 3
#> [2,] 8 14 9 10 7 4 12 19 13 16 5 3 18 1 15 17 2 11 6
#> [3,] 16 13 9 10 3 1 15 19 11 14 7 5 18 2 12 17 6 8 4
#> [4,] 18 13 5 12 7 1 10 19 11 15 6 4 17 2 14 16 9 8 3
#> $res
#> [1] 0.5762807
#>
#> $msg
#> NULL
#>
#> $err
#> NULLNote that the Gamma convergence could be also calculated between pairs of subsequent years by invoking the gamma_conv_msteps function. For example for the lifesatFin dataset:
GCms <- gamma_conv_msteps(lifesatFin,startTime=2003,endTime=2016, timeName = "time")
GCms
#> $res
#> # A tibble: 4 x 2
#> time gammaConv
#> <dbl> <dbl>
#> 1 2003 NA
#> 2 2007 0.4
#> 3 2011 0.510
#> 4 2016 0.576
#>
#> $msg
#> NULL
#>
#> $err
#> NULLThe gamma_conv_msteps function considers the years chosen (in the example above 2003 and 2018) and calculates the Gamma convergence for each pair of subsequent years (2003-2007, 2007-2011 and 2011-2016 in the example above).
The last measure of convergence implemented in the convergEU package is the Delta convergence computed by invoking the delta_conv function. Given a dataset of quantitative indicators along time, the Delta convergence is a statistic describing departures from the best performer. To calculate Delta convergence, it is sufficient to specify the dataset and the time variable name. For example, for the lifesatisf indicator, the results will be:
delta_conv(lifesatFin,"time")
#> $res
#> # A tibble: 4 x 2
#> time delta
#> <dbl> <dbl>
#> 1 2003 22.3
#> 2 2007 21.7
#> 3 2011 18.8
#> 4 2016 19.8
#>
#> $msg
#> NULL
#>
#> $err
#> NULLwith an output given by the components “$res” (results), “$msg” (message) and “$err” (error). It must be noted that the delta_conv function allows to obtain also the declaration of convergence. To this end, the argument extended should be specified as TRUE. For example, for the lifesatisf indicator the syntax is as follows:
delta_conv(lifesatFin,"time", extended=TRUE)
#> $res
#> $res$delta_conv
#> # A tibble: 4 x 2
#> time delta
#> <dbl> <dbl>
#> 1 2003 22.3
#> 2 2007 21.7
#> 3 2011 18.8
#> 4 2016 19.8
#>
#> $res$differences
#> AT BE CY DE EE EL ES FI
#> [1,] 0.2900786 0.6212335 0.9162025 0.7786398 2.200235 1.444585 0.6465211 0
#> [2,] 1.2864127 0.6902347 1.1851578 1.0695601 1.511724 1.651325 0.9796877 0
#> [3,] 0.4182611 0.6992726 0.9145322 0.8737502 1.800689 1.916749 0.6120100 0
#> [4,] 0.1501288 0.7651687 1.5381799 0.7660656 1.345254 2.811414 1.1226888 0
#> FR IE IT LT LU LV MT
#> [1,] 1.1742063 0.4352756 0.9211011 2.694273 0.4490004 2.563162 0.8151641
#> [2,] 0.9090123 0.6468849 1.6447601 1.911084 0.3330436 2.193221 0.6741619
#> [3,] 0.8511925 0.6848898 1.1936469 1.376650 0.2882094 1.835249 0.8437681
#> [4,] 0.9063277 0.3888202 1.5160985 1.590870 0.1709514 1.750492 0.5050535
#> NL PT SI SK
#> [1,] 0.5916190 2.146662 1.094598 2.480780
#> [2,] 0.3640208 2.043251 1.005834 1.555401
#> [3,] 0.3846836 1.310083 1.125367 1.691687
#> [4,] 0.3358555 1.202321 1.219944 1.670450
#>
#> $res$difference_last_first
#> AT BE CY DE EE EL
#> -0.13994980 0.14393520 0.62197733 -0.01257420 -0.85498142 1.36682844
#> ES FI FR IE IT LT
#> 0.47616768 0.00000000 -0.26787853 -0.04645538 0.59499741 -1.10340261
#> LU LV MT NL PT SI
#> -0.27804899 -0.81267023 -0.31011057 -0.25576353 -0.94434166 0.12534523
#> SK
#> -0.81032991
#>
#> $res$strict_conv_ini_last
#> [1] FALSE
#>
#> $res$label_strict
#> [1] " "
#>
#> $res$converg_ini_last
#> [1] TRUE
#>
#> $res$label_conver
#> [1] "convergence"
#>
#> $res$diffe_delta
#> [1] -2.507256
#>
#>
#> $msg
#> NULL
#>
#> $err
#> NULLIn this contex, it is also useful to evaluate how much a collection of MS deviates from the EU mean for a given indicator and a period of time. In order to obtain this further information the demea_change function has been implemented in the convergEU package:
This function calculates the differences from the EU mean at subsequent time within each MS, and afterwards these negative/ positive differences are added over years within the MSs. To illustrate in details the demea_change function, let’s consider the data for the lifesatisf indicator. Suppose that we choose to calculate deviations from the EU mean for Italy, France, Germany and Spain (i.e. argument sele_countries), and by considering as starting time 2003 (i.e. the argument time_0) and as ending time 2016 (i.e. the argument time_t):
res1<-demea_change(lifesatFin,
timeName="time",
time_0 = 2003,
time_t = 2016,
sele_countries= c('IT','FR','DE','ES'),
doplot=TRUE)where the argument doplot is equal to TRUE for also obtaining the plot of the calculated differences. Note that if the argument sele_countries is equal to NA, then all countries in the dataset are considered. The output of the demea_change function consists of the usual three list components: “$res” that contains the results, “$msg” that possibly carries messages for the user and “$err” which is a string containing an error message, if an error occurs:
By considering more in details the component “$res”, it contains for example:
res1$res$resDiffe
#> # A tibble: 4 x 20
#> time AT BE CY DE EE EL ES FI FR IE
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2003 0.882 0.551 0.256 0.393 -1.03 -0.273 0.525 1.17 -0.00245 0.736
#> 2 2007 -0.147 0.449 -0.0454 0.0702 -0.372 -0.512 0.160 1.14 0.231 0.493
#> 3 2011 0.572 0.291 0.0760 0.117 -0.810 -0.926 0.379 0.991 0.139 0.306
#> 4 2016 0.890 0.275 -0.498 0.274 -0.305 -1.77 -0.0829 1.04 0.133 0.651
#> # … with 9 more variables: IT <dbl>, LT <dbl>, LU <dbl>, LV <dbl>, MT <dbl>,
#> # NL <dbl>, PT <dbl>, SI <dbl>, SK <dbl>res1$res$diffe_abs_diff
#> # A tibble: 3 x 20
#> time AT BE CY DE EE EL ES FI FR
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2007 -0.735 -0.101 -0.210 -0.323 -0.656 0.239 -0.365 -0.0320 0.228
#> 2 2011 0.426 -0.158 0.0306 0.0466 0.438 0.415 0.219 -0.149 -0.0913
#> 3 2016 0.317 -0.0167 0.422 0.157 -0.505 0.845 -0.296 0.0492 -0.00590
#> # … with 10 more variables: IE <dbl>, IT <dbl>, LT <dbl>, LU <dbl>, LV <dbl>,
#> # MT <dbl>, NL <dbl>, PT <dbl>, SI <dbl>, SK <dbl>res1$res$stats
#> # A tibble: 4 x 4
#> MS negaSum posiSum posi
#> <chr> <dbl> <dbl> <int>
#> 1 DE -0.323 0.204 4
#> 2 ES -0.661 0.219 7
#> 3 FR -0.0972 0.228 9
#> 4 IT -0.302 0.528 11and minimum and maximum of the calculated differences:
To plot the calculated differences, the user should invoke the plot function as follows:
The counvergEU package provides the function go_ms_fi to prepare country fiches. This function prepares an html file (which can then be saved as pdf) that shows several results for a given reference country, compared to one or more other countries of interest. More precisely, the results relate to several comparisons through the average (i.e. departures from the mean of the selected countries, total increase/ decrease in the gap with the EU mean, total gap with respect to the best country performer within each year) as well as patterns of convergence and divergence and scoreboards for the selected countries. By invoking the go_ms_fi, the html file can be created for different indicators, timeframes and countries. Note that, the go_ms_fi function automatically prepares the country fiches but it is also able to create a directory along an existing path and to copy the rmarkdown file representing the template within it. The Rmarkdown file is parameterized so that passing different parameters the compilation to HTML format takes place with different data, say different indicators and countries.
It is very important to prepare complete data in the required tidy format time by countries; that is, the dataset is made by a time variable and as many other variables as countries that enter into the calculation of the time average (see Section 1.1 for further information). For example, the lifesatFin dataset is in the required format time by countries:
lifesatFin
#> # A tibble: 4 x 20
#> time AT BE CY DE EE EL ES FI FR IE IT LT
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2003 7.85 7.52 7.22 7.36 5.94 6.69 7.49 8.14 6.96 7.70 7.22 5.44
#> 2 2007 6.95 7.54 7.05 7.16 6.72 6.58 7.25 8.23 7.32 7.59 6.59 6.32
#> 3 2011 7.66 7.38 7.16 7.20 6.28 6.16 7.47 8.08 7.23 7.39 6.88 6.70
#> 4 2016 7.92 7.31 6.54 7.31 6.73 5.26 6.95 8.07 7.17 7.68 6.56 6.48
#> # … with 7 more variables: LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PT <dbl>,
#> # SI <dbl>, SK <dbl>Below, a call to the function go_ms_fi() illustrates the syntax for the lifesatFin dataset:
go_ms_fi(
workDF ='lifesatFin',
countryRef ='DE',
otherCountries = "c('IT','UK','FR')",
time_0 = 2003,
time_t = 2016,
tName = 'time',
indiType = 'highBest',
aggregation= 'EU12',
x_angle= 45,
dataNow= Sys.time(),
author = 'A.Student',
outFile = 'Germany-up2-2016-LIFESATFIN',
outDir = 'F:/analysis/IT2018',
indiName= 'lifesatisf'
)
It is very important to emphasize some constraints and quite unusual ways to pass parameters to such a function. In fact, note that the first argument is the working dataset which is passed not as an R object but as a string, the name of the dataset that must be available in the R workspace before invoking go_ms_fi.
The second argument countryRef is a string with the short name of a member country that will be shown in one-country plots. One key country must be specified with some other countries of interest to better decorate graphs and compare performances. Less obvious, argument indiType = “lowBest” specifies if the considered indicator is built so that a low value is good for a country or if a high value is good (indiType = “highBest”). It must be noted that the argument lowBest should be used for indicators where the best possible situation is when the indicator has a low value, for instance poverty rate. The option highBest should be used when the best situation is determined by the indicator having a high value, for instance Employment rate as in the example above.
Of particular importance is the argument outFile that can be a string indicating the name of the output file. Similarly outDir is the path (unit and folders) in which the final compiled html will be stored. The sintax of the path depend on the operating system; for example outDir=‘F:/analysis/IT2018’ indicates that in the usb disk called ‘F’, within the folder ‘analysis’ is located folder ‘IT2018’ where R will write the country fiche. Note that a disk called ‘F’ must exist and also folder ‘analysis’ must exist in such unit, while on the contrary folder ‘IT2018’ is created by the function if it does not already exist.
Within the above mentioned output directory, besides the compiled html, it is also stored a file that has the name specified by the argument outFile but with added the string ‘-workspace.RData’ that contains data and plots produced during the compilation of the country fiche for further subsequent use in other technical reports.
NOTE: Internet connection should be available when invoking go_ms_fi function to properly rendering the results in the html file.
An auxiliary function go_indica_fi() is provided in the R package convergEU to produce an indicator fiche, where the output is an html file (which can then be saved as pdf). This means that the user chooses a given indicator, a time window and a set of countries, and the corresponding fiche is automatically produced by invoking the go_indica_fi() function. The html file produced in output reports measures of convergence as well as time series plots, box-plots, and patterns of convergence and divergence. Differently from the country fiches that require a reference country and a set of other countries for comparison, the indicators fiches contain the results of the analysis of convergence for a given indicator and a given cluster of countries.
When invoking the go_indica_fi() function, an output directory must be also specified. Note that some arguments are passed as strings instead of objects, as described in the last section above for the country fiches.
An example of syntax to invoke the procedure for the lifesatFin dataset is:
go_indica_fi(
time_0 = 2003,
time_t = 2016,
timeName = 'time',
workDF = 'lifesatFin' ,
indicaT = 'lifesatisf',
indiType = c('highBest','lowBest')[1],
seleMeasure = 'all',
seleAggre = 'EU19',
x_angle = 45,
data_res_download = FALSE,
auth = 'A.Student',
dataNow = Sys.time(),
outFile = 'test_IT-lifesatFin',
outDir = 'F:/analysis/EU19'
)
Note that the argument seleMeasure refers to the set of measures of convergence for which the analysis should be performed; if seleMeasure = ‘all’ as in the syntax above, then the results are produced for all four measures of convergence (Beta, Sigma, Delta and Gamma). The set of countries for which the analysis of convergence should be performed are specified in the argument seleAggre as a string; the default is the cluster of Member States EU27.
For advanced users: if the selection of countries is different from a standard aggregate (EU12, EU19 etc.), the argument “custom” can be used.
The option outDir specifies an existing unit and a number of existing folders, the path, in which the compiled fiche will be saved. Only the last folder may be created before compilation if it does not exist.
NOTE: Internet connection should be available when invoking go_indica_fi function to properly rendering the results in the html file. The fiches have been tested with the browsers Mozilla Firefox and Google Chrome.
A second example for locally accessible Eurofound data is reported below by considering the JQI_Intensity index indicator related to the working conditions dimension. Similarly to what described for the Mean Life Satisfaction indicator, the first step is the data preparation in order to subsequently perform the analysis of convergence.
To extract the data for the JQI_Intensity index indicator from the locally accessible Eurofound database, its label indication is necessary, and it is provided in the object dbEUF2018meta from the column “Code_in_database”:
print(dbEUF2018meta, n=200,width=200)
#> # A tibble: 13 x 10
#> DIMENSION SUBDIMENSION
#> <chr> <chr>
#> 1 Quality of life Life satisfaction
#> 2 Quality of life Health
#> 3 Quality of life Health
#> 4 Quality of life Quality of society
#> 5 Quality of life Quality of society
#> 6 Quality of life Quality of society
#> 7 Quality of life Quality of society
#> 8 Quality of life Quality of society
#> 9 Working conditions Working conditions
#> 10 Working conditions Working conditions
#> 11 Working conditions Working conditions
#> 12 Working conditions Working conditions
#> 13 Working conditions Working conditions
#> INDICATOR Code_in_database Official_code Unit Source_organisa…
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Mean life satis… lifesatisf y16_q4 -- Eurofound
#> 2 Mean health sta… health y16_q48 -- Eurofound
#> 3 Percentage of p… goodhealth_p y16_q48 % Eurofound
#> 4 Mean level of t… trustlocal y16_q35f -- Eurofound
#> 5 Level of involv… volunt y16_q29a -- Eurofound
#> 6 Percentage of p… volunt_p y16_q29a % Eurofound
#> 7 Hours per week … caring_h y16_q43a hours Eurofound
#> 8 Social Exclusio… socialexc_i y16_socexindex index Eurofound
#> 9 JQI_Skills and … JQIskill_i wq_slim - JQI SLIM… index Eurofound
#> 10 JQI_Physical en… JQIenviron_i envsec_slim - JQI S… index Eurofound
#> 11 JQI_Intensity i… JQIintensity_i intens_slim - JQI S… index Eurofound
#> 12 JQI_Working tim… JQItime_i wlb_slim - JQI SLIM… index Eurofound
#> 13 Exposition to d… exposdiscr_p disc_d - Having be… % Eurofound
#> Source_reference Disaggregation Bookmark_URL
#> <chr> <chr> <chr>
#> 1 EQLS sex https://www.eurofound.europa.eu/surveys/abou…
#> 2 EQLS sex https://www.eurofound.europa.eu/surveys/abou…
#> 3 EQLS sex https://www.eurofound.europa.eu/surveys/abou…
#> 4 EQLS sex https://www.eurofound.europa.eu/surveys/abou…
#> 5 EQLS sex https://www.eurofound.europa.eu/surveys/abou…
#> 6 EQLS sex https://www.eurofound.europa.eu/surveys/abou…
#> 7 EQLS sex https://www.eurofound.europa.eu/surveys/abou…
#> 8 EQLS sex https://www.eurofound.europa.eu/surveys/abou…
#> 9 EWCS sex https://www.eurofound.europa.eu/surveys/abou…
#> 10 EWCS sex https://www.eurofound.europa.eu/surveys/abou…
#> 11 EWCS sex https://www.eurofound.europa.eu/surveys/abou…
#> 12 EWCS sex https://www.eurofound.europa.eu/surveys/abou…
#> 13 EWCS sex https://www.eurofound.europa.eu/surveys/abou…The set of MS are chosen from those available by invoking the convergEU_glb():
names(convergEU_glb())
#> [1] "EUcodes" "Eurozone" "EA" "EU12"
#> [5] "EU15" "EU19" "EU25" "EU27_2007"
#> [9] "EU27_2019" "EU27_2020" "EU27" "EU28"
#> [13] "geoRefEUF" "metaEUStat" "tmpl_out" "paralintags"
#> [17] "rounDigits" "epsilonV" "scoreBoaTB" "labels_clusters"
convergEU_glb()$EU15$memberStates$MS
#> [1] "Belgium" "Denmark" "France" "Germany"
#> [5] "Greece" "Ireland" "Italy" "Luxembourg"
#> [9] "Netherlands" "Portugal" "Spain" "United-Kingdom"
#> [13] "Austria" "Finland" "Sweden"
convergEU_glb()$EU15$memberStates$codeMS
#> [1] "BE" "DK" "FR" "DE" "EL" "IE" "IT" "LU" "NL" "PT" "ES" "UK" "AT" "FI" "SE"where the EU15 cluster of MS is selected for this type of indicator. Lastly, a time interval is chosen, say from 1995 to 2015. Therefore, we proceed to extract the “JQIintensity_i” indicator for both males and females:
myTBjqt <- extract_indicator_EUF(
indicator_code = "JQIintensity_i", #Code_in_database
fromTime=1995,
toTime=2015,
gender= c("Total","Females","Males")[1],
countries= convergEU_glb()$EU15$memberStates$codeMS
)
myTBjqt
#> $res
#> # A tibble: 5 x 17
#> time sex AT BE DE DK EL ES FI FR IE IT LU
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1995 Total 48.8 33.2 40.8 39.0 40.9 34.2 47.1 38.4 39.0 34.1 31.4
#> 2 2000 Total 42.9 37.3 40.9 37.6 43.5 36.2 46.7 39.5 42.2 39.7 37.6
#> 3 2005 Total 47.6 42.8 46.9 47.9 50.5 41.2 49.6 40.5 36.9 41.9 40.6
#> 4 2010 Total 42.1 40.2 44.9 39.1 48.6 38.0 45.9 43.0 47.0 40.8 40.8
#> 5 2015 Total 42.4 41.5 40.2 45.0 49.3 46.5 41.1 42.7 42.8 38.1 42.4
#> # … with 4 more variables: NL <dbl>, PT <dbl>, SE <dbl>, UK <dbl>
#>
#> $msg
#> NULL
#>
#> $err
#> NULLwhich results in a dataset with the list components “$res”,“$msg” and“$err”.
For the purposes of better explaining other possible data formats into which the user could run into, let’s suppose that the values for the country Austria (“AT”) are completely missing, as highlighted in the following dataset myTBjqmiss:
myTBjqmiss<- mutate(myTBjqt$res, AT = as.numeric(NA))
myTBjqmiss
#> # A tibble: 5 x 17
#> time sex AT BE DE DK EL ES FI FR IE IT LU
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1995 Total NA 33.2 40.8 39.0 40.9 34.2 47.1 38.4 39.0 34.1 31.4
#> 2 2000 Total NA 37.3 40.9 37.6 43.5 36.2 46.7 39.5 42.2 39.7 37.6
#> 3 2005 Total NA 42.8 46.9 47.9 50.5 41.2 49.6 40.5 36.9 41.9 40.6
#> 4 2010 Total NA 40.2 44.9 39.1 48.6 38.0 45.9 43.0 47.0 40.8 40.8
#> 5 2015 Total NA 41.5 40.2 45.0 49.3 46.5 41.1 42.7 42.8 38.1 42.4
#> # … with 4 more variables: NL <dbl>, PT <dbl>, SE <dbl>, UK <dbl>Thus, if we proceed to check the data for the presence of unsuited features (missing values, qualitative variables, vectors of strings):
check_data(myTBjqmiss, timeName = "time")
#> $res
#> NULL
#>
#> $msg
#> NULL
#>
#> $err
#> [1] "Error: one or more missing values in the dataframe."it results in the error message due to the presence of missing values for Austria. Note that in the convergEU package, the function impute_dataset is specifically implemented for the imputation of missing values (see Section 2 and laters). Nevertheless, this function requires at least two non missing values for a given country. Thus, if all the values for a given country are missing, the imputation is not performed. Consequently, if the user proceeds to calculate the four measures of convergence, the following error message for each measure of convergence is returned:
#> [1] "Error: one or more missing values in the dataframe."
That is, the error message states that missing values are present in the dataset since the values of Austria are completely missing. There are some possible solutions even though in the literature it is still an open issue.
Lastly, it becomes clear that when the values for a given country are completely missing in the dataset, neither the country fiches nor the indicator fiches are computed.
Eurostat data are downloaded on the fly from an active Internet connection.
The heterogeneity in the structure of different indicators normalized into the database requires some attentions. A list of covariates for each indicator is sometimes present besides age and gender, thus their values must be set to produce a tidy dataset in the format time by countries. The indicator “Employment rate” is chosen to illustrate how to prepare downloadable data from the Eurostat repository for the analysis of convergence. The data for a specific indicator are downloaded from the Eurostat repository by invoking the function download_indicator_EUS:
It amounts to choose an indicator, a time interval, and a set of countries (MS). The function allows to also condition to gender, age and other variables according to the chosen indicator. It must be noted that when the argument rawDump is TRUE, raw data are downloaded which produce not a tidy dataset; thus, to obtain a tidy dataset in the format years by countries, the argument rawDump=F should be specified.
IMPORTANT: Users exploiting automated access to the database through web services or bulk download should pay particular attention to changes of labels. Further information about EU aggregates is available at “https://ec.europa.eu/eurostat/help/faq/brexit”.
In order to download the data for the “Employment rate” indicator, its indicator code is necessary. The metainformation on the indicator’s labelling is provided by invoking the convergEU_glb() function, as follows:
convergEU_glb()$metaEUStat
#> # A tibble: 56 x 6
#> Official_code Code_in_database indicator Official_code_p… subSelection
#> <chr> <chr> <chr> <chr> <chr>
#> 1 lfsa_argaed act_15-64_p Activity… lfsa_argaed <NA>
#> 2 lfsa_ergaed emp_20_64_p Employme… lfsa_ergaed <NA>
#> 3 lfsa_ergacob emp_nonat_p Employme… lfsa_ergacob <NA>
#> 4 lfsa_urgan unem_15_74_p Unemploy… lfsa_urgan <NA>
#> 5 une_educ_a unem_pri_p Unemploy… une_educ_a <NA>
#> 6 lfsa_upgan ltunemp_p Long-ter… lfsa_upgan <NA>
#> 7 edat_lfse_20 neet_p NEET's r… edat_lfse_20 <NA>
#> 8 une_rt_a youthunemp_p Youth un… une_rt_a <NA>
#> 9 ilc_lvhl36 temptoperm_p Transiti… ilc_lvhl36 <NA>
#> 10 ilc_lvhl30 parttofull_p Transiti… ilc_lvhl30 <NA>
#> # … with 46 more rows, and 1 more variable: selectorUser <chr>where both the columns selectorUser and Official_code_purified provides this information:
convergEU_glb()$metaEUStat$selectorUser
#> [1] "lfsa_argaed" "lfsa_ergaed" "lfsa_ergacob" "lfsa_urgan"
#> [5] "une_educ_a" "lfsa_upgan" "edat_lfse_20" "une_rt_a"
#> [9] "ilc_lvhl36" "ilc_lvhl30" "lfsa_egad" "lfsa_eppgai"
#> [13] "lfsi_emp_a" "lfsi_pt_a" "lfsa_egised" "earn_gr_gpgr2"
#> [17] "selfemp_p" "selfempwe_p" "lfsa_etgar" "prc_ppp_ind"
#> [21] "gov_10dd_edpt1" "nasa_10_ki" "gov_10a_main" "labourcost_i"
#> [25] "labourprod_i" "compemp_pps" "earn_nt_net" "spr_exp_sum"
#> [29] "expspr_p" "ilc_pnp3" "spr_exp_pens" "spr_pns_ben"
#> [33] "edat_lfse_14" "trng_lfse_01" "edat_lfse_03" "expedu_p"
#> [37] "hlth_silc_08" "exphlth_p" "isoc_sk_dskl_i" "isoc_bde15b_h"
#> [41] "isoc_ci_im_i" "isoc_ec_ibuy" "isoc_ci_it_en2" "rd_p_perslf"
#> [45] "rd_e_gerdtot" "isoc_ciegi_ac" "exppubserv_p" "lfsa_eegan2"
#> [49] "prc_hicp_aind" "ilc_di12" "ilc_di11" "lfst_r_lmder"
#> [53] "demo_mlexpec" "hlth_hlye" "y16_deprindex" "hlth_dm060"
convergEU_glb()$metaEUStat$Official_code_purified
#> [1] "lfsa_argaed" "lfsa_ergaed" "lfsa_ergacob" "lfsa_urgan"
#> [5] "une_educ_a" "lfsa_upgan" "edat_lfse_20" "une_rt_a"
#> [9] "ilc_lvhl36" "ilc_lvhl30" "lfsa_egad" "lfsa_eppgai"
#> [13] "lfsi_emp_a" "lfsi_pt_a" "lfsa_egised" "earn_gr_gpgr2"
#> [17] "lfsa_esgan" "lfsa_esgan" "lfsa_etgar" "prc_ppp_ind"
#> [21] "gov_10dd_edpt1" "nasa_10_ki" "gov_10a_main" "nama_10_lp_ulc"
#> [25] "nama_10_lp_ulc" "nama_10_lp_ulc" "earn_nt_net" "spr_exp_sum"
#> [29] "gov_10a_exp" "ilc_pnp3" "spr_exp_pens" "spr_pns_ben"
#> [33] "edat_lfse_14" "trng_lfse_01" "edat_lfse_03" "gov_10a_exp"
#> [37] "hlth_silc_08" "gov_10a_exp" "isoc_sk_dskl_i" "isoc_bde15b_h"
#> [41] "isoc_ci_im_i" "isoc_ec_ibuy" "isoc_ci_it_en2" "rd_p_perslf"
#> [45] "rd_e_gerdtot" "isoc_ciegi_ac" "gov_10a_exp" "lfsa_eegan2"
#> [49] "prc_hicp_aind" "ilc_di12" "ilc_di11" "lfst_r_lmder"
#> [53] "demo_mlexpec" "hlth_hlye" "y16_deprindex" "hlth_dm060"Similarly to the locally accessible Eurofound datasets (Section 1), for the data considered in this Section, the sets of countries (MS) are available in the convergEU_glb() function:
names(convergEU_glb())
#> [1] "EUcodes" "Eurozone" "EA" "EU12"
#> [5] "EU15" "EU19" "EU25" "EU27_2007"
#> [9] "EU27_2019" "EU27_2020" "EU27" "EU28"
#> [13] "geoRefEUF" "metaEUStat" "tmpl_out" "paralintags"
#> [17] "rounDigits" "epsilonV" "scoreBoaTB" "labels_clusters"where, for example, the cluster EU27_2020 of MS:
convergEU_glb()$EU27_2020
#> $dates
#> [1] "01/02/2020" "00/00/0000"
#>
#> $memberStates
#> # A tibble: 27 x 2
#> MS codeMS
#> <chr> <chr>
#> 1 Belgium BE
#> 2 Denmark DK
#> 3 France FR
#> 4 Germany DE
#> 5 Greece EL
#> 6 Ireland IE
#> 7 Italy IT
#> 8 Luxembourg LU
#> 9 Netherlands NL
#> 10 Portugal PT
#> # … with 17 more rowsThe set of countries for which a given indicator is downloaded should be specified in the argument countries of the function download_indicator_EUS. It is up to the user to take the lists of available sets of countries in the convergEU_glb() function, or, alternatively, to make its own list of countries names labels.
Once we have obtained the indicator code for the Employment rate indicator (e.g. lfsa_ergaed) and we have chosen the time interval (say from 2005 to 2015), and the set of countries (say the cluster EU27_2020 of MS), the data are downloaded from the Eurostat repository by invoking the download_indicator_EUS function as follows:
emTB <- download_indicator_EUS(
indicator_code= "lfsa_ergaed",
fromTime = 2005,
toTime = 2015,
countries = convergEU_glb()$EU27_2020$memberStates$codeMS,
rawDump=T)
emTBNote that raw downloaded data are returned since the argument rawDump is TRUE. To obtain filtered values, rawDump=F should be specified:
emprTB <- download_indicator_EUS(
indicator_code= "lfsa_ergaed",
fromTime = 2005,
toTime = 2015,
countries = convergEU_glb()$EU27_2020$memberStates$codeMS,
rawDump=F)
emprTBwhich returns the data in the format years by countries.
The result is a list with the following components:
It is therefore possible to call several times the same function and specify the argument uniqueIdentif as an integer among those in the first column left of $msg$Further_Conditioning$available_seleTagLs to obtain the same indicator under different scales and contexts. That is, the user could download the data and visualize the list in the uniqueIdentif argument by invoking:
Then, the user could choose the scales and contexts in which he/she is interested in.
For example, uniqueIdentif=3:
emprTB1 <- download_indicator_EUS(
indicator_code= "lfsa_ergaed",
fromTime = 2005,
toTime = 2015,
countries = convergEU_glb()$EU27_2020$memberStates$codeMS,
rawDump=F,uniqueIdentif=3)or uniqueIdentif=5:
emprTB2 <- download_indicator_EUS(
indicator_code= "lfsa_ergaed",
fromTime = 2005,
toTime = 2015,
countries = convergEU_glb()$EU27_2020$memberStates$codeMS,
rawDump=F,uniqueIdentif=5)It is also possible to download data only for females or only for males by explicitly specifying it in the argument gender, for example:
emprTB3 <- download_indicator_EUS(
indicator_code= "lfsa_ergaed",
fromTime = 2005,
toTime = 2015,
gender='F',
countries = convergEU_glb()$EU27_2020$memberStates$codeMS,
rawDump=F,uniqueIdentif=1)
emprTB4 <- download_indicator_EUS(
indicator_code= "lfsa_ergaed",
fromTime = 2005,
toTime = 2015,
gender="M",
countries = convergEU_glb()$EU27_2020$memberStates$codeMS,
rawDump=F,uniqueIdentif=1)The data could be also downloaded by considering a specific interval of ages; to this end, it is useful to visualize the available interval of ages for the emprTB data:
unique(emprTB$msg$Conditioning$ageInterv)
#> [1] Y15-19
#> 32 Levels: Y15-19 Y15-24 Y15-39 Y15-59 Y15-64 Y15-74 Y20-24 Y20-29 ... Y70-74For example, to download data by considering the age interval Y20-29 and time interval from 1995 to 2015:
emprTB5 <- download_indicator_EUS(
indicator_code= 'lfsa_ergaed',
fromTime = 1995,
toTime = 2015,
gender='T',
ageInterv = 'Y20-29',
countries = convergEU_glb()$EU27_2020$memberStates$codeMS,
rawDump=F,uniqueIdentif=1)or similarly for the age interval Y15-64:
emprTB6 <- download_indicator_EUS(
indicator_code= 'lfsa_ergaed',
fromTime = 2005,
toTime = 2015,
gender='T',
ageInterv = 'Y15-64',
countries = convergEU_glb()$EU27_2020$memberStates$codeMS,
rawDump=F,uniqueIdentif=1)
emprTB6Note that the ageInterv argument is specified as a string.
Following, it is also convenient to assign a meaningful name to the locally downloaded data for the Employment rate indicator. Within the available data above, we choose the emprTB5 one that relates to both males and females in the age interval “Y20-29”:
As already stated above, the analysis of convergence is performed on clean and imputed data without missing values: a tidy dataset in the format years by countries (see Section 1.1). In this subsection, we describe in detail how to perform imputation of missing values within the convergEU package. It must be emphasized that typical EU indicators are measured over a few years, thus \(10\%\) of missing values within a country may already represent a context that requires substantive reasoning before interpreting results after imputation. The user may or may not go ahead with the analysis depending on the considered context.
Note that as explained in the previous Section, the imputation of missing values in the convergEU package requires at least two non missing values for a given country. If the values for a given country are completely missing, the imputation is not covered.
The final downloaded dataset for the Employment rate indicator is empTBFin:
empTBFin
#> # A tibble: 21 x 30
#> age sex time BE DK FR DE EL IE IT LU NL PT
#> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Y20-… T 1995 55.5 57.6 53.9 56.5 61.6 56.6 53.8 70.2 NA 73.5
#> 2 Y20-… T 1996 55 67.5 51.1 54.9 61.6 56.3 52.3 69.2 65.7 73.9
#> 3 Y20-… T 1997 51.3 67.4 48.1 54.9 62.4 58.5 51.9 72.3 69.4 75
#> 4 Y20-… T 1998 53.8 70.8 51.4 NA 63.7 NA 53.9 NA 71.5 79.3
#> 5 Y20-… T 1999 56 69.7 51.3 61.7 63.4 65.9 54 70.5 71.6 76.8
#> 6 Y20-… T 2000 57.6 71.3 52.4 61.3 63.2 66.2 53.6 75.4 72.7 78.5
#> 7 Y20-… T 2001 50.1 70.9 52.5 59.3 63 66.3 54.6 66.6 75.8 80.1
#> 8 Y20-… T 2002 51.9 68.6 51.8 57.3 64.5 62.5 56.7 68.3 74.2 79.6
#> 9 Y20-… T 2003 50.4 64.2 54.5 55.2 66.6 61.4 56.6 62.5 71.2 76.8
#> 10 Y20-… T 2004 49.2 66.8 53.5 51.5 64.4 59.4 59.9 59.2 73.6 76.1
#> # … with 11 more rows, and 17 more variables: ES <dbl>, AT <dbl>, FI <dbl>,
#> # SE <dbl>, CY <dbl>, CZ <dbl>, EE <dbl>, HU <dbl>, LV <dbl>, LT <dbl>,
#> # MT <dbl>, PL <dbl>, SK <dbl>, SI <dbl>, BG <dbl>, RO <dbl>, HR <dbl>The inspection of the print output reveals that missing values are present. Otherwise, the check_data() function may be called to check for the presence of unsuited features that must be solved before starting the analysis (e.g. missing values, string variables).
check_data(empTBFin, timeName = "time")
#> $res
#> NULL
#>
#> $msg
#> NULL
#>
#> $err
#> [1] "Error: one or more missing values in the dataframe."which reveals the presence of missing values. Thus, imputation of the missing values should be performed prior to the analysis of convergence through the impute_dataset function:
The output of the impute_dataset is a list with the usual three components: “$res”, “$msg” and “$err”. It must be noted that for starting or final missing values, the impute_dataset function provides two options for imputation:
they could be completely deleted, i.e. “cut” option in the arguments tailMiss (missing values starting at the oldest year) and headMiss (missing values ending at the last year), or
they could be cloned, i.e. “constant” option in the arguments tailMiss and headMiss.
For all other missing values within the dataset, deterministic linear imputation is applied in order to obtain an imputed dataset. The examples below help to better explain the missing values imputation procedure with the options “cut” and “constant”. First, let’s impute the starting missing values for the empTBFin dataset with the option “constant”:
EmpImp <- impute_dataset(empTBFin, timeName = "time",
countries=convergEU_glb()$EU27_2020$memberStates$codeMS,
tailMiss = c("cut", "constant")[2],
headMiss = c("cut", "constant")[2])$res
EmpImp
#> # A tibble: 21 x 30
#> age sex time BE DK FR DE EL IE IT LU NL PT
#> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Y20-… T 1995 55.5 57.6 53.9 56.5 61.6 56.6 53.8 70.2 65.7 73.5
#> 2 Y20-… T 1996 55 67.5 51.1 54.9 61.6 56.3 52.3 69.2 65.7 73.9
#> 3 Y20-… T 1997 51.3 67.4 48.1 54.9 62.4 58.5 51.9 72.3 69.4 75
#> 4 Y20-… T 1998 53.8 70.8 51.4 58.3 63.7 62.2 53.9 71.4 71.5 79.3
#> 5 Y20-… T 1999 56 69.7 51.3 61.7 63.4 65.9 54 70.5 71.6 76.8
#> 6 Y20-… T 2000 57.6 71.3 52.4 61.3 63.2 66.2 53.6 75.4 72.7 78.5
#> 7 Y20-… T 2001 50.1 70.9 52.5 59.3 63 66.3 54.6 66.6 75.8 80.1
#> 8 Y20-… T 2002 51.9 68.6 51.8 57.3 64.5 62.5 56.7 68.3 74.2 79.6
#> 9 Y20-… T 2003 50.4 64.2 54.5 55.2 66.6 61.4 56.6 62.5 71.2 76.8
#> 10 Y20-… T 2004 49.2 66.8 53.5 51.5 64.4 59.4 59.9 59.2 73.6 76.1
#> # … with 11 more rows, and 17 more variables: ES <dbl>, AT <dbl>, FI <dbl>,
#> # SE <dbl>, CY <dbl>, CZ <dbl>, EE <dbl>, HU <dbl>, LV <dbl>, LT <dbl>,
#> # MT <dbl>, PL <dbl>, SK <dbl>, SI <dbl>, BG <dbl>, RO <dbl>, HR <dbl>As an example consider the MS Netherlands (“NL”) which has a starting missing value for the year 1995:
print(EmpImp,n=7,width=250)
#> # A tibble: 21 x 30
#> age sex time BE DK FR DE EL IE IT LU NL PT
#> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Y20-29 T 1995 55.5 57.6 53.9 56.5 61.6 56.6 53.8 70.2 65.7 73.5
#> 2 Y20-29 T 1996 55 67.5 51.1 54.9 61.6 56.3 52.3 69.2 65.7 73.9
#> 3 Y20-29 T 1997 51.3 67.4 48.1 54.9 62.4 58.5 51.9 72.3 69.4 75
#> 4 Y20-29 T 1998 53.8 70.8 51.4 58.3 63.7 62.2 53.9 71.4 71.5 79.3
#> 5 Y20-29 T 1999 56 69.7 51.3 61.7 63.4 65.9 54 70.5 71.6 76.8
#> 6 Y20-29 T 2000 57.6 71.3 52.4 61.3 63.2 66.2 53.6 75.4 72.7 78.5
#> 7 Y20-29 T 2001 50.1 70.9 52.5 59.3 63 66.3 54.6 66.6 75.8 80.1
#> ES AT FI SE CY CZ EE HU LV LT MT PL SK
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 53.4 68.6 44.6 53.1 74.6 45.2 49.4 42.8 53.3 53.9 72.2 41.5 24.5
#> 2 53.4 67 47.2 49.2 74.6 45.2 49.4 42.8 53.3 53.9 72.2 41.5 24.5
#> 3 57.3 62.3 44.3 49.8 74.6 45.2 49.4 42.8 53.3 53.9 72.2 41.5 24.5
#> 4 59.6 65.1 54.6 46.9 74.6 45.2 52.2 42.7 53.3 53.9 72.2 40.4 24.5
#> 5 65.2 65.1 57.9 50.8 74.6 40.1 51.7 43.6 54.5 50.9 72.2 32.7 22.3
#> 6 67.3 63.2 58.4 54.1 74.1 38.9 46.7 44 50 46 72.2 36 19
#> 7 68.4 61.5 56.3 60.2 77.5 41.6 53.7 45.1 55.5 44.3 73.6 35.6 19
#> SI BG RO HR
#> <dbl> <dbl> <dbl> <dbl>
#> 1 70 32.1 60.9 52.8
#> 2 70 32.1 60.9 52.8
#> 3 67.5 32.1 60.9 52.8
#> 4 62.3 32.1 61.3 52.8
#> 5 55.6 32.1 64.3 52.8
#> 6 54 32.1 65.1 52.8
#> 7 56.4 31.9 65.7 52.8
#> # … with 14 more rowsNote that in the print output above for the first seven years, the starting missing value for Netherlands and time 1995, is imputed equal to the first observed year (e.g. 1996). Similarly, the first two missing values for Romania (times 1995 and 1996) are imputed as equal to the year 1997 (its first observed year). Equivalently, for the other MS with starting missing values (“CY”, “CZ”,“BG” etc.).
Now, we are going to impute the missing values with the option “cut”:
EmpImp1 <- impute_dataset(empTBFin, timeName = "time",
countries=convergEU_glb()$EU27_2020$memberStates$codeMS,
tailMiss = c("cut", "constant")[1],
headMiss = c("cut", "constant")[1])$res
print(EmpImp1,n=35,width=250)
#> # A tibble: 14 x 30
#> age sex time BE DK FR DE EL IE IT LU NL
#> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Y20-29 T 2002 51.9 68.6 51.8 57.3 64.5 62.5 56.7 68.3 74.2
#> 2 Y20-29 T 2003 50.4 64.2 54.5 55.2 66.6 61.4 56.6 62.5 71.2
#> 3 Y20-29 T 2004 49.2 66.8 53.5 51.5 64.4 59.4 59.9 59.2 73.6
#> 4 Y20-29 T 2005 51.1 68 52.1 52.1 66.4 60 58.3 59.4 67.2
#> 5 Y20-29 T 2006 51 70 52.9 54.2 68.2 62.4 58.2 57.3 68.6
#> 6 Y20-29 T 2007 51.2 70.6 53.7 55.2 69.2 62.8 57.3 58.4 70.2
#> 7 Y20-29 T 2008 49.8 70.1 53.6 57.5 68.3 56.8 56.1 56.8 70.7
#> 8 Y20-29 T 2009 47.4 67.2 50.8 57 66.4 45.7 52.8 54.4 69
#> 9 Y20-29 T 2010 47.2 59.5 49.5 57.4 60.3 39.8 49 49.8 66.3
#> 10 Y20-29 T 2011 46.5 56.9 49.7 58.6 52.3 34.5 48.1 50 67.7
#> 11 Y20-29 T 2012 46.9 51.6 48.7 57 45.6 34.1 46.3 51.5 66.5
#> 12 Y20-29 T 2013 42.9 51.2 47.5 57.4 39.3 38.6 41.2 50 62.5
#> 13 Y20-29 T 2014 43.2 51.4 45.8 56.3 42.3 31.9 39.8 44 59.5
#> 14 Y20-29 T 2015 40.3 51.6 43 56.3 42.9 37.1 39.8 56.6 61.3
#> PT ES AT FI SE CY CZ EE HU LV LT MT PL
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 79.6 68.4 59.8 56.6 62.9 79.3 39.2 48.9 44.9 51.8 52.9 74.9 30.8
#> 2 76.8 69.1 57.1 53.7 57.6 75.4 35.2 38 43.5 52.1 54.2 69.6 30.6
#> 3 76.1 69.6 56.1 52.1 53.4 78 33 55.3 40.9 57.1 53.3 70.1 30.3
#> 4 75.4 70.6 56.1 55.5 53.8 77.1 32.1 50.9 40.2 55.7 57.5 66.9 32.6
#> 5 75.9 72.3 57 54.7 55.2 79.9 34.8 61.2 39.7 61.3 53.9 70.6 34.9
#> 6 75.1 71.9 62.4 58.5 58.7 78.4 38.5 60.1 37.8 61.7 53.9 71.7 39.6
#> 7 75.6 65.7 59.7 60.5 56.9 74.9 38 65.2 37.1 58.4 49.4 72.9 42.9
#> 8 71.1 54.6 55.9 52.4 49.8 74.3 34.5 47.9 32.1 40.9 38.8 68.3 42.1
#> 9 66.4 52.3 55.3 49.4 48 77.1 34.5 43.4 31.2 43.9 24.5 68.7 37
#> 10 64 49.5 59.3 46.9 49.1 73.1 31.6 53.8 30.8 45.3 28.3 69.9 35.9
#> 11 59.7 43.4 58 47.1 47.2 62 29.9 47.5 29.5 49.9 36.1 70.5 36.4
#> 12 55.1 41.4 54.5 43.9 44.1 58.7 36.4 55.5 31.7 49 36.1 72.1 34
#> 13 57.3 42.6 54.6 42.2 44.8 63.7 37 57.5 40.7 48.9 42 70.4 34.1
#> 14 60.6 44.2 56.4 38 46.3 56.9 31.6 54.5 43.3 55.7 41.1 67.8 36.2
#> SK SI BG RO HR
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 18.7 51.6 32.1 58.8 52.8
#> 2 19.9 51.1 32.2 57.9 48.1
#> 3 19.9 49.6 36.6 54.9 53.3
#> 4 15.1 48.5 33.1 53.4 51.8
#> 5 17.6 46.3 34.6 52.2 48.2
#> 6 19.5 54.8 39.1 51.8 47.9
#> 7 22.5 56.7 40.8 51.6 50.6
#> 8 19 50.7 36.1 49.2 42.7
#> 9 20 47.7 30.8 54.6 41.4
#> 10 23.2 44.3 25.8 50.6 44.7
#> 11 21.9 40.3 26 53 31.4
#> 12 24.2 38.6 26.4 53.4 26.6
#> 13 27.1 38.2 28.9 53.5 23.3
#> 14 30.7 43.6 29.7 54.7 28.7The print output shows how actually the option “cut” imputes missing values. More precisely, note that now all countries from 1995 to 2001 are deleted. This is due to the fact that Croatia has initial missing values from year 1995 to year 2001.
For all other missing values within the dataset, deterministic linear imputation is applied. Note also that if a country is processed but it has no missing, then no numerical value changes. In case more information is needed on years where missing values are located for a given country, say HR, the following simple code helps:
select(filter(empTBFin, is.na(HR)),time,HR)
#> # A tibble: 7 x 2
#> time HR
#> <dbl> <dbl>
#> 1 1995 NA
#> 2 1996 NA
#> 3 1997 NA
#> 4 1998 NA
#> 5 1999 NA
#> 6 2000 NA
#> 7 2001 NAor by using pipes:
empTBFin %>%
filter(is.na(HR)) %>%
select(time,HR)
#> # A tibble: 7 x 2
#> time HR
#> <dbl> <dbl>
#> 1 1995 NA
#> 2 1996 NA
#> 3 1997 NA
#> 4 1998 NA
#> 5 1999 NA
#> 6 2000 NA
#> 7 2001 NAOnce the missing values are imputed, the check_data function is called again on the imputed data:
check_data(EmpImp)
#> $res
#> NULL
#>
#> $msg
#> NULL
#>
#> $err
#> [1] "Error: qualitative variables in the dataframe."where the suspected string variables are age and sex; therefore, these two string variables are deleted as follows:
EmpImpF <- dplyr::select(EmpImp,-sex,-age)
check_data(EmpImpF)
#> $res
#> [1] TRUE
#>
#> $msg
#> NULL
#>
#> $err
#> NULLNow, the check_data function confirms that EmpImpF is the final object on which the analysis of convergence should be performed.
Once the dataset is clean and imputed, the analysis of convergence is performed through the functions discussed in the previous Section. Therefore, the four measures of convergence are calculated for the EmpImpF dataset as follows (details in the Section above):
require(ggplot2)
require(dplyr)
require(tibble)
EmpBC <- beta_conv(EmpImpF,
time_0 = 2005,
time_t = 2015,
all_within = FALSE,
timeName = "time")gamma_conv(EmpImpF,ref=2005,last=2015,timeName="time")
gamma_conv_msteps(EmpImpF,startTime=2005,endTime=2015, timeName = "time")and
Following, we report the syntax to obtain Country fiches for the EmpImpF dataset as follows:
go_ms_fi(
workDF ='EmpImpF',
countryRef ='DE',
otherCountries = "c('IT','ES','FR')",
time_0 = 2005,
time_t = 2015,
tName = 'time',
indiType = 'highBest',
aggregation= 'EU27_2020',
x_angle= 45,
dataNow= Sys.time(),
author = 'A.Student',
outFile = 'Germany-up2-2016-EMP',
outDir = 'F:/analysis/EU27_2020',
indiName= 'EmpImpF'
)
As noted previously, recall that the data are specified as a string, e.g. workDF =‘EmpImpF’. The country of reference is Germany (“DE”) compared with the other countries, “IT”, “ES” and “FR”, according to the highBest type of indicator.
For advanced users: if the selection of countries is different from a standard aggregate (EU12, EU19 etc.) the argument “custom” can be used.
Indicators fiches for the EmpImpF dataset are obtained through the following syntax:
go_indica_fi(
time_0 = 2005,
time_t = 2015,
timeName = 'time',
workDF = 'EmpImpF' ,
indicaT = 'lfsa_ergaed',
indiType = c('highBest','lowBest')[1],
seleMeasure = 'all',
seleAggre = 'EU27_2020',
x_angle = 45,
data_res_download = FALSE,
auth = 'A.Student',
dataNow = Sys.time(),
outFile = 'EmpImp',
outDir = 'C:/Users/UTENTE/Desktop/PROJECT EUROFOUND/FI'
)
where the indicator “lfsa_ergaed” is of type highBest, and all the measures are considered for the analysis of convergence.
In this Section, we describe in details the procedure to download data from the Eurostat web site (https://ec.europa.eu/eurostat/data/database). The function down_lo_EUS is specifically implemented to achieve this goal in the convergEU package. In what follows, the procedure is described taking as an example the indicator “hlth_silc_18” related to the Self-perceived health by sex, age and degree of urbanisation. For further details on this type of indicator see: https://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=hlth_silc_18&lang=en.
The data are downloaded from the Eurostat website (https://ec.europa.eu/eurostat/data/database) on the fly from an active Internet connection.
A list of covariates for each indicator is sometimes present besides age and gender, thus their values must be deleted to produce a tidy dataset in the format time by countries. A given indicator is downloaded from the Eurostat web site through the function down_lo_EUS:
IMPORTANT: Users exploiting automated access to the database through web services or bulk download should pay particular attention to changes of labels. Further information about EU aggregates is available at “https://ec.europa.eu/eurostat/help/faq/brexit”.
To download the data, a time interval should be also chosen. The function allows to also select gender, age and other variables depending on the specific indicator. It must be noted that when the argument rawDump is TRUE, raw data are downloaded which does not produce a tidy dataset; to obtain a tidy dataset in the format years by countries, the argument rawDump=F should be specified. Note also that all indicators are supported and, after downloading, the data are not filtered by country members (geo) and/or EU clusters. Therefore, it is up to the user to proceed with further filtering/selection so that the desired collection of countries is obtained.
In order to download the data for a given indicator, its indicator code is required, and it is available on the Eurostat web site next to each indicator’s name within brackets. For example, the Self-perceived health indicator code is hlth_silc_18, as illustrated in the following Figure by the red arrow.Figure 1: Indicators code on the Eurostat web site
We choose to download the data for this indicator from 2003 to 2018 by invoking the down_lo_EUS function:
myTBhea <- down_lo_EUS(
indicator_code = "hlth_silc_18",
fromTime = 2003,
toTime = 2018,
gender= "T",
ageInterv = NA,
rawDump=T,
uniqueIdentif = 1)
myTBhea
#> # A tibble: 720,216 x 8
#> deg_urb levels sex age unit geo time values
#> <fct> <fct> <fct> <fct> <fct> <fct> <dbl> <dbl>
#> 1 DEG1 BAD F Y16-24 PC AT 2018 2.9
#> 2 DEG1 BAD F Y16-24 PC BE 2018 1.1
#> 3 DEG1 BAD F Y16-24 PC BG 2018 0
#> 4 DEG1 BAD F Y16-24 PC CH 2018 1.1
#> 5 DEG1 BAD F Y16-24 PC CY 2018 0.2
#> 6 DEG1 BAD F Y16-24 PC CZ 2018 0
#> 7 DEG1 BAD F Y16-24 PC DE 2018 3.4
#> 8 DEG1 BAD F Y16-24 PC DK 2018 1.1
#> 9 DEG1 BAD F Y16-24 PC EA 2018 1.3
#> 10 DEG1 BAD F Y16-24 PC EA18 2018 1.3
#> # … with 720,206 more rowsNote that raw downloaded data are returned since the argument rawDump is TRUE. To obtain filtered values rawDump=F should be specified:
myTBheal <- down_lo_EUS(
indicator_code = "hlth_silc_18",
fromTime = 2003,
toTime = 2018,
gender= "T",
ageInterv = NA,
rawDump=F,
uniqueIdentif = 1)
myTBheal
#> $res
#> # A tibble: 16 x 43
#> age sex time AT BE BG CH CY CZ DE DK EA EA18
#> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Y16-… T 2003 1.7 1.6 NA NA NA NA NA 0 NA NA
#> 2 Y16-… T 2004 1.3 1.2 NA NA NA NA NA 1.6 NA NA
#> 3 Y16-… T 2005 1.1 1.7 1 NA 0.4 0.1 0.9 2.8 NA 1.1
#> 4 Y16-… T 2006 0.9 1.6 1 NA 0.4 0.3 1 1.3 NA 1
#> 5 Y16-… T 2007 0.7 1.8 1.1 1.6 0.4 0.9 1.2 1.6 NA 1.1
#> 6 Y16-… T 2008 2.5 1.7 0.8 1.1 0.2 1.1 0.8 1.9 1.1 1.1
#> 7 Y16-… T 2009 1.6 1.1 0.6 0.3 0 1.5 0.9 2.6 0.9 0.9
#> 8 Y16-… T 2010 2.4 1.3 0.8 0.3 0.2 0.4 1.4 2.9 1.2 1.2
#> 9 Y16-… T 2011 2.4 1.5 0.7 0.6 0.8 1.6 1 1.7 1.1 1.1
#> 10 Y16-… T 2012 2.3 1.8 0 0.5 1.2 0 1 1.8 1.1 1.1
#> 11 Y16-… T 2013 1.7 1.1 0.5 0.5 1 0 0.8 0.3 1.3 1.3
#> 12 Y16-… T 2014 2.7 2.4 0.9 1.3 0.7 1 0.7 1 1.1 1.1
#> 13 Y16-… T 2015 0.6 1.6 0.4 0.5 0.1 2.1 1.9 1.5 1.4 1.4
#> 14 Y16-… T 2016 0.9 2 0.7 1.6 0.6 1.3 1 0.8 1.3 1.3
#> 15 Y16-… T 2017 2.6 1.5 0.7 1.6 0.2 2.4 1 2.6 0.7 0.7
#> 16 Y16-… T 2018 1.7 1.3 0.1 0.5 0.2 0.5 2.6 0.7 1.3 1.3
#> # … with 30 more variables: EA19 <dbl>, EE <dbl>, EL <dbl>, ES <dbl>, EU <dbl>,
#> # EU27_2007 <dbl>, EU27_2020 <dbl>, EU28 <dbl>, FI <dbl>, FR <dbl>, HR <dbl>,
#> # HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>, LU <dbl>, LV <dbl>, MK <dbl>,
#> # MT <dbl>, NL <dbl>, NO <dbl>, PL <dbl>, PT <dbl>, RO <dbl>, RS <dbl>,
#> # SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>, IS <dbl>
#>
#> $msg
#> $msg$Age
#> [1] "Age automatically set."
#>
#> $msg$Further_Conditioning
#> $msg$Further_Conditioning$current
#> [1] "Selected uniqueIdentif = 1 -> deg_urb: DEG1; levels: BAD; unit: PC"
#>
#> $msg$Further_Conditioning$available_seleTagLs
#> uniqueIdentif tags
#> 1 1 deg_urb: DEG1; levels: BAD; unit: PC
#> 2 2 deg_urb: DEG1; levels: B_VB; unit: PC
#> 3 3 deg_urb: DEG1; levels: FAIR; unit: PC
#> 4 4 deg_urb: DEG1; levels: GOOD; unit: PC
#> 5 5 deg_urb: DEG1; levels: VBAD; unit: PC
#> 6 6 deg_urb: DEG1; levels: VGOOD; unit: PC
#> 7 7 deg_urb: DEG1; levels: VG_G; unit: PC
#> 8 8 deg_urb: DEG2; levels: BAD; unit: PC
#> 9 9 deg_urb: DEG2; levels: B_VB; unit: PC
#> 10 10 deg_urb: DEG2; levels: FAIR; unit: PC
#> 11 11 deg_urb: DEG2; levels: GOOD; unit: PC
#> 12 12 deg_urb: DEG2; levels: VBAD; unit: PC
#> 13 13 deg_urb: DEG2; levels: VGOOD; unit: PC
#> 14 14 deg_urb: DEG2; levels: VG_G; unit: PC
#> 15 15 deg_urb: DEG3; levels: BAD; unit: PC
#> 16 16 deg_urb: DEG3; levels: B_VB; unit: PC
#> 17 17 deg_urb: DEG3; levels: FAIR; unit: PC
#> 18 18 deg_urb: DEG3; levels: GOOD; unit: PC
#> 19 19 deg_urb: DEG3; levels: VBAD; unit: PC
#> 20 20 deg_urb: DEG3; levels: VGOOD; unit: PC
#> 21 21 deg_urb: DEG3; levels: VG_G; unit: PC
#> 22 22 deg_urb: TOTAL; levels: BAD; unit: PC
#> 23 23 deg_urb: TOTAL; levels: B_VB; unit: PC
#> 24 24 deg_urb: TOTAL; levels: FAIR; unit: PC
#> 25 25 deg_urb: TOTAL; levels: GOOD; unit: PC
#> 26 26 deg_urb: TOTAL; levels: VBAD; unit: PC
#> 27 27 deg_urb: TOTAL; levels: VGOOD; unit: PC
#> 28 28 deg_urb: TOTAL; levels: VG_G; unit: PC
#>
#>
#> $msg$Conditioning
#> $msg$Conditioning$indicator_code
#> [1] "hlth_silc_18"
#>
#> $msg$Conditioning$ageInterv
#> [1] Y16-24
#> 17 Levels: Y16-24 Y16-29 Y16-44 Y16-64 Y25-29 Y25-34 Y35-44 Y45-54 ... TOTAL
#>
#> $msg$Conditioning$gender
#> [1] "T"
#>
#>
#>
#> $err
#> NULLThe result is a list with the following components:
Similarly to the download_indicator_EUS function, it is therefore possible to call several times the same function and specify the argument uniqueIdentif as an integer among those in the first column left of $msg$Further_Conditioning$available_seleTagLs to obtain the same indicator under different scales and contexts. For example for degree 1 of urbanization, level good and for degree 3 of urbanization and level fair, respectively:
myTBheal1 <- down_lo_EUS(
indicator_code = "hlth_silc_18",
fromTime = 2003,
toTime = 2018,
gender= "T",
ageInterv = NA,
rawDump=F,
uniqueIdentif = 4)
myTBheal2 <- down_lo_EUS(
indicator_code = "hlth_silc_18",
fromTime = 2003,
toTime = 2018,
gender= "T",
ageInterv = NA,
rawDump=F,
uniqueIdentif = 17)It is also possible to condition to gender in order to download data only for females or only for males, for example:
myTBheal3 <- down_lo_EUS(
indicator_code = "hlth_silc_18",
fromTime = 2003,
toTime = 2018,
gender= "F",
ageInterv = NA,
rawDump=F,
uniqueIdentif = 1)
myTBheal4 <- down_lo_EUS(
indicator_code = "hlth_silc_18",
fromTime = 2003,
toTime = 2018,
gender= "M",
ageInterv = NA,
rawDump=F,
uniqueIdentif = 1)The user could also download data by considering explicitly a specific interval of ages; to visualize the available interval ages for the myTBheal data:
Thus, to download data, say in the age interval Y35-44 and time interval 2003-2018:
myTBheal5 <- down_lo_EUS(
indicator_code = 'hlth_silc_18',
fromTime = 2003,
toTime = 2018,
gender= 'T',
ageInterv = 'Y35-44',
rawDump=F,
uniqueIdentif = 1)
myTBheal5
#> $res
#> # A tibble: 16 x 43
#> age sex time AT BE BG CH CY CZ DE DK EA EA18
#> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Y35-… T 2003 3.1 4.5 NA NA NA NA NA 4 NA NA
#> 2 Y35-… T 2004 5.5 4.4 NA NA NA NA NA 5.3 NA NA
#> 3 Y35-… T 2005 4.2 4.9 4.7 NA 2 5.6 4.9 5.4 NA 4
#> 4 Y35-… T 2006 2.6 5.3 4.7 NA 2.7 4.4 4.4 3.9 NA 4.1
#> 5 Y35-… T 2007 4 5.2 3.1 3.1 2.8 4.6 3.8 6.9 NA 3.5
#> 6 Y35-… T 2008 4.6 5.5 4.1 2.9 2.3 3.4 2.8 6.2 3 3
#> 7 Y35-… T 2009 4.6 6.6 1.9 2.1 2.1 4 3.4 6.8 3.4 3.4
#> 8 Y35-… T 2010 3.4 5.8 1.6 2.4 1.6 2.7 3.6 2.8 3.2 3.2
#> 9 Y35-… T 2011 5.6 6.5 1.2 1.8 1.3 3.1 3.4 5.5 3.4 3.4
#> 10 Y35-… T 2012 7.3 6.8 0.9 1.6 1.8 3 4.5 4.3 3.7 3.7
#> 11 Y35-… T 2013 5.4 6.9 1 2.4 0.9 3.6 4.5 3.3 3.4 3.4
#> 12 Y35-… T 2014 5.7 6.9 1.6 2.3 2.5 2 5.7 5.5 3.5 3.5
#> 13 Y35-… T 2015 5.6 8.9 1.3 0.9 2.4 2 4.9 1.9 3.2 3.2
#> 14 Y35-… T 2016 2.9 7.9 1.5 6.3 1.3 2.7 4.3 3.6 3.1 3.1
#> 15 Y35-… T 2017 4.4 6.2 1.8 2.3 1.5 1.8 2.9 5.6 2.5 2.5
#> 16 Y35-… T 2018 4 5.4 2 4.2 2.1 1.9 3.5 2.5 2.9 2.9
#> # … with 30 more variables: EA19 <dbl>, EE <dbl>, EL <dbl>, ES <dbl>, EU <dbl>,
#> # EU27_2007 <dbl>, EU27_2020 <dbl>, EU28 <dbl>, FI <dbl>, FR <dbl>, HR <dbl>,
#> # HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>, LU <dbl>, LV <dbl>, MK <dbl>,
#> # MT <dbl>, NL <dbl>, NO <dbl>, PL <dbl>, PT <dbl>, RO <dbl>, RS <dbl>,
#> # SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>, IS <dbl>
#>
#> $msg
#> $msg$Further_Conditioning
#> $msg$Further_Conditioning$current
#> [1] "Selected uniqueIdentif = 1 -> deg_urb: DEG1; levels: BAD; unit: PC"
#>
#> $msg$Further_Conditioning$available_seleTagLs
#> uniqueIdentif tags
#> 1 1 deg_urb: DEG1; levels: BAD; unit: PC
#> 2 2 deg_urb: DEG1; levels: B_VB; unit: PC
#> 3 3 deg_urb: DEG1; levels: FAIR; unit: PC
#> 4 4 deg_urb: DEG1; levels: GOOD; unit: PC
#> 5 5 deg_urb: DEG1; levels: VBAD; unit: PC
#> 6 6 deg_urb: DEG1; levels: VGOOD; unit: PC
#> 7 7 deg_urb: DEG1; levels: VG_G; unit: PC
#> 8 8 deg_urb: DEG2; levels: BAD; unit: PC
#> 9 9 deg_urb: DEG2; levels: B_VB; unit: PC
#> 10 10 deg_urb: DEG2; levels: FAIR; unit: PC
#> 11 11 deg_urb: DEG2; levels: GOOD; unit: PC
#> 12 12 deg_urb: DEG2; levels: VBAD; unit: PC
#> 13 13 deg_urb: DEG2; levels: VGOOD; unit: PC
#> 14 14 deg_urb: DEG2; levels: VG_G; unit: PC
#> 15 15 deg_urb: DEG3; levels: BAD; unit: PC
#> 16 16 deg_urb: DEG3; levels: B_VB; unit: PC
#> 17 17 deg_urb: DEG3; levels: FAIR; unit: PC
#> 18 18 deg_urb: DEG3; levels: GOOD; unit: PC
#> 19 19 deg_urb: DEG3; levels: VBAD; unit: PC
#> 20 20 deg_urb: DEG3; levels: VGOOD; unit: PC
#> 21 21 deg_urb: DEG3; levels: VG_G; unit: PC
#> 22 22 deg_urb: TOTAL; levels: BAD; unit: PC
#> 23 23 deg_urb: TOTAL; levels: B_VB; unit: PC
#> 24 24 deg_urb: TOTAL; levels: FAIR; unit: PC
#> 25 25 deg_urb: TOTAL; levels: GOOD; unit: PC
#> 26 26 deg_urb: TOTAL; levels: VBAD; unit: PC
#> 27 27 deg_urb: TOTAL; levels: VGOOD; unit: PC
#> 28 28 deg_urb: TOTAL; levels: VG_G; unit: PC
#>
#>
#> $msg$Conditioning
#> $msg$Conditioning$indicator_code
#> [1] "hlth_silc_18"
#>
#> $msg$Conditioning$ageInterv
#> [1] "Y35-44"
#>
#> $msg$Conditioning$gender
#> [1] "T"
#>
#>
#>
#> $err
#> NULLor similarly for the age interval Y55-64:
myTBheal6 <- down_lo_EUS(
indicator_code = 'hlth_silc_18',
fromTime = 2003,
toTime = 2018,
gender= 'T',
ageInterv = 'Y55-64',
rawDump=F,
uniqueIdentif = 1)
myTBheal6
#> $res
#> # A tibble: 16 x 43
#> age sex time AT BE BG CH CY CZ DE DK EA EA18
#> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Y55-… T 2003 4.8 11.1 NA NA NA NA NA 9.9 NA NA
#> 2 Y55-… T 2004 9.2 9.2 NA NA NA NA NA 10 NA NA
#> 3 Y55-… T 2005 8.5 9.6 18.8 NA 8.7 13.8 13 10.9 NA 11.7
#> 4 Y55-… T 2006 7.8 10.3 18.8 NA 11 13.4 12.2 10.2 NA 11.3
#> 5 Y55-… T 2007 12.6 10.3 16 4.5 10.4 14.7 11 8.8 NA 11.2
#> 6 Y55-… T 2008 10 8.5 16.4 4.1 10.6 12.8 10.6 9.2 9.9 10
#> 7 Y55-… T 2009 10.1 10.1 13.2 4.4 12.5 11.8 10.9 4.6 9.6 9.6
#> 8 Y55-… T 2010 13.1 9.3 11.4 5.1 10.1 13.5 10 6.7 8.9 9
#> 9 Y55-… T 2011 12.4 9.6 10.2 4.2 8.5 11.9 9.3 7.9 9.3 9.3
#> 10 Y55-… T 2012 12.5 11.4 10.7 6.5 7.8 11.6 10.6 11.5 9.6 9.6
#> 11 Y55-… T 2013 10.7 11 9.2 6.1 10.7 12 9.8 11.4 9.5 9.6
#> 12 Y55-… T 2014 10 14.5 7.5 3.8 7.8 10.8 11.8 9.8 10 9.9
#> 13 Y55-… T 2015 14.4 12.2 8.1 7.3 6.2 9.9 12.6 10.9 10.3 10.2
#> 14 Y55-… T 2016 10.4 12 7.4 5.7 4.7 12.8 12.2 7.6 8.8 8.8
#> 15 Y55-… T 2017 11.1 10.2 8 5.7 4.6 13.5 9.6 9.9 7.5 7.5
#> 16 Y55-… T 2018 9.8 7.8 7.9 7.9 3.7 9.6 12 9.6 8.3 8.3
#> # … with 30 more variables: EA19 <dbl>, EE <dbl>, EL <dbl>, ES <dbl>, EU <dbl>,
#> # EU27_2007 <dbl>, EU27_2020 <dbl>, EU28 <dbl>, FI <dbl>, FR <dbl>, HR <dbl>,
#> # HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>, LU <dbl>, LV <dbl>, MK <dbl>,
#> # MT <dbl>, NL <dbl>, NO <dbl>, PL <dbl>, PT <dbl>, RO <dbl>, RS <dbl>,
#> # SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>, IS <dbl>
#>
#> $msg
#> $msg$Further_Conditioning
#> $msg$Further_Conditioning$current
#> [1] "Selected uniqueIdentif = 1 -> deg_urb: DEG1; levels: BAD; unit: PC"
#>
#> $msg$Further_Conditioning$available_seleTagLs
#> uniqueIdentif tags
#> 1 1 deg_urb: DEG1; levels: BAD; unit: PC
#> 2 2 deg_urb: DEG1; levels: B_VB; unit: PC
#> 3 3 deg_urb: DEG1; levels: FAIR; unit: PC
#> 4 4 deg_urb: DEG1; levels: GOOD; unit: PC
#> 5 5 deg_urb: DEG1; levels: VBAD; unit: PC
#> 6 6 deg_urb: DEG1; levels: VGOOD; unit: PC
#> 7 7 deg_urb: DEG1; levels: VG_G; unit: PC
#> 8 8 deg_urb: DEG2; levels: BAD; unit: PC
#> 9 9 deg_urb: DEG2; levels: B_VB; unit: PC
#> 10 10 deg_urb: DEG2; levels: FAIR; unit: PC
#> 11 11 deg_urb: DEG2; levels: GOOD; unit: PC
#> 12 12 deg_urb: DEG2; levels: VBAD; unit: PC
#> 13 13 deg_urb: DEG2; levels: VGOOD; unit: PC
#> 14 14 deg_urb: DEG2; levels: VG_G; unit: PC
#> 15 15 deg_urb: DEG3; levels: BAD; unit: PC
#> 16 16 deg_urb: DEG3; levels: B_VB; unit: PC
#> 17 17 deg_urb: DEG3; levels: FAIR; unit: PC
#> 18 18 deg_urb: DEG3; levels: GOOD; unit: PC
#> 19 19 deg_urb: DEG3; levels: VBAD; unit: PC
#> 20 20 deg_urb: DEG3; levels: VGOOD; unit: PC
#> 21 21 deg_urb: DEG3; levels: VG_G; unit: PC
#> 22 22 deg_urb: TOTAL; levels: BAD; unit: PC
#> 23 23 deg_urb: TOTAL; levels: B_VB; unit: PC
#> 24 24 deg_urb: TOTAL; levels: FAIR; unit: PC
#> 25 25 deg_urb: TOTAL; levels: GOOD; unit: PC
#> 26 26 deg_urb: TOTAL; levels: VBAD; unit: PC
#> 27 27 deg_urb: TOTAL; levels: VGOOD; unit: PC
#> 28 28 deg_urb: TOTAL; levels: VG_G; unit: PC
#>
#>
#> $msg$Conditioning
#> $msg$Conditioning$indicator_code
#> [1] "hlth_silc_18"
#>
#> $msg$Conditioning$ageInterv
#> [1] "Y55-64"
#>
#> $msg$Conditioning$gender
#> [1] "T"
#>
#>
#>
#> $err
#> NULLNote that the age interval is specified in the ageInterv argument as a string.
Once the data are downloaded, we assign them a meaningful name. Within the available data above, we choose myTBheal5 dataset that relates to both males and females between thirty five and forty four years old:
It is worthwhile to remember that the data are not filtered by country members or EU clusters. Therefore, some further data processing is needed if the user desire to obtain the data only for a given collection of countries. For example, to select only the cluster EU27_2020 of MS:
EU27estr<-convergEU_glb()$EU27_2020$memberStates$codeMS
TBhealth<- dplyr::select(TBheal,time,EU27estr)
TBhealth
#> # A tibble: 16 x 28
#> time BE DK FR DE EL IE IT LU NL PT ES AT
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2003 4.5 4 NA NA 1.6 2.2 NA 6.6 NA NA NA 3.1
#> 2 2004 4.4 5.3 3.9 NA 1.1 1.8 3.2 5.2 NA 5.5 3.5 5.5
#> 3 2005 4.9 5.4 4.4 4.9 1.6 2.8 2.6 3.5 3 7.1 3.5 4.2
#> 4 2006 5.3 3.9 4.9 4.4 2.3 0.9 3.1 3.2 4 5.2 4 2.6
#> 5 2007 5.2 6.9 3.1 3.8 1.6 3 3 1.9 3.4 5.8 3.5 4
#> 6 2008 5.5 6.2 3.4 2.8 2.2 2.8 2.6 5.5 3.2 3.5 2.7 4.6
#> 7 2009 6.6 6.8 3.4 3.4 1.4 4 2.7 6.6 3.5 4.8 3.2 4.6
#> 8 2010 5.8 2.8 3.8 3.6 1.1 2.6 2.2 4.3 2.9 5.3 3 3.4
#> 9 2011 6.5 5.5 4.3 3.4 1.9 1.9 3.4 5.3 2 5 2.3 5.6
#> 10 2012 6.8 4.3 4.9 4.5 1.8 2.9 3.1 1.6 3.9 3.2 2.3 7.3
#> 11 2013 6.9 3.3 4.1 4.5 1.4 0.6 3.2 1.9 3.3 5 1.7 5.4
#> 12 2014 6.9 5.5 3.7 5.7 2.4 2.5 3.3 4.7 3.5 2.9 1.8 5.7
#> 13 2015 8.9 1.9 2.9 4.9 2 1.7 3.2 3.4 2.8 4 1.6 5.6
#> 14 2016 7.9 3.6 5 4.3 2.4 0.9 1.8 3.8 1.9 3.3 1.2 2.9
#> 15 2017 6.2 5.6 3.6 2.9 1.8 1.6 1 3.7 2.8 3.5 1.4 4.4
#> 16 2018 5.4 2.5 4.3 3.5 2.2 3.2 1.2 3.1 2.3 3.4 1.8 4
#> # … with 15 more variables: FI <dbl>, SE <dbl>, CY <dbl>, CZ <dbl>, EE <dbl>,
#> # HU <dbl>, LV <dbl>, LT <dbl>, MT <dbl>, PL <dbl>, SK <dbl>, SI <dbl>,
#> # BG <dbl>, RO <dbl>, HR <dbl>or by using pipes:
TBheal %>%
as_tibble %>%
select(time,EU27estr)
#> # A tibble: 16 x 28
#> time BE DK FR DE EL IE IT LU NL PT ES AT
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2003 4.5 4 NA NA 1.6 2.2 NA 6.6 NA NA NA 3.1
#> 2 2004 4.4 5.3 3.9 NA 1.1 1.8 3.2 5.2 NA 5.5 3.5 5.5
#> 3 2005 4.9 5.4 4.4 4.9 1.6 2.8 2.6 3.5 3 7.1 3.5 4.2
#> 4 2006 5.3 3.9 4.9 4.4 2.3 0.9 3.1 3.2 4 5.2 4 2.6
#> 5 2007 5.2 6.9 3.1 3.8 1.6 3 3 1.9 3.4 5.8 3.5 4
#> 6 2008 5.5 6.2 3.4 2.8 2.2 2.8 2.6 5.5 3.2 3.5 2.7 4.6
#> 7 2009 6.6 6.8 3.4 3.4 1.4 4 2.7 6.6 3.5 4.8 3.2 4.6
#> 8 2010 5.8 2.8 3.8 3.6 1.1 2.6 2.2 4.3 2.9 5.3 3 3.4
#> 9 2011 6.5 5.5 4.3 3.4 1.9 1.9 3.4 5.3 2 5 2.3 5.6
#> 10 2012 6.8 4.3 4.9 4.5 1.8 2.9 3.1 1.6 3.9 3.2 2.3 7.3
#> 11 2013 6.9 3.3 4.1 4.5 1.4 0.6 3.2 1.9 3.3 5 1.7 5.4
#> 12 2014 6.9 5.5 3.7 5.7 2.4 2.5 3.3 4.7 3.5 2.9 1.8 5.7
#> 13 2015 8.9 1.9 2.9 4.9 2 1.7 3.2 3.4 2.8 4 1.6 5.6
#> 14 2016 7.9 3.6 5 4.3 2.4 0.9 1.8 3.8 1.9 3.3 1.2 2.9
#> 15 2017 6.2 5.6 3.6 2.9 1.8 1.6 1 3.7 2.8 3.5 1.4 4.4
#> 16 2018 5.4 2.5 4.3 3.5 2.2 3.2 1.2 3.1 2.3 3.4 1.8 4
#> # … with 15 more variables: FI <dbl>, SE <dbl>, CY <dbl>, CZ <dbl>, EE <dbl>,
#> # HU <dbl>, LV <dbl>, LT <dbl>, MT <dbl>, PL <dbl>, SK <dbl>, SI <dbl>,
#> # BG <dbl>, RO <dbl>, HR <dbl>Similarly, the user can select the cluster of EU27_2020 of MS and other countries available, say Russia for example:
TBhealthR<- select(TBheal,time,EU27estr,RS)
TBhealthR
#> # A tibble: 16 x 29
#> time BE DK FR DE EL IE IT LU NL PT ES AT
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2003 4.5 4 NA NA 1.6 2.2 NA 6.6 NA NA NA 3.1
#> 2 2004 4.4 5.3 3.9 NA 1.1 1.8 3.2 5.2 NA 5.5 3.5 5.5
#> 3 2005 4.9 5.4 4.4 4.9 1.6 2.8 2.6 3.5 3 7.1 3.5 4.2
#> 4 2006 5.3 3.9 4.9 4.4 2.3 0.9 3.1 3.2 4 5.2 4 2.6
#> 5 2007 5.2 6.9 3.1 3.8 1.6 3 3 1.9 3.4 5.8 3.5 4
#> 6 2008 5.5 6.2 3.4 2.8 2.2 2.8 2.6 5.5 3.2 3.5 2.7 4.6
#> 7 2009 6.6 6.8 3.4 3.4 1.4 4 2.7 6.6 3.5 4.8 3.2 4.6
#> 8 2010 5.8 2.8 3.8 3.6 1.1 2.6 2.2 4.3 2.9 5.3 3 3.4
#> 9 2011 6.5 5.5 4.3 3.4 1.9 1.9 3.4 5.3 2 5 2.3 5.6
#> 10 2012 6.8 4.3 4.9 4.5 1.8 2.9 3.1 1.6 3.9 3.2 2.3 7.3
#> 11 2013 6.9 3.3 4.1 4.5 1.4 0.6 3.2 1.9 3.3 5 1.7 5.4
#> 12 2014 6.9 5.5 3.7 5.7 2.4 2.5 3.3 4.7 3.5 2.9 1.8 5.7
#> 13 2015 8.9 1.9 2.9 4.9 2 1.7 3.2 3.4 2.8 4 1.6 5.6
#> 14 2016 7.9 3.6 5 4.3 2.4 0.9 1.8 3.8 1.9 3.3 1.2 2.9
#> 15 2017 6.2 5.6 3.6 2.9 1.8 1.6 1 3.7 2.8 3.5 1.4 4.4
#> 16 2018 5.4 2.5 4.3 3.5 2.2 3.2 1.2 3.1 2.3 3.4 1.8 4
#> # … with 16 more variables: FI <dbl>, SE <dbl>, CY <dbl>, CZ <dbl>, EE <dbl>,
#> # HU <dbl>, LV <dbl>, LT <dbl>, MT <dbl>, PL <dbl>, SK <dbl>, SI <dbl>,
#> # BG <dbl>, RO <dbl>, HR <dbl>, RS <dbl>and by using pipes:
TBheal %>%
as_tibble %>%
select(time,EU27estr,RS)
#> # A tibble: 16 x 29
#> time BE DK FR DE EL IE IT LU NL PT ES AT
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2003 4.5 4 NA NA 1.6 2.2 NA 6.6 NA NA NA 3.1
#> 2 2004 4.4 5.3 3.9 NA 1.1 1.8 3.2 5.2 NA 5.5 3.5 5.5
#> 3 2005 4.9 5.4 4.4 4.9 1.6 2.8 2.6 3.5 3 7.1 3.5 4.2
#> 4 2006 5.3 3.9 4.9 4.4 2.3 0.9 3.1 3.2 4 5.2 4 2.6
#> 5 2007 5.2 6.9 3.1 3.8 1.6 3 3 1.9 3.4 5.8 3.5 4
#> 6 2008 5.5 6.2 3.4 2.8 2.2 2.8 2.6 5.5 3.2 3.5 2.7 4.6
#> 7 2009 6.6 6.8 3.4 3.4 1.4 4 2.7 6.6 3.5 4.8 3.2 4.6
#> 8 2010 5.8 2.8 3.8 3.6 1.1 2.6 2.2 4.3 2.9 5.3 3 3.4
#> 9 2011 6.5 5.5 4.3 3.4 1.9 1.9 3.4 5.3 2 5 2.3 5.6
#> 10 2012 6.8 4.3 4.9 4.5 1.8 2.9 3.1 1.6 3.9 3.2 2.3 7.3
#> 11 2013 6.9 3.3 4.1 4.5 1.4 0.6 3.2 1.9 3.3 5 1.7 5.4
#> 12 2014 6.9 5.5 3.7 5.7 2.4 2.5 3.3 4.7 3.5 2.9 1.8 5.7
#> 13 2015 8.9 1.9 2.9 4.9 2 1.7 3.2 3.4 2.8 4 1.6 5.6
#> 14 2016 7.9 3.6 5 4.3 2.4 0.9 1.8 3.8 1.9 3.3 1.2 2.9
#> 15 2017 6.2 5.6 3.6 2.9 1.8 1.6 1 3.7 2.8 3.5 1.4 4.4
#> 16 2018 5.4 2.5 4.3 3.5 2.2 3.2 1.2 3.1 2.3 3.4 1.8 4
#> # … with 16 more variables: FI <dbl>, SE <dbl>, CY <dbl>, CZ <dbl>, EE <dbl>,
#> # HU <dbl>, LV <dbl>, LT <dbl>, MT <dbl>, PL <dbl>, SK <dbl>, SI <dbl>,
#> # BG <dbl>, RO <dbl>, HR <dbl>, RS <dbl>Prior to the analysis of convergence, we have to perform the imputation of missing values for the data previously downloaded from the Eurostat web site. The final downloaded dataset is TBhealth in which the countries are the cluster EU27_2020 of MS. The inspection of the print output in the previous Subsection reveals that missing values are present. Otherwise, we can invoke the check_data() function to check for the presence of unsuited features that must be solved before starting the analysis (e.g. missing values, string variables).
check_data(TBhealth, timeName = "time")
#> $res
#> NULL
#>
#> $msg
#> NULL
#>
#> $err
#> [1] "Error: one or more missing values in the dataframe."Thus, we proceed to impute the missing values through the impute_dataset function. Note that the print output of the TBhealth dataset reveals both missing values starting at the oldest year and missing values ending at the last year. Therefore, the arguments tailMiss and headMiss should be correctly specified according to how the user decide to impute the missing values. The example for the TBhealth dataset helps to better illustrate this point. Suppose that missing values starting at the oldest year should be propagated (i.e. option constant in the tailMiss argument), while those ending at the last year should be deleted (i.e. option cut in the headMiss argument), that is:
TBhealthF <- impute_dataset(TBhealth, timeName = "time",
countries=convergEU_glb()$EU27_2020$memberStates$codeMS,
tailMiss = c("cut", "constant")[2],
headMiss = c("cut", "constant")[1])$res
TBhealthF
#> # A tibble: 16 x 28
#> time BE DK FR DE EL IE IT LU NL PT ES AT
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2003 4.5 4 3.9 4.9 1.6 2.2 3.2 6.6 3 5.5 3.5 3.1
#> 2 2004 4.4 5.3 3.9 4.9 1.1 1.8 3.2 5.2 3 5.5 3.5 5.5
#> 3 2005 4.9 5.4 4.4 4.9 1.6 2.8 2.6 3.5 3 7.1 3.5 4.2
#> 4 2006 5.3 3.9 4.9 4.4 2.3 0.9 3.1 3.2 4 5.2 4 2.6
#> 5 2007 5.2 6.9 3.1 3.8 1.6 3 3 1.9 3.4 5.8 3.5 4
#> 6 2008 5.5 6.2 3.4 2.8 2.2 2.8 2.6 5.5 3.2 3.5 2.7 4.6
#> 7 2009 6.6 6.8 3.4 3.4 1.4 4 2.7 6.6 3.5 4.8 3.2 4.6
#> 8 2010 5.8 2.8 3.8 3.6 1.1 2.6 2.2 4.3 2.9 5.3 3 3.4
#> 9 2011 6.5 5.5 4.3 3.4 1.9 1.9 3.4 5.3 2 5 2.3 5.6
#> 10 2012 6.8 4.3 4.9 4.5 1.8 2.9 3.1 1.6 3.9 3.2 2.3 7.3
#> 11 2013 6.9 3.3 4.1 4.5 1.4 0.6 3.2 1.9 3.3 5 1.7 5.4
#> 12 2014 6.9 5.5 3.7 5.7 2.4 2.5 3.3 4.7 3.5 2.9 1.8 5.7
#> 13 2015 8.9 1.9 2.9 4.9 2 1.7 3.2 3.4 2.8 4 1.6 5.6
#> 14 2016 7.9 3.6 5 4.3 2.4 0.9 1.8 3.8 1.9 3.3 1.2 2.9
#> 15 2017 6.2 5.6 3.6 2.9 1.8 1.6 1 3.7 2.8 3.5 1.4 4.4
#> 16 2018 5.4 2.5 4.3 3.5 2.2 3.2 1.2 3.1 2.3 3.4 1.8 4
#> # … with 15 more variables: FI <dbl>, SE <dbl>, CY <dbl>, CZ <dbl>, EE <dbl>,
#> # HU <dbl>, LV <dbl>, LT <dbl>, MT <dbl>, PL <dbl>, SK <dbl>, SI <dbl>,
#> # BG <dbl>, RO <dbl>, HR <dbl>Thus, as the output reveals, missing values from 2003 to 2009 for the corresponding countries have been propagated, while the indicator’s values for 2018 have been deleted. Thus, the TBhealthF is the final imputed dataset on which the check_data function is called again:
and TBhealthF is the final object on which the analysis should be performed.
It may be of interest to assume that part of the variability observed in a country on a given index is not structural, i.e. not due to causal determinants by transient fluctuations. Furthermore, the interest here is not directed towards prediction but on smoothing values observed in the whole considered time interval. In such a case, a smoothing procedure remove sudden large changes showing a less variable time series than the original. Given that here short time series (panel data) are considered, a three points weighted average is proposed in the convergEU package and implemented in the smoo_dataset function:
It must be noted that the smoothing is performed on a dataset without missing values, e.g. on the imputed dataset. Note also that the time variable should be deleted when performing the smoothing with the smoo_dataset function. To better illustrate this point, let’s suppose that we want to perform smoothing for the TBhealthF dataset in which the missing values are already imputed. Furthermore, suppose that we choose the following three MS on which the smoothing is to be performed: Italy, France and Germany, coded “IT”, “FR” and “DE” respectively. Thus, we select the data with just these three countries and the time variable:
workTB <- dplyr::select(TBhealthF, time, IT,DE,FR)
check_data(workTB)
#> $res
#> [1] TRUE
#>
#> $msg
#> NULL
#>
#> $err
#> NULLThe check_data function confirms that the dataset is ready for performing smoothing. Thus, once checking is passed, we go with the smoothing step after deleting the time variable:
resSM <- smoo_dataset(select(workTB,-time), leadW = 0.15, timeTB= select(workTB,time))
resSM
#> # A tibble: 16 x 4
#> time IT DE FR
#> <dbl> <dbl> <dbl> <dbl>
#> 1 2003 3.2 4.9 3.9
#> 2 2004 2.94 4.9 4.11
#> 3 2005 3.07 4.69 4.4
#> 4 2006 2.84 4.36 3.92
#> 5 2007 2.87 3.63 3.99
#> 6 2008 2.81 3.48 3.27
#> 7 2009 2.44 3.23 3.57
#> 8 2010 2.92 3.43 3.84
#> 9 2011 2.76 3.95 4.34
#> 10 2012 3.27 4.03 4.30
#> 11 2013 3.20 5.01 4.27
#> 12 2014 3.22 4.85 3.53
#> 13 2015 2.65 4.98 4.13
#> 14 2016 2.06 3.96 3.51
#> 15 2017 1.42 3.75 4.49
#> 16 2018 1.03 2.99 3.70Note that, the first argument is the complete dataset workTB with just country columns while the third argument timeTB is the dataset that will be displayed in the output and in which the time variable has been added after performing smoothing. The argument leadw refers to a leading positive weight where \(0< w \leq 1\). The special case \(w=1\) corresponds to no smoothing. In case of missing values, an NA is returned. If the weight is outside the interval \((0,1]\), then an NA is returned. In our example, the smoothing is firstly performed with an weight equal to 0.15. To carry out comparison between the original indicator’s values and the smoothed ones, the following dataset is built:
compaTB <- select(bind_cols(workTB, select(resSM,-time)), time,IT,IT1,DE,DE1, FR, FR1)
compaTB
#> # A tibble: 16 x 7
#> time IT IT1 DE DE1 FR FR1
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2003 3.2 3.2 4.9 4.9 3.9 3.9
#> 2 2004 3.2 2.94 4.9 4.9 3.9 4.11
#> 3 2005 2.6 3.07 4.9 4.69 4.4 4.4
#> 4 2006 3.1 2.84 4.4 4.36 4.9 3.92
#> 5 2007 3 2.87 3.8 3.63 3.1 3.99
#> 6 2008 2.6 2.81 2.8 3.48 3.4 3.27
#> 7 2009 2.7 2.44 3.4 3.23 3.4 3.57
#> 8 2010 2.2 2.92 3.6 3.43 3.8 3.84
#> 9 2011 3.4 2.76 3.4 3.95 4.3 4.34
#> 10 2012 3.1 3.27 4.5 4.03 4.9 4.30
#> 11 2013 3.2 3.20 4.5 5.01 4.1 4.27
#> 12 2014 3.3 3.22 5.7 4.85 3.7 3.53
#> 13 2015 3.2 2.65 4.9 4.98 2.9 4.13
#> 14 2016 1.8 2.06 4.3 3.96 5 3.51
#> 15 2017 1 1.42 2.9 3.75 3.6 4.49
#> 16 2018 1.2 1.03 3.5 2.99 4.3 3.70Therefore, a graphical output shows changes for Italy (“IT”), Germany (“DE”) and France (“FR”) with original index in blue and smoothed index in red:
require(gridExtra)
ITq<-qplot(time,IT,data=compaTB) +
geom_line(colour="navyblue") +
geom_line(aes(x=time,y=IT1),colour="red") +
geom_point(aes(x=time,y=IT1),colour="red",shape=8)
DEq<-qplot(time,DE,data=compaTB) +
geom_line(colour="navyblue") +
geom_line(aes(x=time,y=DE1),colour="red") +
geom_point(aes(x=time,y=DE1),colour="red",shape=8)
FRq<-qplot(time,FR,data=compaTB) +
geom_line(colour="navyblue") +
geom_line(aes(x=time,y=FR1),colour="red") +
geom_point(aes(x=time,y=FR1),colour="red",shape=8)
grid.arrange(ITq, DEq, FRq, nrow=3)Note that higher is the weight fixed, less smoothed are the indicator’s values. For example, with \(w=0.5\):
require(gridExtra)
resSM2 <- smoo_dataset(select(workTB,-time), leadW = 0.5, timeTB= select(workTB,time))
compaTB2 <- select(bind_cols(workTB, select(resSM2,-time)), time,IT,IT1,DE,DE1, FR, FR1)
ITq2<-qplot(time,IT,data=compaTB2) +
geom_line(colour="navyblue") +
geom_line(aes(x=time,y=IT1),colour="red") +
geom_point(aes(x=time,y=IT1),colour="red",shape=8)
DEq2<-qplot(time,DE,data=compaTB2) +
geom_line(colour="navyblue") +
geom_line(aes(x=time,y=DE1),colour="red") +
geom_point(aes(x=time,y=DE1),colour="red",shape=8)
FRq2<-qplot(time,FR,data=compaTB2) +
geom_line(colour="navyblue") +
geom_line(aes(x=time,y=FR1),colour="red") +
geom_point(aes(x=time,y=FR1),colour="red",shape=8)
grid.arrange(ITq2, DEq2, FRq2, nrow=3)Lastly, note that a weight equal to 1 leaves data unchanged:
resSM3 <- smoo_dataset(select(workTB,-time),leadW=1,
timeTB= select(workTB,time))
compaTB3 <- select(bind_cols(workTB,
select(resSM3,-time)),
time,IT,IT1,DE,DE1, FR, FR1)
ITq3<-qplot(time,IT,data=compaTB3) +
geom_line(colour="navyblue",
position=position_jitter(width=0,height=0.07)) +
geom_line(aes(x=time,y=IT1),colour="red",
position=position_jitter(width=0,
height=0.07)) +
geom_point(aes(x=time,y=IT1),colour="red",
shape=8,position=position_jitter(width=0,
height=0.07))
DEq3<-qplot(time,DE,data=compaTB3) +
geom_line(colour="navyblue",
position=position_jitter(width=0,
height=0.07)) +
geom_line(aes(x=time,y=DE1),colour="red",
position=position_jitter(width=0,
height=0.07)) +
geom_point(aes(x=time,y=DE1),colour="red",shape=8,
position=position_jitter(width=0,
height=0.07))
FRq3<-qplot(time,FR,data=compaTB3) +
geom_line(colour="navyblue",
position=position_jitter(width=0,
height=0.07)) +
geom_line(aes(x=time,y=FR1),colour="red",
position=position_jitter(width=0,
height=0.07)) +
geom_point(aes(x=time,y=FR1),colour="red",
position=position_jitter(width=0,
height=0.07))
grid.arrange(ITq3, DEq3, FRq3, nrow=3)A time window larger than 3 could be considered, but deep thoughts are recommended about economic and social changes that may happen in EU during 5 consecutive years.
Several alternative smoothing algorithm are available in the R software. Classical moving average smoothers are described in the caTools package. See for example:
In the convergEU package, smoother based on moving average is implemented in the function ma_dataset:
where the time window of the moving average is set through the argument kappa. Following, we are going to compute smoother based on moving average for the TBhealthF dataset by considering the countries Italy (“IT”), Germany (“DE”) and France (“FR”). Thus, the time by countries dataset is:
We proceed by considering the following time windows of the moving average:
The output is a standard tidy dataset of smoothed values; for example the ma4 dataset with \(k=4\):
ma4
#> $res
#> # A tibble: 16 x 4
#> time IT DE FR
#> <dbl> <dbl> <dbl> <dbl>
#> 1 2003 3.2 4.9 3.9
#> 2 2004 3.03 4.78 4.28
#> 3 2005 2.98 4.5 4.08
#> 4 2006 2.82 3.98 3.95
#> 5 2007 2.85 3.6 3.7
#> 6 2008 2.62 3.4 3.42
#> 7 2009 2.72 3.3 3.72
#> 8 2010 2.85 3.72 4.1
#> 9 2011 2.98 4 4.28
#> 10 2012 3.25 4.53 4.25
#> 11 2013 3.2 4.9 3.9
#> 12 2014 2.88 4.85 3.92
#> 13 2015 2.33 4.45 3.8
#> 14 2016 1.8 3.9 3.95
#> 15 2017 1 2.9 3.6
#> 16 2018 1.2 3.5 4.3
#>
#> $msg
#> NULL
#>
#> $err
#> NULLThe Figure below shows results for Italy, Germany and France for different degrees of smoothing: original (black), \(k=3\) (red), \(k=4\) (blue), \(k=5\) (green):
myGit <- ggplot(cuTB1,aes(x=time,y=cuTB1$IT))+geom_line()+geom_point()+
geom_line(aes(x=time,y=ma3$res$IT),colour="red")+
geom_point(aes(x=time,y=ma3$res$IT),colour="red")+
labs(y="IT")+
#
geom_line(aes(x=time,y=ma4$res$IT),colour="blue")+
geom_point(aes(x=time,y=ma4$res$IT),colour="blue")+
#
geom_line(aes(x=time,y=ma5$res$IT),colour="green")+
geom_point(aes(x=time,y=ma5$res$IT),colour="green")
myGde <- ggplot(cuTB1,aes(x=time,y=cuTB1$DE))+geom_line()+geom_point()+
geom_line(aes(x=time,y=ma3$res$DE),colour="red")+
geom_point(aes(x=time,y=ma3$res$DE),colour="red")+
labs(y="DE")+
#
geom_line(aes(x=time,y=ma4$res$DE),colour="blue")+
geom_point(aes(x=time,y=ma4$res$DE),colour="blue")+
#
geom_line(aes(x=time,y=ma5$res$DE),colour="green")+
geom_point(aes(x=time,y=ma5$res$DE),colour="green")
myGfr <- ggplot(cuTB1,aes(x=time,y=cuTB1$FR))+geom_line()+geom_point()+
geom_line(aes(x=time,y=ma3$res$FR),colour="red")+
geom_point(aes(x=time,y=ma3$res$FR),colour="red")+
labs(y="FR")+
#
geom_line(aes(x=time,y=ma4$res$FR),colour="blue")+
geom_point(aes(x=time,y=ma4$res$FR),colour="blue")+
#
geom_line(aes(x=time,y=ma5$res$FR),colour="green")+
geom_point(aes(x=time,y=ma5$res$FR),colour="green")
grid.arrange(myGit, myGde, myGfr, nrow=3)It must be noted that typically the time series is so short (for example with 15 observations) that at \(k=7\) a lot of observations are smoothed with different number of observations (shorter at start and end). Therefore, in the considered context it is typical to perform the smoothing for a maximum of 5 consecutive years.
It often happens that the user could have data stored in some external source files. For example, usually the data are stored within Excel files. In this Section we illustrate in details how to import and prepare for analysis of convergence data stored in several types of Excel files. More precisely, in the following Subsection, we describe how to import and process data stored in two types of Excel file:
In Subsection 4.2 instead we describe how to import and process Excel files downloaded directly from the Eurostat web site.
First, let’s suppose that we have data for a given indicator, say for example the demo_mlexpec (Life expectancy by age and sex) indicator stored locally in an .xlsx or .xls Excel files. To this end, in the R package readxl the function read_excel is implemented that allows to import data stored in both .xls and .xlsx Excel files:
To import the data from the .xlsx file for the indicator demo_mlexpec, we proceed as follows:
print(myExc1,n=35,width=250)
#> # A tibble: 16 x 29
#> time BE DK FR DE EL IE IT LU NL PT ES UK
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2001 77.5 76.4 78.6 77.9 78.5 76.6 79.6 77.5 77.8 76.6 79.1 77.7
#> 2 2002 77.5 76.4 78.7 77.9 78.6 77.1 79.7 77.5 77.8 76.8 79.2 77.7
#> 3 2003 77.6 76.8 78.6 78 78.6 77.6 79.5 77.2 78.1 76.8 79 77.8
#> 4 2004 78.3 77.2 79.7 78.6 78.7 78 80.2 78.6 78.7 77.7 79.7 78.4
#> 5 2005 78.4 77.6 79.7 78.7 78.9 78.3 80.2 78.8 79 77.5 79.6 78.6
#> 6 2006 78.8 77.7 80.3 79.2 79.2 78.6 80.7 78.6 79.3 78.3 80.4 78.9
#> 7 2007 79.2 77.7 80.6 79.4 79 79 80.8 78.7 79.7 78.5 80.4 79.1
#> 8 2008 79.1 78.1 80.7 79.5 79.5 79.5 80.9 79.8 79.8 78.8 80.8 79.2
#> 9 2009 79.4 78.3 80.9 79.6 79.6 79.5 81.1 80 80.2 79 81.2 79.8
#> 10 2010 79.6 78.6 81.1 79.8 79.9 80.1 81.5 80.1 80.3 79.3 81.6 80
#> 11 2011 79.9 79.1 81.6 79.9 80 80.1 81.6 80.4 80.6 79.9 81.8 80.4
#> 12 2012 79.8 79.4 81.4 80 79.9 80.2 81.6 80.7 80.5 79.8 81.8 80.3
#> 13 2013 80 79.6 81.7 79.9 80.7 80.3 82.1 81.2 80.7 80.1 82.4 80.4
#> 14 2014 80.6 80.1 82.2 80.4 80.8 80.7 82.5 81.5 81.1 80.6 82.5 80.7
#> 15 2015 80.4 80.1 81.8 80 80.4 80.8 81.9 81.6 80.9 80.5 82.2 80.3
#> 16 2016 80.8 80.2 82 80.3 80.8 81 82.6 82 81 80.6 82.7 80.6
#> AT FI SE CY CZ EE HU LV LT MT PL SK SI
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 78.2 77.5 79.2 78.4 74.6 70.5 72.1 NA 71.2 78.2 73.8 73.1 75.7
#> 2 78.2 77.6 79.3 78 74.7 70.8 72.1 69.9 71.3 78.3 74.1 73.3 75.9
#> 3 78.1 77.8 79.5 78.3 74.6 71.4 72.1 70.3 71.5 78.1 74.2 73.4 75.7
#> 4 78.7 78.2 79.9 78.4 75.2 71.9 72.5 70.6 71.5 78.8 74.4 73.7 76.5
#> 5 78.9 78.4 79.9 78 75.4 72.4 72.4 70.2 70.7 78.9 74.5 73.7 76.8
#> 6 79.4 78.8 80.2 79.4 76 72.5 72.9 70.1 70.5 78.8 74.8 73.9 77.5
#> 7 79.6 78.8 80.3 79.1 76.3 72.6 73 70.4 70.2 79.5 74.8 74 77.6
#> 8 79.9 79.1 80.5 79.9 76.5 73.7 73.6 71.6 71.1 79.4 75.1 74.4 78.3
#> 9 79.8 79.3 80.7 80.2 76.6 74.6 73.8 72.3 72.4 79.8 75.3 74.8 78.5
#> 10 80.1 79.4 80.9 80.8 76.9 75.3 74.1 72.5 72.6 80.9 75.8 75 79
#> 11 80.4 79.8 81 80.5 77.2 75.8 74.5 73.4 73.1 80.4 76.2 75.5 79.4
#> 12 80.3 79.9 81 80.4 77.4 76 74.6 73.6 73.4 80.3 76.2 75.7 79.4
#> 13 80.5 80.3 81.2 81.7 77.5 76.7 75.2 73.7 73.4 81.4 76.5 76 79.7
#> 14 80.9 80.5 81.5 81.5 78.1 76.6 75.3 73.7 74 81.5 77.1 76.4 80.4
#> 15 80.6 80.7 81.4 81 77.9 77.2 75 74.1 73.9 81.5 76.9 76.1 80
#> 16 81 80.6 81.6 81.9 78.4 77.2 75.5 74.1 74.3 82.2 77.3 76.7 80.4
#> BG RO HR
#> <dbl> <dbl> <dbl>
#> 1 71.9 71.4 74.1
#> 2 72.1 71.1 74.2
#> 3 72.2 71.2 74
#> 4 72.4 71.6 74.9
#> 5 72.3 72 74.7
#> 6 72.5 72.5 75.3
#> 7 72.7 73 75.2
#> 8 73 73.4 75.3
#> 9 73.4 73.5 75.7
#> 10 73.5 73.5 76.1
#> 11 73.9 74.2 76.5
#> 12 74 74.2 76.6
#> 13 74.5 74.8 77.1
#> 14 74.1 74.6 77.3
#> 15 74.1 74.5 76.8
#> 16 74.4 74.9 77.5where “excel1.xlsx” is the path (unit and folders) in which the Excel file is stored. The sheet argument specifies the sheet name in which the data are stored, in our example named “Data”; note that if the sheet is not specified, the default is to read in the first available one.
Note that by default the read_excel function treats blank cells as missing values. Thus, if missing values are marked in any other way different from blank cells, the argument na should be specified accordingly. For example, suppose that the “excel1.xlsx” file (sheet “Datamiss”) contains missing values marked by “.” instead of blank cells. Thus, to correctly import the data, we specify accordingly the na argument as follows:
print(myExc1,n=35,width=250)
#> # A tibble: 16 x 29
#> time BE DK FR DE EL IE IT LU NL PT ES UK
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2001 77.5 76.4 78.6 77.9 78.5 76.6 79.6 77.5 77.8 76.6 79.1 77.7
#> 2 2002 77.5 76.4 78.7 77.9 78.6 77.1 79.7 77.5 77.8 76.8 79.2 77.7
#> 3 2003 77.6 76.8 78.6 78 78.6 77.6 79.5 77.2 78.1 76.8 79 77.8
#> 4 2004 78.3 77.2 79.7 78.6 78.7 78 80.2 78.6 78.7 77.7 79.7 78.4
#> 5 2005 78.4 77.6 79.7 78.7 78.9 78.3 80.2 78.8 79 77.5 79.6 78.6
#> 6 2006 78.8 77.7 80.3 79.2 79.2 78.6 80.7 78.6 79.3 78.3 80.4 78.9
#> 7 2007 79.2 77.7 80.6 79.4 79 79 80.8 78.7 79.7 78.5 80.4 79.1
#> 8 2008 79.1 78.1 80.7 79.5 79.5 79.5 80.9 79.8 79.8 78.8 80.8 79.2
#> 9 2009 79.4 78.3 80.9 79.6 79.6 79.5 81.1 80 80.2 79 81.2 79.8
#> 10 2010 79.6 78.6 81.1 79.8 79.9 80.1 81.5 80.1 80.3 79.3 81.6 80
#> 11 2011 79.9 79.1 81.6 79.9 80 80.1 81.6 80.4 80.6 79.9 81.8 80.4
#> 12 2012 79.8 79.4 81.4 80 79.9 80.2 81.6 80.7 80.5 79.8 81.8 80.3
#> 13 2013 80 79.6 81.7 79.9 80.7 80.3 82.1 81.2 80.7 80.1 82.4 80.4
#> 14 2014 80.6 80.1 82.2 80.4 80.8 80.7 82.5 81.5 81.1 80.6 82.5 80.7
#> 15 2015 80.4 80.1 81.8 80 80.4 80.8 81.9 81.6 80.9 80.5 82.2 80.3
#> 16 2016 80.8 80.2 82 80.3 80.8 81 82.6 82 81 80.6 82.7 80.6
#> AT FI SE CY CZ EE HU LV LT MT PL SK SI
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 78.2 77.5 79.2 78.4 74.6 70.5 72.1 NA 71.2 78.2 73.8 73.1 75.7
#> 2 78.2 77.6 79.3 78 74.7 70.8 72.1 69.9 71.3 78.3 74.1 73.3 75.9
#> 3 78.1 77.8 79.5 78.3 74.6 71.4 72.1 70.3 71.5 78.1 74.2 73.4 75.7
#> 4 78.7 78.2 79.9 78.4 75.2 71.9 72.5 70.6 71.5 78.8 74.4 73.7 76.5
#> 5 78.9 78.4 79.9 78 75.4 72.4 72.4 70.2 70.7 78.9 74.5 73.7 76.8
#> 6 79.4 78.8 80.2 79.4 76 72.5 72.9 70.1 70.5 78.8 74.8 73.9 77.5
#> 7 79.6 78.8 80.3 79.1 76.3 72.6 73 70.4 70.2 79.5 74.8 74 77.6
#> 8 79.9 79.1 80.5 79.9 76.5 73.7 73.6 71.6 71.1 79.4 75.1 74.4 78.3
#> 9 79.8 79.3 80.7 80.2 76.6 74.6 73.8 72.3 72.4 79.8 75.3 74.8 78.5
#> 10 80.1 79.4 80.9 80.8 76.9 75.3 74.1 72.5 72.6 80.9 75.8 75 79
#> 11 80.4 79.8 81 80.5 77.2 75.8 74.5 73.4 73.1 80.4 76.2 75.5 79.4
#> 12 80.3 79.9 81 80.4 77.4 76 74.6 73.6 73.4 80.3 76.2 75.7 79.4
#> 13 80.5 80.3 81.2 81.7 77.5 76.7 75.2 73.7 73.4 81.4 76.5 76 79.7
#> 14 80.9 80.5 81.5 81.5 78.1 76.6 75.3 73.7 74 81.5 77.1 76.4 80.4
#> 15 80.6 80.7 81.4 81 77.9 77.2 75 74.1 73.9 81.5 76.9 76.1 80
#> 16 81 80.6 81.6 81.9 78.4 77.2 75.5 74.1 74.3 82.2 77.3 76.7 80.4
#> BG RO HR
#> <dbl> <dbl> <dbl>
#> 1 71.9 71.4 74.1
#> 2 72.1 71.1 74.2
#> 3 72.2 71.2 74
#> 4 72.4 71.6 74.9
#> 5 72.3 72 74.7
#> 6 72.5 72.5 75.3
#> 7 72.7 73 75.2
#> 8 73 73.4 75.3
#> 9 73.4 73.5 75.7
#> 10 73.5 73.5 76.1
#> 11 73.9 74.2 76.5
#> 12 74 74.2 76.6
#> 13 74.5 74.8 77.1
#> 14 74.1 74.6 77.3
#> 15 74.1 74.5 76.8
#> 16 74.4 74.9 77.5The dataset myExc1 contains the data imported from the .xlsx file:
print(myExc1,n=35,width=250)
#> # A tibble: 16 x 29
#> time BE DK FR DE EL IE IT LU NL PT ES UK
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2001 77.5 76.4 78.6 77.9 78.5 76.6 79.6 77.5 77.8 76.6 79.1 77.7
#> 2 2002 77.5 76.4 78.7 77.9 78.6 77.1 79.7 77.5 77.8 76.8 79.2 77.7
#> 3 2003 77.6 76.8 78.6 78 78.6 77.6 79.5 77.2 78.1 76.8 79 77.8
#> 4 2004 78.3 77.2 79.7 78.6 78.7 78 80.2 78.6 78.7 77.7 79.7 78.4
#> 5 2005 78.4 77.6 79.7 78.7 78.9 78.3 80.2 78.8 79 77.5 79.6 78.6
#> 6 2006 78.8 77.7 80.3 79.2 79.2 78.6 80.7 78.6 79.3 78.3 80.4 78.9
#> 7 2007 79.2 77.7 80.6 79.4 79 79 80.8 78.7 79.7 78.5 80.4 79.1
#> 8 2008 79.1 78.1 80.7 79.5 79.5 79.5 80.9 79.8 79.8 78.8 80.8 79.2
#> 9 2009 79.4 78.3 80.9 79.6 79.6 79.5 81.1 80 80.2 79 81.2 79.8
#> 10 2010 79.6 78.6 81.1 79.8 79.9 80.1 81.5 80.1 80.3 79.3 81.6 80
#> 11 2011 79.9 79.1 81.6 79.9 80 80.1 81.6 80.4 80.6 79.9 81.8 80.4
#> 12 2012 79.8 79.4 81.4 80 79.9 80.2 81.6 80.7 80.5 79.8 81.8 80.3
#> 13 2013 80 79.6 81.7 79.9 80.7 80.3 82.1 81.2 80.7 80.1 82.4 80.4
#> 14 2014 80.6 80.1 82.2 80.4 80.8 80.7 82.5 81.5 81.1 80.6 82.5 80.7
#> 15 2015 80.4 80.1 81.8 80 80.4 80.8 81.9 81.6 80.9 80.5 82.2 80.3
#> 16 2016 80.8 80.2 82 80.3 80.8 81 82.6 82 81 80.6 82.7 80.6
#> AT FI SE CY CZ EE HU LV LT MT PL SK SI
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 78.2 77.5 79.2 78.4 74.6 70.5 72.1 NA 71.2 78.2 73.8 73.1 75.7
#> 2 78.2 77.6 79.3 78 74.7 70.8 72.1 69.9 71.3 78.3 74.1 73.3 75.9
#> 3 78.1 77.8 79.5 78.3 74.6 71.4 72.1 70.3 71.5 78.1 74.2 73.4 75.7
#> 4 78.7 78.2 79.9 78.4 75.2 71.9 72.5 70.6 71.5 78.8 74.4 73.7 76.5
#> 5 78.9 78.4 79.9 78 75.4 72.4 72.4 70.2 70.7 78.9 74.5 73.7 76.8
#> 6 79.4 78.8 80.2 79.4 76 72.5 72.9 70.1 70.5 78.8 74.8 73.9 77.5
#> 7 79.6 78.8 80.3 79.1 76.3 72.6 73 70.4 70.2 79.5 74.8 74 77.6
#> 8 79.9 79.1 80.5 79.9 76.5 73.7 73.6 71.6 71.1 79.4 75.1 74.4 78.3
#> 9 79.8 79.3 80.7 80.2 76.6 74.6 73.8 72.3 72.4 79.8 75.3 74.8 78.5
#> 10 80.1 79.4 80.9 80.8 76.9 75.3 74.1 72.5 72.6 80.9 75.8 75 79
#> 11 80.4 79.8 81 80.5 77.2 75.8 74.5 73.4 73.1 80.4 76.2 75.5 79.4
#> 12 80.3 79.9 81 80.4 77.4 76 74.6 73.6 73.4 80.3 76.2 75.7 79.4
#> 13 80.5 80.3 81.2 81.7 77.5 76.7 75.2 73.7 73.4 81.4 76.5 76 79.7
#> 14 80.9 80.5 81.5 81.5 78.1 76.6 75.3 73.7 74 81.5 77.1 76.4 80.4
#> 15 80.6 80.7 81.4 81 77.9 77.2 75 74.1 73.9 81.5 76.9 76.1 80
#> 16 81 80.6 81.6 81.9 78.4 77.2 75.5 74.1 74.3 82.2 77.3 76.7 80.4
#> BG RO HR
#> <dbl> <dbl> <dbl>
#> 1 71.9 71.4 74.1
#> 2 72.1 71.1 74.2
#> 3 72.2 71.2 74
#> 4 72.4 71.6 74.9
#> 5 72.3 72 74.7
#> 6 72.5 72.5 75.3
#> 7 72.7 73 75.2
#> 8 73 73.4 75.3
#> 9 73.4 73.5 75.7
#> 10 73.5 73.5 76.1
#> 11 73.9 74.2 76.5
#> 12 74 74.2 76.6
#> 13 74.5 74.8 77.1
#> 14 74.1 74.6 77.3
#> 15 74.1 74.5 76.8
#> 16 74.4 74.9 77.5The print output reveals that the dataset is in the required format, time by countries. Following, the data are checked for missing values and other unsuited features:
check_data(myExc1)
#> $res
#> NULL
#>
#> $msg
#> NULL
#>
#> $err
#> [1] "Error: one or more missing values in the dataframe."revealing that missing values are present. Thus, imputation of the missing values is performed by invoking the impute_dataset function:
TBex <- impute_dataset(myExc1, timeName = "time",
countries=convergEU_glb()$EU27_2020$memberStates$codeMS,
headMiss = c("cut", "constant")[2])$res
check_data(TBex)
#> $res
#> [1] TRUE
#>
#> $msg
#> NULL
#>
#> $err
#> NULLThe check_data function, called again on the imputed dataset, confirms that TBexF is the final dataset to perform the analysis of convergence.
Note that when importing data from Excel files, a particular attention should be also devoted to the name assigned to the variables in the dataset. As an example, consider that the data imported for the demo_mlexpec indicator are as follows:
print(myExc3,n=35,width=250)
#> # A tibble: 16 x 29
#> time BE DK France DE EL IE IT LU NL PT ES
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2001 77.5 76.4 78.6 77.9 78.5 76.6 79.6 77.5 77.8 76.6 79.1
#> 2 2002 77.5 76.4 78.7 77.9 78.6 77.1 79.7 77.5 77.8 76.8 79.2
#> 3 2003 77.6 76.8 78.6 78 78.6 77.6 79.5 77.2 78.1 76.8 79
#> 4 2004 78.3 77.2 79.7 78.6 78.7 78 80.2 78.6 78.7 77.7 79.7
#> 5 2005 78.4 77.6 79.7 78.7 78.9 78.3 80.2 78.8 79 77.5 79.6
#> 6 2006 78.8 77.7 80.3 79.2 79.2 78.6 80.7 78.6 79.3 78.3 80.4
#> 7 2007 79.2 77.7 80.6 79.4 79 79 80.8 78.7 79.7 78.5 80.4
#> 8 2008 79.1 78.1 80.7 79.5 79.5 79.5 80.9 79.8 79.8 78.8 80.8
#> 9 2009 79.4 78.3 80.9 79.6 79.6 79.5 81.1 80 80.2 79 81.2
#> 10 2010 79.6 78.6 81.1 79.8 79.9 80.1 81.5 80.1 80.3 79.3 81.6
#> 11 2011 79.9 79.1 81.6 79.9 80 80.1 81.6 80.4 80.6 79.9 81.8
#> 12 2012 79.8 79.4 81.4 80 79.9 80.2 81.6 80.7 80.5 79.8 81.8
#> 13 2013 80 79.6 81.7 79.9 80.7 80.3 82.1 81.2 80.7 80.1 82.4
#> 14 2014 80.6 80.1 82.2 80.4 80.8 80.7 82.5 81.5 81.1 80.6 82.5
#> 15 2015 80.4 80.1 81.8 80 80.4 80.8 81.9 81.6 80.9 80.5 82.2
#> 16 2016 80.8 80.2 82 80.3 80.8 81 82.6 82 81 80.6 82.7
#> UK AT Finland SE CY CZ EE HU LV LT MT PL
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 77.7 78.2 77.5 79.2 78.4 74.6 70.5 72.1 NA 71.2 78.2 73.8
#> 2 77.7 78.2 77.6 79.3 78 74.7 70.8 72.1 69.9 71.3 78.3 74.1
#> 3 77.8 78.1 77.8 79.5 78.3 74.6 71.4 72.1 70.3 71.5 78.1 74.2
#> 4 78.4 78.7 78.2 79.9 78.4 75.2 71.9 72.5 70.6 71.5 78.8 74.4
#> 5 78.6 78.9 78.4 79.9 78 75.4 72.4 72.4 70.2 70.7 78.9 74.5
#> 6 78.9 79.4 78.8 80.2 79.4 76 72.5 72.9 70.1 70.5 78.8 74.8
#> 7 79.1 79.6 78.8 80.3 79.1 76.3 72.6 73 70.4 70.2 79.5 74.8
#> 8 79.2 79.9 79.1 80.5 79.9 76.5 73.7 73.6 71.6 71.1 79.4 75.1
#> 9 79.8 79.8 79.3 80.7 80.2 76.6 74.6 73.8 72.3 72.4 79.8 75.3
#> 10 80 80.1 79.4 80.9 80.8 76.9 75.3 74.1 72.5 72.6 80.9 75.8
#> 11 80.4 80.4 79.8 81 80.5 77.2 75.8 74.5 73.4 73.1 80.4 76.2
#> 12 80.3 80.3 79.9 81 80.4 77.4 76 74.6 73.6 73.4 80.3 76.2
#> 13 80.4 80.5 80.3 81.2 81.7 77.5 76.7 75.2 73.7 73.4 81.4 76.5
#> 14 80.7 80.9 80.5 81.5 81.5 78.1 76.6 75.3 73.7 74 81.5 77.1
#> 15 80.3 80.6 80.7 81.4 81 77.9 77.2 75 74.1 73.9 81.5 76.9
#> 16 80.6 81 80.6 81.6 81.9 78.4 77.2 75.5 74.1 74.3 82.2 77.3
#> SK SI Bulgaria RO HR
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 73.1 75.7 71.9 71.4 74.1
#> 2 73.3 75.9 72.1 71.1 74.2
#> 3 73.4 75.7 72.2 71.2 74
#> 4 73.7 76.5 72.4 71.6 74.9
#> 5 73.7 76.8 72.3 72 74.7
#> 6 73.9 77.5 72.5 72.5 75.3
#> 7 74 77.6 72.7 73 75.2
#> 8 74.4 78.3 73 73.4 75.3
#> 9 74.8 78.5 73.4 73.5 75.7
#> 10 75 79 73.5 73.5 76.1
#> 11 75.5 79.4 73.9 74.2 76.5
#> 12 75.7 79.4 74 74.2 76.6
#> 13 76 79.7 74.5 74.8 77.1
#> 14 76.4 80.4 74.1 74.6 77.3
#> 15 76.1 80 74.1 74.5 76.8
#> 16 76.7 80.4 74.4 74.9 77.5where the names of the columns related to the countries are a mixing of MS label (for example “BE” and “IT”) and MS names (for example “France” and “Finland”). Thus, when invoking the impute_dataset function, the arguments countries could not be longer specified with convergEU_glb()$EU27_2020$memberStates$codeMS as previously done, since it returns a NULL component:
colnames(myExc3)
#> [1] "time" "BE" "DK" "France" "DE" "EL"
#> [7] "IE" "IT" "LU" "NL" "PT" "ES"
#> [13] "UK" "AT" "Finland" "SE" "CY" "CZ"
#> [19] "EE" "HU" "LV" "LT" "MT" "PL"
#> [25] "SK" "SI" "Bulgaria" "RO" "HR"
TBex1 <- impute_dataset(myExc3, timeName = "time",
countries=convergEU_glb()$EU27_2020$memberStates$codeMS,
headMiss = c("cut", "constant")[2])$res
TBex1
#> NULLA first strategy to overcome this issue is to extract the names of the countries as specified in the Excel file, and subsequently to specify these countries names in the argument countries as follows:
ms<-names(myExc3)[2:29]
TBexF4 <- impute_dataset(myExc3, timeName = "time",
countries=ms,
headMiss = c("cut", "constant")[2])$res
print(TBexF4,n=35,width=250)
#> # A tibble: 16 x 29
#> time BE DK France DE EL IE IT LU NL PT ES
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2001 77.5 76.4 78.6 77.9 78.5 76.6 79.6 77.5 77.8 76.6 79.1
#> 2 2002 77.5 76.4 78.7 77.9 78.6 77.1 79.7 77.5 77.8 76.8 79.2
#> 3 2003 77.6 76.8 78.6 78 78.6 77.6 79.5 77.2 78.1 76.8 79
#> 4 2004 78.3 77.2 79.7 78.6 78.7 78 80.2 78.6 78.7 77.7 79.7
#> 5 2005 78.4 77.6 79.7 78.7 78.9 78.3 80.2 78.8 79 77.5 79.6
#> 6 2006 78.8 77.7 80.3 79.2 79.2 78.6 80.7 78.6 79.3 78.3 80.4
#> 7 2007 79.2 77.7 80.6 79.4 79 79 80.8 78.7 79.7 78.5 80.4
#> 8 2008 79.1 78.1 80.7 79.5 79.5 79.5 80.9 79.8 79.8 78.8 80.8
#> 9 2009 79.4 78.3 80.9 79.6 79.6 79.5 81.1 80 80.2 79 81.2
#> 10 2010 79.6 78.6 81.1 79.8 79.9 80.1 81.5 80.1 80.3 79.3 81.6
#> 11 2011 79.9 79.1 81.6 79.9 80 80.1 81.6 80.4 80.6 79.9 81.8
#> 12 2012 79.8 79.4 81.4 80 79.9 80.2 81.6 80.7 80.5 79.8 81.8
#> 13 2013 80 79.6 81.7 79.9 80.7 80.3 82.1 81.2 80.7 80.1 82.4
#> 14 2014 80.6 80.1 82.2 80.4 80.8 80.7 82.5 81.5 81.1 80.6 82.5
#> 15 2015 80.4 80.1 81.8 80 80.4 80.8 81.9 81.6 80.9 80.5 82.2
#> 16 2016 80.8 80.2 82 80.3 80.8 81 82.6 82 81 80.6 82.7
#> UK AT Finland SE CY CZ EE HU LV LT MT PL
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 77.7 78.2 77.5 79.2 78.4 74.6 70.5 72.1 69.9 71.2 78.2 73.8
#> 2 77.7 78.2 77.6 79.3 78 74.7 70.8 72.1 69.9 71.3 78.3 74.1
#> 3 77.8 78.1 77.8 79.5 78.3 74.6 71.4 72.1 70.3 71.5 78.1 74.2
#> 4 78.4 78.7 78.2 79.9 78.4 75.2 71.9 72.5 70.6 71.5 78.8 74.4
#> 5 78.6 78.9 78.4 79.9 78 75.4 72.4 72.4 70.2 70.7 78.9 74.5
#> 6 78.9 79.4 78.8 80.2 79.4 76 72.5 72.9 70.1 70.5 78.8 74.8
#> 7 79.1 79.6 78.8 80.3 79.1 76.3 72.6 73 70.4 70.2 79.5 74.8
#> 8 79.2 79.9 79.1 80.5 79.9 76.5 73.7 73.6 71.6 71.1 79.4 75.1
#> 9 79.8 79.8 79.3 80.7 80.2 76.6 74.6 73.8 72.3 72.4 79.8 75.3
#> 10 80 80.1 79.4 80.9 80.8 76.9 75.3 74.1 72.5 72.6 80.9 75.8
#> 11 80.4 80.4 79.8 81 80.5 77.2 75.8 74.5 73.4 73.1 80.4 76.2
#> 12 80.3 80.3 79.9 81 80.4 77.4 76 74.6 73.6 73.4 80.3 76.2
#> 13 80.4 80.5 80.3 81.2 81.7 77.5 76.7 75.2 73.7 73.4 81.4 76.5
#> 14 80.7 80.9 80.5 81.5 81.5 78.1 76.6 75.3 73.7 74 81.5 77.1
#> 15 80.3 80.6 80.7 81.4 81 77.9 77.2 75 74.1 73.9 81.5 76.9
#> 16 80.6 81 80.6 81.6 81.9 78.4 77.2 75.5 74.1 74.3 82.2 77.3
#> SK SI Bulgaria RO HR
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 73.1 75.7 71.9 71.4 74.1
#> 2 73.3 75.9 72.1 71.1 74.2
#> 3 73.4 75.7 72.2 71.2 74
#> 4 73.7 76.5 72.4 71.6 74.9
#> 5 73.7 76.8 72.3 72 74.7
#> 6 73.9 77.5 72.5 72.5 75.3
#> 7 74 77.6 72.7 73 75.2
#> 8 74.4 78.3 73 73.4 75.3
#> 9 74.8 78.5 73.4 73.5 75.7
#> 10 75 79 73.5 73.5 76.1
#> 11 75.5 79.4 73.9 74.2 76.5
#> 12 75.7 79.4 74 74.2 76.6
#> 13 76 79.7 74.5 74.8 77.1
#> 14 76.4 80.4 74.1 74.6 77.3
#> 15 76.1 80 74.1 74.5 76.8
#> 16 76.7 80.4 74.4 74.9 77.5where ms are the countries names extracted from the dataset myExc3. Thus, the impute_dataset function works with any country name or label as long as it is correctly specified in the countries argument.
A second strategy is to rename the MS names with their corresponding labels in order to be compliant with those reported in the convergEU_glb()$EU27_2020$memberStates$codeMS as follows:
Note that in the function rename, firstly, the new name to be assigned is specified followed by the original name. For example, FR=France indicates that France should be renamed in FR. A further check if the countries label within the working dataset are compliant with those reported in the convergEU_glb() could be performed with the function not_in implemented in the convergEU package:
cnmyExc<-colnames(myExc4[2:29])
cnConv<-convergEU_glb()$EU27_2020$memberStates$codeMS
not_in(cnmyExc,cnConv)
#> [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
#> [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [25] FALSE FALSE FALSE FALSEwhere cnmyExc are the countries labels in the working dataset and cnConv are the countries labels in the object convergEU_glb()$EU27_2020$memberStates$codeMS. Note that the output FALSE confirms that all the countries labels in the myExc4 dataset are within the convergEU_glb()$EU27_2020$memberStates$codeMS object. Note also that deterministic linear imputation is not the only one available. Further possible methods of missing values imputation are available in other R packages, for example MICE, Amelia, missForest, Hmisc, mi.
Once the countries names are compliant with those reported in the object convergEU_glb()$EU27_2020$memberStates$codeMS, the countries argument in the impute_dataset function could be specified as:
Excel files in the format .csv are imported by invoking the read_csv function in the package readr:
Let’s suppose that the data for the demo_mlexpec indicator described in the previous Subsection are stored in a .csv file, and missing data are reported with “.” instead of blank cells. Thus, the data are imported as follows:
where “inst/vign/excel2.csv” is the path (eventually including disk unit and folders) in which the .csv file is stored, while the argument na indicates how the missing values are reported in the .csv file.
print(myCsv1, n=35, width = 250)
#> # A tibble: 16 x 29
#> time BE DK FR DE EL IE IT LU NL PT ES UK
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2001 77.5 76.4 78.6 77.9 78.5 76.6 79.6 77.5 77.8 76.6 79.1 77.7
#> 2 2002 77.5 76.4 78.7 77.9 78.6 77.1 79.7 77.5 77.8 76.8 79.2 77.7
#> 3 2003 77.6 76.8 78.6 78 78.6 77.6 79.5 77.2 78.1 76.8 79 77.8
#> 4 2004 78.3 77.2 79.7 78.6 78.7 78 80.2 78.6 78.7 77.7 79.7 78.4
#> 5 2005 78.4 77.6 79.7 78.7 78.9 78.3 80.2 78.8 79 77.5 79.6 78.6
#> 6 2006 78.8 77.7 80.3 79.2 79.2 78.6 80.7 78.6 79.3 78.3 80.4 78.9
#> 7 2007 79.2 77.7 80.6 79.4 79 79 80.8 78.7 79.7 78.5 80.4 79.1
#> 8 2008 79.1 78.1 80.7 79.5 79.5 79.5 80.9 79.8 79.8 78.8 80.8 79.2
#> 9 2009 79.4 78.3 80.9 79.6 79.6 79.5 81.1 80 80.2 79 81.2 79.8
#> 10 2010 79.6 78.6 81.1 79.8 79.9 80.1 81.5 80.1 80.3 79.3 81.6 80
#> 11 2011 79.9 79.1 81.6 79.9 80 80.1 81.6 80.4 80.6 79.9 81.8 80.4
#> 12 2012 79.8 79.4 81.4 80 79.9 80.2 81.6 80.7 80.5 79.8 81.8 80.3
#> 13 2013 80 79.6 81.7 79.9 80.7 80.3 82.1 81.2 80.7 80.1 82.4 80.4
#> 14 2014 80.6 80.1 82.2 80.4 80.8 80.7 82.5 81.5 81.1 80.6 82.5 80.7
#> 15 2015 80.4 80.1 81.8 80 80.4 80.8 81.9 81.6 80.9 80.5 82.2 80.3
#> 16 2016 80.8 80.2 82 80.3 80.8 81 82.6 82 81 80.6 82.7 80.6
#> AT FI SE CY CZ EE HU LV LT MT PL SK SI
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 78.2 77.5 79.2 78.4 74.6 70.5 72.1 NA 71.2 78.2 73.8 73.1 75.7
#> 2 78.2 77.6 79.3 78 74.7 70.8 72.1 69.9 71.3 78.3 74.1 73.3 75.9
#> 3 78.1 77.8 79.5 78.3 74.6 71.4 72.1 70.3 71.5 78.1 74.2 73.4 75.7
#> 4 78.7 78.2 79.9 78.4 75.2 71.9 72.5 70.6 71.5 78.8 74.4 73.7 76.5
#> 5 78.9 78.4 79.9 78 75.4 72.4 72.4 70.2 70.7 78.9 74.5 73.7 76.8
#> 6 79.4 78.8 80.2 79.4 76 72.5 72.9 70.1 70.5 78.8 74.8 73.9 77.5
#> 7 79.6 78.8 80.3 79.1 76.3 72.6 73 70.4 70.2 79.5 74.8 74 77.6
#> 8 79.9 79.1 80.5 79.9 76.5 73.7 73.6 71.6 71.1 79.4 75.1 74.4 78.3
#> 9 79.8 79.3 80.7 80.2 76.6 74.6 73.8 72.3 72.4 79.8 75.3 74.8 78.5
#> 10 80.1 79.4 80.9 80.8 76.9 75.3 74.1 72.5 72.6 80.9 75.8 75 79
#> 11 80.4 79.8 81 80.5 77.2 75.8 74.5 73.4 73.1 80.4 76.2 75.5 79.4
#> 12 80.3 79.9 81 80.4 77.4 76 74.6 73.6 73.4 80.3 76.2 75.7 79.4
#> 13 80.5 80.3 81.2 81.7 77.5 76.7 75.2 73.7 73.4 81.4 76.5 76 79.7
#> 14 80.9 80.5 81.5 81.5 78.1 76.6 75.3 73.7 74 81.5 77.1 76.4 80.4
#> 15 80.6 80.7 81.4 81 77.9 77.2 75 74.1 73.9 81.5 76.9 76.1 80
#> 16 81 80.6 81.6 81.9 78.4 77.2 75.5 74.1 74.3 82.2 77.3 76.7 80.4
#> BG RO HR
#> <dbl> <dbl> <dbl>
#> 1 71.9 71.4 74.1
#> 2 72.1 71.1 74.2
#> 3 72.2 71.2 74
#> 4 72.4 71.6 74.9
#> 5 72.3 72 74.7
#> 6 72.5 72.5 75.3
#> 7 72.7 73 75.2
#> 8 73 73.4 75.3
#> 9 73.4 73.5 75.7
#> 10 73.5 73.5 76.1
#> 11 73.9 74.2 76.5
#> 12 74 74.2 76.6
#> 13 74.5 74.8 77.1
#> 14 74.1 74.6 77.3
#> 15 74.1 74.5 76.8
#> 16 74.4 74.9 77.5Once the data from the .csv are imported, the common operations of data checking and imputation of missing values are performed as described previously for the excel1.xlsx file.
Figure 2: Data explorer button on the Eurostat web site
Figure 3: Data tables customization on the Eurostat web site
Lastly, to download the data we click on the Download button (blue arrow, Figure 3).
As outlined above, for the indicator une_educ_a, we proceed to download the .xls file from the web site http://appsso.eurostat.ec.europa.eu/nui/submitViewTableAction.do (blue arrow, Figure 3) by considering:
Codes for Labeling (green arrow, Figure 3);
Data tables arranged in the format years by countries (red arrow, Figure 3).
Figure 4: Data extraction Eurostat web site (.xls file)
Once the .xls is downloaded and properly saved in a local folder, we proceed to import the data by considering the first data table extracted in the shift “Data”. Note that each Excel shift contains different data tables, and also further heading information on the extracted data, e.g. the indicator’s name and label, dates of update and extraction, further conditioning variables (sex, age, education). Therefore, to properly select only the data on the indicator’s itself, a range of cells should be properly specified by the user. For example, for the une_educ_a indicator:
where “inst/vign/une_educ_a.xls” specifies the path (eventually including disk unit or folders) in which the file is stored; the argument sheet indicate the Excel sheet from which the data should be imported; the argument range specifies the cells in which the data are arranged.
print(myxls1, n=100, width = 250)
#> # A tibble: 10 x 42
#> `TIME/GEO` EU28 EU15 EA19 EA18 EA17 BE BG CZ DK DE EE
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2009 14.4 14.5 15 14.9 14.9 13.6 15.5 24.1 9.1 15.7 28.1
#> 2 2010 15.7 15.7 16.2 16.2 16.1 15.3 22.7 25 11.1 14.7 30.5
#> 3 2011 16.2 16.2 16.7 16.7 16.7 13.9 26.4 24.3 11.3 13 25.7
#> 4 2012 18.2 18.3 19.2 19.1 19.1 14 28 28.5 11.7 12.2 23.1
#> 5 2013 19.2 19.5 20.5 20.5 20.5 15.8 29.9 25.6 11.1 11.9 15
#> 6 2014 18.5 18.8 20.1 20.1 20.1 16.2 28.3 22.1 10.3 11.8 13
#> 7 2015 17.3 17.6 18.9 18.9 18.9 16.8 25.1 22.7 9.7 11.2 12.2
#> 8 2016 16.1 16.4 17.8 17.7 17.7 15.9 22.2 20.5 9.1 10.1 12.7
#> 9 2017 14.7 15.1 16.4 16.3 16.3 14.6 18.1 13.1 9 9.5 10.9
#> 10 2018 13.2 13.6 14.8 14.8 14.8 13.2 15.5 10.7 8 8.8 10.3
#> IE EL ES FR FX HR IT CY LV LT LU HU MT
#> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 18.1 9.5 24.5 : 13.6 9.9 9.5 6.1 31.1 29.7 8.1 23.2 9
#> 2 22.1 12.7 27.1 : 14.5 13 10.3 7.1 32.8 39.4 6.1 25.1 9.6
#> 3 24 18.1 28.9 : 14.4 17.4 10.6 7.5 29.2 38.4 8.3 25 9
#> 4 25.8 26 33.7 : 15.2 18.6 13.6 13.4 26.6 34.7 8.4 24.8 9.5
#> 5 22.2 29.8 35.3 : 16.2 21.5 15.9 19.3 25.1 32.8 10.2 23.7 9.5
#> 6 19.8 28.2 33.8 17.1… 16.3 25.7 16.6 19.4 24 29.8 10.2 18.5 9
#> 7 17.1 26.7 31 17.6… 16.9 21.5 15.6 18.5 21.9 26.3 10.6 17.4 8.7
#> 8 14.5 26.4 28 18 17.3 17.4 15.7 15.6 20.6 25.1 9.9 13.2 7.4
#> 9 11.6 24.3 25 17 16.3 19.8 15.5 14.1 18.8 20.9 8.9 11.1 6.1
#> 10 9.9 22.3 22.1 16.1… 15.5 11.6 14.6 9.8 16.5 17.9 8.4 10.3 5.6
#> NL AT PL PT RO SI SK FI SE UK IS NO CH
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 6.8 10.7 14.7 10.3 7.5 8.8 41.5 14.8 15.8 12.9 10.2 6.5 :
#> 2 8.1 9.2 17.5 11.8 5.9 11.7 44.2 16.1 17 13.8 10.5 7.3 8
#> 3 7.8 9.1 18.3 13.9 7.2 13.4 42.5 16.1 16.5 14.1 10.3 6.9 8
#> 4 9.3 9.8 19.5 16.5 6.7 14.8 44.6 16 17.6 13.9 10.3 6.2 7.79…
#> 5 11.3 10.4 20.5 17.4 6.7 17.8 42.4 17.1 18.8 13.8 8.4 7.7 8.30…
#> 6 12.1 11.4 19 15.3 6.7 15.4 41.1 17.2 19.1 11.3 7 7.5 8.69…
#> 7 11.2 11.2 16.9 13.6 8.1 13.9 37.6 17.6 18.9 9.6 6.1 9.3 8.80…
#> 8 9.8 12.7 14.5 12.1 7.6 14.5 31.6 16.7 18.8 8.3 4.5 9.9 9.40…
#> 9 8.3 13 12.2 9.8 6.8 11.1 29.8 17.9 18.5 7.3 4.7 9.4 8.30…
#> 10 6.6 11.4 9.9 7.4 5.8 8.6 29.8 15.9 19 6.4 4.5 8.4 8.30…
#> ME MK RS TR
#> <chr> <dbl> <chr> <dbl>
#> 1 : 38.6 : 12.2
#> 2 : 39 : 10.2
#> 3 30.300000000000001 37.6 : 8.1
#> 4 34.899999999999999 37.8 : 7.5
#> 5 41.399999999999999 34.3 : 8.1
#> 6 32.200000000000003 32.2 18.199999999999999 9.3
#> 7 28.300000000000001 29.9 15.6 9.7
#> 8 24.600000000000001 29.2 13.1 9.9
#> 9 22.199999999999999 26.5 11.6 9.6
#> 10 20.100000000000001 23.8 12.6 9.9Note that the print output reveals the characters “:” which indicate missing values. Thus, the na argument should be properly specified in order to correctly import such type of data as missing values. Moreover, all the variables should be numeric, including the time variable that actually is of type “chr”; therefore, also the argument col_types that allows to specify several collection of strings, should be settled to “numeric” when importing the data from the .xls file:
require(readxl)
myxls2<-read_excel(file.path("vign","une_educ_a.xls"),
sheet="Data",range = "A12:AP22", na=":", col_types = "numeric")
print(myxls2, n=100, width = 250)#> # A tibble: 10 x 42
#> `TIME/GEO` EU28 EU15 EA19 EA18 EA17 BE BG CZ DK DE EE
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2009 14.4 14.5 15 14.9 14.9 13.6 15.5 24.1 9.1 15.7 28.1
#> 2 2010 15.7 15.7 16.2 16.2 16.1 15.3 22.7 25 11.1 14.7 30.5
#> 3 2011 16.2 16.2 16.7 16.7 16.7 13.9 26.4 24.3 11.3 13 25.7
#> 4 2012 18.2 18.3 19.2 19.1 19.1 14 28 28.5 11.7 12.2 23.1
#> 5 2013 19.2 19.5 20.5 20.5 20.5 15.8 29.9 25.6 11.1 11.9 15
#> 6 2014 18.5 18.8 20.1 20.1 20.1 16.2 28.3 22.1 10.3 11.8 13
#> 7 2015 17.3 17.6 18.9 18.9 18.9 16.8 25.1 22.7 9.7 11.2 12.2
#> 8 2016 16.1 16.4 17.8 17.7 17.7 15.9 22.2 20.5 9.1 10.1 12.7
#> 9 2017 14.7 15.1 16.4 16.3 16.3 14.6 18.1 13.1 9 9.5 10.9
#> 10 2018 13.2 13.6 14.8 14.8 14.8 13.2 15.5 10.7 8 8.8 10.3
#> IE EL ES FR FX HR IT CY LV LT LU HU MT
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 18.1 9.5 24.5 NA 13.6 9.9 9.5 6.1 31.1 29.7 8.1 23.2 9
#> 2 22.1 12.7 27.1 NA 14.5 13 10.3 7.1 32.8 39.4 6.1 25.1 9.6
#> 3 24 18.1 28.9 NA 14.4 17.4 10.6 7.5 29.2 38.4 8.3 25 9
#> 4 25.8 26 33.7 NA 15.2 18.6 13.6 13.4 26.6 34.7 8.4 24.8 9.5
#> 5 22.2 29.8 35.3 NA 16.2 21.5 15.9 19.3 25.1 32.8 10.2 23.7 9.5
#> 6 19.8 28.2 33.8 17.1 16.3 25.7 16.6 19.4 24 29.8 10.2 18.5 9
#> 7 17.1 26.7 31 17.6 16.9 21.5 15.6 18.5 21.9 26.3 10.6 17.4 8.7
#> 8 14.5 26.4 28 18 17.3 17.4 15.7 15.6 20.6 25.1 9.9 13.2 7.4
#> 9 11.6 24.3 25 17 16.3 19.8 15.5 14.1 18.8 20.9 8.9 11.1 6.1
#> 10 9.9 22.3 22.1 16.2 15.5 11.6 14.6 9.8 16.5 17.9 8.4 10.3 5.6
#> NL AT PL PT RO SI SK FI SE UK IS NO CH
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 6.8 10.7 14.7 10.3 7.5 8.8 41.5 14.8 15.8 12.9 10.2 6.5 NA
#> 2 8.1 9.2 17.5 11.8 5.9 11.7 44.2 16.1 17 13.8 10.5 7.3 8
#> 3 7.8 9.1 18.3 13.9 7.2 13.4 42.5 16.1 16.5 14.1 10.3 6.9 8
#> 4 9.3 9.8 19.5 16.5 6.7 14.8 44.6 16 17.6 13.9 10.3 6.2 7.8
#> 5 11.3 10.4 20.5 17.4 6.7 17.8 42.4 17.1 18.8 13.8 8.4 7.7 8.3
#> 6 12.1 11.4 19 15.3 6.7 15.4 41.1 17.2 19.1 11.3 7 7.5 8.7
#> 7 11.2 11.2 16.9 13.6 8.1 13.9 37.6 17.6 18.9 9.6 6.1 9.3 8.8
#> 8 9.8 12.7 14.5 12.1 7.6 14.5 31.6 16.7 18.8 8.3 4.5 9.9 9.4
#> 9 8.3 13 12.2 9.8 6.8 11.1 29.8 17.9 18.5 7.3 4.7 9.4 8.3
#> 10 6.6 11.4 9.9 7.4 5.8 8.6 29.8 15.9 19 6.4 4.5 8.4 8.3
#> ME MK RS TR
#> <dbl> <dbl> <dbl> <dbl>
#> 1 NA 38.6 NA 12.2
#> 2 NA 39 NA 10.2
#> 3 30.3 37.6 NA 8.1
#> 4 34.9 37.8 NA 7.5
#> 5 41.4 34.3 NA 8.1
#> 6 32.2 32.2 18.2 9.3
#> 7 28.3 29.9 15.6 9.7
#> 8 24.6 29.2 13.1 9.9
#> 9 22.2 26.5 11.6 9.6
#> 10 20.1 23.8 12.6 9.9
Now, the missing values are correctly imported in the time by countries dataset myxls2 and all the variables are of type numeric. Once the data are correctly imported, the data processing could take place as described in the previous sections. For instance, we can choose only a set of countries, say the cluster EU27_2020:
names(myxls2)
#> [1] "TIME/GEO" "EU28" "EU15" "EA19" "EA18" "EA17"
#> [7] "BE" "BG" "CZ" "DK" "DE" "EE"
#> [13] "IE" "EL" "ES" "FR" "FX" "HR"
#> [19] "IT" "CY" "LV" "LT" "LU" "HU"
#> [25] "MT" "NL" "AT" "PL" "PT" "RO"
#> [31] "SI" "SK" "FI" "SE" "UK" "IS"
#> [37] "NO" "CH" "ME" "MK" "RS" "TR"
EU27estr<-convergEU_glb()$EU27_2020$memberStates$codeMS
myxls<- select(myxls2,`TIME/GEO`,EU27estr)where the function select identifies the string EU27estr as a collection of strings. For the list of all possible clusters of countries see the convergEU_glb() function described in details in Section 1.
Subsequently, we can proceed with data checking and missing values imputation in order to obtain the final dataset on which to perform the analysis of convergence:
EU27estr<-convergEU_glb()$EU27_2020$memberStates$codeMS
myxls<- select(myxls2,`TIME/GEO`,EU27estr)
check_data(myxls)
myxls3<-myxls %>% rename(TIME=`TIME/GEO`)
myxlsf <- impute_dataset(myxls3, timeName ="TIME",
countries=convergEU_glb()$EU27_2020$memberStates$codeMS,
headMiss = c("cut", "constant")[2],
tailMiss = c("cut", "constant")[2])$resFor convenience, we also change the name of the time variable (e.g. from “TIME/GEO” to “TIME”).
Figure 5: Data extraction Eurostat web site (.csv file)
To import the .csv file, we use the function read_csv from the package readr, as follows:
where “vign/une_educ_a_1_Data.csv” is the path (eventually including disk unit and folders) in which the .csv file is located, while the argument na specifies how the missing values are reported in the Excel file.
mycsv
#> # A tibble: 410 x 7
#> GEO TIME SEX ISCED11 AGE UNIT Value
#> <chr> <dbl> <lgl> <chr> <chr> <chr> <dbl>
#> 1 EU28 2009 TRUE ED0-2 Y15-74 PC_ACT 14.4
#> 2 EU15 2009 TRUE ED0-2 Y15-74 PC_ACT 14.5
#> 3 EA19 2009 TRUE ED0-2 Y15-74 PC_ACT 15
#> 4 EA18 2009 TRUE ED0-2 Y15-74 PC_ACT 14.9
#> 5 EA17 2009 TRUE ED0-2 Y15-74 PC_ACT 14.9
#> 6 BE 2009 TRUE ED0-2 Y15-74 PC_ACT 13.6
#> 7 BG 2009 TRUE ED0-2 Y15-74 PC_ACT 15.5
#> 8 CZ 2009 TRUE ED0-2 Y15-74 PC_ACT 24.1
#> 9 DK 2009 TRUE ED0-2 Y15-74 PC_ACT 9.1
#> 10 DE 2009 TRUE ED0-2 Y15-74 PC_ACT 15.7
#> # … with 400 more rowsNote that the data imported from the .csv file are in a long raw format. Therefore, a further data processing is needed in order to convert the data in the required format years by countries. To this end, the function spread in the package tidyr is used as follows:
mycsv1<- select(mycsv,GEO,TIME, Value)
mycsvf <- spread(mycsv1, GEO, Value)
mycsvf
#> # A tibble: 10 x 42
#> TIME AT BE BG CH CY CZ DE DK EA17 EA18 EA19 EE
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2009 10.7 13.6 15.5 NA 6.1 24.1 15.7 9.1 14.9 14.9 15 28.1
#> 2 2010 9.2 15.3 22.7 8 7.1 25 14.7 11.1 16.1 16.2 16.2 30.5
#> 3 2011 9.1 13.9 26.4 8 7.5 24.3 13 11.3 16.7 16.7 16.7 25.7
#> 4 2012 9.8 14 28 7.8 13.4 28.5 12.2 11.7 19.1 19.1 19.2 23.1
#> 5 2013 10.4 15.8 29.9 8.3 19.3 25.6 11.9 11.1 20.5 20.5 20.5 15
#> 6 2014 11.4 16.2 28.3 8.7 19.4 22.1 11.8 10.3 20.1 20.1 20.1 13
#> 7 2015 11.2 16.8 25.1 8.8 18.5 22.7 11.2 9.7 18.9 18.9 18.9 12.2
#> 8 2016 12.7 15.9 22.2 9.4 15.6 20.5 10.1 9.1 17.7 17.7 17.8 12.7
#> 9 2017 13 14.6 18.1 8.3 14.1 13.1 9.5 9 16.3 16.3 16.4 10.9
#> 10 2018 11.4 13.2 15.5 8.3 9.8 10.7 8.8 8 14.8 14.8 14.8 10.3
#> # … with 29 more variables: EL <dbl>, ES <dbl>, EU15 <dbl>, EU28 <dbl>,
#> # FI <dbl>, FR <dbl>, FX <dbl>, HR <dbl>, HU <dbl>, IE <dbl>, IS <dbl>,
#> # IT <dbl>, LT <dbl>, LU <dbl>, LV <dbl>, ME <dbl>, MK <dbl>, MT <dbl>,
#> # NL <dbl>, NO <dbl>, PL <dbl>, PT <dbl>, RO <dbl>, RS <dbl>, SE <dbl>,
#> # SI <dbl>, SK <dbl>, TR <dbl>, UK <dbl>The dataset mycsvf is now in the format time by countries, and all the variables are of type “numeric”. So, we can proceed with data checking and imputation of missing values:
EU27estr<-convergEU_glb()$EU27_2020$memberStates$codeMS
mycsvf1<- select(mycsvf,TIME,EU27estr)
check_data(mycsvf1)
mycsvf2 <- impute_dataset(mycsvf1, timeName = "TIME",
countries=convergEU_glb()$EU27_2020$memberStates$codeMS,
headMiss = c("cut", "constant")[2],
tailMiss = c("cut", "constant")[2])$resAbsolute change as described in the reserved Eurofound Annex is defined as: \[ \Delta y_{m,i,t} = y_{m,i,t} - y_{m,i,t-1} \] for country \(m\), indicator \(i\) at time \(t\).
In the convergEU package, absolute changes across countries are calculated by invoking the abso_change function on a sorted dataframe in the format time by countries and without missing values. To illustrate this function, we take as an example the locally available dataset emp_20_64_MS described in Section 1:
emp_20_64_MS
#> # A tibble: 17 x 29
#> time AT BE BG CY CZ DE DK EE EL ES FI FR
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2002 70.9 64.7 56.5 75.1 71.7 68.8 78.3 68.3 62.8 63.2 73.2 68.6
#> 2 2003 71.3 64.5 58.7 75.4 71 68.4 77.4 69.1 63.8 64.3 72.9 69.7
#> 3 2004 68.4 65.8 61.2 75.7 70.1 67.9 78.1 70.2 64.3 65.2 72.5 69.2
#> 4 2005 70.4 66.5 61.9 74.4 70.7 69.4 78 72 64.4 67.5 73 69.4
#> 5 2006 71.6 66.5 65.1 75.8 71.2 71.1 79.4 75.9 65.6 69 73.9 69.4
#> 6 2007 72.8 67.7 68.4 76.8 72 72.9 79 76.9 65.8 69.7 74.8 69.9
#> 7 2008 73.8 68 70.7 76.5 72.4 74 78.7 77.1 66.3 68.5 75.8 70.5
#> 8 2009 73.4 67.1 68.8 75.3 70.9 74.2 76.1 70 65.6 64 73.5 69.5
#> 9 2010 73.9 67.6 64.7 75 70.4 75 74.9 66.8 63.8 62.8 73 69.3
#> 10 2011 74.2 67.3 62.9 73.4 70.9 76.5 74.8 70.6 59.6 62 73.8 69.2
#> 11 2012 74.4 67.2 63 70.2 71.5 76.9 74.3 72.2 55 59.6 74 69.4
#> 12 2013 74.6 67.2 63.5 67.2 72.5 77.3 74.3 73.3 52.9 58.6 73.3 69.5
#> 13 2014 74.2 67.3 65.1 67.6 73.5 77.7 74.7 74.3 53.3 59.9 73.1 69.2
#> 14 2015 74.3 67.2 67.1 67.9 74.8 78 75.4 76.5 54.9 62 72.9 69.5
#> 15 2016 74.8 67.7 67.7 68.7 76.7 78.6 76 76.6 56.2 63.9 73.4 70
#> 16 2017 75.4 68.5 71.3 70.8 78.5 79.2 76.6 78.7 57.8 65.5 74.2 70.6
#> 17 2018 76.2 69.7 72.4 73.9 79.9 79.9 77.5 79.5 59.5 67 76.3 71.3
#> # … with 16 more variables: HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>,
#> # LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>, RO <dbl>,
#> # SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>The dataset is ready for the analysis given that missing values are already imputed. Thus, we proceed to calculate the absolute changes across all the countries in the emp_20_64_MS dataset by considering as reference time 2007 (argument time_0) and as focus time 2018 (argument time_t):
myAC <- abso_change(emp_20_64_MS,
time_0 = 2007,
time_t = 2018,
all_within=TRUE,
timeName = "time")
names(myAC$res)
#> [1] "abso_change" "sum_abs_change" "average_abs_change"The results consist in:
myAC$res$abso_change
#> time AT BE BG CY CZ DE DK EE EL ES FI FR HR HU
#> 1 2003 0.4 -0.2 2.2 0.3 -0.7 -0.4 -0.9 0.8 1.0 1.1 -0.3 1.1 0.5 1.0
#> 2 2004 -2.9 1.3 2.5 0.3 -0.9 -0.5 0.7 1.1 0.5 0.9 -0.4 -0.5 1.3 -0.4
#> 3 2005 2.0 0.7 0.7 -1.3 0.6 1.5 -0.1 1.8 0.1 2.3 0.5 0.2 0.3 0.2
#> 4 2006 1.2 0.0 3.2 1.4 0.5 1.7 1.4 3.9 1.2 1.5 0.9 0.0 0.6 0.4
#> 5 2007 1.2 1.2 3.3 1.0 0.8 1.8 -0.4 1.0 0.2 0.7 0.9 0.5 3.3 -0.3
#> 6 2008 1.0 0.3 2.3 -0.3 0.4 1.1 -0.3 0.2 0.5 -1.2 1.0 0.6 1.0 -0.8
#> 7 2009 -0.4 -0.9 -1.9 -1.2 -1.5 0.2 -2.6 -7.1 -0.7 -4.5 -2.3 -1.0 -0.7 -1.4
#> 8 2010 0.5 0.5 -4.1 -0.3 -0.5 0.8 -1.2 -3.2 -1.8 -1.2 -0.5 -0.2 -2.1 -0.2
#> 9 2011 0.3 -0.3 -1.8 -1.6 0.5 1.5 -0.1 3.8 -4.2 -0.8 0.8 -0.1 -2.3 0.5
#> 10 2012 0.2 -0.1 0.1 -3.2 0.6 0.4 -0.5 1.6 -4.6 -2.4 0.2 0.2 -1.7 1.2
#> 11 2013 0.2 0.0 0.5 -3.0 1.0 0.4 0.0 1.1 -2.1 -1.0 -0.7 0.1 -0.9 1.4
#> IE IT LT LU LV MT NL PL PT RO SE SI SK UK
#> 1 -0.4 0.9 2.7 -1.2 1.1 -0.4 -0.5 -0.4 -1.1 0.5 -0.3 -1.9 1.8 0.4
#> 2 0.6 1.6 -1.1 0.5 0.0 -0.5 -0.4 -0.3 -0.4 -0.1 -0.7 2.9 -1.5 0.2
#> 3 1.6 -0.2 1.1 1.3 1.7 0.1 -2.2 1.3 -0.4 -1.1 0.3 0.1 1.0 0.3
#> 4 0.8 0.9 0.6 0.1 4.1 0.5 1.0 1.8 0.4 1.2 0.7 0.4 1.5 0.0
#> 5 1.7 0.3 1.4 0.5 2.0 0.7 1.8 2.6 -0.1 -0.4 1.3 0.9 1.2 0.0
#> 6 -1.6 0.2 -0.7 -0.8 0.2 0.6 1.4 2.3 0.6 0.0 0.3 0.6 1.6 0.0
#> 7 -5.5 -1.3 -5.0 1.6 -8.8 -0.2 -0.1 -0.1 -2.0 -0.9 -2.1 -1.1 -2.4 -1.3
#> 8 -2.5 -0.6 -2.7 0.3 -2.3 1.1 -0.6 -0.6 -0.8 1.3 -0.2 -1.6 -1.8 -0.4
#> 9 -0.9 0.0 2.6 -0.6 2.0 1.5 0.2 0.2 -1.5 -1.0 1.3 -1.9 0.4 0.0
#> 10 -0.1 -0.1 1.6 1.3 1.8 2.3 0.2 0.2 -2.5 1.0 0.0 -0.1 0.1 0.6
#> 11 2.0 -1.2 1.4 -0.3 1.6 2.3 -0.7 0.2 -0.9 -0.1 0.4 -1.1 -0.1 0.7myAC$res$sum_abs_change
#> AT BE BG CY CZ DE DK EE EL ES FI FR HR HU IE IT
#> 10.3 5.5 22.6 13.9 8.0 10.3 8.2 25.6 16.9 17.6 8.5 4.5 14.7 7.8 17.7 7.3
#> LT LU LV MT NL PL PT RO SE SI SK UK
#> 20.9 8.5 25.6 10.2 9.1 10.0 10.7 7.6 7.6 12.6 13.4 3.9myAC$res$average_abs_change
#> AT BE BG CY CZ DE DK EE
#> 0.9363636 0.5000000 2.0545455 1.2636364 0.7272727 0.9363636 0.7454545 2.3272727
#> EL ES FI FR HR HU IE IT
#> 1.5363636 1.6000000 0.7727273 0.4090909 1.3363636 0.7090909 1.6090909 0.6636364
#> LT LU LV MT NL PL PT RO
#> 1.9000000 0.7727273 2.3272727 0.9272727 0.8272727 0.9090909 0.9727273 0.6909091
#> SE SI SK UK
#> 0.6909091 1.1454545 1.2181818 0.3545455If desired, less digits may be displayed for the outputs, for example after rounding. Thus, the user could select how many decimals should be displayed in the results by using the functions round or format as follows:
An important summary is obtained as unweighted average of country values. The cluster of considered countries may be specified and is also stored within the function generating global static objects and tables, e.g. the convergEU_glb() function (see Section 1 for further details). For example, the unweighted average in the emp_20_64_MS dataset for the EU25 cluster is:
average_clust(emp_20_64_MS,
timeName = "time",
cluster = "EU25")$res[,c(1,30)]
#> # A tibble: 17 x 2
#> time EU25
#> <dbl> <dbl>
#> 1 2002 68.5
#> 2 2003 68.6
#> 3 2004 68.6
#> 4 2005 69.2
#> 5 2006 70.3
#> 6 2007 71.2
#> 7 2008 71.5
#> 8 2009 69.4
#> 9 2010 68.6
#> 10 2011 68.8
#> 11 2012 68.7
#> 12 2013 68.8
#> 13 2014 69.7
#> 14 2015 70.6
#> 15 2016 71.7
#> 16 2017 73.1
#> 17 2018 74.4while for EU12 is:
average_clust(emp_20_64_MS,timeName = "time",cluster = "EU12")$res[,c(1,30)]
#> # A tibble: 17 x 2
#> time EU12
#> <dbl> <dbl>
#> 1 2002 69.1
#> 2 2003 69.1
#> 3 2004 69.4
#> 4 2005 69.9
#> 5 2006 70.6
#> 6 2007 71.3
#> 7 2008 71.4
#> 8 2009 69.9
#> 9 2010 69.2
#> 10 2011 68.6
#> 11 2012 68.0
#> 12 2013 67.8
#> 13 2014 68.4
#> 14 2015 69.2
#> 15 2016 70.1
#> 16 2017 71.2
#> 17 2018 72.3An unknown label, like “EUspirit”, causes computation error:
average_clust(emp_20_64_MS,timeName = "time",cluster = "EUspirit")
#> $res
#> NULL
#>
#> $msg
#> NULL
#>
#> $err
#> [1] "Error: badly specified countries."and similarly for a wrong time name:
average_clust(emp_20_64_MS,timeName = "TTime",cluster = "EU19")
#> $res
#> NULL
#>
#> $msg
#> NULL
#>
#> $err
#> [1] "Error: Time variable not in the dataframe."Time series can be also plotted:
wwTB <- average_clust(emp_20_64_MS,timeName = "time",cluster = "EU27")$res[,c(1,30)]
mini_EU <- min(wwTB$EU27)
maxi_EU <- max(wwTB$EU27)
qplot(time, EU27, data=wwTB,
ylim=c(mini_EU,maxi_EU))+geom_line(colour="navy blue")+
ylab("emp_20_64")In the convergEU package, the function departure_mean is implemented and it calculates for each country the departure from the average:
The calculations are performed with respect to the results obtained for the Sigma convergence. An example follows for the emp_20_64_MS dataset. First, the Sigma convergence is calculated:
mySTB <- sigma_conv(emp_20_64_MS,timeName="time")
mySTB
#> $res
#> # A tibble: 17 x 5
#> time stdDev CV mean devianceT
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2002 6.34 0.0939 67.5 1125.
#> 2 2003 5.95 0.0878 67.8 991.
#> 3 2004 5.70 0.0839 67.9 909.
#> 4 2005 5.54 0.0809 68.4 858.
#> 5 2006 5.57 0.0801 69.6 869.
#> 6 2007 5.47 0.0775 70.6 838.
#> 7 2008 5.36 0.0755 71.0 804.
#> 8 2009 5.03 0.0730 69.0 710.
#> 9 2010 5.24 0.0769 68.1 768.
#> 10 2011 5.59 0.0821 68.1 875.
#> 11 2012 5.98 0.0880 68 1002.
#> 12 2013 6.28 0.0922 68.0 1103.
#> 13 2014 5.98 0.0867 69.0 1000.
#> 14 2015 5.74 0.0820 70.0 922.
#> 15 2016 5.60 0.0789 71.0 879.
#> 16 2017 5.37 0.0741 72.5 808.
#> 17 2018 5.30 0.0717 73.8 786.
#>
#> $msg
#> NULL
#>
#> $err
#> NULLThen, the departure from the mean can be characterized:
res <- departure_mean(oriTB = emp_20_64_MS, sigmaTB = mySTB$res)
names(res$res)
#> [1] "departures" "squaredContrib" "devianceContrib"
res$res$departures
#> # A tibble: 17 x 33
#> time stdDev CV mean devianceT AT BE BG CY CZ DE DK
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2002 6.34 0.0939 67.5 1125. 0 0 -1 1 0 0 1
#> 2 2003 5.95 0.0878 67.8 991. 0 0 -1 1 0 0 1
#> 3 2004 5.70 0.0839 67.9 909. 0 0 -1 1 0 0 1
#> 4 2005 5.54 0.0809 68.4 858. 0 0 -1 1 0 0 1
#> 5 2006 5.57 0.0801 69.6 869. 0 0 0 1 0 0 1
#> 6 2007 5.47 0.0775 70.6 838. 0 0 0 1 0 0 1
#> 7 2008 5.36 0.0755 71.0 804. 0 0 0 1 0 0 1
#> 8 2009 5.03 0.0730 69.0 710. 0 0 0 1 0 1 1
#> 9 2010 5.24 0.0769 68.1 768. 1 0 0 1 0 1 1
#> 10 2011 5.59 0.0821 68.1 875. 1 0 0 0 0 1 1
#> 11 2012 5.98 0.0880 68 1002. 1 0 0 0 0 1 1
#> 12 2013 6.28 0.0922 68.0 1103. 1 0 0 0 0 1 0
#> 13 2014 5.98 0.0867 69.0 1000. 0 0 0 0 0 1 0
#> 14 2015 5.74 0.0820 70.0 922. 0 0 0 0 0 1 0
#> 15 2016 5.60 0.0789 71.0 879. 0 0 0 0 1 1 0
#> 16 2017 5.37 0.0741 72.5 808. 0 0 0 0 1 1 0
#> 17 2018 5.30 0.0717 73.8 786. 0 0 0 0 1 1 0
#> # … with 21 more variables: EE <dbl>, EL <dbl>, ES <dbl>, FI <dbl>, FR <dbl>,
#> # HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>, LU <dbl>, LV <dbl>,
#> # MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>, RO <dbl>, SE <dbl>, SI <dbl>,
#> # SK <dbl>, UK <dbl>where \(-1,0,1\) indicates values respectively below \(-1\), within the interval \((-1,1)\) and above \(+1\). Details on the contribution of each MS to the variance at a given time \(t\) is evaluated by the square difference from the mean, say EU27_2020:
res$res$squaredContrib
#> # A tibble: 17 x 28
#> AT BE BG CY CZ DE DK EE EL ES FI
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 11.4 7.94 1.21e+2 57.5 17.5 1.64e+0 116. 0.612 22.3 18.6 32.3
#> 2 12.5 10.7 8.23e+1 58.2 10.4 3.95e-1 92.7 1.77 15.8 12.1 26.3
#> 3 0.243 4.44 4.50e+1 60.7 4.81 5.10e-5 104. 5.26 13.0 7.33 21.1
#> 4 3.91 3.69 4.25e+1 35.7 5.19 9.58e-1 91.7 12.8 16.2 0.849 21.0
#> 5 4.16 9.37 1.99e+1 38.9 2.69 2.37e+0 96.8 40.2 15.7 0.314 18.8
#> 6 4.84 8.41 4.84e+0 38.4 1.96 5.29e+0 70.6 39.7 23.0 0.810 17.6
#> 7 7.98 8.85 7.56e-2 30.5 2.03 9.15e+0 59.7 37.5 21.9 6.13 23.3
#> 8 19.3 3.62 4.14e-2 39.6 3.60 2.70e+1 50.4 0.993 11.6 25.0 20.2
#> 9 33.5 0.264 1.17e+1 47.4 5.22 4.74e+1 46.0 1.73 18.6 28.2 23.9
#> 10 37.7 0.579 2.66e+1 28.5 8.06 7.12e+1 45.4 6.45 71.6 36.7 32.9
#> 11 41.0 0.640 2.50e+1 4.84 12.2 7.92e+1 39.7 17.6 169 70.6 36
#> 12 43.0 0.710 2.06e+1 0.710 19.9 8.57e+1 39.2 27.6 229. 89.2 27.6
#> 13 27.3 2.81 1.50e+1 1.89 20.5 7.61e+1 32.8 28.4 246. 82.4 17.0
#> 14 18.6 7.74 8.31e+0 4.34 23.2 6.43e+1 29.4 42.5 227. 63.7 8.51
#> 15 14.4 11.0 1.10e+1 5.34 32.4 5.76e+1 24.9 31.2 219. 50.6 5.71
#> 16 8.37 16.1 1.46e+0 2.91 35.9 4.48e+1 16.8 38.4 216. 49.1 2.87
#> 17 5.52 17.2 2.10e+0 0.0025 36.6 3.66e+1 13.3 31.9 206. 46.9 6.00
#> # … with 17 more variables: FR <dbl>, HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>,
#> # LT <dbl>, LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>,
#> # RO <dbl>, SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>It is also possible to decompose the numerator of the variance, called deviance, at each time in order to appreciate the percentage of contribution provided by each MS to the total deviance:
res$res$devianceContrib
#> # A tibble: 17 x 28
#> AT BE BG CY CZ DE DK EE EL ES FI
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1.02 0.706 1.08e+1 5.11e+0 1.56 1.46e-1 10.3 0.0544 1.98 1.66 2.87
#> 2 1.26 1.08 8.30e+0 5.87e+0 1.05 3.99e-2 9.35 0.178 1.59 1.22 2.65
#> 3 0.0267 0.488 4.95e+0 6.68e+0 0.529 5.61e-6 11.4 0.578 1.43 0.806 2.32
#> 4 0.456 0.430 4.95e+0 4.16e+0 0.605 1.12e-1 10.7 1.49 1.88 0.0989 2.44
#> 5 0.479 1.08 2.29e+0 4.48e+0 0.309 2.73e-1 11.1 4.63 1.81 0.0362 2.17
#> 6 0.578 1.00 5.78e-1 4.59e+0 0.234 6.31e-1 8.42 4.74 2.75 0.0967 2.11
#> 7 0.993 1.10 9.41e-3 3.80e+0 0.253 1.14e+0 7.43 4.67 2.72 0.762 2.90
#> 8 2.72 0.511 5.84e-3 5.59e+0 0.507 3.81e+0 7.10 0.140 1.63 3.53 2.85
#> 9 4.36 0.0344 1.52e+0 6.17e+0 0.680 6.17e+0 6.00 0.225 2.42 3.68 3.11
#> 10 4.31 0.0662 3.04e+0 3.26e+0 0.922 8.14e+0 5.19 0.737 8.18 4.20 3.77
#> 11 4.09 0.0639 2.50e+0 4.83e-1 1.22 7.91e+0 3.96 1.76 16.9 7.04 3.59
#> 12 3.90 0.0644 1.87e+0 6.44e-2 1.80 7.77e+0 3.55 2.51 20.8 8.08 2.51
#> 13 2.73 0.280 1.50e+0 1.89e-1 2.05 7.61e+0 3.28 2.83 24.6 8.23 1.70
#> 14 2.02 0.839 9.01e-1 4.70e-1 2.52 6.97e+0 3.18 4.61 24.7 6.91 0.923
#> 15 1.63 1.25 1.25e+0 6.08e-1 3.68 6.55e+0 2.83 3.56 25.0 5.75 0.650
#> 16 1.04 1.99 1.80e-1 3.60e-1 4.44 5.54e+0 2.07 4.74 26.8 6.07 0.354
#> 17 0.703 2.19 2.68e-1 3.18e-4 4.66 4.66e+0 1.70 4.06 26.2 5.97 0.764
#> # … with 17 more variables: FR <dbl>, HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>,
#> # LT <dbl>, LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>,
#> # RO <dbl>, SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>Thus, each row adds to 100. It is possible to produce a graphical output about the main features of country time series by invoking the function graph_departure, as shown below:
myGD <- graph_departure(res$res$departures,
timeName = "time",
indiType = "highBest",
displace = 0.25,
displaceh = 0.5,
dimeFontNum = 2,
myfont_scale = 1.25,
x_angle = 45,
axis_name_y = "Countries",
axis_name_x = "Time",
alpha_color = 0.9
)
myGD
#> $res#>
#> $msg
#> NULL
#>
#> $err
#> NULL
It must be noted that given that the indicator is of type “highBest”, the default for the colours of the rectangles are red for “-1”, grey for “0” and light sky blue for “1”. Otherwise, if an indicator is of type “lowBest”, the default for the colours of the rectangles changes in light sky blue for “-1”, grey for “0” and red for “1”. The user could also change these colours through the argument color_rect. For example, for “-1” coloured in red, “0” in yellow and “1” in dark grey, and by selecting only a subset of the available countries:
myGD1 <- graph_departure(res$res$departures[,1:10],
timeName = "time",
indiType = "highBest",
displace = 0.25,
displaceh = 0.45,
dimeFontNum = 4,
myfont_scale = 1.35,
x_angle = 45,
color_rect = c("-1"='red', "0"='yellow',"1"='green4'),
axis_name_y = "Countries",
axis_name_x = "Time",
alpha_color = 0.29
)
myGD1
#> $res#>
#> $msg
#> NULL
#>
#> $err
#> NULL
Moreover, the graphical output could be further customized by the user; for example, by changing the scale of the argument displaceh, or that of x_angle related to the angle according to which the years are displayed on the x axes.
Ranking of countries by each considered years is implemented in the country_ranking function. The argument typeInd allows to select according to which indicator the ranking should be performed: highBest (“higher is the best”) or lowBest (“lower is the best”). For example, for the emp_20_64_MS data, the ranking is performed according to the highBest indicator from 2005 to 2010:
data(emp_20_64_MS)
myCR<-country_ranking(emp_20_64_MS,timeName = "time", time_0 = 2005, time_t = 2010,
typeInd= "highBest")
myCR
#> $res
#> # A tibble: 6 x 29
#> time AT BE BG CY CZ DE DK EE EL ES FI FR
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2005 13 19 24 4 11 14 2 9 21 18 5 14
#> 2 2006 11 19 22 4 14 15 1 3 21 18 6 16
#> 3 2007 11 20 19 4 15 10 2 3 22 17 9 16
#> 4 2008 10 21 16 5 14 9 2 3 22 20 6 17
#> 5 2009 8 17 15 4 11 5 3 13 21 24 7 14
#> 6 2010 6 14 18 3 10 3 5 15 23 24 8 13
#> # … with 16 more variables: HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>,
#> # LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>, RO <dbl>,
#> # SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>
#>
#> $msg
#> NULLThe function departure_best in the convergEU package allows to calculate for each country the departure from the best performing MS. For example, for the emp_20_64_MS data by considering the highBest indicator:
mySTB2 <- departure_best(emp_20_64_MS,timeName="time",indiType = "highBest")
mySTB2
#> $res
#> $res$raw_departures
#> # A tibble: 17 x 29
#> time AT BE BG CY CZ DE DK EE EL ES FI
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2002 -7.90 -14.1 -22.3 -3.7 -7.10 -10 -0.5 -10.5 -16 -15.6 -5.60
#> 2 2003 -7.2 -14 -19.8 -3.10 -7.5 -10.1 -1.10 -9.4 -14.7 -14.2 -5.60
#> 3 2004 -9.70 -12.3 -16.9 -2.40 -8 -10.2 0 -7.90 -13.8 -12.9 -5.60
#> 4 2005 -7.70 -11.6 -16.2 -3.70 -7.40 -8.70 -0.100 -6.10 -13.7 -10.6 -5.10
#> 5 2006 -7.8 -12.9 -14.3 -3.6 -8.2 -8.3 0 -3.5 -13.8 -10.4 -5.5
#> 6 2007 -7.30 -12.4 -11.7 -3.30 -8.10 -7.20 -1.10 -3.20 -14.3 -10.4 -5.30
#> 7 2008 -6.6 -12.4 -9.7 -3.9 -8 -6.4 -1.7 -3.3 -14.1 -11.9 -4.6
#> 8 2009 -4.90 -11.2 -9.5 -3 -7.40 -4.10 -2.2 -8.30 -12.7 -14.3 -4.80
#> 9 2010 -4.20 -10.5 -13.4 -3.10 -7.70 -3.10 -3.20 -11.3 -14.3 -15.3 -5.10
#> 10 2011 -5.2 -12.1 -16.5 -6 -8.5 -2.9 -4.6 -8.8 -19.8 -17.4 -5.6
#> 11 2012 -5 -12.2 -16.4 -9.2 -7.9 -2.5 -5.1 -7.2 -24.4 -19.8 -5.4
#> 12 2013 -5.2 -12.6 -16.3 -12.6 -7.30 -2.5 -5.5 -6.5 -26.9 -21.2 -6.5
#> 13 2014 -5.80 -12.7 -14.9 -12.4 -6.5 -2.30 -5.30 -5.7 -26.7 -20.1 -6.9
#> 14 2015 -6.2 -13.3 -13.4 -12.6 -5.7 -2.5 -5.10 -4 -25.6 -18.5 -7.60
#> 15 2016 -6.4 -13.5 -13.5 -12.5 -4.5 -2.6 -5.2 -4.6 -25 -17.3 -7.80
#> 16 2017 -6.40 -13.3 -10.5 -11 -3.30 -2.60 -5.2 -3.10 -24 -16.3 -7.60
#> 17 2018 -6.40 -12.9 -10.2 -8.70 -2.70 -2.70 -5.10 -3.10 -23.1 -15.6 -6.30
#> # … with 17 more variables: FR <dbl>, HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>,
#> # LT <dbl>, LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>,
#> # RO <dbl>, SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>
#>
#> $res$cumulated_dif
#> AT BE BG CY CZ DE DK EE EL ES FI
#> -109.9 -214.0 -245.5 -114.8 -115.8 -88.7 -51.0 -106.5 -322.9 -261.8 -100.9
#> FR HR HU IE IT LT LU LV MT NL PL
#> -170.3 -317.7 -260.3 -162.1 -312.5 -148.8 -163.7 -157.5 -280.8 -61.7 -266.4
#> PT RO SE SI SK UK
#> -146.4 -245.2 -0.9 -155.6 -223.3 -72.6
#>
#>
#> $msg
#> NULL
#>
#> $err
#> NULLwhere the “$res” component is a list of the departures from the best performer and the cumulated differences over years.
It is also possible to plot for each country the deviations from the best performer as follows:
departure_best_plot(mySTB2$res$cumulated_dif,
mainCountry = "IT",
countries=c("DE", "FR","SE"),
displace = 0.25,
axis_name_y = "Countries",
val_alpha = 0.95,
debug=FALSE)
#> $res#>
#> $msg
#> NULL
#>
#> $err
#> NULL
Note that in the argument mainCountry, the country of interest should be specified (that is Italy in our example), while the argument countries reports the set of countries with respect to which the deviations are plotted.
Scoreboards of countries show the departure of an indicator level from the average for each year in the dataset. The syntax to calculate such quantities for the dataset emp_20_64_MS is the following:
data(emp_20_64_MS)
resTB <- scoreb_yrs(emp_20_64_MS,timeName = "time")
resTB
#> $res
#> $res$sigma_conv
#> # A tibble: 17 x 9
#> time stdDev CV mean devianceT elle1in elle1su elle2in elle2su
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2002 6.34 0.0939 67.5 1125. 64.3 70.7 61.2 73.9
#> 2 2003 5.95 0.0878 67.8 991. 64.8 70.7 61.8 73.7
#> 3 2004 5.70 0.0839 67.9 909. 65.1 70.8 62.2 73.6
#> 4 2005 5.54 0.0809 68.4 858. 65.7 71.2 62.9 74.0
#> 5 2006 5.57 0.0801 69.6 869. 66.8 72.3 64.0 75.1
#> 6 2007 5.47 0.0775 70.6 838. 67.9 73.3 65.1 76.1
#> 7 2008 5.36 0.0755 71.0 804. 68.3 73.7 65.6 76.3
#> 8 2009 5.03 0.0730 69.0 710. 66.5 71.5 64.0 74.0
#> 9 2010 5.24 0.0769 68.1 768. 65.5 70.7 62.9 73.4
#> 10 2011 5.59 0.0821 68.1 875. 65.3 70.9 62.5 73.7
#> 11 2012 5.98 0.0880 68 1002. 65.0 71.0 62.0 74.0
#> 12 2013 6.28 0.0922 68.0 1103. 64.9 71.2 61.8 74.3
#> 13 2014 5.98 0.0867 69.0 1000. 66.0 72.0 63.0 75.0
#> 14 2015 5.74 0.0820 70.0 922. 67.1 72.9 64.2 75.7
#> 15 2016 5.60 0.0789 71.0 879. 68.2 73.8 65.4 76.6
#> 16 2017 5.37 0.0741 72.5 808. 69.8 75.2 67.1 77.9
#> 17 2018 5.30 0.0717 73.8 786. 71.2 76.5 68.6 79.1
#>
#> $res$sco_level
#> # A tibble: 17 x 29
#> time AT BE BG CY CZ DE DK EE EL ES FI FR
#> <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 2002 4 3 1 5 4 3 5 3 2 2 4 3
#> 2 2003 4 2 1 5 4 3 5 3 2 2 4 3
#> 3 2004 3 3 1 5 3 3 5 3 2 3 4 3
#> 4 2005 3 3 1 5 3 3 5 4 2 3 4 3
#> 5 2006 3 2 2 5 3 3 5 5 2 3 4 3
#> 6 2007 3 2 3 5 3 3 5 5 2 3 4 3
#> 7 2008 4 2 3 5 3 4 5 5 2 3 4 3
#> 8 2009 4 3 3 5 3 5 5 3 2 2 4 3
#> 9 2010 5 3 2 5 3 5 5 3 2 1 4 3
#> 10 2011 5 3 2 4 4 5 5 3 1 1 5 3
#> 11 2012 5 3 2 3 4 5 5 4 1 1 5 3
#> 12 2013 5 3 2 3 4 5 4 4 1 1 4 3
#> 13 2014 4 3 2 3 4 5 4 4 1 1 4 3
#> 14 2015 4 3 2 3 4 5 4 5 1 1 4 3
#> 15 2016 4 2 2 3 5 5 4 4 1 1 3 3
#> 16 2017 4 2 3 3 5 5 4 5 1 1 3 3
#> 17 2018 3 2 3 3 5 5 4 5 1 1 3 3
#> # … with 16 more variables: HR <int>, HU <int>, IE <int>, IT <int>, LT <int>,
#> # LU <int>, LV <int>, MT <int>, NL <int>, PL <int>, PT <int>, RO <int>,
#> # SE <int>, SI <int>, SK <int>, UK <int>
#>
#> $res$sco_change
#> # A tibble: 17 x 29
#> time AT BE BG CY CZ DE DK EE EL ES FI FR
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2002 NA NA NA NA NA NA NA NA NA NA NA NA
#> 2 2003 3 3 5 3 2 2 1 4 4 4 2 4
#> 3 2004 1 4 5 3 2 2 3 4 3 4 3 2
#> 4 2005 5 3 3 1 3 4 2 5 3 5 3 3
#> 5 2006 3 1 5 3 2 4 3 5 3 3 3 1
#> 6 2007 3 3 5 3 3 4 1 3 2 3 3 2
#> 7 2008 4 3 5 2 3 4 2 3 3 1 4 3
#> 8 2009 4 3 3 3 3 4 3 1 4 1 3 3
#> 9 2010 5 5 1 3 3 5 3 1 2 3 3 4
#> 10 2011 3 3 1 2 3 4 3 5 1 3 4 3
#> 11 2012 3 3 3 1 3 3 3 5 1 1 3 3
#> 12 2013 3 3 3 1 4 3 3 4 1 2 2 3
#> 13 2014 1 2 4 2 3 2 2 3 2 3 1 1
#> 14 2015 1 1 5 2 3 2 3 5 4 5 1 2
#> 15 2016 2 2 2 3 5 2 2 1 3 5 2 2
#> 16 2017 1 2 5 4 3 1 1 4 3 3 2 1
#> 17 2018 2 3 3 5 3 1 2 2 4 3 5 1
#> # … with 16 more variables: HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>,
#> # LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>, RO <dbl>,
#> # SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>
#>
#> $res$sco_level_num
#> # A tibble: 17 x 29
#> time AT BE BG CY CZ DE DK EE EL ES FI FR
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2002 0.5 0 -1 1 0.5 0 1 0 -0.5 -0.5 0.5 0
#> 2 2003 0.5 -0.5 -1 1 0.5 0 1 0 -0.5 -0.5 0.5 0
#> 3 2004 0 0 -1 1 0 0 1 0 -0.5 0 0.5 0
#> 4 2005 0 0 -1 1 0 0 1 0.5 -0.5 0 0.5 0
#> 5 2006 0 -0.5 -0.5 1 0 0 1 1 -0.5 0 0.5 0
#> 6 2007 0 -0.5 0 1 0 0 1 1 -0.5 0 0.5 0
#> 7 2008 0.5 -0.5 0 1 0 0.5 1 1 -0.5 0 0.5 0
#> 8 2009 0.5 0 0 1 0 1 1 0 -0.5 -0.5 0.5 0
#> 9 2010 1 0 -0.5 1 0 1 1 0 -0.5 -1 0.5 0
#> 10 2011 1 0 -0.5 0.5 0.5 1 1 0 -1 -1 1 0
#> 11 2012 1 0 -0.5 0 0.5 1 1 0.5 -1 -1 1 0
#> 12 2013 1 0 -0.5 0 0.5 1 0.5 0.5 -1 -1 0.5 0
#> 13 2014 0.5 0 -0.5 0 0.5 1 0.5 0.5 -1 -1 0.5 0
#> 14 2015 0.5 0 -0.5 0 0.5 1 0.5 1 -1 -1 0.5 0
#> 15 2016 0.5 -0.5 -0.5 0 1 1 0.5 0.5 -1 -1 0 0
#> 16 2017 0.5 -0.5 0 0 1 1 0.5 1 -1 -1 0 0
#> 17 2018 0 -0.5 0 0 1 1 0.5 1 -1 -1 0 0
#> # … with 16 more variables: HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>,
#> # LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>, RO <dbl>,
#> # SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>
#>
#>
#> $msg
#> NULL
#>
#> $err
#> NULLwhere the result is a list of three components: the summary statistics, the numerical labels to indicate the interval of the partition a level belongs to, the interval of the partition a change belongs to.
Numerical labels are assigned as described in the DRAFT JOINT EMPLOYMENT REPORT FROM THE COMMISSION AND THE COUNCIL (2018):
For the comparison of a country with the EU average, the following steps are recommended, from raw data:
First, we select the country Italy (“IT”) to perform comparison with the EU average:
Then, we calculate the Sigma convergence:
and we build the ttmp dataset composed of i) the results of the $res component of Sigma convergence, and ii) the emp_20_64_MS dataset without the time variable:
estrattore<- emp_20_64_MS[,timeName] >= 2002 & emp_20_64_MS[,timeName] <= 2016
ttmp <- cbind(outSig$res, dplyr::select(emp_20_64_MS[estrattore,], -contains(timeName)))Following, we extract the minimum and the maximum for the indicator’s values in the emp_20_64_MS dataset:
miniY <- min(emp_20_64_MS[,- which(names(emp_20_64_MS) == timeName )])
maxiY <- max(emp_20_64_MS[,- which(names(emp_20_64_MS) == timeName )])and we build the plot for the comparison of Italy and the EU average:
myx_angle <- 45
myG2 <-
ggplot(ttmp) + ggtitle(
paste("EU average (black, solid) and country",selectedCountry ," (red, dotted)") )+
geom_line(aes(x=ttmp[,timeName], y =ttmp[,"mean"]),colour="black") +
geom_point(aes(x=ttmp[,timeName],y =ttmp[,"mean"]),colour="black") +
# geom_line()+geom_point()+
ylim(c(miniY,maxiY)) + xlab("Year") +ylab("Indicator") +
theme(legend.position = "none")+
# add countries
geom_line( aes(x=ttmp[,timeName], y = ttmp[,"IT"],colour="red"),linetype="dotted") +
geom_point( aes(x=ttmp[,timeName], y = ttmp[,"IT"],colour="red")) +
ggplot2::scale_x_continuous(breaks = ttmp[,timeName],
labels = ttmp[,timeName]) +
ggplot2::theme(
axis.text.x=ggplot2::element_text(
#size = ggplot2::rel(myfont_scale ),
angle = myx_angle
#vjust = 1,
#hjust=1
))
myG2It is also possible to graphically show departures in terms of the above defined partition by invoking the ms_dynam function:
#{r,fig.height=6,fig.width=6}
obe_lvl <- scoreb_yrs(emp_20_64_MS,timeName = timeName)$res$sco_level_num
# select subset of time
estrattore <- obe_lvl[,timeName] >= 2009 & obe_lvl[,timeName] <= 2016
scobelvl <- obe_lvl[estrattore,]
my_MSstd <- ms_dynam( scobelvl,
timeName = "time",
displace = 0.25,
displaceh = 0.45,
dimeFontNum = 3,
myfont_scale = 1.35,
x_angle = 45,
axis_name_y = "Countries",
axis_name_x = "Time",
alpha_color = 0.9,
indiType = "highBest")
my_MSstdNote that the colours of the labels for the defined partition change depending on the type of indicator. More precisely, in the example considered above, the indicator is of type “highBest”; thus, the colours of the rectangles are red for the label “-1”, yellow (ocra) for “-0.5”, yellow for “0”, pale green for “0.5” and dark green for “+1”. Otherwise, if an indicator is of type “lowBest”, these colours change in dark green for “-1”, pale green for “-0.5”, yellow for “0”, yellow (ocra) for “0.5” and red for “1”.
The following reference may be consulted for details:
Brussels, 21.11.2018, COM(2018) 761 final, DRAFT JOINT EMPLOYMENT REPORT FROM THE COMMISSION AND THE COUNCIL, accompanying the Communication from the Commission on the Annual Growth Survey 2019.
Eurofound (2018), Upward convergence in the EU: Concepts, measurements and indicators, Publications Office of the European Union, Luxembourg; by: Massimiliano Mascherini, Martina Bisello, Hans Dubois and Franz Eiffe.
Tuszynski, J. (2015). caTools: Tools: moving window statistics, GIF, Base64, ROC AUC, etc. R package version 1.17.1.2, URL https://CRAN.R-project.org/package=caTools.
Wickham, H., Bryan, J., Kalicinski, M., Valery, K., Leitienne, C., Colbert, B., Hoerl, D., Miller, E. (2019). readxl: Read Excel Files. R package version 1.3.1, URL https://CRAN.R-project. org/package=readxl.
Wickham, H., Hester, J., Francois, R., Jukka Jylanki, Jorgensen M (2018). readr: Read Rectangular Text Data. R package version 1.3.1, URL https://CRAN.R-project.org/package=readr.
Buuren van, S., Groothuis-Oudshoorn, K. (2011) mice: Multivariate Imputation via Chained Equations in R. Journal of Statistical Software, 45: 1-67.
Honaker J, King G, Blackwell M, Blackwell MM. Amelia: A Program for Missing Data. R package version 1.7.6, URL https://cran.r-project.org/web/packages/Amelia/index.html.
Stekhoven, D.J. missForest: Nonparametric Missing Value Imputation using Random Forest. R package version 1.4, URL https://cran.r-project.org/web/packages/missForest/index.html.
Harrell, F.E., Jr. Hmisc: Harrell Miscellaneous. R package version 4.3-0, URL https://cran.r-project.org/web/packages/Hmisc/index.html.
Gelman, A., Hill, J., Yajima, M., Su, Y., Pittau, M., Goodrich, B., Si, Y., Kropko, J. mi: Missing Data Imputation and Model Checking, R package version 1.0, URL https://cran.r-project.org/web/packages/mi/index.html.
Wickham, H. and Henry, L. tidyr: Tidy Messy Data, R package version 1.0.0, URL https://cran.r-project.org/web/packages/tidyr/index.html.