Seminari del DiSIA
16/12/2021 ore 15.00 - The participation on site is restricted and you need to register here https://labdisia.disia.unifi.it/reserve205/. The Lecture will be available also online. Please register here to participate https://us02web.zoom.us/webinar/register/WN_0QB9L_b3RlS9Zp5Rix30_g
Bayesian Models for Microbiome Data with Variable Selection
Marina Vannucci (Rice University, Houston (USA))
I will describe Bayesian models developed for understanding how the microbiome varies within a population of interest. I will focus on integrative analyses, where the goal is to combine microbiome data with other available information (e.g. dietary patterns) to identify significant associations between taxa and a set of predictors. For this, I will describe a general class of hierarchical Dirichlet-Multinomial (DM) regression models which use spike-and-slab priors for the selection of the significant associations. I will also describe a joint model that efficiently embeds DM regression models and compositional regression frameworks, in order to investigate how the microbiome may affect the relation between dietary factors and phenotypic responses, such as body mass index. I will discuss advantages and limitations of the proposed methods with respect to current standard approaches used in the microbiome community, and will present results on the analysis of real datasets. If time allows, I will also briefly present extensions of the model to mediation analysis.
10/12/2021 ore 15.00 - The Seminar will be on-line: https://datascience.unifi.it/index.php/event/9th-seminar-of-the-d2-seminar-series-florence-center-for-data-science
D2 Seminar Series
(RG) Italy’s lowest-low fertility in times of uncertainty
(AM) Approximating the Neighborhood Function of (Temporal) Graphs
Doppio seminario FDS:
Raffaele Guetto (DiSIA) & Andrea Marino (DiSIA)
The generalized and relatively homogeneous fertility decline across European countries in the aftermath of the Great Recession poses serious challenges to our knowledge of contemporary low fertility patterns. The rise of economic uncertainty has often been identified, in the sociological and demographic literature, as the main cause of this state of affairs. The forces of uncertainty have been traditionally operationalized through objective indicators of individuals’ actual and past labour market situations. However, this presentation argues that the role of uncertainty needs to be conceptualized and operationalized taking into account that people use works of imagination, producing their own narrative of the future, also influenced by the media. To outline such an approach, I review contemporary drivers of Italy’s lowest-low fertility, placing special emphasis on the role of uncertainty fueled by labour market deregulations and – more recently – the Covid-19 pandemic. I discuss the effects of the objective (labour-market related) and subjective (individuals’ perceptions, including future outlooks) sides of uncertainty on fertility, based on a set of recent empirical findings obtained through a variety of data and methods. In doing so, I highlight the potential contribution of so-called “big data” and techniques of media content analysis and Natural Language Processing for the analysis of the effects of media-conveyed narratives of the economy.
The average distance in graphs (like, for instance, the Facebook friendship network and the Internet Movie Database collaboration network), often referred to as degrees of separation, has been largely investigated. However, if the number of nodes is very large (millions or billions), computing this measure needs prohibitive time and space costs as it requires to compute for each node the so-called neighbourhood function, i.e. for each vertex v and for each h, how many nodes are within distance h from v. Temporal graphs are a special kind of graphs where edges have temporal labels, specifying their occurring times, in the same way as the connections of the public transportation network of a city are available only at scheduled times. Here, paths make sense only if they correspond to sequences of edges with strictly increasing labels. A possible notion of distance between two nodes in a temporal network is the earliest arrival time of the temporal paths connecting the two nodes. In this case, the temporal neighbourhood function is defined as the number of nodes reachable from a given one in a given time interval, and it is also expensive to compute. We introduce probabilistic counting in order to approximate the size of sets and we show how both plain and temporal neighbourhood functions can be approximated by plugging this technique into a simple dynamic programming algorithm.
26/11/2021 ore 14.00 - The Seminar will be on-line: https://datascience.unifi.it/index.php/event/8th-seminar-of-the-d2-seminar-series-florence-center-for-data-science/
D2 Seminar Series
(GR) State Space Model to Detect Cycles in Heterogeneous Agents Models (joint work with Filippo Gusella)
(MB) High quality video experience using deep neural networks
Doppio seminario FDS:
Giorgio Ricchiuti (DISEI) & Marco Bertini (DINFO)
We propose an empirical test to depict possible endogenous cycles within Heterogeneous Agent Models (HAMs). We consider a 2-type HAM into a standard small-scale dynamic asset pricing framework. On the one hand, fundamentalists base their expectations on the deviation of fundamental value from market price expecting a convergence between them. On the other hand, chartists, subject to self-fulling moods, consider the level of past prices and relate it to the fundamental value acting as contrarians. These pricing strategies, by their nature, cannot be directly observed but can cause the response of the observed data. For this reason, we consider the agents' beliefs as unobserved state components from which, through a state space model formulation, the heterogeneity of fundamentalist-chartist trader cycles can be mathematically derived and empirically tested. The model is estimated using the S&P500 index, for the period 1990-2020 at different time scales, specifically, daily, monthly, and quarterly.
Lossy image and video compression algorithms are the enabling technology for a large variety of multimedia applications, reducing the bandwidth required for image transmission and video streaming. However, lossy image and video compression codecs decrease the perceived visual quality, eliminate higher frequency details and in certain cases add noise or small image structures. There are two main drawbacks of this phenomenon. First, images and videos appear much less pleasant to the human eye, reducing the quality of experience. Second, computer vision algorithms such as object detectors may be hindered and their performance reduced. Removing such artefacts means recovering the original image from a perturbed version of it. This means that one ideally should invert the compression process through a complicated non-linear image transformation. In this talk, I’ll present our most recent works based on the GAN framework that allows us to produce images with photorealistic details from highly compressed inputs.
25/11/2021 ore 12.00 - Per partecipare in presenza (max 20 persone) prenotarsi su https://labdisia.disia.unifi.it/reserve205/. Per partecipare a distanza: meet.google.com/aio-dhut-nah
Mortality and morbidity drivers of the global distribution of health
Inaki Permanyer (Centre d’Estudis Demogràfics, Barcelona)
Increasing life expectancy (LE) and reducing its variability across countries (the so-called “International Health Inequality”, IHI) are progressively prominent goals in global development agendas. Yet, LE is composed of two components: the number of years individuals are expected to live in “good” and in “less-than-good” health. While the first component (“Health-adjusted life expectancy”, HALE) is normatively desirable, the second one (“Unhealthy life expectancy”, UHLE, or LE – HALE) is highly controversial because of the high personal, social, and economic costs often associated with the presence of disease or disability – an issue that can muddy the waters when interpreting global health dynamics and calls for new conceptual approaches. Here we document how the evolution of HALE and UHLE between 1990 and 2019 have shaped (i) the trends and composition of LE both at the country, regional and the global levels, and (ii) the levels and trends in IHI. Our findings indicate that UHLE has tended to grow at a faster rate than HALE, thus leading to an expansion of morbidity in 75% of world countries. IHI increases until year 2000 and starts declining from that year onwards – a trend that is mostly determined by the evolution of HALE across countries. While still minoritarian, UHLE is a non-negligible and increasingly relevant factor determining the levels of international health inequality (IHI). These findings and ideas are useful to understand the role that the healthy and unhealthy components of LE are playing in contemporary health dynamics and for the elaboration of policies aiming at tackling health inequalities both across and within countries.
24/11/2021 ore 12.00 - Per partecipare in presenza, in aula 205 (ex 32), max 20 persone, prenotarsi su https://labdisia.disia.unifi.it/reserve205/. Per partecipare a distanza: meet.google.com/bas-vwed-bcv
Innovation on methods for the evaluation of the short-term health effects of environmental hazards.
Francesco Sera (DiSIA) e Dominic Royé (University of Santiago de Compostela)
Numerous studies have evaluated the short-term effects on health of environmental exposures (e.g. pollutants, temperature) giving valuable information for preventive public health strategies. In this seminar we’ll discuss some advances in the methodology and in the definition of the exposures and
the outcomes that allows a better description of the epidemiological association.
The two-stage design has become a standard tool in environmental epidemiology to model short-term effects with multi-location data. We’ll illustrate multiple design extensions of the classical two-stage method. These are based on improvements of the standard two-stage meta-analytic models along the lines of linear mixed-effects models, by allowing location-specific estimates to be pooled through a flexible fixed and random-effects structures. This permits the analysis of associations characterised by combinations of multivariate outcomes, hierarchical geographical structures, repeated measures, and/or longitudinal settings. The designs extensions will be illustrated in examples using data collected by the Multi-country Multi-city research network.
In terms of new exposures we defined new indices that emphasize the importance of hot nights, which may prevent necessary nocturnal rest. Recently we use hot-night duration and excess to predict daily cause-specific mortality in summer, using multiple cities across Southern Europe. We found positive but generally non-linear associations between relative risk of cause-specific mortality and duration and excess of hot nights.
Another aspect is that most of the studies use daily counts of deaths or hospitalisations as health outcomes, although they are the ones at the top of the health impact pyramid reflecting only a limited proportion of patients with the most severe cases. In a recent study we evaluate the relationship between short-term exposure to the daily mean temperature and medication prescribed for the respiratory system in five Spanish cities.
The proposed innovation on the methodology and in the definition of the exposures and the outcomes provide new evidence on the short-term health effects of environmental hazards that could improve decision-making for preventive public health strategies.
12/11/2021 ore 14.00 - The Seminar will be on-line: https://datascience.unifi.it/index.php/event/7th-seminar-of-the-d2-seminar-series-florence-center-for-data-science/
D2 Seminar Series
(LB) Recent advances in bibliometric indexes and their implementation
(VB) Fisher’s Noncentral Hypergeometric Distribution for the Size Estimation of Unemployed Graduates in Italy (joint work with Brunero Liseo, University Sapienza, Roma)
Doppio seminario FDS:
Luigi Brugnano (DIMAI) & Veronica Ballerini (DiSIA)
Bibliometric indexes are nowadays very commonly used for assessing scientific production, research groups, journals, etc. It must be stressed that such indexes cannot substitute to enter the merit of the specific research but, nonetheless, they can provide a gross evaluation of its impact on the scientific community. That premise, the currently used indexes often have drawbacks and/or sensibly vary for different subjects of investigation. For this reason, in  an alternative index has been proposed, based on an idea akin to that of the Google PageRank. Its actual implementation has been recently done in the Scopus database . In this talk, the basic facts and results of this approach will be recalled.  P.Amodio, L.Brugnano. Recent advances in bibliometric indexes and the PaperRank problem. Journal of Computational and Applied Mathematics 267 (2014) 182-194. http://doi.org/10.1016/j.cam.2014.02.018.  P.Amodio, L.Brugnano, F.Scarselli. Implementation of the PaperRank and AuthorRank indices in the Scopus database. Journal of Infometrics 15 (2021) 101206. https://doi.org/10.1016/j.joi.2021.101206.
To quantify unemployment among those who have never been employed is often tough. The lack of an administrative data flow attributable to such individuals makes them an elusive population. Hence, one must rely on surveys. However, individuals’ response rates to questions on their occupation may differ according to their employment status, implying a not-at-random missing data generation mechanism. We exploit the underused Fisher’s noncentral hypergeometric distribution (FNCH) to solve such a biased urn experiment. FNCH has been underemployed in the statistical literature mainly because of the computational burden given by its probability mass function. Indeed, as the number of draws and the number of different categories in the population increases, any method involving the evaluation of the likelihood is practically unfeasible. Firstly, we present a methodology that allows the approximation of the posterior distribution of the population size via MCMC and ABC methods. Then, we apply such methodology to the case of graduated unemployed in Italy, exploiting information from different data sources.
29/10/2021 ore 14.00 - The Seminar will be on-line: https://datascience.unifi.it/index.php/event/6th-seminar-of-the-d2-seminar-series-florence-center-for-data-science/
D2 Seminar Series
(GI) Performance-based research funding: Evidence from the largest natural experiment worldwide
(MF) Consensus-based optimization
Doppio seminario FDS: Giulia Iori (Department of Economics of the City University of London) & Massimo Fornasier (Department of Mathematics of the Technical University of Munich)
The Research Excellence Framework (REF) is the main UK government policy on public research in the last 30 years. The primary aim of this policy is to promote and reward research excellence through competition for scarce research resources. Surprisingly, and despite the severe criticisms, little has been done to systematically evaluate its effects. In this paper, we evaluate the impact of the REF 2014. We exploit a large database that contains all publications in Economics, Business, Management, and Finance available in Scopus since 2001. We use a synthetic control method to compare the performance of each of the universities from the UK with counter-factual similar units in terms of past research constructed using data for US universities. We find a significant positive increase, relative to the control group, in the number of published papers, and in the proportion of papers published in highly ranked journals within the Economics/Econometrics area and the Business, Management and Finance area. Both Russell and non-Russell Group universities benefited from the REF, with the Russell Group universities experiencing an overall significant increase in the number of publications and number of publications in top journals, and the non-Russell group experiencing a significant increase in the proportion of publications in top journals in all areas. Interestingly, the non-Russell group experienced a comparatively stronger increase in the proportion of top publications in Economics/Econometrics while the Russell Group experienced a comparatively stronger increase in the proportion of top publications in Business, Management and Finance. However, we see an insignificant effect when we focus on per-author output measures indicating that growth in output was mostly achieved by an increase in the number of research active academics rather than an overall increase in research productivity.
Consensus-based optimization (CBO) is a multi-agent metaheuristic derivative-free optimization method that can globally minimize nonconvex nonsmooth functions and is amenable to theoretical analysis. In fact, optimizing agents (particles) move on the optimization domain driven by a drift towards an instantaneous consensus point, which is computed as a convex combination of particle locations, weighted by the cost function according to Laplace’s principle, and it represents an approximation to a global minimizer. The dynamics are further perturbed by a random vector field to favor exploration, whose variance is a function of the distance of the particles to the consensus point. Based on an experimentally supported intuition that CBO always performs a gradient descent of the squared Euclidean distance to the global minimizer, we show a novel technique for proving the global convergence to the global minimizer in mean-field law for a rich class of objective functions. We further present formulations of CBO over compact hypersurfaces. We conclude the talk with a few numerical experiments, which show that CBO scales well with the dimension and is extremely versatile.
28/10/2021 ore 12.00 - Per partecipare in presenza (max 20 persone) prenotarsi su https://labdisia.disia.unifi.it/reserve205/. Per partecipare a distanza: https://meet.google.com/atb-qtxf-nvb
Uncertainty across the “Contact Line”: armed conflict, COVID-19, and perceptions of fertility decline in Eastern Ukraine
Brienna Perelli-Harris (University of Southampton)
While economic uncertainty has been a key explanation for very low fertility throughout Europe, few studies have conducted in-depth investigations into social and political sources of uncertainty. Here we study Ukraine, which has recently experienced fertility rates around 1.3. In 2014, armed conflict wracked Ukraine’s eastern regions, producing around 1.7 million internally displaced persons (IDPs). To better understand how continuing crises shape Ukrainians’ perceptions of childbearing, we analyse 16 online focus groups conducted in July 2021. We compare IDPs and locals and residents of the Government and non-Government Controlled Areas. The discussions reveal how Covid and political instability have intensified uncertainties, discouraging couples from having more than one child. Participants mentioned poverty, insecurity, and uncertain relationships as reasons to curtail childbearing. Some blamed the government or delved into conspiracy theories. Nonetheless, others were optimistic and planned to have more children, indicating mixed reactions to uncertainty.
15/10/2021 ore 14.00 - The Seminar will be on-line: https://datascience.unifi.it/index.php/event/5th-seminar-of-the-d2-seminar-series-florence-center-for-data-science/
D2 Seminar Series
(LB) Endogenous and Exogenous Volatility in the Foreign Exchange Market (with G. Cifarelli)
(CM) Artificial Intelligence in Neuroimaging
Doppio seminario FDS: Leonardo Bargigli (DISEI) & Chiara Marzi (IFAC – CNR)
We study two sources of heteroscedasticity in high-frequency financial data. The first, endogenous, source is the behaviour of bounded rational market participants. The second, exogenous, source is the flow of market relevant information. We estimate the impact of the two sources jointly by means of a Markov switching (MS) SVAR model. Following the original intuition of Rigobon (2003), we achieve identification for all coefficients by assuming that the structural errors of the MS-SVAR model follow a GARCH-DCC process. Using transaction data of the EUR/USD interdealer market in 2016, we firstly detect three regimes of endogenous volatility. Then we show that both kinds of volatility matter for the transmission of shocks, and that the exogenous information is channelled to the market mostly through price variations. This suggests that, on the FX market, liquidity providers are better informed than liquidity takers, who act mostly as feedback traders. The latter are able to profit from trade because, unlike noise traders, they respond immediately to the informative price shocks.
Life sciences data coupled with Artificial Intelligence (AI) techniques can help researchers accurately pinpoint novel biomarkers. AI can propose new indices as potential biomarkers while simultaneously aiding in searching for hidden patterns among “well-established” indices. In this webinar, we will take a brief journey through some applications of Machine Learning in neuroimaging. In the first part of the webinar, we will talk about the not so easy “marriage” between AI and clinical data, focusing on Big Data from Radiology Imaging and related issues. In the second part, we will see how we can transfer mathematical, physical, and statistical ideas to the Neuroimaging domain and how AI can help this transfer. An example is the study of the structural complexity of the brain starting from MRI images, using fractal analysis. The Fractal Dimension (FD) can be considered a measure of morphological changes due to healthy ageing and/or the onset of neurological diseases. The use of ML techniques can promote the candidature of FD as a biomarker for many neurological diseases.
Business Excellence 5.0
Gabriele Arcidiacono (DIIE - Università Guglielmo Marconi, Roma)
Il webinar è l’occasione per fruire di un osservatorio privilegiato sul tema della Business Excellence: l’eccellenza a tutto tondo definita anche Global Excellence, che permette di conseguire risultati e benefici sostenibili nel tempo. Tale approccio opera nell’ambito tecnico dei processi (Process Excellence), sulla corretta conoscenza dei comportamenti e delle dinamiche interpersonali (Human Excellence) e sul più idoneo utilizzo della tecnologia informatica e digitale (Digital Excellence). La sinergia di questi tre campi d’azione permette di strutturare un percorso di Miglioramento Continuo che includa processi, persone e digitalizzazione e conduca al raggiungimento di risultati sempre più sfidanti. - scheda
30/09/2021 ore 15.30 - La partecipazione in presenza (Aula 205 - ex 32) è possibile per un max di 15 persone, contattare email@example.com. Per partecipare a distanza: https://meet.google.com/hfw-mdvm-ity
Ethnicity and Governing Party Support in Africa
Carlos G. Rivero (Valencia University & Centre for International and Comparative Politics, Stellenbosch University)
Historically, ethnicity has been considered to play a fundamental role in voting behaviour in Africa. However, researchers on the issue have found contradictory conclusions. The most recent research concludes that the African voter is more rational than expected. Overall ethnicity seems to be less influential than theory used to suggest. Against this background, this paper analyses vote for governing party in Africa and presents evidence that the method and data set used will have an important influence upon the final result. The research takes form of a quantitative analysis making extensive use of survey data from 2005 to 2019. Results indicate that ethnicity, although not exclusively, is still an explanatory factor. At a glance, African vote is rationally ethnic.
23/09/2021 ore 12.00 - La partecipazione in presenza (Aula 205 - ex 32) è possibile per un max di 15 persone, contattare firstname.lastname@example.org. Per partecipare a distanza: https://meet.google.com/ezd-dhpt-uvh
Incentivizing family savings as way to fight educational inequality: Preliminary results from an RCT in Italy
Loris Vergolini (University of Bologna & FBK-IRVAPP)
The paper deals with the programme WILL “Educare al Futuro”, a policy experimentation aimed at implementing a Children Saving Accounts and at evaluating its effects on a set of school-related outcomes. The program, launched in 2019 in four Italian cities (Cagliari, Florence, Teramo and Torino), targets 9th grade students from low-income families, offering them the opportunity to regularly save small amounts of money on a bank account that provides a generous multiplier (x4 if the money is spent on proven school expenses). In addition to the savings account, the beneficiaries are entitled to access to: financial education courses, to educational support and to a guidance programme. The impacts of the program will be assessed using a randomized controlled trial by 2024. In this paper we present a set of preliminary findings emerging from the first follow-up survey carried out in Spring 2021. We found that families in the treatment group show an increase in both general and for school activities savings and this is not at the expenses of other spending. Moreover, the treated families used the saving accounts to buy ICT equipment (a computer, a tablet or an internet connection) to allow their children attend online learning during the Covid-19 pandemic. Our preliminary results also suggest that WILL has no effects on parental involvement, parental aspirations and expectations as well as children school performances and children socio-emotional well-being. The lack of effects on these dimensions is discussed in terms of theory of change and of implementation issues due to the pandemic.
Affidabilità per componenti e sistemi (RELIA)
Marcantonio Catelani (Dipartimento di Ingegneria dell'informazione)
Prendendo spunto dagli esempi di "criticità" e "punti di attenzione" che caratterizzano l’argomento, obiettivo dell’incontro è fornire alcuni spunti di riflessione e discussione a partire da semplici concetti di base, metodologici e sperimentali, dell’affidabilità.
Il seminario proposto, "Affidabilità per componenti e sistemi", costituisce un primo evento a cui seguiranno ulteriori approfondimenti utili, a parere dei relatori, per comprendere il più ampio contesto dei requisiti e prestazioni RAMS - Reliability, Availability, Maintanability and Safety - tipico di alcuni scenari industriali. - scheda
Disegno degli esperimenti e progettazione robusta (DoE-for-RD)
Rossella Berni (Dipartimento di Statistica, Informatica, Applicazioni "Giuseppe Parenti" - UniFI)
Obiettivo dell’incontro è fornire alcuni spunti di riflessione e discussione a partire da semplici concetti di base, di statistica e del disegno sperimentale, per la progettazione robusta (robust design).
Il webinar proposto, "Disegno degli esperimenti e progettazione robusta", costituisce un primo evento a cui seguirà un secondo approfondimento, utile per chiarire quali devono essere le competenze e gli strumenti necessari per una ottimizzazione robusta di processo, ampiamente utilizzata, talvolta in modo non del tutto completo e aggiornato, in ambito tecnologico e industriale. - scheda - locandina
08/07/2021 ore 10.30 - Link to join the seminar on Meet: meet.google.com/psr-ggem-mhv
Algoritmi di riduzione della dimensionalità e di estrazione di ranking per sistemi multidimensionali di indicatori ordinali e dati parzialmente ordinati
Marco Fattore (Università degli Studi di Milano-Bicocca)
Il problema della costruzione di indici sintetici e di ranking a partire da sistemi multidimensionali di indicatori ordinali è sempre più diffuso, nell’ambito della statistica socio-economica e a supporto di processi di valutazione e di multi-criteria decision/policy making; ciononostante, l’apparato metodologico per l’analisi di dati ordinali a più dimensioni è ancora limitato e in larga parte mutuato dall’analisi statistica di variabili quantitative. Obiettivo del seminario è mostrare come sia invece possibile impostare una “analisi ordinale dei dati”, importando nella metodologia statistica le corrette strutture matematiche, a partire dalla Teoria delle Relazioni d’Ordine, una parte della matematica discreta dedicata alle proprietà degli insiemi parzialmente ordinati e quasi-ordinati. In particolare, prendendo le mosse dal problema della misurazione della povertà multidimensionale, del benessere o della sostenibilità, il seminario affronta il tema della riduzione della dimensionalità e dell’estrazione di ranking per sistemi di dati ordinali e parzialmente ordinati, introducendo ai più recenti sviluppi metodologici, illustrando sia gli algoritmi già disponibili che quelli in corso di sviluppo e discutendo i loro punti di forza e di debolezza, con particolare attenzione agli aspetti computazionali. Il seminario si conclude, con un’illustrazione delle linee di ricerca, teoriche e applicate, in corso di sviluppo, fornendo una mappa dei problemi risolti e di quelli ancora aperti, nell’ambito dell’analisi multidimensionale ordinale di dati socio-economici.
02/07/2021 ore 14.00 - The Seminar will be on-line: https://datascience.unifi.it/index.php/event/4th-seminar-of-the-d2-seminar-series-florence-center-for-data-science/
D2 Seminar Series
(AG) Circular data, conditional independence & graphical models
(CC) Penalized hyperbolic-polynomial splines
Doppio seminario FDS: Anna Gottard (DiSIA) & Costanza Conti (DIEF)
Circular variables, arising in several contexts and fields, are characterized by periodicity. Models for studying the dependence/independence structure of circular variables are under-explored. We will discuss three multivariate circular distributions, the von Mises, the Wrapped Normal and the Inverse Stereographic distributions, focusing on their properties concerning conditional independence. For each of these distributions, we examine the main properties related to conditional independence and introduce suitable classes of graphical models. The usefulness of the proposal is shown by modelling the conditional independence among dihedral angles characterizing the three-dimensional structure of some proteins.
The advent of P-splines, first introduced by Eilers and Marx in 2010 (see ), has led to important developments in data regression through splines. With the aim of generalizing polynomial P-splines, in  we have recently defined a model of penalized regression spline, called HP-spline, in which polynomial B-spline functions are replaced by Hyperbolic-Polynomial bell-shaped basis functions. HP-splines are defined as a solution to a minimum problem characterized by a discrete penalty term. They inherit from P-splines the advantages of the model, like the separation of the data from the spline nodes, so avoiding the problems of overfitting and the consequent oscillations at the edges. HP-splines are particularly interesting in different applications that require analysis and forecasting of data with exponential trends. Indeed, the starting idea is the definition of a polynomial-exponential smoothing spline model to be used in the framework of the Laplace transform inversion. We present some recent results on the existence, uniqueness, and reproduction properties of HP-splines, also with the aim of extending their usage to data analysis.
18/06/2021 ore 14.00 - The Seminar will be on-line: https://datascience.unifi.it/index.php/event/3rd-seminar-of-the-d2-seminar-series-florence-center-for-data-science/
D2 Seminar Series
(ER) From algebra to biology: what does the math of ensemble averaging methods can tell us
(GC) Big data from space. Recent Advances in Remote Sensing Technologies
Doppio seminario FDS:
Enrico Ravera & Gherardo Chirici (University of Florence)
Our work aims at a quantitative comparison of different methods for reconstructing conformational ensembles of biological macromolecules integrating molecular simulations and experimental data. This field has evolved over the years reflecting the evolution of computational power and sampling schemes, and a plethora of different methods have been proposed. These methods can vary extensively in terms of how the prior information from the simulation is used to reproduce the experimental data, but can be coarsely attributed to two categories: Maximum Entropy or Maximum Parsimony. In any case, the problem is severely underdetermined and therefore additional information needs to be provided on the basis of the chemical knowledge about the system under investigation. Maximum entropy looks for the minimal perturbation of the prior distribution, whereas Maximum Parsimony looks for the smallest possible ensemble that can explain in full the experimental data. On these grounds, one can expect radically different solutions in the reconstruction, but surprises are still possible – and can be justified by a rigorous geometrical description of the different methods.
Since the 1970s, remote sensing technologies for terrestrial observation have generated a constant flow of data from different platforms, in different formats and with different purposes. From these, through successive steps, spatial information is generated to support the Earth resources monitoring and planning. Indispensable in various sectors: from urban planning to geology, from agriculture to forest monitoring and, more generally, any type of information to support environmental monitoring. For this reason, remotely sensed information is recognized as a typical example of big data ante litteram. Today the new cloud computing technologies (such as Google Earth Engine) allow facing the complex problem of data management and processing of big data from remote sensing with new strategies that have revolutionized these data sources are used. From experiments on small areas, today we have moved to the possibility of operationally processing vast multidimensional and multitemporal datasets on a global scale. The increased availability of information from space is exemplified by the numerous services offered by the European Copernicus program. The presentation, starting from a brief introduction to remote sensing techniques, illustrates some examples of applications developed within the geoLAB – Geomatics Laboratory of the Department of Agriculture, Food, Environment and Forestry (DAGRI) and the UNIFI COPERNICUS Research Unit.
10/06/2021 ore 12.00 - Link to join the seminar on Meet: meet.google.com/bmh-anmz-jms
Same-Sex Marriage, Relationship Dynamics and Well-being among Sexual Minorities in the UK
Diederik Boertien (Centre d’Estudis Demogràfics, Barcelona)
Changing laws and attitudes have reduced obstacles for sexual minorities to parts of family life including marriage and parenthood. The question arises to what extent these changes affected the relationship dynamics of sexual minorities and which obstacles to achieving family-related outcomes remain. In this presentation, a closer look is taken at the impact of the legalization of same-sex marriage in the UK (2014). To what extent did legalizing same-sex marriage have an impact on the relationships of sexual minorities? Beyond the uptake of marriage, are there any changes in union formation, separation and partnering? What has the impact been on relationship satisfaction among same-sex couples? Data from Understanding Society is used which allows for a longitudinal study of the relationship dynamics of individuals in same-sex couples, but also includes information on sexual identity, which allows studying how the relationship dynamics of single individuals identifying as a sexual minority changed over time.
04/06/2021 ore 14.00 - The Seminar will be on-line: https://datascience.unifi.it/index.php/event/2nd-seminar-of-the-d2-seminar-series-florence-center-for-data-science/
D2 Seminar Series
(CB) Scattered data: surface reconstruction and fault detection
(LB) Game-based education promotes sustainable water use
Doppio seminario FDS:
Cesare Bracco & Leonardo Boncinelli (University of Florence)
We will consider two aspects concerning scattered data approximation. The first is reconstructing a (parametrized) surface from a set of scattered points: the lack of structure in the data requires approximation methods that automatically adapt to the distribution and shape of the data themselves. We will discuss an effective approach to this issue based on hierarchical spline spaces, which can be locally refined, and therefore naturally lead to adaptive algorithms. The reconstruction problem contains another interesting problem: detecting the discontinuities the surface may have in order to reproduce them. Finding the discontinuity curves, usually called faults (or gradient faults when gradient discontinuities are considered), is actually an important issue in itself, with several applications, for example in image processing and geophysics. I will present a method to determine which points in the scattered data set lie close to a (gradient) fault, based on indicators obtained by using numerical differentiation formulas.
In this study, we estimate the impact of a game-based educational program aimed at promoting sustainable water usage among 2nd-4th grade students and their families living in the municipality of Lucca, Italy. To this purpose, we exploited unique data from a quasi-experiment involving about two thousand students, one thousand participating (the treatment group), and one thousand not participating (the control group) in the program. Data were collected by means of a survey that we specifically designed and implemented for collecting students’ self-reported behaviours. Our estimates indicate that the program has been successful: the students in the program reported an increase in efficient water usage and an increase in the frequency of discussions with their parents about water usage; moreover, positive effects were still observed after six months. Our findings suggest that game-based educational programs can be an effective instrument to promote sustainable water consumption behaviors in children and their parents.
21/05/2021 ore 14.00 - The Seminar will be on-line: https://datascience.unifi.it/index.php/event/1st-seminar-of-the-d2-seminar-series-florence-center-for-data-science/
D2 Seminar Series
(FM) Assessing causality under interference
(AB) Lifelong Learning at the end of the (new) Early Years
Doppio seminario FDS:
Fabrizia Mealli & Andrew Bagdanov (University of Florence)
Causal inference from non-experimental data is challenging; it is even more challenging when units are connected through a network. Interference issues may arise, in that potential outcomes of a unit depend on its treatment as well as on the treatments of other units, such as their neighbours in the network. In addition, the typical unconfoundedness assumption must be extended—say, to include the treatment of neighbours, and individual and neighbourhood covariates—to guarantee identification and valid inference. These issues will be discussed, new estimands introduced to define treatment and interference effects and the bias of a naive estimator that wrongly assumes away interference will be shown. A covariate-adjustment method leading to valid estimates of treatment and interference effects in observational studies on networks will be introduced and applied to a problem of assessing the effect of air quality regulations (installation of scrubbers on power plants) on health in the USA.
Lifelong learning, also often referred to as continual or incremental learning, refers to the training of artificially intelligent systems able to continuously learn to address new tasks from new data while preserving knowledge learned from previously learned ones. Lifelong learning is currently enjoying a sort of renaissance due to renewed interest from the Deep Learning community. In this seminar, I will introduce the overall framework of continual learning, discuss the fundamental role played by the stability-plasticity dilemma in understanding catastrophic forgetting in lifelong learning systems, and present a broad panorama of recent results in class-incremental learning. I will conclude the discussion with a look at current trends, open problems, and low-hanging opportunities in this area.
Modelling international migration flows in Europe by integrating multiple data sources
Emanuele Del Fava (MPIDR, Germany)
Although understanding the drivers of migration is critical to enacting effective policies, theoretical advances in the study of migration processes have been limited by the lack of data on flows of migrants, or by the fragmented nature of these flows. In this work, we build on existing Bayesian modeling strategies to develop a statistical framework for integrating different types of data on migration flows. We offer estimates, as well as associated measures of uncertainty, for inflows and outflows among 31 European countries, by combining administrative and household survey data from 2002 to 2018. Methodologically, we demonstrate the added value of survey data when there is lack of administrative data or they show poor quality. Substantively, we document the historical impact of the EU enlargement and the free movement of workers in Europe on migration flows.
29/04/2021 ore 12.00 - Link to join the seminar on Meet: meet.google.com/bmh-anmz-jms
In-work poverty in Europe. Trends and determinants in longitudinal perspective
Stefani Scherer (Università degli Studi di Trento)
Employment remains among the most important factors to protect individuals and their families from economic poverty. However, recent years have witnessed an alarming increase of in-work poverty (IWP) in some contexts, thus of being poor notwithstanding employment. This paper investigates trends and determinants of in-work poverty in Europe, using EU-SILC data for the period 2004-2015. The contribution is threefold. First, we provide an analysis of the risk and the persistence of IWP for different social groups and household employment patterns and discuss potential implications for social stratification dynamics. Second, we look into the (causal) dynamics of poverty and test for the presence of genuine state dependence (GSD) and the role of unobserved heterogeneity in shaping the accumulation of economic disadvantages over time. Third, we adopt a comparative perspective across countries (and time periods) analysing how different institutional features affect exposure to and dynamics of economic disadvantages. We show that one income is often no longer enough to keep families out of poverty and thus confirm the importance of a second earner, and the need to have at least one non-low-pay source of income. This is particularly true for Europe’s South and for the less privileged social groups. We find no evidence for GSD, which comes with important implications to combat poverty in terms of activation policies, rather than through transfers. Finally, notwithstanding different levels of exposure to and “stickiness” of IWP, the drivers turn out to be pretty similar across European countries.
01/04/2021 ore 12.00 - Link to join the seminar on Meet: meet.google.com/bmh-anmz-jms
The intersection of partnership and fertility histories among immigrants and their descendants in the United Kingdom: A multistate approach
Julia Mikolai (University of St. Andrews, UK)
We study the interrelationship between partnership and fertility histories of immigrants and their descendants in the UK using data from the UK Household Longitudinal Study. Previous studies have either focused on immigrants’ fertility or their partnership experiences. However, no studies have analysed the intersection of these two interrelated life domains among immigrants and their descendants. Using multistate event history models, we analyse the outcomes of 1) unpartnered women (cohabitation, marriage, or childbirth), 2) cohabiting women (marriage, separation, or childbirth), and 3) married women (separation or childbirth). Our innovative modelling strategy allows us to jointly analyse repeated partnership and fertility transitions and to incorporate duration in the models. We also study how the interrelationship between these processes has changed across birth cohorts. We find distinct patterns by migrant groups. Among unpartnered native and European/Western women, cohabitation is the most likely outcome followed by marriage and childbirth. Unpartnered women from other origins show different patterns. South Asians tend to marry and have low risks of childbirth and cohabitation. Caribbean women are most likely to have a child, followed by cohabitation and marriage. Among cohabiting native and European/Western women, marriage is the most likely outcome followed by separation and childbirth. We find few significant differences between the outcomes of cohabiting women from all other origin groups. Married women in all migrant origin groups are most likely to have a child and much less likely to separate. Separation risks are the lowest among South Asian and the highest among Caribbean women. We find the largest change across birth cohorts among native and European/Western women whereas the same patterns persist among women from all other migrant groups.
25/03/2021 ore 14.10 - Seminar via Webex
The later life consequences of natural disasters: earthquakes, tsunami, and radioactive fall-out
Marco Cozzani (European University Institute (EUI))
24/03/2021 ore 14.10 - Seminar via Webex
The early life consequences of natural disasters: earthquakes and hurricanes
Marco Cozzani (European University Institute (EUI))
11/03/2021 ore 14.10 - Seminar via Webex
Air pollution and human health
Risto Conte Keivabu (European University Institute (EUI))
10/03/2021 ore 14.10 - Seminar via Webex
Climate change and temperature extremes: why does it matter for humans and especially for the poor?
Risto Conte Keivabu (European University Institute (EUI))
Agnese Vitali (Università degli Studi di Trento)
In this seminar I will reflect on the drivers leading couples into female breadwinning, and on (some of) the consequences that female breadwinning can have for couples. After reviewing theoretical explanations based on the gender revolution, the reversal of the gender gap in education and macro-economic change, I will present results based on data from the Luxembourg Income Study database showing how partners’ relative incomes have changed across four decades for a selection of developed countries. The seminar will reflect on the importance of distinguishing between dual-earner couples with women as main earners from ‘pure’ female-breadwinner couples – as these are characterized by essentially different drivers and socio-economic backgrounds. When moving to study consequences of female breadwinning, distinguishing among the two subgroups also appears important: the “female-breadwinning” penalty frequently found in previous studies on a series of outcomes such as wellbeing, the risk of union dissolution and time spent on domestic and care work, appears reduced for couples with women as main earners compared to ‘pure’ female-breadwinner couples.
29/05/2020 ore 11.00 - Il seminario si svolgerà on-line (gli interessati contattino la Prof.ssa Alessandra Petrucci).
Sustainability, Climate Change and Hazardous events: Statistical Measures, Challenges and Innovations
Angela Ferruzza (ISTAT, Roma)
Sustainable development Goals, Climate Change and Hazardous events frameworks will be presented, considering their interactions. Challenges and innovations in the process of improving building capacity for increasing the statistical measures will be considered.
05/05/2020 ore 15.00 - Il seminario si svolgerà on line.
La statistica per lo sviluppo sostenibile
Gianluigi Bovini (Fondazione Unipolis, ASviS)
19/02/2020 ore 12.00
From logit to linear regression and back
Giovanni Maria Marchetti (DiSIA)
For binary variables, I show necessary and sufficient conditions for the parameter equivalence of linear- and logit-regression coefficients, discussing consequences for some Ising models.
18/02/2020 ore 12.00
On Spatial Lag Models estimated using crowdsourcing, web-scraping or other unconventionally collected Big Data
Giuseppe Arbia (Catholic University of the Sacred Heart Milan – Rome)
The Big Data revolution is challenging the state-of-the-art statistical and econometric techniques not only for the computational burden connected with the high volume and speed with which data are generated, but even more for the variety of sources through which data are collected (Arbia, 2019). This paper concentrates specifically on this last aspect. Common examples of non traditional Big Data sources are represented by crowdsourcing (data voluntarily collected by individuals) and web scraping (data extracted from websites and reshaped into structured datasets). A common characteristic to these unconventional data collections is the lack of any precise statistical sample design, a situation described in statistics as “convenience sampling”. As it is well known, in these conditions no probabilistic inference is possible. To overcome this problem, Arbia et al. (2018) proposed the use of a special form of post-stratification (termed “post-sampling”), with which data are manipulated prior their use in an inferential context. In this paper we generalize this approach using the same idea to estimate a Spatial Lag Econometric Model (SLM). We start showing through a Monte Carlo study that using data collected without a proper design, parameters’ estimates can be biased. Secondly, we propose a post sampling strategy to tackle this problem. We show that the proposed strategy indeed achieves a bias-reduction, but at the price of a concomitant increase in the variance of the estimators. We thus suggest an MSE-correction operational strategy. The paper also contains a formal derivation of the increase in variance implied by the post-sampling procedure and concludes with an empirical application of the method in the estimation of a hedonic price model in the city of Milan using web scraped data.
20/01/2020 ore 12.00
Statistical scalability of approximate likelihood inference
Helen Ogden (Mathematical Sciences - University of Southampton)
In cases where it is not possible to evaluate the likelihood function exactly, an alternative is to find a numerical approximation to the likelihood, then to use this approximate likelihood in place of the true likelihood to do inference about the model parameters. Approximate likelihoods are typically designed to be computationally scalable, but the statistical properties of these methods are often not well understood: fitting the model may be fast, but is the resulting inference any good? I will describe conditions which ensure that the approximate likelihood inference retains good statistical properties, and discuss the statistical scalability of inference with an approximate likelihood, in terms of how the cost of conducting statistically valid inference scales as the amount of data increases. I will demonstrate the implications of these results for a particular family of approximations to the likelihood used for inference on an Ising model, and for Laplace approximations to the likelihood used for inference in mixed-effects models
15/01/2020 ore 14.30
Modern Modeling: Guidelines for Best Practice and Useful Innovations
Todd D. Little (Texas Tech University)
In this talk, I will highlight an approach to modeling interventions that puts a premium on avoiding Type II error. It relies on latent variable structural equation modeling using multiple groups and other innovations in SEM modeling. These innovations along with modern principled approaches to modeling allows effect evaluations of intervention effects. I will highlight the advantages as well as discuss potential pitfalls.
Ultimo aggiornamento 6 dicembre 2021 .