# Archivio seminari del DiSIA

## Abstract

### Bayesian Models for Microbiome Data with Variable Selection

### Marina Vannucci (Rice University, Houston (USA))

I will describe Bayesian models developed for understanding how the microbiome varies within a population of interest. I will focus on integrative analyses, where the goal is to combine microbiome data with other available information (e.g. dietary patterns) to identify significant associations between taxa and a set of predictors. For this, I will describe a general class of hierarchical Dirichlet-Multinomial (DM) regression models which use spike-and-slab priors for the selection of the significant associations. I will also describe a joint model that efficiently embeds DM regression models and compositional regression frameworks, in order to investigate how the microbiome may affect the relation between dietary factors and phenotypic responses, such as body mass index. I will discuss advantages and limitations of the proposed methods with respect to current standard approaches used in the microbiome community, and will present results on the analysis of real datasets. If time allows, I will also briefly present extensions of the model to mediation analysis.

Torna alla lista dei seminari archiviati

### D2 Seminar Series

(RG) Italy’s lowest-low fertility in times of uncertainty

(AM) Approximating the Neighborhood Function of (Temporal) Graphs

### Doppio seminario FDS:

Raffaele Guetto (DiSIA) & Andrea Marino (DiSIA)

Raffaele Guetto:

The generalized and relatively homogeneous fertility decline across European countries in the aftermath of the Great Recession poses serious challenges to our knowledge of contemporary low fertility patterns. The rise of economic uncertainty has often been identified, in the sociological and demographic literature, as the main cause of this state of affairs. The forces of uncertainty have been traditionally operationalized through objective indicators of individuals’ actual and past labour market situations. However, this presentation argues that the role of uncertainty needs to be conceptualized and operationalized taking into account that people use works of imagination, producing their own narrative of the future, also influenced by the media. To outline such an approach, I review contemporary drivers of Italy’s lowest-low fertility, placing special emphasis on the role of uncertainty fueled by labour market deregulations and – more recently – the Covid-19 pandemic. I discuss the effects of the objective (labour-market related) and subjective (individuals’ perceptions, including future outlooks) sides of uncertainty on fertility, based on a set of recent empirical findings obtained through a variety of data and methods. In doing so, I highlight the potential contribution of so-called “big data” and techniques of media content analysis and Natural Language Processing for the analysis of the effects of media-conveyed narratives of the economy.

Andrea Marino:

The average distance in graphs (like, for instance, the Facebook friendship network and the Internet Movie Database collaboration network), often referred to as degrees of separation, has been largely investigated. However, if the number of nodes is very large (millions or billions), computing this measure needs prohibitive time and space costs as it requires to compute for each node the so-called neighbourhood function, i.e. for each vertex v and for each h, how many nodes are within distance h from v. Temporal graphs are a special kind of graphs where edges have temporal labels, specifying their occurring times, in the same way as the connections of the public transportation network of a city are available only at scheduled times. Here, paths make sense only if they correspond to sequences of edges with strictly increasing labels. A possible notion of distance between two nodes in a temporal network is the earliest arrival time of the temporal paths connecting the two nodes. In this case, the temporal neighbourhood function is defined as the number of nodes reachable from a given one in a given time interval, and it is also expensive to compute. We introduce probabilistic counting in order to approximate the size of sets and we show how both plain and temporal neighbourhood functions can be approximated by plugging this technique into a simple dynamic programming algorithm.

Torna alla lista dei seminari archiviati

### D2 Seminar Series

(GR) State Space Model to Detect Cycles in Heterogeneous Agents Models (joint work with Filippo Gusella)

(MB) High quality video experience using deep neural networks

### Doppio seminario FDS:

Giorgio Ricchiuti (DISEI) & Marco Bertini (DINFO)

Giorgio Ricchiuti:

We propose an empirical test to depict possible endogenous cycles within Heterogeneous Agent Models (HAMs). We consider a 2-type HAM into a standard small-scale dynamic asset pricing framework. On the one hand, fundamentalists base their expectations on the deviation of fundamental value from market price expecting a convergence between them. On the other hand, chartists, subject to self-fulling moods, consider the level of past prices and relate it to the fundamental value acting as contrarians. These pricing strategies, by their nature, cannot be directly observed but can cause the response of the observed data. For this reason, we consider the agents' beliefs as unobserved state components from which, through a state space model formulation, the heterogeneity of fundamentalist-chartist trader cycles can be mathematically derived and empirically tested. The model is estimated using the S&P500 index, for the period 1990-2020 at different time scales, specifically, daily, monthly, and quarterly.

Marco Bertini:

Lossy image and video compression algorithms are the enabling technology for a large variety of multimedia applications, reducing the bandwidth required for image transmission and video streaming. However, lossy image and video compression codecs decrease the perceived visual quality, eliminate higher frequency details and in certain cases add noise or small image structures. There are two main drawbacks of this phenomenon. First, images and videos appear much less pleasant to the human eye, reducing the quality of experience. Second, computer vision algorithms such as object detectors may be hindered and their performance reduced. Removing such artefacts means recovering the original image from a perturbed version of it. This means that one ideally should invert the compression process through a complicated non-linear image transformation. In this talk, I’ll present our most recent works based on the GAN framework that allows us to produce images with photorealistic details from highly compressed inputs.

Torna alla lista dei seminari archiviati

### Mortality and morbidity drivers of the global distribution of health

### Inaki Permanyer (Centre d’Estudis Demogràfics, Barcelona)

Increasing life expectancy (LE) and reducing its variability across countries (the so-called “International Health Inequality”, IHI) are progressively prominent goals in global development agendas. Yet, LE is composed of two components: the number of years individuals are expected to live in “good” and in “less-than-good” health. While the first component (“Health-adjusted life expectancy”, HALE) is normatively desirable, the second one (“Unhealthy life expectancy”, UHLE, or LE – HALE) is highly controversial because of the high personal, social, and economic costs often associated with the presence of disease or disability – an issue that can muddy the waters when interpreting global health dynamics and calls for new conceptual approaches. Here we document how the evolution of HALE and UHLE between 1990 and 2019 have shaped (i) the trends and composition of LE both at the country, regional and the global levels, and (ii) the levels and trends in IHI. Our findings indicate that UHLE has tended to grow at a faster rate than HALE, thus leading to an expansion of morbidity in 75% of world countries. IHI increases until year 2000 and starts declining from that year onwards – a trend that is mostly determined by the evolution of HALE across countries. While still minoritarian, UHLE is a non-negligible and increasingly relevant factor determining the levels of international health inequality (IHI). These findings and ideas are useful to understand the role that the healthy and unhealthy components of LE are playing in contemporary health dynamics and for the elaboration of policies aiming at tackling health inequalities both across and within countries.

Torna alla lista dei seminari archiviati

### Innovation on methods for the evaluation of the short-term health effects of environmental hazards.

### Francesco Sera (DiSIA) e Dominic Royé (University of Santiago de Compostela)

Numerous studies have evaluated the short-term effects on health of environmental exposures (e.g. pollutants, temperature) giving valuable information for preventive public health strategies. In this seminar we’ll discuss some advances in the methodology and in the definition of the exposures and
the outcomes that allows a better description of the epidemiological association.

The two-stage design has become a standard tool in environmental epidemiology to model short-term effects with multi-location data. We’ll illustrate multiple design extensions of the classical two-stage method. These are based on improvements of the standard two-stage meta-analytic models along the lines of linear mixed-effects models, by allowing location-specific estimates to be pooled through a flexible fixed and random-effects structures. This permits the analysis of associations characterised by combinations of multivariate outcomes, hierarchical geographical structures, repeated measures, and/or longitudinal settings. The designs extensions will be illustrated in examples using data collected by the Multi-country Multi-city research network.

In terms of new exposures we defined new indices that emphasize the importance of hot nights, which may prevent necessary nocturnal rest. Recently we use hot-night duration and excess to predict daily cause-specific mortality in summer, using multiple cities across Southern Europe. We found positive but generally non-linear associations between relative risk of cause-specific mortality and duration and excess of hot nights.

Another aspect is that most of the studies use daily counts of deaths or hospitalisations as health outcomes, although they are the ones at the top of the health impact pyramid reflecting only a limited proportion of patients with the most severe cases. In a recent study we evaluate the relationship between short-term exposure to the daily mean temperature and medication prescribed for the respiratory system in five Spanish cities.

The proposed innovation on the methodology and in the definition of the exposures and the outcomes provide new evidence on the short-term health effects of environmental hazards that could improve decision-making for preventive public health strategies.

Torna alla lista dei seminari archiviati

### D2 Seminar Series

(LB) Recent advances in bibliometric indexes and their implementation

(VB) Fisher’s Noncentral Hypergeometric Distribution for the Size Estimation of Unemployed Graduates in Italy (joint work with Brunero Liseo, University Sapienza, Roma)

### Doppio seminario FDS:

Luigi Brugnano (DIMAI) & Veronica Ballerini (DiSIA)

Luigi Brugnano:

Bibliometric indexes are nowadays very commonly used for assessing scientific production, research groups, journals, etc. It must be stressed that such indexes cannot substitute to enter the merit of the specific research but, nonetheless, they can provide a gross evaluation of its impact on the scientific community. That premise, the currently used indexes often have drawbacks and/or sensibly vary for different subjects of investigation. For this reason, in [1] an alternative index has been proposed, based on an idea akin to that of the Google PageRank. Its actual implementation has been recently done in the Scopus database [2]. In this talk, the basic facts and results of this approach will be recalled. [1] P.Amodio, L.Brugnano. Recent advances in bibliometric indexes and the PaperRank problem. Journal of Computational and Applied Mathematics 267 (2014) 182-194. http://doi.org/10.1016/j.cam.2014.02.018. [2] P.Amodio, L.Brugnano, F.Scarselli. Implementation of the PaperRank and AuthorRank indices in the Scopus database. Journal of Infometrics 15 (2021) 101206. https://doi.org/10.1016/j.joi.2021.101206.

Veronica Ballerini:

To quantify unemployment among those who have never been employed is often tough. The lack of an administrative data flow attributable to such individuals makes them an elusive population. Hence, one must rely on surveys. However, individuals’ response rates to questions on their occupation may differ according to their employment status, implying a not-at-random missing data generation mechanism. We exploit the underused Fisher’s noncentral hypergeometric distribution (FNCH) to solve such a biased urn experiment. FNCH has been underemployed in the statistical literature mainly because of the computational burden given by its probability mass function. Indeed, as the number of draws and the number of different categories in the population increases, any method involving the evaluation of the likelihood is practically unfeasible. Firstly, we present a methodology that allows the approximation of the posterior distribution of the population size via MCMC and ABC methods. Then, we apply such methodology to the case of graduated unemployed in Italy, exploiting information from different data sources.

Torna alla lista dei seminari archiviati

### D2 Seminar Series

(GI) Performance-based research funding: Evidence from the largest natural experiment worldwide

(MF) Consensus-based optimization

### Doppio seminario FDS: Giulia Iori (Department of Economics of the City University of London) & Massimo Fornasier (Department of Mathematics of the Technical University of Munich)

Giulia Iori:

The Research Excellence Framework (REF) is the main UK government policy on public research in the last 30 years. The primary aim of this policy is to promote and reward research excellence through competition for scarce research resources. Surprisingly, and despite the severe criticisms, little has been done to systematically evaluate its effects. In this paper, we evaluate the impact of the REF 2014. We exploit a large database that contains all publications in Economics, Business, Management, and Finance available in Scopus since 2001. We use a synthetic control method to compare the performance of each of the universities from the UK with counter-factual similar units in terms of past research constructed using data for US universities. We find a significant positive increase, relative to the control group, in the number of published papers, and in the proportion of papers published in highly ranked journals within the Economics/Econometrics area and the Business, Management and Finance area. Both Russell and non-Russell Group universities benefited from the REF, with the Russell Group universities experiencing an overall significant increase in the number of publications and number of publications in top journals, and the non-Russell group experiencing a significant increase in the proportion of publications in top journals in all areas. Interestingly, the non-Russell group experienced a comparatively stronger increase in the proportion of top publications in Economics/Econometrics while the Russell Group experienced a comparatively stronger increase in the proportion of top publications in Business, Management and Finance. However, we see an insignificant effect when we focus on per-author output measures indicating that growth in output was mostly achieved by an increase in the number of research active academics rather than an overall increase in research productivity.

Massimo Fornasier:

Consensus-based optimization (CBO) is a multi-agent metaheuristic derivative-free optimization method that can globally minimize nonconvex nonsmooth functions and is amenable to theoretical analysis. In fact, optimizing agents (particles) move on the optimization domain driven by a drift towards an instantaneous consensus point, which is computed as a convex combination of particle locations, weighted by the cost function according to Laplace’s principle, and it represents an approximation to a global minimizer. The dynamics are further perturbed by a random vector field to favor exploration, whose variance is a function of the distance of the particles to the consensus point. Based on an experimentally supported intuition that CBO always performs a gradient descent of the squared Euclidean distance to the global minimizer, we show a novel technique for proving the global convergence to the global minimizer in mean-field law for a rich class of objective functions. We further present formulations of CBO over compact hypersurfaces. We conclude the talk with a few numerical experiments, which show that CBO scales well with the dimension and is extremely versatile.

Torna alla lista dei seminari archiviati

### Uncertainty across the “Contact Line”: armed conflict, COVID-19, and perceptions of fertility decline in Eastern Ukraine

### Brienna Perelli-Harris (University of Southampton)

While economic uncertainty has been a key explanation for very low fertility throughout Europe, few studies have conducted in-depth investigations into social and political sources of uncertainty. Here we study Ukraine, which has recently experienced fertility rates around 1.3. In 2014, armed conflict wracked Ukraine’s eastern regions, producing around 1.7 million internally displaced persons (IDPs). To better understand how continuing crises shape Ukrainians’ perceptions of childbearing, we analyse 16 online focus groups conducted in July 2021. We compare IDPs and locals and residents of the Government and non-Government Controlled Areas. The discussions reveal how Covid and political instability have intensified uncertainties, discouraging couples from having more than one child. Participants mentioned poverty, insecurity, and uncertain relationships as reasons to curtail childbearing. Some blamed the government or delved into conspiracy theories. Nonetheless, others were optimistic and planned to have more children, indicating mixed reactions to uncertainty.

Torna alla lista dei seminari archiviati

### D2 Seminar Series

(LB) Endogenous and Exogenous Volatility in the Foreign Exchange Market (with G. Cifarelli)

(CM) Artificial Intelligence in Neuroimaging

### Doppio seminario FDS: Leonardo Bargigli (DISEI) & Chiara Marzi (IFAC – CNR)

Leonardo Bargigli:

We study two sources of heteroscedasticity in high-frequency financial data. The first, endogenous, source is the behaviour of bounded rational market participants. The second, exogenous, source is the flow of market relevant information. We estimate the impact of the two sources jointly by means of a Markov switching (MS) SVAR model. Following the original intuition of Rigobon (2003), we achieve identification for all coefficients by assuming that the structural errors of the MS-SVAR model follow a GARCH-DCC process. Using transaction data of the EUR/USD interdealer market in 2016, we firstly detect three regimes of endogenous volatility. Then we show that both kinds of volatility matter for the transmission of shocks, and that the exogenous information is channelled to the market mostly through price variations. This suggests that, on the FX market, liquidity providers are better informed than liquidity takers, who act mostly as feedback traders. The latter are able to profit from trade because, unlike noise traders, they respond immediately to the informative price shocks.

Chiara Marzi:

Life sciences data coupled with Artificial Intelligence (AI) techniques can help researchers accurately pinpoint novel biomarkers. AI can propose new indices as potential biomarkers while simultaneously aiding in searching for hidden patterns among “well-established” indices. In this webinar, we will take a brief journey through some applications of Machine Learning in neuroimaging. In the first part of the webinar, we will talk about the not so easy “marriage” between AI and clinical data, focusing on Big Data from Radiology Imaging and related issues. In the second part, we will see how we can transfer mathematical, physical, and statistical ideas to the Neuroimaging domain and how AI can help this transfer. An example is the study of the structural complexity of the brain starting from MRI images, using fractal analysis. The Fractal Dimension (FD) can be considered a measure of morphological changes due to healthy ageing and/or the onset of neurological diseases. The use of ML techniques can promote the candidature of FD as a biomarker for many neurological diseases.

Torna alla lista dei seminari archiviati

### Business Excellence 5.0

### Gabriele Arcidiacono (DIIE - Università Guglielmo Marconi, Roma)

Il webinar è l’occasione per fruire di un osservatorio privilegiato sul tema della Business Excellence: l’eccellenza a tutto tondo definita anche Global Excellence, che permette di conseguire risultati e benefici sostenibili nel tempo. Tale approccio opera nell’ambito tecnico dei processi (Process Excellence), sulla corretta conoscenza dei comportamenti e delle dinamiche interpersonali (Human Excellence) e sul più idoneo utilizzo della tecnologia informatica e digitale (Digital Excellence). La sinergia di questi tre campi d’azione permette di strutturare un percorso di Miglioramento Continuo che includa processi, persone e digitalizzazione e conduca al raggiungimento di risultati sempre più sfidanti.

Documenti: Scheda

Video: video

Torna alla lista dei seminari archiviati

### Ethnicity and Governing Party Support in Africa

### Carlos G. Rivero (Valencia University & Centre for International and Comparative Politics, Stellenbosch University)

Historically, ethnicity has been considered to play a fundamental role in voting behaviour in Africa. However, researchers on the issue have found contradictory conclusions. The most recent research concludes that the African voter is more rational than expected. Overall ethnicity seems to be less influential than theory used to suggest. Against this background, this paper analyses vote for governing party in Africa and presents evidence that the method and data set used will have an important influence upon the final result. The research takes form of a quantitative analysis making extensive use of survey data from 2005 to 2019. Results indicate that ethnicity, although not exclusively, is still an explanatory factor. At a glance, African vote is rationally ethnic.

Torna alla lista dei seminari archiviati

### Incentivizing family savings as way to fight educational inequality: Preliminary results from an RCT in Italy

### Loris Vergolini (University of Bologna & FBK-IRVAPP)

The paper deals with the programme WILL “Educare al Futuro”, a policy experimentation aimed at implementing a Children Saving Accounts and at evaluating its effects on a set of school-related outcomes. The program, launched in 2019 in four Italian cities (Cagliari, Florence, Teramo and Torino), targets 9th grade students from low-income families, offering them the opportunity to regularly save small amounts of money on a bank account that provides a generous multiplier (x4 if the money is spent on proven school expenses). In addition to the savings account, the beneficiaries are entitled to access to: financial education courses, to educational support and to a guidance programme. The impacts of the program will be assessed using a randomized controlled trial by 2024. In this paper we present a set of preliminary findings emerging from the first follow-up survey carried out in Spring 2021. We found that families in the treatment group show an increase in both general and for school activities savings and this is not at the expenses of other spending. Moreover, the treated families used the saving accounts to buy ICT equipment (a computer, a tablet or an internet connection) to allow their children attend online learning during the Covid-19 pandemic. Our preliminary results also suggest that WILL has no effects on parental involvement, parental aspirations and expectations as well as children school performances and children socio-emotional well-being. The lack of effects on these dimensions is discussed in terms of theory of change and of implementation issues due to the pandemic.

Torna alla lista dei seminari archiviati

### Affidabilità per componenti e sistemi (RELIA)

### Marcantonio Catelani (Dipartimento di Ingegneria dell'informazione)

Prendendo spunto dagli esempi di "criticità" e "punti di attenzione" che caratterizzano l’argomento, obiettivo dell’incontro è fornire alcuni spunti di riflessione e discussione a partire da semplici concetti di base, metodologici e sperimentali, dell’affidabilità.

Il seminario proposto, "Affidabilità per componenti e sistemi", costituisce un primo evento a cui seguiranno ulteriori approfondimenti utili, a parere dei relatori, per comprendere il più ampio contesto dei requisiti e prestazioni RAMS - Reliability, Availability, Maintanability and Safety - tipico di alcuni scenari industriali.

Documenti: Scheda

Video: video

Torna alla lista dei seminari archiviati

### Disegno degli esperimenti e progettazione robusta (DoE-for-RD)

### Rossella Berni (Dipartimento di Statistica, Informatica, Applicazioni "Giuseppe Parenti" - UniFI)

Obiettivo dell’incontro è fornire alcuni spunti di riflessione e discussione a partire da semplici concetti di base, di statistica e del disegno sperimentale, per la progettazione robusta (robust design).

Il webinar proposto, "Disegno degli esperimenti e progettazione robusta", costituisce un primo evento a cui seguirà un secondo approfondimento, utile per chiarire quali devono essere le competenze e gli strumenti necessari per una ottimizzazione robusta di processo, ampiamente utilizzata, talvolta in modo non del tutto completo e aggiornato, in ambito tecnologico e industriale.

Video: video

Torna alla lista dei seminari archiviati

08/07/2021 - Link to join the seminar on Meet: meet.google.com/psr-ggem-mhv

### Algoritmi di riduzione della dimensionalità e di estrazione di ranking per sistemi multidimensionali di indicatori ordinali e dati parzialmente ordinati

### Marco Fattore (Università degli Studi di Milano-Bicocca)

Il problema della costruzione di indici sintetici e di ranking a partire da sistemi multidimensionali di indicatori ordinali è sempre più diffuso, nell’ambito della statistica socio-economica e a supporto di processi di valutazione e di multi-criteria decision/policy making; ciononostante, l’apparato metodologico per l’analisi di dati ordinali a più dimensioni è ancora limitato e in larga parte mutuato dall’analisi statistica di variabili quantitative. Obiettivo del seminario è mostrare come sia invece possibile impostare una “analisi ordinale dei dati”, importando nella metodologia statistica le corrette strutture matematiche, a partire dalla Teoria delle Relazioni d’Ordine, una parte della matematica discreta dedicata alle proprietà degli insiemi parzialmente ordinati e quasi-ordinati. In particolare, prendendo le mosse dal problema della misurazione della povertà multidimensionale, del benessere o della sostenibilità, il seminario affronta il tema della riduzione della dimensionalità e dell’estrazione di ranking per sistemi di dati ordinali e parzialmente ordinati, introducendo ai più recenti sviluppi metodologici, illustrando sia gli algoritmi già disponibili che quelli in corso di sviluppo e discutendo i loro punti di forza e di debolezza, con particolare attenzione agli aspetti computazionali. Il seminario si conclude, con un’illustrazione delle linee di ricerca, teoriche e applicate, in corso di sviluppo, fornendo una mappa dei problemi risolti e di quelli ancora aperti, nell’ambito dell’analisi multidimensionale ordinale di dati socio-economici.

Torna alla lista dei seminari archiviati

### D2 Seminar Series

(AG) Circular data, conditional independence & graphical models

(CC) Penalized hyperbolic-polynomial splines

### Doppio seminario FDS: Anna Gottard (DiSIA) & Costanza Conti (DIEF)

Anna Gottard:

Circular variables, arising in several contexts and fields, are characterized by periodicity. Models for studying the dependence/independence structure of circular variables are under-explored. We will discuss three multivariate circular distributions, the von Mises, the Wrapped Normal and the Inverse Stereographic distributions, focusing on their properties concerning conditional independence. For each of these distributions, we examine the main properties related to conditional independence and introduce suitable classes of graphical models. The usefulness of the proposal is shown by modelling the conditional independence among dihedral angles characterizing the three-dimensional structure of some proteins.

Costanza Conti:

The advent of P-splines, first introduced by Eilers and Marx in 2010 (see [4]), has led to important developments in data regression through splines. With the aim of generalizing polynomial P-splines, in [1] we have recently defined a model of penalized regression spline, called HP-spline, in which polynomial B-spline functions are replaced by Hyperbolic-Polynomial bell-shaped basis functions. HP-splines are defined as a solution to a minimum problem characterized by a discrete penalty term. They inherit from P-splines the advantages of the model, like the separation of the data from the spline nodes, so avoiding the problems of overfitting and the consequent oscillations at the edges. HP-splines are particularly interesting in different applications that require analysis and forecasting of data with exponential trends. Indeed, the starting idea is the definition of a polynomial-exponential smoothing spline model to be used in the framework of the Laplace transform inversion. We present some recent results on the existence, uniqueness, and reproduction properties of HP-splines, also with the aim of extending their usage to data analysis.

Torna alla lista dei seminari archiviati

### D2 Seminar Series

(ER) From algebra to biology: what does the math of ensemble averaging methods can tell us

(GC) Big data from space. Recent Advances in Remote Sensing Technologies

### Doppio seminario FDS:

Enrico Ravera & Gherardo Chirici (University of Florence)

Enrico Ravera:

Our work aims at a quantitative comparison of different methods for reconstructing conformational ensembles of biological macromolecules integrating molecular simulations and experimental data. This field has evolved over the years reflecting the evolution of computational power and sampling schemes, and a plethora of different methods have been proposed. These methods can vary extensively in terms of how the prior information from the simulation is used to reproduce the experimental data, but can be coarsely attributed to two categories: Maximum Entropy or Maximum Parsimony. In any case, the problem is severely underdetermined and therefore additional information needs to be provided on the basis of the chemical knowledge about the system under investigation. Maximum entropy looks for the minimal perturbation of the prior distribution, whereas Maximum Parsimony looks for the smallest possible ensemble that can explain in full the experimental data. On these grounds, one can expect radically different solutions in the reconstruction, but surprises are still possible – and can be justified by a rigorous geometrical description of the different methods.

Gherardo Chirici:

Since the 1970s, remote sensing technologies for terrestrial observation have generated a constant flow of data from different platforms, in different formats and with different purposes. From these, through successive steps, spatial information is generated to support the Earth resources monitoring and planning. Indispensable in various sectors: from urban planning to geology, from agriculture to forest monitoring and, more generally, any type of information to support environmental monitoring. For this reason, remotely sensed information is recognized as a typical example of big data ante litteram. Today the new cloud computing technologies (such as Google Earth Engine) allow facing the complex problem of data management and processing of big data from remote sensing with new strategies that have revolutionized these data sources are used. From experiments on small areas, today we have moved to the possibility of operationally processing vast multidimensional and multitemporal datasets on a global scale. The increased availability of information from space is exemplified by the numerous services offered by the European Copernicus program. The presentation, starting from a brief introduction to remote sensing techniques, illustrates some examples of applications developed within the geoLAB – Geomatics Laboratory of the Department of Agriculture, Food, Environment and Forestry (DAGRI) and the UNIFI COPERNICUS Research Unit.

Torna alla lista dei seminari archiviati

10/06/2021 - Link to join the seminar on Meet: meet.google.com/bmh-anmz-jms

### Same-Sex Marriage, Relationship Dynamics and Well-being among Sexual Minorities in the UK

### Diederik Boertien (Centre d’Estudis Demogràfics, Barcelona)

Changing laws and attitudes have reduced obstacles for sexual minorities to parts of family life including marriage and parenthood. The question arises to what extent these changes affected the relationship dynamics of sexual minorities and which obstacles to achieving family-related outcomes remain. In this presentation, a closer look is taken at the impact of the legalization of same-sex marriage in the UK (2014). To what extent did legalizing same-sex marriage have an impact on the relationships of sexual minorities? Beyond the uptake of marriage, are there any changes in union formation, separation and partnering? What has the impact been on relationship satisfaction among same-sex couples? Data from Understanding Society is used which allows for a longitudinal study of the relationship dynamics of individuals in same-sex couples, but also includes information on sexual identity, which allows studying how the relationship dynamics of single individuals identifying as a sexual minority changed over time.

Torna alla lista dei seminari archiviati

### D2 Seminar Series

(CB) Scattered data: surface reconstruction and fault detection

(LB) Game-based education promotes sustainable water use

### Doppio seminario FDS:

Cesare Bracco & Leonardo Boncinelli (University of Florence)

Cesare Bracco:

We will consider two aspects concerning scattered data approximation. The first is reconstructing a (parametrized) surface from a set of scattered points: the lack of structure in the data requires approximation methods that automatically adapt to the distribution and shape of the data themselves. We will discuss an effective approach to this issue based on hierarchical spline spaces, which can be locally refined, and therefore naturally lead to adaptive algorithms. The reconstruction problem contains another interesting problem: detecting the discontinuities the surface may have in order to reproduce them. Finding the discontinuity curves, usually called faults (or gradient faults when gradient discontinuities are considered), is actually an important issue in itself, with several applications, for example in image processing and geophysics. I will present a method to determine which points in the scattered data set lie close to a (gradient) fault, based on indicators obtained by using numerical differentiation formulas.

Leonardo Boncinelli:

In this study, we estimate the impact of a game-based educational program aimed at promoting sustainable water usage among 2nd-4th grade students and their families living in the municipality of Lucca, Italy. To this purpose, we exploited unique data from a quasi-experiment involving about two thousand students, one thousand participating (the treatment group), and one thousand not participating (the control group) in the program. Data were collected by means of a survey that we specifically designed and implemented for collecting students’ self-reported behaviours. Our estimates indicate that the program has been successful: the students in the program reported an increase in efficient water usage and an increase in the frequency of discussions with their parents about water usage; moreover, positive effects were still observed after six months. Our findings suggest that game-based educational programs can be an effective instrument to promote sustainable water consumption behaviors in children and their parents.

Torna alla lista dei seminari archiviati

### D2 Seminar Series

(FM) Assessing causality under interference

(AB) Lifelong Learning at the end of the (new) Early Years

### Doppio seminario FDS:

Fabrizia Mealli & Andrew Bagdanov (University of Florence)

Fabrizia Mealli:

Causal inference from non-experimental data is challenging; it is even more challenging when units are connected through a network. Interference issues may arise, in that potential outcomes of a unit depend on its treatment as well as on the treatments of other units, such as their neighbours in the network. In addition, the typical unconfoundedness assumption must be extended—say, to include the treatment of neighbours, and individual and neighbourhood covariates—to guarantee identification and valid inference. These issues will be discussed, new estimands introduced to define treatment and interference
effects and the bias of a naive estimator that wrongly assumes away interference will be shown. A covariate-adjustment method leading to valid estimates of treatment and interference effects in observational studies on networks will be introduced and applied to a problem of assessing the effect of air quality regulations (installation of scrubbers on power plants) on health in the USA.

Andrew Bagdanov:

Lifelong learning, also often referred to as continual or incremental learning, refers to the training of artificially intelligent systems able to continuously learn to address new tasks from new data while preserving knowledge learned from previously learned ones. Lifelong learning is currently enjoying a sort of renaissance due to renewed interest from the Deep Learning community. In this seminar, I will introduce the overall framework of continual learning, discuss the fundamental role played by the stability-plasticity dilemma in understanding catastrophic forgetting in lifelong learning systems, and present a broad panorama of recent results in class-incremental learning. I will conclude the discussion with a look at current trends, open problems, and low-hanging opportunities in this area.

Torna alla lista dei seminari archiviati

### Modelling international migration flows in Europe by integrating multiple data sources

### Emanuele Del Fava (MPIDR, Germany)

Although understanding the drivers of migration is critical to enacting effective policies, theoretical advances in the study of migration processes have been limited by the lack of data on flows of migrants, or by the fragmented nature of these flows. In this work, we build on existing Bayesian modeling strategies to develop a statistical framework for integrating different types of data on migration flows. We offer estimates, as well as associated measures of uncertainty, for inflows and outflows among 31 European countries, by combining administrative and household survey data from 2002 to 2018. Methodologically, we demonstrate the added value of survey data when there is lack of administrative data or they show poor quality. Substantively, we document the historical impact of the EU enlargement and the free movement of workers in Europe on migration flows.

Video: Recorded seminar

Torna alla lista dei seminari archiviati

29/04/2021 - Link to join the seminar on Meet: meet.google.com/bmh-anmz-jms

### In-work poverty in Europe. Trends and determinants in longitudinal perspective

### Stefani Scherer (Università degli Studi di Trento)

Employment remains among the most important factors to protect individuals and their families from economic poverty. However, recent years have witnessed an alarming increase of in-work poverty (IWP) in some contexts, thus of being poor notwithstanding employment. This paper investigates trends and determinants of in-work poverty in Europe, using EU-SILC data for the period 2004-2015. The contribution is threefold. First, we provide an analysis of the risk and the persistence of IWP for different social groups and household employment patterns and discuss potential implications for social stratification dynamics. Second, we look into the (causal) dynamics of poverty and test for the presence of genuine state dependence (GSD) and the role of unobserved heterogeneity in shaping the accumulation of economic disadvantages over time. Third, we adopt a comparative perspective across countries (and time periods) analysing how different institutional features affect exposure to and dynamics of economic disadvantages. We show that one income is often no longer enough to keep families out of poverty and thus confirm the importance of a second earner, and the need to have at least one non-low-pay source of income. This is particularly true for Europe’s South and for the less privileged social groups. We find no evidence for GSD, which comes with important implications to combat poverty in terms of activation policies, rather than through transfers. Finally, notwithstanding different levels of exposure to and “stickiness” of IWP, the drivers turn out to be pretty similar across European countries.

Torna alla lista dei seminari archiviati

01/04/2021 - Link to join the seminar on Meet: meet.google.com/bmh-anmz-jms

### The intersection of partnership and fertility histories among immigrants and their descendants in the United Kingdom: A multistate approach

### Julia Mikolai (University of St. Andrews, UK)

We study the interrelationship between partnership and fertility histories of immigrants and their descendants in the UK using data from the UK Household Longitudinal Study. Previous studies have either focused on immigrants’ fertility or their partnership experiences. However, no studies have analysed the intersection of these two interrelated life domains among immigrants and their descendants. Using multistate event history models, we analyse the outcomes of 1) unpartnered women (cohabitation, marriage, or childbirth), 2) cohabiting women (marriage, separation, or childbirth), and 3) married women (separation or childbirth). Our innovative modelling strategy allows us to jointly analyse repeated partnership and fertility transitions and to incorporate duration in the models. We also study how the interrelationship between these processes has changed across birth cohorts. We find distinct patterns by migrant groups. Among unpartnered native and European/Western women, cohabitation is the most likely outcome followed by marriage and childbirth. Unpartnered women from other origins show different patterns. South Asians tend to marry and have low risks of childbirth and cohabitation. Caribbean women are most likely to have a child, followed by cohabitation and marriage. Among cohabiting native and European/Western women, marriage is the most likely outcome followed by separation and childbirth. We find few significant differences between the outcomes of cohabiting women from all other origin groups. Married women in all migrant origin groups are most likely to have a child and much less likely to separate. Separation risks are the lowest among South Asian and the highest among Caribbean women. We find the largest change across birth cohorts among native and European/Western women whereas the same patterns persist among women from all other migrant groups.

Torna alla lista dei seminari archiviati

25/03/2021 - Seminar via Webex

### The later life consequences of natural disasters: earthquakes, tsunami, and radioactive fall-out

### Marco Cozzani (European University Institute (EUI))

TBA

Torna alla lista dei seminari archiviati

24/03/2021 - Seminar via Webex

### The early life consequences of natural disasters: earthquakes and hurricanes

### Marco Cozzani (European University Institute (EUI))

TBA

Torna alla lista dei seminari archiviati

11/03/2021 - Seminar via Webex

### Air pollution and human health

### Risto Conte Keivabu (European University Institute (EUI))

TBA

Torna alla lista dei seminari archiviati

10/03/2021 - Seminar via Webex

### Climate change and temperature extremes: why does it matter for humans and especially for the poor?

### Risto Conte Keivabu (European University Institute (EUI))

TBA

Torna alla lista dei seminari archiviati

### Female-breadwinner families

### Agnese Vitali (Università degli Studi di Trento)

In this seminar I will reflect on the drivers leading couples into female breadwinning, and on (some of) the consequences that female breadwinning can have for couples.

After reviewing theoretical explanations based on the gender revolution, the reversal of the gender gap in education and macro-economic change, I will present results based on data from the Luxembourg Income Study database showing how partners’ relative incomes have changed across four decades for a selection of developed countries.

The seminar will reflect on the importance of distinguishing between dual-earner couples with women as main earners from ‘pure’ female-breadwinner couples – as these are characterized by essentially different drivers and socio-economic backgrounds. When moving to study consequences of female breadwinning, distinguishing among the two subgroups also appears important: the “female-breadwinning” penalty frequently found in previous studies on a series of outcomes such as wellbeing, the risk of union dissolution and time spent on domestic and care work, appears reduced for couples with women as main earners compared to ‘pure’ female-breadwinner couples.

Video: Recorded seminar

Torna alla lista dei seminari archiviati

29/05/2020 - Il seminario si svolge on-line.

### Sustainability, Climate Change and Hazardous events: Statistical Measures, Challenges and Innovations

### Angela Ferruzza (ISTAT, Roma)

Sustainable development Goals, Climate Change and Hazardous events frameworks will be presented, considering their interactions. Challenges and innovations in the process of improving building capacity for increasing the statistical measures will be considered.

Referente: prof.ssa Alessandra Petrucci

Torna alla lista dei seminari archiviati

05/05/2020 - Il seminario si svolge on line.

### La statistica per lo sviluppo sostenibile

### Gianluigi Bovini (Fondazione Unipolis, ASviS)

TBA

Torna alla lista dei seminari archiviati

### From logit to linear regression and back

### Giovanni Maria Marchetti (DiSIA)

For binary variables, I show necessary and sufficient conditions for the parameter equivalence of linear- and logit-regression coefficients, discussing consequences for some Ising models.

Torna alla lista dei seminari archiviati

### On Spatial Lag Models estimated using crowdsourcing, web-scraping or other unconventionally collected Big Data

### Giuseppe Arbia (Catholic University of the Sacred Heart Milan – Rome)

The Big Data revolution is challenging the state-of-the-art statistical and econometric techniques not only for the computational burden connected with the high volume and speed with which data are generated, but even more for the variety of sources through which data are collected (Arbia, 2019). This paper concentrates specifically on this last aspect. Common examples of non traditional Big Data sources are represented by crowdsourcing (data voluntarily collected by individuals) and web scraping (data extracted from websites and reshaped into structured datasets). A common characteristic to these unconventional data collections is the lack of any precise statistical sample design, a situation described in statistics as “convenience sampling”. As it is well known, in these conditions no probabilistic inference is possible. To overcome this problem, Arbia et al. (2018) proposed the use of a special form of post-stratification (termed “post-sampling”), with which data are manipulated prior their use in an inferential context. In this paper we generalize this approach using the same idea to estimate a Spatial Lag Econometric Model (SLM). We start showing through a Monte Carlo study that using data collected without a proper design, parameters’ estimates can be biased. Secondly, we propose a post sampling strategy to tackle this problem. We show that the proposed strategy indeed achieves a bias-reduction, but at the price of a concomitant increase in the variance of the estimators. We thus suggest an MSE-correction operational strategy. The paper also contains a formal derivation of the increase in variance implied by the post-sampling procedure and concludes with an empirical application of the method in the estimation of a hedonic price model in the city of Milan using web scraped data.

Torna alla lista dei seminari archiviati

### Statistical scalability of approximate likelihood inference

### Helen Ogden (Mathematical Sciences - University of Southampton)

In cases where it is not possible to evaluate the likelihood function exactly, an alternative is to find a numerical approximation to the likelihood, then to use this approximate likelihood in place of the true likelihood to do inference about the model parameters. Approximate likelihoods are typically designed to be computationally scalable, but the statistical properties of these methods are often not well understood: fitting the model may be fast, but is the resulting inference any good? I will describe conditions which ensure that the approximate likelihood inference retains good statistical properties, and discuss the statistical scalability of inference with an approximate likelihood, in terms of how the cost of conducting statistically valid inference scales as the amount of data increases. I will demonstrate the implications of these results for a particular family of approximations to the likelihood used for inference on an Ising model, and for Laplace approximations to the likelihood used for inference in mixed-effects models

Torna alla lista dei seminari archiviati

### Modern Modeling: Guidelines for Best Practice and Useful Innovations

### Todd D. Little (Texas Tech University)

In this talk, I will highlight an approach to modeling interventions that puts a premium on avoiding Type II error. It relies on latent variable structural equation modeling using multiple groups and other innovations in SEM modeling. These innovations along with modern principled approaches to modeling allows effect evaluations of intervention effects. I will highlight the advantages as well as discuss potential pitfalls.

Torna alla lista dei seminari archiviati

### Multinomial Probit: Probability Simulators and Identification

### Alessandro Palandri

The paper introduces four unbiased probability-simulators which produce continuous (simulated) log-likelihood functions with almost everywhere continuous derivatives. Identification conditions are derived which show that when an intercept is present in every latent utility equation then every element of the shocks' variance-covariance matrix must be fixed in order to attain exact-identification. Monte Carlo simulations are used to illustrate unbiasedness and relative efficiency of the proposed simulators and to confirm the predicted under- and exact-identification of representative parameterizations. A previous study on the relation between movie attendance and violent crimes is reconsidered in the multinomial probit framework.

Torna alla lista dei seminari archiviati

### Learning the two parameters of the Poisson Dirichlet distribution with forensic applications

### Giulia Cereda (University of Leiden, Netherlands)

The "rare type match problem" is the situation in which the suspect's DNA profile, matching the DNA profile of the crime stain, is not in the database of reference. The evaluation of this match in the light of the two competing hypotheses (the crime stain has been left by the suspect or by another person) is based on the calculation of the likelihood ratio and depends on the population proportions of the DNA profiles, that are unknown. We propose a Bayesian nonparametric method that uses a two-parameter Poisson Dirichlet distribution as a prior over the ranked population proportions and discards the information about the names of the different DNA profiles. The Poisson Dirichlet seems to be appropriate for data coming from European Y-STR DNA profiles, but the likelihood ratio turned out to depend on the posterior of the two parameters, treated as random variables, given observed data. Inference on the Poisson Dirichlet parameters has not received much attention in Bayesian nonparametric literature. In the previous work, we have explored the use of MLE estimators for the two parameters of Poisson Dirichlet distribution, and brutally plug them into the LR evaluation. Now, we are investigating the approximation of the posterior of the Poisson Dirichlet parameters via MCMC and ABC methods. The three approaches are discussed and compared, both in controlled situations where the true parameters are known. Applications to real caseworks are also proposed.

Torna alla lista dei seminari archiviati

### Machine learning methods for estimating the employment status in italy

### Roberta Varriale (ISTAT)

In recent decades, National Statistical Institutes have focused on producing official statistics by exploiting multiple sources of information (multi-source statistics) rather than a single source, usually a statistical survey. The growing importance of producing multi-source statistics in official statistics has led to increasing investments in research activities in this sector. In this context, one of the research projects addressed by the Italian National Statistical Institute (Istat) concerned the study of methodologies for producing estimates on employment rates in Italy through the use of multiple sources of information, survey data and administrative sources. The data comes from the Labour Force (LF) survey conducted by Istat and from several administrative sources that Istat regularly acquires from external bodies. The “quantity” of information is very different: those coming from administrative sources concern about 25 million individuals, while those coming from the LF survey refer to an extremely limited number (about 330,000) of individuals. The two measures do not agree on employment status for about 6% of the units from the LF survey. One proposed approach uses a Hidden Markov model to take into account the deficiencies in the measurement process of both survey and administrative sources. The model describes a measurement process as a function of a time-varying latent state (in this case the employment category), whose dynamics is described by a Markov chain defined over a discrete set of states. At present, the implementation phase for the production process of statistics on employment through the use of HM models is coming to an end in Istat. The present work describes the use of Machine Learning methods to predict the individual employment status. This approach is based on the application of decision tree and random forest models, that are predictive models, usually used to classify instances of large amounts of data. In the work, the obtained results will be described, together with their usefulness in this application context. The models have been applied through the use of the software R.

Torna alla lista dei seminari archiviati

### China's Age of Abundance: An Application of the National Transfer Accounts (NTA) Method

### WANG Feng (University of California, Irvine, USA, and Fudan University, China)

In the past 15 years or so, the National Transfer Accounts (NTA) approach has extended to over 90 countries in the world, as an analytical tool to examine intergenerational economic transfers along with population changes. Using China as an example, this lecture briefly introduces the methodology and data sources required, and examines changes in the following four areas: 1) changing income and consumption age profiles along with rapid economic change; 2) aggregate life-cycle deficits (surpluses) and their projections; 3) inequalities in public transfers over time, and 4) projecting fiscal burdens due to population aging and welfare program expansion. The talk serves as an overview of China's historical transformations as seen from the NTA-based analytical approach, and a plan to explore areas for future studies and global comparisons.

Torna alla lista dei seminari archiviati

### Composite likelihood inference for simultaneous clustering and dimensionality reduction of mixed-type longitudinal data

### Monia Ranalli (Sapienza Università di Roma)

This talk aims at introducing a multivariate hidden Markov model (HMM) for mixed-type (continuous and ordinal) variables. As some of the considered variables may not contribute to the clustering structure, a hidden Markov-based model is built such that discriminative and noise dimensions can be recognized. The variables are considered to be linear combinations of two independent sets of latent factors where one contains the information about the cluster structure, following an HMM, and the other one contains noise dimensions distributed as a multivariate normal (and it does not change over time). The resulting model is parsimonious, but its computational burden may be cumbersome. To overcome any computational issue, a composite likelihood approach is introduced to estimate model parameters. The model is applied to a real dataset derived from the first five waves of the Chinese Longitudinal Healthy Longevity Survey. The model is able to identify the discriminant variables and capture the cluster structure changing over time parsimoniously.

Torna alla lista dei seminari archiviati

### Automating the Process of Information Quality Assessment

### Davide Ceolin (CWI and Vrije Universiteit Amsterdam)

Assessing the quality of online information is a challenging, yet crucial task to help users dealing with information overload. Experts (of diverse backgrounds) are often good examples of reliable assessors, but experts are a costly and limited resource. In this talk, I will present some advancements on scaling up the assessment process, involving the use of crowdsourcing and the definition of trust-related metrics. I will evaluate these methods on a corpus of online documents regarding the vaccination debate.

Torna alla lista dei seminari archiviati

### Living with Intelligent Machines

### Nello Cristianini (University of Bristol)

Nello Cristianini is Professor of Artificial Intelligence (AI) at the University of Bristol. He is the co-author of two widely known books in machine learning, An Introduction to Support Vector Machines and Kernel Methods for Pattern Analysis, as well as a book in bioinformatics, Introduction to Computational Genomics. He is also a recipient of the Royal Society Wolfson Research Merit Award and a current holder of a European Research Council Advanced Grant. Currently he is working on social and ethical implications of AI.

Torna alla lista dei seminari archiviati

### Current Challenges in the Analysis of Brain Signals

### Hernando Ombao (KAUST)

Advances in imaging technology has given unprecedented access for neuroscientists to examine various facets of how the brain “works”. Brain activity is complex. A full understanding of brain activity requires careful study of its multi-scale spatial-temporal organization (from neurons to regions of interest; and from transient events to long-term temporal dynamics). In this talk, the focus will be on modeling connectivity between many channels of a network of electroencephalogram (EEG) recordings. There are many challenges to analyzing brain connectivity. First, there is no unique measure that can completely characterize dependence between EEG channels in a network. Second, brain data is massive - these are recordings across many location and over long recording times. Third, it has a complex structure with non-stationary properties that evolve over space and time. Thus, these challenges present big opportunities for data scientists to develop new tools and models for addressing the current research in the neuroscience community.

Torna alla lista dei seminari archiviati

### Gender Bias in Academic Promotions, Myth or Reality? Evidence from a Factorial Survey Experiment

### Nevena Kulic (European University Institute)

Given the relatively equal share of women and men at the PhD level, there is a dramatic decrease of women’s representation in higher stages of academic career. In this article, we study whether and how demand-side processes contribute to this decrease. We examine, in particular, bias occurring due to differences in the evaluation of network ties of men and women. The following questions are asked: Is there a gender bias in hiring/promotion due to different evaluation of networks of men and women in academia?(2) If yes, what is the role of strength of ties therein?(3) Does the status of a candidate’s tie matter in hiring/promotion and if yes, is the effect different for men and women? Factorial survey experiment on hiring/promotion of full professors was conducted in Italy on three broad disciplines: humanities (area 11), economics and statistics (area 13), social and political sciences (area 14). In this experiment, respondents evaluate a randomly assigned set of candidate’s profiles, which vary randomly in their gender and type of network ties. We hypothesized that women will be evaluated more positively than men if they rely on inter-connected contacts in closed networks, and borrow social capital from a high-status tie. On the contrary, we expected that men will be better evaluated than women when relying on open external networks, and will be less penalized than women for not relying on high status contacts. Results indicate that our hypotheses are partly confirmed: gender bias is found in the evaluation of external ties in competence evaluation: women tend to be evaluated less well when their ties are external (weak), and the bias originates in the sample of male respondents. There is, however, no confirmation on the gendered effect of status of ties. Our article has important policy relevance and it helps unravel the demand side mechanisms of gender inequality in academia. A better understanding of these processes would indicate whether and how academic institutions could adjust their procedures in order to bring more parity, and prevent the loss of human capital due to lower presence of highly qualified women.

Torna alla lista dei seminari archiviati

### Identification of good educational practices in schools with high added value using Data Mining Techniques

### Fernando Martínez Abad (University of Salamanca)

The main objective of this Research Project, funded by BBVA Foundation, was the identification of factors associated with performance in schools with high added value for the production of a catalogue of good educational practices and its dissemination to the educational community. Based on the results of the Spanish sample from the PISA 2015 assessment, this objective was developed in 3 stages: 1) Application of hierarchical linear models for identifying schools with high and low effectiveness: To quantify the effectiveness of every school, we isolated the average performance of schools not attributable to the effect to the contextual variables in order to obtain an average residual. Thus, we select and characterize schools whose residual average performance is systematically maintained at higher (high effectiveness) and lower levels (low effectiveness); 2) Application of Data Mining techniques for identifying non-contextual factors associated with high and low effectiveness: We applied decision trees, which generate classifications based on their scores in the explanatory variables taking as a reference the given criterion (schools considered to have high and low effectiveness); 3) Design of a catalogue of good educational practices: this stage was developed with the aim of disseminating the results to different groups that may be interested in the project.

Torna alla lista dei seminari archiviati

### Dimensionality reduction via the identifications of the data intrinsic dimensions

### Antonietta Mira (Università della Svizzera Italiana)

Even if they are defined on a space with a large dimension, data points usually lie onto a hypersurface, or manifold with a much smaller intrinsic dimension (ID). The recent TWO-NN method (Facco et al., 2017, Scientific Report) allows estimating the ID when all points lie onto a single manifold. TWO-NN makes only a fairly weak assumption, that the density of points is approximately constant in a small neighborhood around each point. Under this hypothesis, the ratio of the distances of a point from its first and second neighbour follows a Pareto distribution that depends parametrically only on the ID, allowing for an immediate estimation of the latter. We extend the TWO-NN model to the case in which the data lie onto several manifolds with different ID. While the idea behind the extension is simple (the Pareto distribution is just replaced by a mixture of K Pareto distributions), a non-trivial Bayesian scheme is required for correctly estimating the model and assigning each point to the correct manifold. Applying this method, which we dub Hidalgo (heterogeneous intrinsic dimension algorithm), we uncover a surprising ID variability in several real-world datasets such as FMRI, protein folding, financial data, gene expression and basketball data. The Hidalgo model obtains remarkable results, but its limitation consists in fixing a priori the number of component in the mixture. To adopt a fully Bayesian approach, a possible extension would be the specification of a prior distribution for the parameter K. Instead, with even greater flexibility, we let K go to infinity, using a Bayesian Nonparametric approach and model the data as an infinite mixture of Pareto distributions. This approach, at the same time, takes into account the uncertainty relative to the number of mixture components.

Torna alla lista dei seminari archiviati

### B. Arpino: Challenges of causal inference in demographic observational studies

R. Guetto: The growth of mixed unions in Italy: a marker of immigrant integration and societal openness?

### Welcome seminar: Bruno Arpino & Raffaele Guetto

B.Arpino: I will start the seminar summarising my research interests and academic trajectory. Then I will present the following work-in-progress study that combines my main methodological and applied research interests:

Demographers are often confronted with the goal of establishing a causal link between demographic events (e.g., fertility, union formation and dissolution) and socio-economic, health and other types of outcomes. Since experiments are commonly not a feasible strategy, demographic research often relies on observational studies. Not being able to manipulate the treatment assignment, demographers have to deal with several issues, such as omitted variable bias and reverse causality. The aims of this paper are to review the methods commonly used by demographers to estimate causal effects in observational studies and to discuss strengths and limitations of these methods. Motivated by the estimation of the causal effect of grandparental childcare on health and using simulations mimicking the Survey of Health and Retirement in Europe (SHARE), I will compare propensity score matching, fixed effects and instrumental variables regression. The goal of these simulations is to highlight the consequences of violations of assumptions underlying each method depending also on different types of data available to the researcher.

I will conclude the seminar with my plans for future research in the short run.

R.Guetto: I will start the seminar presenting my academic background, my research interests and plans for future research. I will then present an overview of the results of the research on native-immigrant unions and immigrant socioeconomic integration that I carried out in the last years.

Mixed unions, i.e. unions between natives and immigrants, which in Italy primarily involve Italian men partnered with foreign women, are commonly understood as the height of immigrant integration and societal openness. However, consistent with the status exchange theory, such unions are more likely when less-educated, older native men marry better educated, younger immigrant women, especially when the latter originate from non-Western countries. I highlight the existence of a multiplicity of factors underlying such mating patterns. From the standpoint of the foreign partner, I discuss the relevance of immigrants’ economic circumstances and provide causal evidence on the role played by the possibility of obtaining Italian/EU citizenship through marriage. The analysis of the Italian partner’s perspective points to the increasing crowding out of low-educated men on the native marriage market. Cultural factors also need to be considered, as foreign women are usually more compliant with traditional gender roles. Overall, the results reject a simplistic interpretation of the growth of mixed unions as an indicator of increased immigrant integration and societal openness.

Torna alla lista dei seminari archiviati

### A causal inference approach to evaluate the health impacts of air quality regulations: The health benefits of the 1990 Clean Air Act Amendments

### Rachel Nethery (Harvard University)

In evaluating the health effects of previously implemented air quality regulations, the US Environmental Protection Agency first predicts what air pollution levels would have been in a given year under the counterfactual scenario of no regulation and then inserts these predictions into a health impact function to predict the corresponding counterfactual number of various types of health events (e.g., death, cardiovascular events, etc...). These predictions are then compared to the number of health events predicted for the observed pollutant levels under the regulations. This procedure is generally carried out for each pollutant separately. In this paper, we develop a causal inference framework to estimate the number of health events prevented by air quality regulations via the resulting changes in exposure to multiple pollutants simultaneously. We introduce a causal estimand called the Total Events Avoided (TEA), and we propose both a matching method and a Bayesian machine learning method for estimation. In simulations, we find that both the matching and machine learning methods perform favorably in comparison to standard parametric approaches, and we evaluate the impacts of tuning parameter specifications. To our knowledge, this is the first attempt to perform causal inference in the presence of multiple continuous exposures. We apply these methods to investigate the health impacts of the 1990 Clean Air Act Amendments (CAAA). In particular, we seek to answer the question "How many mortality events, cardiovascular hospitalizations, and dementia-related hospitalizations were avoided in the Medicare population in 2001 thanks to CAAA-attributable changes in pollution exposures in the year 2000?". For each zipcode in the US, we have obtained (1) pollutant exposure levels with the CAAA in place in 2000, (2) the observed count of each health event in the Medicare population in 2001, and (3) estimated counterfactual pollutant exposure levels under a no-CAAA scenario in 2000. Without relying on modeling assumptions, our matching and machine learning methods use confounder-adjusted relationships between observed pollution exposures and health outcomes to inform estimation of the number of health events that would have occurred in the same population under the counterfactual, no-CAAA pollution exposure levels. The TEA is computed as the difference in the estimated no-CAAA counterfactual event count and the observed event count. This approach could be used to analyze any regulation, any set of pollutants, and any health outcome for which data are available. This framework improves on the current regulatory evaluation protocol in the following ways: (1) the causal inference approach clarifies the question under study, the statistical quantity being estimated, and the assumptions of the methods; (2) statistical models and resulting estimates are built on real health outcome data; (3) the results do not rely on dubious parametric assumptions; and (4) all pollutants are evaluated concurrently so that any synergistic effects are accounted for.

Torna alla lista dei seminari archiviati

### Sustainable development Goals, Climate Change and Hazardous events: statistical measures, challenges and innovations

### Angela Ferruzza (ISTAT)

Sustainable development Goals, Climate Change and Hazardous events frameworks will be presented, considering their interactions. Challenges and innovations in the process of improving building capacity for increasing the statistical measurues will be considered. The 2019 SDGs Istat National Report will be presented.

Torna alla lista dei seminari archiviati

### Causal inference methods in environmental studies: challenges and opportunities

### Francesca Dominici (Harvard University)

What if I told you I had evidence of a serious threat to American national security – a terrorist attack in which a jumbo jet will be hijacked and crashed every 12 days. Thousands will continue to die unless we act now. This is the question before us today – but the threat doesn't come from terrorists. The threat comes from climate change and air pollution. We have developed an artificial neural network model that uses on-the-ground air-monitoring data and satellite-based measurements to estimate daily pollution levels across the continental U.S., breaking the country up into 1-square-kilometer zones. We have paired that information with health data contained in Medicare claims records from the last 12 years, and for 97% of the population ages 65 or older. We have developed statistical methods and computational efficient algorithms for the analysis of over 460 million health records. Our research shows that short and long term exposure to air pollution is killing thousands of senior citizens each year. This data science platform is telling us that federal limits on the nation's most widespread air pollutants are not stringent enough. This type of data is the sign of a new era for the role of data science in public health, and also for the associated methodological challenges. For example, with enormous amounts of data, the threat of unmeasured confounding bias is amplified, and causality is even harder to assess with observational studies. These and other challenges will be discussed.

Torna alla lista dei seminari archiviati

### Efficient Data Processing in High Performance Big Data Platforms

### Nicola Tonellotto (ISTI CNR)

Abstract: One of the largest and most used big data platforms used nowadays is represented by the Web search engines. With the ever-growing amount of data produced daily, all Web search companies must rely on distributed data storage and processing mechanisms hosted on computer clusters composed by thousands of processors and provided by large data centers. This distributed infrastructure allows to execute a vast amount of complex data processing that provide effective insights for the users, at near real-time with sub-second response times. Moreover, the most recent scientific advances in big data analytics exploit machine learning and artificial intelligence solutions. These solutions are particularly computationally expensive, and their energy consumption has a great impact on the overall energy consumption at a global scale.
In this seminar we will discuss some recent investigations about (i) novel efficient algorithmic solutions to improve the usage of hardware resources (e.g., reducing response times and increasing the throughput) when complex machine learned models for processing large data collections and (ii) online management of computational load to reduce the energy consumption by automatically switching among their available CPU frequencies to adapt to external operational conditions.

Short bio: Dr. Nicola Tonellotto (http://hpc.isti.cnr.it/~khast/) is a researcher within the High Performance Computing Lab at Information Science and Technologies Institute of the National Research Council of Italy. His main research interests include high performance big data platforms and information retrieval, focusing on efficiency aspects of query processing and resource management. Nicola has co-authored more than 60 papers on these topics in peer reviewed international journal and conferences. He lectures on Computer Architectures for BSc students and Distributed Enabling Platforms for MSc students at the University of Pisa. He was co-recipient of the ACM’s SIGIR 2015 Best Paper Award for the paper entitled “QuickScorer: a Fast Algorithm to Rank Documents with Additive Ensembles of Regression Trees”.

Torna alla lista dei seminari archiviati

### Alternatives to Aging Alone?: "Kinlessness" and the Potential Importance of Friends

### Christine Mair (University of Maryland)

Increasing numbers of older adults cross-nationally are without children or partners in later life and may have greater reliance on non-kin (e.g., friends), although these patterns likely vary by country context. This paper hypothesizes that those without traditional kin and who live in countries with a stronger emphasis on friendship will have more friends. While these hypothesized patterns are consistent with interdisciplinary literatures, they have not been tested empirically and remain overlooked in current narratives on "aging alone." This study combines individual-level data from the Survey of Health, Ageing, and Retirement in Europe (SHARE, Wave 6) with aggregate nation-level data to estimate multilevel negative binomial models exploring number of friends among those aged 50+ across 17 countries. Those who lack kin report more friends, particularly in countries with a higher percentage of people who believe that friends are "very important" in life. This paper challenges dominating assumptions about "aging alone" that rely on lack of family as an indicator of "alone." Future studies should investigate how friendship is correlated with lack of kin, particularly in wealthier nations. Previous research may have overestimated risk in wealthier nations, but underestimated risk in less wealthy nations and/or more family-centered nations.

Torna alla lista dei seminari archiviati

### S. Bacci: Developments in the context of Item Response Theory models

A. Magrini: Linear Markovian models with distributed-lags to assess the economic impact of investments

### Welcome seminar: Silvia Bacci & Alessandro Magrini

S. Bacci: Latent variable models are a wide family of statistical models based on the use of unobservable (i.e., latent) variables for multiple aims, such as, measuring unobservable traits, accounting for measurement errors, representing unobserved heterogeneity that arises with complex data structures (e.g., multilevel and longitudinal data). The class of latent variable models includes the Item Response Theory (IRT) models, which are adopted to measure latent traits, when individual responses to a set of categorical items are available.
In such a context, the multidimensional Latent Class IRT (LC-IRT) models extend traditional IRT models for multidimensionality (i.e., presence of multiple latent traits) and discreteness of latent traits, which allows us to cluster individuals in unobserved homogeneous groups (latent classes).
The general formulation of the class of models at issue is illustrated with details concerning the model specification, the estimation, and the model selection. Furthermore, some useful extensions to deal with real data are discussed: (i) introduction of individual covariates, (ii) model formulation to account for multilevel data structures, and (iii) model formulation to deal with missing item responses. The estimation of models at issue may be accomplished through two specific R packages. Finally, some applications in the educational setting are illustrated.

A. Magrini: Linear regression with temporally delayed covariates (distributed-lag linear regression) is a standard approach to assess the impact of investments on economic outcomes through time. Typically, constraints on the lag shapes are required to meet domain knowledge. For instance, the effect of an investment may be small at first, then it may reach a peak before diminishing to zero after some time lags. Polynomial lag shapes with endpoint constraints are typically exploited to represent such feature. A deeper analysis is directed towards the decomposition of the 'overall' impact into intermediate contributions and considers several multiple outcomes. Linear Markovian models are proposed for this task, where a set of distributed-lag linear regressions are recursively defined according to causal assumptions of the problem domain. The theory and the software are presented, several real-world applications are illustrated, and open issues are discussed.

Torna alla lista dei seminari archiviati

### Couples' transition to parenthood in Finland: A tale of two recessions

### Chiara Comolli (University of Lausanne)

The question of how fluctuations in the business cycles and fertility are linked resurfaced in the aftermath of the Great Recession of 2008-09, when birth rates started declining in many countries. Finland, although affected to a much lesser extent than other regions of Europe, is no exception to this decline. However, previous macro-level research on the much stronger recession in Finland in the 1990s shows that, contrary to other developed countries, the typical pro-cyclical behavior of fertility in relation to the business cycle was absent. The objective of this paper is to test how a typical feature of both recessions at the individual level, labor market uncertainty, is linked to childbearing risk in Finland. In particular, I focus on the transition to first birth and on the explicit period comparison between the 1990s and the 2000s. I use Finnish population registers (1988- 2013) and adopt a dyadic couple perspective to assess the association between each partner's employment status and the transition to parenthood. Finally, I investigate how, differently in the two periods, the latter relationship changes depending on aggregate labor market conditions to test whether there was a change over time from counter- to pro- cyclicality of fertility in Finland.

Torna alla lista dei seminari archiviati

### Simple Structure Detection Through Bayesian Exploratory Multidimensional IRT Models

### Lara Fontanella (Università di Chieti-Pescara)

In modern validity theory, a major concern is the construct validity of a test, which is commonly assessed through confirmatory or exploratory factor analysis. In the framework of Bayesian exploratory Multidimensional Item Response Theory (MIRT) models, we discuss two methods aimed at investigating the underlying structure of a test, in order to verify if the latent model adheres to a chosen simple factorial structure. This purpose is achieved without imposing hard constraints on the discrimination parameter matrix to address the rotational indeterminacy. The first approach prescribes a 2-step procedure. The parameter estimates are obtained through an unconstrained MCMC sampler. The simple structure is, then, inspected with a post-processing step based on the Consensus Simple Target Rotation technique. In the second approach, both rotational invariance and simple structure retrieval are addressed within the MCMC sampling scheme, by introducing a sparsity-inducing prior on the discrimination parameters. Through simulation as well as real-world studies, we demonstrate that the proposed methods are able to correctly infer the underlying sparse structure and to retrieve interpretable solutions.

Torna alla lista dei seminari archiviati

### Incorporating information and assumptions to estimate validity of case identification in healthcare databases and to address bias from measurement error in observational studies: a research plan to su

### Rosa Gini (Agenzia Regionale di Sanità della Toscana)

Observational studies based on healthcare databases have become increasingly common. While methods to address confounding have been tailored to the nature of this data, methods to address bias from measurement error are less developed. However, it is acknowledged that study variables are not measured exactly in databases, since data is collected for purposes other than research, and the variables are measured based on recorded information only. Estimating indices of validity of a measurement, such as sensitivity, specificity, positive and negative predictive value, is commonly recommended, but rarely accomplished. In this talk we introduce a methodology of measuring variables, called component strategy, that has the potential of addressing this problem when multiple sources of data are available. We illustrate the examples of a chronic disease (type 2 diabetes), an acute disease (acute myocardial infarction) and an infectious disease (pertussis) measured in multiple European databases and we describe the effect of different measurements. We introduce some formulas that allow to analytically obtain some validity indices based on other validity indices and on observed frequencies. We sketch a research plan based on this strategy that aims at understanding when partial information on validity can be exploited to provide a full picture of measurement error, at quantifying the dependence on assumptions, at measuring associated uncertainty, and at addressing bias produced by quantified measurement error. We introduce the ConcePTION project, a 5-year project funded by the Innovative Medicines Initiative, that aims at building an ecosystem for better monitoring safety of medicines use in pregnancy and breastfeeding, based on procedures and tools to transform existing data into actionable evidence, resulting in better and timely information. The aim of the talk is to discuss the research plan and foster a collaboration, that may profit, in particular, the ConcePTION project.

Torna alla lista dei seminari archiviati

### Statistical Learning with High-dimensional Structurally Dependent Data

### Tapabrata Maiti (Michigan State University)

Rapid development of information technology is making it possible to collect massive amounts of multidimensional, multimodal data with high dimensionality in diverse fields of science and engineering. New statistical learning and data mining methods have been developing accordingly to solve challenging problems arising out of these complex systems. In this talk, we will discuss a specific type of statistical learning, namely the problem of feature selection and classification when the features are high dimensional and structured, specifically, they are spatio-temporal in nature. Various machine learning techniques are suitable for this type of problems although the underlying statistical theories are not well established. We will discuss some recently developed techniques in the context of specific examples arising in neuroimaging study.

Torna alla lista dei seminari archiviati

### Analysis and automatic detection of hate speech: from pre-teen Cyberbullying on WhatsApp to Islamophobic discourse on Twitter

### Rachele Sprugnoli

The widespread use of social media yields a huge number of interactions on the Web. Unfortunately, social media messages are often written to attack specific groups of users based on their religion, ethnicity or social status. Due to the massive rise of hateful, abusive, offensive messages, platforms such as Twitter and Facebook have been searching for solutions to tackle hate speech. As a consequence, the amount of research targeting the detection of hate speech, abusive language and cyberbullying also shows an increase. In this talk we will present two projects in the field of hate speech analysis and detection: CREEP which aims at identifying and preventing the possible negative impacts of cyberbullying on young people and HateMeter, whose goal is to increase the efficiency and effectiveness of NGOs in preventing and tackling Islamophobia at EU level. In particular, we will describe the language resources and technologies under development in the two projects and we will show two demos based on Natural Language Processing tools.

Torna alla lista dei seminari archiviati

### Using marginal models for structure learning

### Sung-Ho Kim (Dept of Mathematical Sciences, KAIST)

Structure learning for Bayesian networks has been made in a heuristic mode in search of an optimal model to avoid an explosive computational burden. A structural error which occurred at a point of structure learning may deteriorate its subsequent learning. In the talk, a remedial approach to this error-for-error process will be introduced by using marginal model structures. The remedy is made by fixing local errors in structure in reference to the marginal structures. In this sense, the remedy is called a marginally corrective procedure. A new score function is also introduced for the procedure which consists of two components, the likelihood function of a model and a discrepancy measure in marginal structures. The marginally corrective procedure compares favorably with one of the most popular algorithms in experiments with benchmark data sets.

Torna alla lista dei seminari archiviati

### Evidence of bias in randomized clinical trials of hepatitis C interferon therapies

### Massimo Attanasio (Università degli Studi di Palermo)

Introduction: Bias may occur in randomized clinical trials in favor of the new experimental treatment because of unblinded assessment of subjective endpoints or wish bias. Using results from published trials, we analyzed and compared the treatment effect of hepatitis C antiviral interferon therapies experimental or control. Methods: Meta-regression of trials enrolling naive hepatitis C virus patients that underwent four therapies including interferon alone or plus ribavirin during past years. The outcome measure was the sustained response evaluated by transaminases and/or hepatitis C virus-RNA serum load. Data on the outcome across therapies were collected according to the assigned arm (experimental or control) and to other trial and patient-level characteristics. Results: The overall difference in efficacy between the same treatment labeled experimental or control had a mean of + 11.9% (p < 0.0001). The unadjusted difference favored the experimental therapies of group IFN-1 (+ 6%) and group IFN-3 (+ 10%), while there was no difference for group IFN-2 because of success rates from large multinational trials. In a meta-regression model with trial-specific random effects including several trial and patient-level variables, treatment and arm type remained significant (p < 0.0001 and p = 0.0009 respectively) in addition to drug-schedule-related variables. Conclusion: Our study indicates the same treatment is more effective when labeled ‘‘experimental’’ compared to when labeled ‘‘control’’ in a setting of trials using an objective endpoint and even after adjusting for patient and study-level characteristics. We discuss several factors related to design and conduct of hepatitis C trials as potential explanations of the bias toward the experimental treatment.

Torna alla lista dei seminari archiviati

### A new aspect of Riordan arrays - II

### Gi-Sang Cheon (Department of Mathematics, Sungkyunkwan University (Korea))

Let (S,\star) be a semigroup. A semigroup algebra ${\mathbb K}[S]$ of $S$ over a field ${\mathbb K}$ is the set of all linear combinations of finitely many elements of $S$ with coefficients in ${\mathbb K}$: \begin{eqnarray*} {\mathbb K}[S]=\left\{\sum_{\alpha\in S}c_\alpha \alpha| c_\alpha\in {\mathbb K}\right\}. \end{eqnarray*} The ring of formal power series over a field ${\mathbb K}[[t]]$ together with convolution is an example of semigroup. It suggests that Riordan arrays over ${\mathbb K}[[t]]$ can be generalized to a semigroup algebra ${\mathbb K}[S]$. Furthermore, by using the fact that lattice is a partially ordered set and a semigroup, the notion of {\it semi-Riordan arrays} over a semigroup algebra will be introduced in connection with lattice and poset. Then we will see that a Riordan array is the semi-Riordan array over the semigroup algebra ${\mathbb K}[S]$ where $S=\{0,1,2,\ldots\}$ is a semigroup together with usual addition which is a totally ordered set.

Torna alla lista dei seminari archiviati

### A new aspect of Riordan arrays - I

### Gi-Sang Cheon (Department of Mathematics, Sungkyunkwan University (Korea))

Let $R[[t]]$ be the ring of formal power series over a ring $R$. A Riordan array $(g,f)$ is an infinite lower triangular matrix constructed out of two functions $g,f\in R[[t]]$ with $f(0) = 0$ in such a way that its $k$th column generating function is $gf^k$ for $k\ge0$. The set of all invertible Riordan arrays forms a group called the {\it Riordan group}. In many contexts we see that the Riordan arrays are used as a machine to generate new approaches in combinatorics and its applications. Throughout this talk we will see the Riordan group and Riordan arrays from the several different points of view, e.g. group theory, combinatorics, graph theory, matrix theory, topology and Lie theory. In addition, we will see how Riordan arrays have been generalized and where they have been applied.

Torna alla lista dei seminari archiviati

### Statistica, nuovo empirismo e società nell’era dei Big Data

### Giuseppe Arbia (Univ. Cattolica Sacro Cuore Roma, Univ. Svizzera Italiana, College of William & Mary di Williamsburg)

Gli ultimi decenni hanno visto un'esplosione formidabile nella raccolta dei dati e nella loro diffusione ed utilizzo in tutti i settori della società umana. Tale fenomeno è dovuto soprattutto alla accresciuta capacità di raccogliere ed immagazzinare informazioni in forma automatica attraverso fonti diversissime quali sensori di varia natura, satelliti, telefoni cellulari, internet, droni e molti altri ancora. È questo il fenomeno denominato la "rivoluzione dei big data".

Lo scopo del seminario è quello di descrivere, in termini accessibili anche ai non specialisti, il fenomeno dei big data e le sue possibili ripercussioni nella vita di ogni giorno. Inoltre si discuterà le conseguenze sulla Statistica: l'arte di conoscere la realtà e di prendere decisioni sulla base di dati empirico-osservazionali. Presentazione del libro "Statistica, nuovo empirismo e società nell'era dei Big Data" di Giuseppe Arbia, Edizioni Nuova Cultura (2018).

Torna alla lista dei seminari archiviati

### Cattuto: High-resolution social networks: measurement, modeling and applications

Paolotti: It takes a village - how collaborations in data science for social good can make a difference

### Doppio seminario: Ciro Cattuto & Daniela Paolotti (ISI Foundation)

Ciro Cattuto: Digital technologies provide the opportunity to quantify specific human behaviors with unprecedented levels of detail and scale. Personal electronic devices and wearable sensors, in particular, can be used to map the network structure of human close-range interactions in a variety of settings relevant for research in computational social science, epidemiology and public health. This talk will review the experience of the SocioPatterns collaboration (www.sociopatterns.org), an international effort aimed at measuring and studying high-resolution human and animal social networks using wearable proximity sensors. I will discuss technology requirements and measurement experiences in diverse environments such as schools, hospitals and households, including recent work in low-resource rural settings in Africa. I will discuss the complex features found in empirical temporal networks and show how methods from network science and machine learning can be used to detect structures and to understand the role they play for dynamical processes, such as epidemics, occurring over the network. I will close with an overview of future research directions and applications.

Paolotti: The unprecedented opportunities provided by data science in all the areas of human knowledge become even more evident when applied to the fields of social innovation, international development and humanitarian aid. Using social media data to study malnutrition and obesity in children in developing countries, using mobile phones digital traces to understand women mobility for safety and security, harvesting search engine queries to study suicide among young people in India: these are only a few of the examples of how data science can be exploited to solve issues around many social problems and support global agencies and policymakers in implementing better and more impactful policies and interventions. Nevertheless, data scientists alone cannot be successful in this complex effort. Greater access to data, more collaboration between public and private sector entities, and an increased ability to analyze datasets are needed to tackle these society's greatest challenges.
In this talk, we will cover examples of how actors from different entities can join forces around data and knowledge to create public value with an impact on global societal issues and set the path to accelerate the harnessing of data science for social good.

Torna alla lista dei seminari archiviati

### CONJUGATE BAYES FOR PROBIT REGRESSION VIA UNIFIED SKEW-NORMALS

### Daniele Durante (Department of Decision Sciences, Bocconi University)

Regression models for dichotomous data are ubiquitous in statistics. Besides being useful for inference on binary responses, such methods are also fundamental building-blocks in more complex classification strategies covering, for example, Bayesian additive regression trees (BART). Within the Bayesian framework, inference proceeds by updating the priors for the coefficients, typically set to be Gaussians, with the likelihood induced by probit or logit regressions for the binary responses. In this updating, the apparent absence of a tractable posterior has motivated a variety of computational methods, including Markov Chain Monte Carlo (MCMC) routines and algorithms which approximate the posterior. Despite being routinely implemented, current MCMC strategies face mixing or time-inefficiency issues in large p and small n studies, whereas approximate routines fail to capture the skewness typically observed in the posterior. In this seminar, I will prove that the posterior distribution for the probit coefficients has a unified skew-normal kernel, under Gaussian priors. Such a novel result allows efficient Bayesian inference for a wide class of applications, especially in large p and small-to-moderate n studies where state-of-the-art computational methods face notable issues. These advances are outlined in a genetic study, and further motivate the development of a wider class of conjugate priors for probit models along with methods to obtain independent and identically distributed samples from the unified skew-normal posterior. Finally, these results are also generalized to improve classification via BARTs.

Torna alla lista dei seminari archiviati

### Heterogeneity in dynamics of risk accumulation: the case of unemployment

### Raffaele Grotti (European University Institute)

The paper aims at providing a contribution to the study of socioeconomic risks. It studies different mechanisms that account for the stickiness of the unemployment condition and, relatedly, for the longitudinal accumulation of unemployment experiences. In particular, the paper disentangles two mechanisms: ‘genuine state dependence’ dynamics and unobserved characteristics at the individual level; and accounts both for their relative weight and for their possible interplay in shaping the accumulation of unemployment risks over time. Dynamics of accumulation are investigated, showing their distribution among different workforce segments defined both in terms of observable and unobservable characteristics. This is done applying correlated dynamic random-effects probit models and providing statistics for protracted unemployment exposure. The analysis makes use of EU-SILC data from 2003 to 2015 for four European countries (DK, FR, IT and UK). Empirical results indicate that both unobserved heterogeneity and genuine state dependence are relevant and partly independent factors in explaining the reiteration and accumulation of unemployment and long-term unemployment risks over time. The analysis shows how the weight of these two components varies at both macro and micro level, according to different labour market and institutional settings and depending on individual endowments. Finally, the paper discusses how the distinction between unobserved heterogeneity and genuine state dependence and the evaluation of their possible interplay can provide useful insights with respect to theories of cumulative advantages and with respect to an efficient design of policy measures aimed at contrasting the accumulation of occupational penalties over time.

Torna alla lista dei seminari archiviati

### Lupparelli: On log-mean linear regression graph models

Bocci: Statistical modelling of spatial data: some results and developments

### Welcome Seminars: Monia Lupparelli & Chiara Bocci

Lupparelli: This talk aims to illustrate the log-mean linear parameterization for multivariate Bernoulli distributions which represents the counterpart for marginal modelling of the well-established log-linear parameterization. In fact, the log-mean transformation is defined by the same log-linear mapping applied on the space of the mean parameter (the marginal probability vector) rather than on the simplex (the space of the joint probability vector). The class of log-mean linear models, under suitable zero constraints, corresponds to the class of discrete bi-directed graph models as well as the class of log-linear models is used to specify discrete undirected graph models. Moreover, the log-mean linear transformation provides a novel link function in multivariate regression settings with discrete response variables. The resulting class of log-mean linear regression models is used for modelling regression graphs via a sequence of marginal regressions where the coefficients are linear functions of log-relative risk parameters. The class of models will be better illustrated in two different contexts: (i) for assessing the effect of HIV-infection on multimorbidity and (ii) to derive the relationship between marginal and conditional relative risk parameters in regression settings with multiple intermediate variables.

Bocci: TBA

Torna alla lista dei seminari archiviati

### Flexible and Sparse Bayesian Model-Based Clustering

### Bettina Grün (Johannes Kepler University, Linz, Austria)

Finite mixtures of multivariate normal distributions constitute a standard tool for clustering multivariate observations. However, selecting the suitable number of clusters, identifying cluster-relevant variables as well as accounting for non-normal shapes of the clusters are still challenging issues in applications. Within a Bayesian framework we indicate how suitable prior choices can help to solve these issues. We achieve this considering only prior distributions that have the characteristics that they are conditionally conjugate or can be reformulated as hierarchical priors, thus allowing for simple estimation using MCMC methods with data augmentation.

Torna alla lista dei seminari archiviati

### Bayesian Structure Learning in Multi-layered Genomic Networks

### Min Jin Ha (UT MD Anderson Cancer Center)

Integrative network modeling of data arising from multiple genomic platforms provides insight into the holistic picture of the interactive system, as well as the flow of information across many disease domains. The basic data structure consists of a sequence of hierarchically ordered datasets for each individual subject, which facilitates integration of diverse inputs, such as genomic, transcriptomic, and proteomic data. A primary analytical task in such contexts is to model the layered architecture of networks where the vertices can be naturally partitioned into ordered layers, dictated by multiple platforms, and exhibit both undirected and directed relationships. We propose a multi-layered Gaussian graphical model (mlGGM) to investigate conditional independence structures in such multi-level genomic networks. We use a Bayesian node-wise selection (BANS) framework that coherently accounts for the multiple types of dependencies in mlGGM, and using variable selection strategies, allows for flexible modeling, sparsity, and incorporation of edge-specific prior knowledge. Through simulated data generated under various scenarios, we demonstrate that BANS outperforms other existing multivariate regression-based methodologies. We apply our method to estimate integrative genomic networks for key signaling pathways across multiple cancer types, find commonalities and differences in their multi-layered network structures, and show translational utilities of these integrative networks.

Torna alla lista dei seminari archiviati

### "Exit this way": Persistent Gender & Race Differences in Pathways Out of In-Work Poverty in the US

### Emanuela Struffolino (WZB - Berlin Social Science Center)

We analyze the differences by gender and race in long-term pathways out of in-work poverty. Such differences are understood as "pathway gaps", analogous with gender and racial income gaps studied in labor-market economics and sociology. We combine data from three high-quality data sources (NLSY79, NLSY97, PSID) and apply sequence analysis multistate models to 1) empirically identify pathways out of in-work poverty, 2) estimate the associations between gender and race with each distinct pathway, and 3) attempt to account for these gender and race differences. We identify five different pathways out from in-work poverty. While men and non-Hispanic whites are most likely to experience successful long-term transitions out of poverty within the labor market, women and African Americans are more likely to only temporarily exit in-work poverty, commonly by exiting the labor market. These "pathway gaps" persist even after controlling for selection into in-work poverty, educational attainment, and family demographic behavior.

Torna alla lista dei seminari archiviati

### Estimating Causal Effects On Social Networks

### Laura Forastiere (Yale Institute for Network Science - Yale University)

In most real-world systems units are interconnected and can be represented as networks consisting of nodes and edges. For instance, in social systems individuals can have social ties, family or financial relationships. In settings where some units are exposed to a treatment and its effects spills over connected units, estimating both the direct effect of the treatment and spillover effects presents several challenges. First, assumptions on the way and the extent to which spillover effects occur along the observed network are required. Second, in observational studies, where the treatment assignment is not under the control of the investigator, confounding and homophily are potential threats to the identification and estimation of causal effects on networks. Here, we make two structural assumptions: i) neighborhood interference, which assumes interference to operate only through a function of the the immediate neighbors’ treatments, ii) unconfoundedness of the individual and neighborhood treatment, which rules out the presence of unmeasured confounding variables, including those driving homophily. Under these assumptions we develop a new covariate-adjustment estimator for treatment and spillover effects in observational studies on networks. Estimation is based on a generalized propensity score that balances individual and neighborhood covariates across units under different levels of individual treatment and of exposure to neighbors’ treatment. Adjustment for propensity score is performed using a penalized spline regression. Inference capitalizes on a three-step Bayesian procedure which allows taking into account the uncertainty in the propensity score estimation and avoiding model feedback. Finally, correlation of interacting units is taken into account using a community detection algorithm and incorporating random effects in the outcome model. All these sources of variability, including variability of treatment assignment, are accounted for in in the posterior distribution of finite-sample causal estimands.This is a joint work with Edo Airoldi, Albert Wu and Fabrizia Mealli.

Torna alla lista dei seminari archiviati

### Time-varying survivor average causal effects with semicompeting risks

### Leah Comment (Department of Biostatistics, Harvard T.H. Chan School of Public Health)

In semicompeting risks problems, non-terminal time-to-event outcomes such as time to hospital readmission are subject to truncation by death. These settings are often modeled with parametric illness-death models, but evaluating causal treatment effects with hazard models is problematic due to the evolution of incompatible risk sets over time. To combat this problem, we introduce two new causal estimands: the time-varying survivor average causal effect (TV-SACE) and the restricted mean survivor average causal effect (RM-SACE). These principal stratum causal effects are defined among units that would survive regardless of assigned treatment. We adopt a Bayesian estimation procedure that is anchored to parameterization of illness-death models for both treatment arms but maintains causal interpretability. We outline a frailty specification that can accommodate within-person correlation between non-terminal and terminal event times, and we discuss potential avenues for adding model flexibility. This research is joint work with Fabrizia Mealli, Corwin Zigler, and Sebastien Haneuse.

Torna alla lista dei seminari archiviati

### Estimation of Multivariate Factor Stochastic Volatility Models by Efficient Method of Moments

### Christian Muecher (University of Konstanz)

We introduce a frequentist procedure to estimate multivariate factor stochastic volatility models. The estimation is done in two steps. First, the factor loadings, idiosyncratic variances and unconditional factor variances are estimated by approximating the dynamic factor model with a static one. Second, we apply the Efficient Method of Moments with GARCH(1,1) as an auxiliary model to estimate the stochastic volatility parameters governing the dynamic latent factors and idiosyncratic noises. Based on various simulations, we show that our procedure outperforms existing approaches in terms of accuracy and efficiency and it has clear computational advantages over the existing Bayesian methods.

Torna alla lista dei seminari archiviati

### Graph Algorithms for Data Analysis

### Andrea Marino (Università di Pisa)

Real world data can be very often modelled with networks whose aim is to represent relationships among real world entities. In this talk we will discuss efficient algorithmic tools for the analysis of big real world networks. We will overview some algorithms for the analysis of huge graphs focused on data gathering, degrees of separation computation, centrality measures, community discovery, and novel similarity measures among entities.

Torna alla lista dei seminari archiviati

### Data-driven transformations in small area estimation: An application with the R-package emdi

### Timo Schimid (Freie Universität Berlin)

Small area models typically depend on the validity of model assumptions. For example, a commonly used version of the Empirical Best Predictor relies on the Gaussian assumptions of the error terms of the linear mixed regression model, a feature rarely observed in applications with real data. The present paper proposes to tackle the potential lack of validity of the model assumptions by using data-driven scaled transformations as opposed to ad-hoc chosen transformations. Different types of transformations are explored, the estimation of the transformation parameters is studied in detail under the linear mixed regression model and transformations are used in small area prediction of linear and non-linear parameters. Mean squared error estimation that accounts for the uncertainty due to the estimation of the transformation parameters is explored using bootstrap approaches. The proposed methods are illustrated using real survey and census data for estimating income deprivation parameters for municipalities in Mexico with the R-package emdi. The package enables the estimation of regionally disaggregated indicators using small area estimation methods and includes tools for (a) customized parallel computing, (b) model diagnostic analyses, (c) creating high quality maps and (d) exporting the results to Excel and OpenDocument Spreadsheets are included. Simulation studies and the results from the application show that using carefully selected, data-driven transformations can improve small area estimation.

Torna alla lista dei seminari archiviati

### A brief history of linear quantile mixed models and recent developments in nonlinear and additive regression

### Marco Geraci (University of South Carolina)

What follows is a story that began about sixteen years ago in Viale Morgagni (with some of the events taking place in a cottage of the Montalve’s estate). In this talk, I will retrace the steps that led me to develop linear quantile mixed models (LQMMs). These models have found application in public health, preventive medicine, virology, genetics, anesthesiology, immunology, ophthalmology, orthodontics, cardiology, pharmacology, biochemistry, biology, marine biology, environmental, climate and marine sciences, psychology, criminology, gerontology, economics and finance, linguistic and lexicography. Supported by a grant from the National Institute of Child Health and Human Development, I recently extended LQMMs to nonlinear and additive regression. I will present models, estimation algorithms and software, along with a few applications.

Torna alla lista dei seminari archiviati

### Transcompiling and Analysing Firewalls

### Letterio Galletta (IMT Lucca)

Configuring and maintaining a firewall configuration is notoriously hard. On the one hand, network administrators have to know in detail the policy meaning, as well as the internals of the firewall systems and of their languages. On the other hand, policies are written in low-level, platform-specific languages where firewall rules are inspected and enforced along non trivial control flow paths. Further difficulties arise from Network Address Translation (NAT), an indispensable mechanism in IPv4 networking for performing port redirection and translation of addresses. In this talk, we present a transcompilation pipiline that helps system administrators reason on policies, port a configuration from a system to another and perform refactoring, e.g., removing useless or redundant rules. Our pipeline and its correctness are based on IFCL, a generic configuration language equipped with a formal semantics. Relying on this language we decompile a real firewall configuration into an abstract specification, which exposes the meaning of the configuration and enables us to carry out analysis and recompilation.

Torna alla lista dei seminari archiviati

### Empirical Bayes Estimation of Species Distribution with Overdispersed Data

### Fabio Divino (Dipartimento di Bioscienze e Territorio, Università del Molise)

The estimation of species distributions is fundamental in the assessment of biodiversity and in monitoring of environmental conditions. In this work we present preliminary results in Bayesian inference of multivariate discrete distributions with applications in biomonitoring. In particular, we consider the problem of the estimation of species distributions when collected data are affected by overdispersion. Ecologists have often to deal with data which exhibit a variability that differs from what they expect on the basis of the model assumed to be valid. The phenomenon is known as overdispersion if the observed variability exceeds the expected variability or underdispersion if it is lower than expected. Such differences between observed and expected variation in the data can be interpreted as failures of some of the basic hypotheses of the model. The problem is very common when dealing with counts data for which the variability is directly connected with the magnitude of the phenomenon. Overdispersion is more common than underdispersion and can be originated by several causes. Among them, the most important and relevant in ecolgy is the overdispersion by heterogeneity of the population with respect to the assumed model. An interesting approach to account for the overdispersion by heterogeneity is the method based on compound models. The idea of compound models originated during the 20s and it concerns the possibility to mix a model of interest with a mixing probability measure. Therefore, the mixture resulting by the integration generates a new model that allows to account for larger variation than that one included in the reference model. A typical application of this approach is the well known Gamma-Poisson compound model that generates the Negative Binomial model. In this work we present the use of a double compound model, a multivariate Poisson distribution combined with Gamma and Dirichlet models, in order to account for the presence of overdispersion . Some results of simulations will compare the Gamma-Dirichlet-Poisson model with the reference Multinomial model. Further, some applications in biomonitoring of aquatic environments are presented. Acknowledgement This work is part of a joint research with Salme Karkkainen, Johanna Arje and Antti Penttinen (University of Jyväskylä) and Kristian Meissner (SYKE, Finland).

Torna alla lista dei seminari archiviati

### From sentiment to superdiversity on Twitter

### Alina Sirbu (Dip. Informatica, Universita' di Pisa)

Superdiversity refers to large cultural differences in a population due to immigration. In this talk we introduce a superdiversity index based on Twitter data and lexicon based sentiment analysis, using ideas from epidemic spreading and opinion dynamics models. We show how our index correlates with official immigration statistics available from the European Commission’s Joint Research Center, and we compare it with various other measures computed from the same Twitter data. We argue that our index has predictive power in regions where exact data on immigration is not available, paving the way for a nowcasting model of immigration.

Torna alla lista dei seminari archiviati

### Convolution Autoregressive Processes and Excess Volatility

### Umberto Cherubini (University of Bologna)

We discuss the economic model and the econometric properties of the Convolution Autoregressive Process of order 1 (C-AR(1)), with focus on the simplest gaussian case. This is a first order autoregressive property in which the innovations are dependent of the lagged value of the process. We show that the model may be generated by the presence of extrapolative bias in expectations. Extrapolative expectations bring about excess volatility and excess persistence in the dynamics of the variable. While excess volatility cannot be identified if one only observes the time series of the variable, identification can be achieved if the expectations of the variable are observed in the forward markets. We show that the model is well suited to generate the excess variance of long maturity prices documented in the Giglio-Kelly variance ratio test. We finally discuss possible extensions of the model beyond the gaussian case both by changing the specification of the expectations model and setting a non linear data generating process for the fundamental process.

Torna alla lista dei seminari archiviati

### Heterogeneous federated data center for research: from design to operation

### Sergio Rabellino (University of Torino)

Bringing a large datacenter from design to operation is complex in all the world. In Italy it is a challenge that starts playing a snakes and ladders game against bureaucracy and colleagues for shaping the European tenders, the government board, the price list, the access policy and eventually all the software needed to operate the datacenter. The talk review all these aspects with experience of designing the University of Torino Competency Center in Scientific Computing serving over 20 departments with its 1M€ OCCAM platform, and writing the HPC4AI project charter recently funded with 4.5M€ by Piedmont Region and serving over 10 departments in two universities (University and Technical University of Turin).

Torna alla lista dei seminari archiviati

### Designing a heterogeneous federated data center for research

### Marco Aldinucci (Computer Science Department, University of Torino)

The advance of high-speed networks and virtualization techniques make it possible to leverage on economy of scale and consolidate all servers of a large organisation in few, powerful, energy-efficient data centers, which also synergically work with public cloud. In this, research organisations as universities exhibit distinguishing features of both technical and sociological nature. Firstly, it hardly exists a compute workload and resource usage pattern that are dominant across many different disciplines and departments. This makes the design of the datacenter and the access policy so delicate to require to address open research questions. Secondly, scientists are generally inclined to complain about all that is not total freedom to do what they want to urgently do, including strange, greedy and dangerous behaviours for the data and the system themselves.

Torna alla lista dei seminari archiviati

### Job Instability and Fertility during the Economic Recession: EU Countries

### Isabella Giorgetti (Dep. of Economics and Social Sciences, Università Politecnica delle Marche)

The trends of decline in TFR varied widely across EU countries. Exploiting individual data from the longitudinal EU-SILC dataset (2005-2013), this study investigates the cross-country effect of job instability on the couple’s choice of having one (more) child. I build job instability measure for both partners by the first-order lag of own economic activity status in labour market (holding temporary, permanent contract, or being unemployed). In order to account for the unobserved heterogeneity and potential presence of endogeneity, I estimate, under sequential moment restriction, a Two Stage Least Square Model (2SLS) in first differences. Thus, I group European countries according to six different welfare regimes and I estimate the heterogeneous effects of instability in the labour market on childbearing in a comparative framework. The principal result is that the cross-country average effect of job instability on couples’ fertility decisions has not statistical relevance due to the huge countryspecific fixed effects. Distinguishing between welfare regimes, institutional settings and social active policies reveal a varying family behaviour in fertility. In low-fertility countries, however, it is confirmed that the impact of parents’ successful labour market integration might be ambiguous and it might be due to the scarcity of child care options and/or cultural norms.

Torna alla lista dei seminari archiviati

### Data Science and Our Environment

### Francesca Dominici (Harvard T.H. Chan School of Public Health)

What if I told you I had evidence of a serious threat to American national security--a terrorist attack in which a jumbo jet will be hijacked and crashed every 12 days. Thousands will continue to die unless we act now. This is the question before us today--but the threat doesn’t come from terrorists. The threat comes from climate change and air pollution. We have developed an artificial neural network model that uses on-the-ground air-monitoring data and satellite-based measurements to estimate daily pollution levels across the continental U.S., breaking the country up into 1-square-kilometer zones. We have paired that information with health data contained in Medicare claims records from the last 12 years, and for 97% of the population ages 65 or older. We have developed statistical methods and computational efficient algorithms for the analysis over 460 million health records. Our research shows that short and long term exposure to air pollution is killing thousands of senior citizens each year. This data science platform is telling us that federal limits on the nation’s most widespread air pollutants are not stringent enough. This type of data is the sign of a new era for the role of data science in public health, and also for the associated methodological challenges. For example, with enormous amounts of data, the threat of unmeasured confounding bias is amplified, and causality is even harder to assess with observational studies. These and other challenges will be discussed.

Torna alla lista dei seminari archiviati

### Seminario di presentazione: principali linee di ricerca e risultati ottenuti

### Francesca Giambona (DiSIA)

Nel corso del seminario verranno presentate le principali linee di ricerca e le principali tematiche oggetto di analisi e di ricerca del percorso scientifico. In particolare verranno descritte le principali metodologie statistiche utilizzate per analizzare i fenomeni oggetto di studio e presentati i più importanti risultati empirici ottenuti. Infine, verranno brevemente introdotti i temi di ricerca attualmente oggetto di studio.

Torna alla lista dei seminari archiviati

### Latent variable modelling for dependent observations: some developments

### M. Francesca Marino (DiSIA)

When dealing with dependent data, standard statistical methods cannot be directly used for the analysis as they may produce biased results and lead to misleading inferential conclusions. In this framework, latent variables are frequently used as a tool for capturing dependence and describing association. During the seminar, some developments in the context of latent variable modelling will be presented. Three main area of research will be covered, namely quantile regression, social network analysis, and small area estimation. The main results will be presented and some hints on current research and future developments will be given

Torna alla lista dei seminari archiviati

### SAM Based Analysis of the Impact of VAT Rate Cut on the Economic System of China after the Tax Reform

### Ma Kewei (Shanxi University of Finance and Economics)

China completed on May 2016 the tax reform, and from a bifurcated system based on business tax (BT) and Value Added Tax (VAT), moved to a VAT entirely based system. The effects of this reform are still under analysis, regarding both the alleviation of tax burden on industry and service sectors and the economic improvements. Currently, it seems there is a common agreement that, except some minor cases worth of further deepening, in both respects these effects are positive. It is interesting to analyze the impact that in a VAT based system regime would have a reduction of the VAT rate on some key industry. To our knowledge, no analysis in this direction has been performed so far. In this paper, we try to fill this gap and analyze the effect of VAT rate cuts on selected industries on the whole economic system using an impact multiplier model based on a purposively elaborated Social Accounting Matrix (SAM) 2015 for China, which also allows to conduct a preliminary analysis of the structure of China’s economic system. (Joint work with Guido Ferrari and Zichuan Mi)

Torna alla lista dei seminari archiviati

### Probabilistic Distance Algorithm: some recent developments in data clustering

### Francesco Palumbo (Università di Napoli Federico II)

Distance-based clustering, part of the not model based methods, optimize a global criterion based on the distance among clusters. The most widely known distance-based method is k-means clustering (MacQueen, 1967) and several extensions of this method have recently been proposed (Vichi and Kiers, 2001; Rocci et al., 2011; Timmerman et al., 2013). These extensions overcome issues arising from the correlation between variables. However, k-means clustering, and extensions thereof, can fail when clusters do not have a spherical shape and/or some extreme points are present, which tend to affect the group means. Iyigun (2007) and Ben-Israel and Iyigun (2008) propose a non-hierarchical distance-based clustering method, called probabilistic distance (PD) clustering, that overcomes these issues. Tortora et al. Tortora et al. (2016) propose a factor version of the method to deal with high-dimensional data. In PD-clustering, the number of clusters K is assumed to be a priori known, and a wide review on how to choose K can be found in. Given some random centres, the probability of any point belonging to a cluster is assumed to be inversely proportional to the distance from the centre of that cluster Iyigun (2007). The aim of the seminar is to illustrate the PD algorithm and some recent applications in the data clustering framework in the model-based (Rainey et al., 2017) and not model-based perspective. References

Torna alla lista dei seminari archiviati

### Bayesian Multilevel Latent Class Models for the Multiple Imputation of Nested Categorical Data

### Davide Vidotto (Tilburg University)

Multiple imputation of multilevel data (i.e., data collected from different groups) does not only require to take correlations among variables into account, but also to consider possible dependencies between units coming from the same group. While a number of imputation models have been proposed in the literature for continuous data, existing methods for multilevel categorical data, such as the JOMO imputation method, still have limitations. For instance, JOMO only considers pairwise relationships between variables, and uses default priors that can affect the quality of the imputations in case of small sample sizes. With the present work, we propose using Multilevel Latent Class models to perform multiple imputation of missing multilevel categorical data. The model is flexible enough to retrieve original (complex) associations of the variables at hand while respecting the data hierarchy. The model is implemented under a Bayesian framework and estimated via Gibbs sampling, a natural choice for multiple imputation applications. After formally introducing the model, we will show the results of a simulation study in which model performance is assessed, and compared with the listwise deletion and JOMO methods. Results indicate that the Bayesian Multilevel Latent Class model is able to recover unbiased and efficient parameter estimates of the analysis model considered in our study.

Torna alla lista dei seminari archiviati

### Resilient and Secure Cyber Physical Systems: Matching the Present and the Future!

### Andrea Bondavalli (DiMAI)

Un sistema cyber-fisico (Cyber Physical System) è un sistema in cui gli elementi computazionali interagiscono strettamente con le entità fisiche tramite sensori e attuatori, controllando così processi individuali, organizzativi o meccanici tramite l'utilizzo delle tecnologie dell'informazione e della comunicazione (computer, software e reti). Tali sistemi sono tipicamente automatizzati, intelligenti e collaborativi, e molti di essi richiedono elevati livelli di resilienza e di sicurezza che assicurino la sopravvivenza dei sistemi in presenza di anomalie casuali, attacchi deliberati e, in generale, eventi critici imprevisti.

Il seminario sarà organizzato in due parti:

- Nella prima parte saranno presentate le caratteristiche principali dei Cyber Physical System, soffermandosi in particolare sugli aspetti legati alla resilienza e sicurezza di tali sistemi, e saranno presentati esempi concreti di CPS in diversi domini applicativi, dall'Internet delle cose (IoT - Internet of Things), ai Sistemi di Sistemi, alle tematiche di Industria 4.0.

- Nella seconda parte sarà presentato il nuovo Curriculum in Resilient and Secure Cyber Physical Systems del Corso di Laurea Magistrale in Informatica, chiarendone gli aspetti caratterizzanti, gli obiettivi formativi e gli sbocchi professionali.

Torna alla lista dei seminari archiviati

### Stronger Instruments and Refined Covariate Balance in an Observational Study of the Effectiveness of Prompt Admission to the ICU

### Luke Keele (Georgetown University)

Instrumental Variable (IV) methods, subject to appropriate identification assumptions, allow for consistent estimation of causal effects in observational data in the presence of unobserved confounding. Near-far matching has been proposed as one analytic method to improve inference by strengthening the effect of the instrument on the exposure and balancing observable characteristics between groups of subjects with low and high values of the instrument. However, in settings with hierarchical data (e.g. patients nested within hospitals), or where several covariate interactions must be balanced, conventional near-far matching algorithms may fail to achieve the requisite covariate balance. We develop a new matching algorithm, that combines near-far matching with refined covariate balance, to balance large numbers of nominal covariates while also strengthening the IV. This extension of near-far matching is motivated by a UK case study that aims to identify the causal effect of prompt admission to the Intensive Care Unit on 7-day and 28-day mortality.

Torna alla lista dei seminari archiviati

### Assessing the Efficacy of Intrapartum Antibiotic Prophylaxis for Prevention of Early-Onset Group B Streptococcal Disease through Propensity Score Design

### Elizabeth R. Zell (Centers for Disease Control and Prevention)

Observational data can assist in answering hard questions about disease prevention. Early-onset neonatal group B streptococcal disease (EOGBS) can be prevented by intrapartum antibiotic prophylaxis (IAP). Clinical trials demonstrated efficacy of beta-lactam agents for a narrow population of women. Questions about effectiveness of antibiotic durations <4 hours, and agents appropriate for penicillin allergic women remain. We applied propensity score design methods to sample survey data on EOGBS cases and over 7000 non-cases from ten US states in 2003-2004, to match infants exposed to IAP with infants who were not exposed. Antibiotic efficacy was estimated for different antibiotic classes and durations before delivery. Our analysis supports the recommendation that Beta-lactam intrapartum prophylaxis of at least four hours before delivery remains the primary treatment; less than four hours and clindamycin prophylaxis are not as effective in preventing EOGBS.

Torna alla lista dei seminari archiviati

### Spatial Chaining of Price Indexes to Improve International Comparisons of Prices and Real Incomes

### D.S. Prasada Rao (School of Economics, The University of Queensland, Brisbane, Australia )

The International Comparisons Program (ICP) compares the purchasing power of currencies and real income of almost all countries in the world. An ICP multilateral comparison uses as building blocks bilateral comparisons between all possible pairs of countries. These are then combined to obtain the overall global comparison. One problem with this approach is that some of the bilateral comparisons are typically of lower quality, and their inclusion therefore undermines the integrity of the multilateral comparison. Formulating multilateral comparisons as a graph theory problem, we show how quality can be improved by replacing bilateral comparisons with their shortest path spatially chained equivalents. We consider a number of different ways in which this can be done, and illustrate these methods using data from the 2011 round of ICP. We then propose criteria for comparing the performance of competing multilateral methods, and using these criteria demonstrate how spatial chaining improves the quality of the overall global comparison.

Torna alla lista dei seminari archiviati

### Netflix e Deep Learning

### P. Crescenzi (Università di Firenze)

Seminario divulgativo su alcuni temi "caldi" dell'informatica:

• Come, tra decine di migliaia di film, qualcuno ci possa suggerire cosa vedere. E ci azzecca. Ovvero la magia dei sistemi di raccomandazione.

• Come si sia studiato il funzionamento dei neuroni del cervello umano per simularne il comportamento mediante le reti neurali e come queste reti, previo addestramento, possano comportarsi in modo intelligente. Ovvero la magia dell'intelligenza artificiale.

Torna alla lista dei seminari archiviati

### Exact P-values for Network Interference

### Guido Imbens (Stanford)

We study the calculation of exact p-values for a large class of non-sharp null hypotheses about treatment effects in a setting with data from experiments involving members of a single connected network. The class includes null hypotheses that limit the effect of one unit's treatment status on another according to the distance between units; for example, the hypothesis might specify that the treatment status of immediate neighbors has no effect, or that units more than two edges away have no effect. We also consider hypotheses concerning the validity of sparsification of a network (for example based on the strength of ties) and hypotheses restricting heterogeneity in peer effects (so that, for example, only the number or fraction treated among neighboring units matters). Our general approach is to define an artificial experiment, such that the null hypothesis that was not sharp for the original experiment is sharp for the artificial experiment, and such that the randomization analysis for the artificial experiment is validated by the design of the original experiment. (with Susan Athey and Dean Eckles)

Torna alla lista dei seminari archiviati

### Introducing AWMA and its activities to promote mathematics among African women

### Kifle Yirgalem Tsegaye (Department of Mathematics, Addis Ababa University, Ethiopia)

Archeological findings and their interpretations persuade us to think that Africa is not only the cradle of human kind, but along with ancient Mesopotamia and others, is also a birth place of mathematical and technological ideas. It is natural to ask - what happened to civilization, mathematics and technological sciences in contemporary Africa? A million-euro question, though this talk is about what African women in mathematics are doing to change, at least to improve their realities. Just like women around the world, African women, had always been denied access to important components of development like education, in particular mathematics and technological sciences, with excuses like - "these are strictly meant for men" kind of impositions and restrictions. Many had no choices but believe in it and collaborate in making only "good wives" or "attractive women" out of themselves. Things are changing in this regard, but not strong enough to liberate African women out of the stereotypical depiction of themselves and cope up in the world of science. It is believed that role models in every profession are important to convince youngsters that it is possible for them to be whoever they want. Having nation-wide, continent-wide or world-wide strong networks of women in the fields of mathematics and Technological Sciences, enables us destroy such stereotypes and open the gate wide enough for our girls to dance with all sciences, most of all in the fields of math and technological sciences. The talk is a brief introduction of: - African Women in Mathematics Association (AWMA) and its activities; - Ethiopian girls in the fields of Science, Mathematics, Engineering and Technology.

Torna alla lista dei seminari archiviati

### Le problematiche dei rischi estremi: modelli teorici e strumenti finanziario-attuariali

### Marcello Galeotti (University of Florence)

1. Una definizione dinamica di rischio economico. Misure di rischio: VaR e Expected Shortfall. Il problema del calcolo delle misure di rischio. 2. La teoria dei valori estremi. Distribuzioni light e heavy tail. Tasso d’azzardo. Fluttuazioni di somme e massimi. L’eccesso medio. Distribuzione generalizzata di Pareto. 3. Strumenti finanziari innovativi per la gestione di rischi ambientali. Opzioni di progetto e Catastrophic bonds. Dinamiche interattive: possibilità di esiti “virtuosi” e di equilibri sub-ottimali. Un caso di studio: rischi di esondazione del fiume Arno nell’area e nella città di Firenze. 4. Un modello evolutivo per i rischi sanitari. Medicina difensiva, assicurazioni sanitarie, azioni legali. Un modello di gioco evolutivo. Il ruolo dei premi assicurativi. Comportamenti asintotici dipendenti dai principi di calcolo del premio.

Torna alla lista dei seminari archiviati

### Endogenous Significance Levels in Finance and Economics

### Alessandro Palandri (DISiA - University of Florence)

This paper argues that rational agents who do not know the model's parameters and have to proceed to their estimation will treat the significance level as a choice variable. Calculating costs associated to errors of Types I and II, rational agents will choose significance levels that maximize expected utility. The misalignment of the investigators' standard statistical significance levels to those that are optimal for agents has profound implications for the empirical tests of the model itself. Specifically, empirical studies could reject models in terms of statistically significant misprices when in fact the models are true: the misprices are not significant for the agents, no expected prontable intervention, no force to bring the system back to equilibrium.

Torna alla lista dei seminari archiviati

### Bayesian methods in Biostatistics

### Francesco Stingo (University of Florence)

In this talk I will review some Bayesian methods for bio-medical applications I have developed in the recent years. These methods can be classified in 4 research areas: 1) graphical models for complex biological networks, 2) hierarchical models for data integration (integromics and imaging genetics), 3) power-prior approaches for personalized medicine, 4) change-point models for cancer early detection.

Torna alla lista dei seminari archiviati

### Validity of case-finding algorithms for diseases in database and multi-database studies

### Rosa Gini (Agenzia Regionale di Sanità della Toscana, Firenze)

In database studies in pharmacoepidemiology and health services research, variables that identify a disease are derived from existing data sources, by mean of data processing. The ‘true’ variables should be conceptualized as unobserved quantities, and the study variables entering the actual analysis as measurements, resulting from case-finding algorithms (CFA) applied to the original data. The validity of a CFA is the difference between its result and the true variable. The science of estimating the validity of CFAs is in development. Stemming from the methodology of validation of diagnostic algorithms, it nevertheless has specific hurdles and opportunities. In this talk we will introduce the concept of *component CFA*. We will show the results from a large validation study of CFAs for type 2 diabetes, hypertension, and ischaemic heart disease in Italian administrative databases, using primary care medical records as a gold standard, and propose more research to generalize and apply its results. We will then show how the component analysis may support the estimation of validity of CFAs in multi-database, multi-national studies.

Torna alla lista dei seminari archiviati

### Introduction to Riordan graphs

### Gi-Sang Cheon (Department of Mathematics, Sungkyunkwan University (Korea))

There are many reasons to define new classes of graphs. More generally, considering the n*n symmetric Riordan matrix in modulo 2, we define Riordan graph RG(n) of order n. In this talk, we study some basic properties of Riordan graphs such as the number of edges, the degree sequence, the matching number, the clique number, the independence number and so on. Moreover, several examples of Riordan graphs are given. They include Pascal graphs, Catalan graphs, Fibonacci graphs and many others.

Torna alla lista dei seminari archiviati

### Pascal Graphs and related open problems

### Gi-Sang Cheon (Department of Mathematics, Sungkyunkwan University (Korea))

In 1983, Deo and Quinn introduced Pascal graphs in their searching for a class of graphs with certain desired properties to be used as computer networks. One of the desired properties is that the design be simple and recursive so that when a new node is added, the entire network does not have to be reconfigured. Another property is that one central vertex be adjacent to all others. The third requirement is that there exist several paths between each pair of vertices (for reliability) and that some of these paths be of short lengths (to reduce communication delays). Finally, the graphs should have good cohesion and connectivity. A Pascal matrix PM(n) of order n is defined to be an n*n symmetric binary matrix where the main diagonal entries are all 0's and the lower triangular part of the matrix consists of the first n-1 rows of Pascal's triangle modulo 2. The graph corresponding to the adjacency matrix PM(n) is called the Pascal graph of order n.

Torna alla lista dei seminari archiviati

### Uncertainty in Propensity Score Estimation: Bayesian Methods for Variable Selection and Model-Averaged Causal Effects.

### Corwin M. Zigler (Harvard University)

Causal inference with observational data frequently relies on the notion of the propensity score (PS) to adjust treatment comparisons for observed confounding factors. As comparative effectiveness research in the era of "big data" increasingly relies on large and complex collections of administrative resources, researchers are frequently confronted with decisions regarding which of a high-dimensional covariate set to include in the PS model in order to satisfy the assumptions necessary for estimating average causal effects. Typically, simple or ad-hoc methods are employed to arrive at a single PS model, without acknowledging the uncertainty associated with the model selection. We propose Bayesian methods for PS variable selection and model averaging that 1) select relevant variables from a set of candidate variables to include in the PS model and 2) estimate causal treatment effects as weighted averages of estimates under different PS models. The associated weight for each PS model reflects the data-driven support for that model’s ability to adjust for the necessary variables. We illustrate features of our proposed approaches with a simulation study, and ultimately use our methods to compare the effectiveness of treatments for brain tumors among Medicare beneficiaries.

Torna alla lista dei seminari archiviati

### Bayesian Effect Estimation Accounting for Adjustment Uncertainty

### Corwin M. Zigler (Harvard University)

Model-based estimation of the effect of an exposure on an outcome is generally sensitive to the choice of which confounding factors are included in the model. We propose a new approach, which we call Bayesian adjustment for confounding (BAC), to estimate the effect of an exposure of interest on the outcome, while accounting for the uncertainty in the choice of confounders. Our approach is based on specifying two models: (1) the outcome as a function of the exposure and the potential confounders (the outcome model); and (2) the exposure as a function of the potential confounders (the exposure model). We consider Bayesian variable selection on both models and link the two by introducing a dependence parameter denoting the prior odds of including a predictor in the outcome model, given that the same predictor is in the exposure model. In the absence of dependence, BAC reduces to traditional Bayesian model averaging (BMA). In simulation studies, we show that BAC with dependence can estimate the exposure effect with smaller bias than traditional BMA, and improved coverage. We compare BAC with other methods, including traditional BMA, in a time series data set of hospital admissions, air pollution levels, and weather variables in Nassau, NY for the period 1999–2005. Using each approach, we estimate the short-term effects of PM2.5 on emergency admissions for cardiovascular diseases, accounting for confounding. This application illustrates the potentially significant pitfalls of misusing variable selection methods in the context of adjustment uncertainty.

Torna alla lista dei seminari archiviati

### La prova statistica nel processo penale

### Benito Vittorio Frosini (Università Cattolica del Sacro Cuore)

Il contenuto del seminario sulla prova statistica nei processi penali costituisce una introduzione ai problemi che si incontrano usualmente nella trattazione (e nella letteratura) della c.d. Statistica Forense, con particolare riguardo ai processi penali; in realtà, anche nei processi civili si incontrano problematiche del tutto analoghe, che però ricevono meno attenzione nella letteratura specialistica. Le regole da seguire nella valutazione delle prove portate in un processo, e in particolare delle prove scientifiche e delle prove statistiche, sono previste praticamente in tutte le legislazioni; una esposizione abbastanza chiara e analitica è quella contenuta nelle c.d. Federal Rules of Evidence degli Stati Uniti, che giustamente non fanno distinzione fra processo civile e processo penale. Il modo generalmente raccomandato, in presenza di un evento E (evidenza processuale) che può essere prodotto da più cause, alle quali sono associabili probabilità (a priori), è il ricorso alla formula di Bayes, che trasforma le verosimiglianze (del tipo P(E|C)) nelle probabilità a posteriori delle possibili cause (del tipo P(C|E)). La correttezza di questo approccio implica una valutazione del tutto negativa della c.d. "fallacia del condizionale trasposto", che consiste nell'impiego di probabilità del tipo P(E|C) - quando sono molto piccole - in luogo delle probabilità P(C|E), come purtroppo è accaduto in vari processi. Viene inoltre discusso il problema dell'impiego delle c.d. "statistiche nude", e più in generale della c.d. "inferenza statistica nuda", consistente nell'impiego - nel particolare esempio discusso in un processo - di statistiche derivanti da una popolazione più o meno ampia, che contiene il caso processuale in discussione. Esempi rilevanti in questa discussione, analiticamente considerati, sono stati i processi concernenti Alfred Dreyfus (in Francia), Sacco e Vanzetti (negli Stati Uniti), Chedzey (Australia), Sally Clark (nel Regno Unito). Vengono anche esaminati alcuni tipi di processi (civili e penali) in cui l'approccio bayesiano è praticamente assente nella letteratura, mentre vengono usualmente impiegati i c.d. tests di significatività (Fisher-Neyman-Pearson).

Torna alla lista dei seminari archiviati

### A Practical Framework for Specification, Analysis and Enforcement of Attribute-based Access Control Policies

### Andrea Margheri (University of Southampton)

Access control systems are widely used means for the protection of computing systems. They may take several forms, use different technologies and involve varying degrees of complexity. In this seminar, we present a fully-implemented Java-based framework for the specification, analysis and enforcement of attribute-based access control policies. The framework rests on FACPL, a formal language with a compact, yet expressive, syntax that permits expressing real-world access control policies. The analysis functionalities exploit a state-of-the-art SMT-based approach and support the automatic verifications of many properties of interest. We introduce FACPL and its supporting functionalities by means of a real-world case study from an e-Health application domain.

Torna alla lista dei seminari archiviati

### Reasoning about the trade-off between security and performance

### Boris Köpf (IMDEA Madrid)

Today's software systems employ a wide variety of techniques for minimizing the use of resources such as time, memory, and energy. While these techniques are indispensable for achieving competitive performance, they can pose a serious threat to security: By reducing the resource consumption on average (but not in the worst case), they introduce variations that can be exploited by adversaries for recovering private information about users, or even cryptographic keys. In this talk I will give examples of attacks against a number of performance-enhancing features of software and hardware, and I will present ongoing work on techniques for quantifying the resulting threat and for choosing the most cost-effective defense.

Torna alla lista dei seminari archiviati

### Sensitivity Analysis for Differences-in-Differences

### Luke Keele (Dept. of Political Science Penn State University)

In the estimation of causal effects with observational data, applied analysts often use the differences-in-differences (DID) method. The method is widely used since the needed before and after comparison of a treated and control group is a common situation in the social sciences. Researchers use this method since it protects against a specific form of unobserved confounding. Here, we develop a set of tools to allow analysts to better utilize the method of DID. First, we articulate the hypothetical experiment that DID seeks to replicate. Next, we develop form of matching that allows for covariate adjustment in under the DID identification strategy that is consistent with the hypothetical experiment. We also develop a set of confirmatory tests that should hold if DID is a valid identification strategy. Finally, we adapt a well known method of sensitivity analysis for hidden confounding to the DID method. We develop these sensitivity analysis methods for both binary and continuous outcomes. We show that DID is in a very real sense more sensitive to hidden confounders than a standard selection on observables identification strategy. We then apply our methods to three different empirical examples from the social sciences.

Torna alla lista dei seminari archiviati

### Sharing on recent collaborative research work between MSE of Wuhan University and SEEM of City University of Hong Kong in Remanufacturing System and Healthcare Management.

### Xian-Kia Wang & Kwai-Sang Chin (Wuhan University, China & University of Hong Kong)

Competitive Strategy in Remanufacturing and the Effects of Government Subsidy This presentation aims to report the research ideas and initial result of a national research project in China about the effective remanufacturing systems and its integrated management. We consider a single-period model comprised of an original equipment manufacturer (OEM) who produces only new products and a remanufacturer who collects used products from consumers and produces remanufactured products. The OEM and the remanufacturer compete in the product market. We examine the effects of government subsidy as a means to promote remanufacturing activity, particularly with subsidy to remanufacturer and consumers respectively. It is found that the government subsidy to both remanufacturer and consumer increases remanufacturing activity, while the subsidy to remanufacturer shows better result. It is because the subsidy to remanufacturer is resulted in lower price of remanufactured products, thus leading to higher consumer surplus and social welfare. (b) Emergency Healthcare Operations Management This presentation aims to report partial outcome of a research project entitled “Delivering 21st Century Healthcare in Hong Kong—Building a Quality-and-Efficiency Driven System”, particularly in the Emergency Department of a hospital. Emergency Department (ED) is a pivotal component in the social healthcare system, particularly in a highly dense city like Hong Kong. The ED management in Hong Kong has been long challenged by the high patient demands, manpower shortage, and unbalanced ED staff utilization. These problems, if not properly addressed, will be creating negative impact on the patient experience as well as staff morale. Based on the support of an ED of a main public hospital in Hong Kong, we have studied the patient arrival pattern and developed a forecasting and simulation model for ED managers to evaluate the current ED performance and suggest alternative ED operations for the ever-changing ED environment and patient demand.

Torna alla lista dei seminari archiviati

### Sensitivity analysis with unmeasured confounding

### Peng Ding (Dept. of Statistics, University of California Berkeley)

- Part I: Cornfield’s inequalities and extensions to unmeasured confounding in observational studies

- Part II: Cornfield’s inequalities and Rosenbaum’s design sensitivity in causal inference with observational studies

Torna alla lista dei seminari archiviati

### Randomization inference for treatment effect variation

### Peng Ding (Dept. of Statistics, University of California Berkeley)

- Part I: Decomposing treatment effect variation in randomized experiments with and without noncompliance

- Part II: Randomization-based tests for idiosyncratic treatment effect variation

Torna alla lista dei seminari archiviati

### Regional poverty measurement in a border region

### Ralf Münnich (Universität Trier)

Statistical outcome from National Statistical Institutes is generally produced using the national data available. For regional policies, however, not solely the information on the subregion might be important but also surrounding information. In general, this is solved using spatial modelling. In case of regions at national borders, the situation becomes more difficult since neighbouring information from other countries, in general, is not available. The presentation focuses on regional poverty measurement using the ARPR in the federal states Rhineland-Palatinate and Saarland on LAU1. Based on an area-level small area model, several approaches to handle the border statistics problem will be presented and discussed.

Torna alla lista dei seminari archiviati

### A multilevel Heckman model to investigate financial assets among old people in Europe

### Omar Paccagnella (Department of Statistical Sciences, University of Padua)

Heckman sample selection models are often applied when the variable of interest is observed or recorded only if a (selection) condition applies. This may occur because of unit or item (survey) non-responses or because a specific product or condition is owned by a subsample of units and unobservable components affecting the inclusion in the subsample are correlated with unobservable factors influencing the variable of interest. The latter case may be well represented by the amount invested in financial and/or real assets: it may be observed and studied only if the unit has included that product in its own portfolio. Household portfolios have been widely investigated, particularly among the old people. Indeed, as earnings of the older people typically reflect income pensions, consumption in later life can be supported by spending down financial or real assets. Therefore, wealth is a key measure of the individual socio-economic status in an ageing society. On the one hand, this topic may be addressed to investigate the ownership patterns of financial or real assets and/or their invested amount. On the other hand, the interest may be focused on cross-country comparisons of ownership/amount of the assets. This paper aims at shedding more light on the behaviour of households across Europe by a joint study of their ownership patterns and invested amount in some financial assets (i.e. savings for short-term or long-term investments) among the old population. To this aim, in order to take into account the hierarchical nature of the data (households nested into countries) and the features of the variable of interest (the amount invested in a financial product is observed only if the household owns that asset), a multilevel Heckman model is the suggested methodological solution for the analysis. Such model is applied to data from SHARE (Survey of Health, Ageing and Retirement in Europe), which is an international survey on ageing, collecting detailed information on the socio-economic status of the European old population.

Torna alla lista dei seminari archiviati

### Women’s mid-life ageing in low and middle income countries

### Tiziana Leone (London School of Economics)

To date there is little evidence on how the health needs and their ageing process of women aged 45-65 in Low and Middle Income Countries (LMICs), in particular when compared to other age groups as well as men. This is particularly significant given that this is the period when menopause occurs and the cumulative effects of multiple births or birth injuries can cause health problems across women’s life course. The aim of the project is to analyse inequalities across socio-economic groups in the ageing process of women between the ages of 45-65 and compare it to that of men within and between countries. Using data from wave 1 of SAGE in 4 countries (Ghana, Mexico, Russia and India) as well as longitudinal data from the Indonesian Longitudinal Family Survey, this study looks at the age pattern of physical and mental decline looking at objective measures of health such as grip strength, cognitive functions and walking speed as well as chronic diseases. Ultimately the study aims to understand what policy implications there are for the pace of ageing in LMICs. Results show shows a clear pattern of deterioriation of health in the middle age group significantly different from men who show a more linear pattern. This pattern is particularly prominent when we look at physical health rather than mental health. The study highlights the importance to shed more light into this field as women have health care needs beyond their reproductive life which are potentially neglected.

Torna alla lista dei seminari archiviati

### On Latent Change Model Choice in Longitudinal Studies

### Tenko Raykov (Michigan State University)

This talk is concerned with two helpful aids in the process of choosing between models of change in repeated measure investigations in the behavioral and social sciences: (i) interval estimates of proportions explained variance in longitudinally followed variables, and (ii) individual case residuals associated with these variables. The discussed method allows obtaining confidence intervals for the R-squared indices of repeatedly administered measures, as well as subject-specific discrepancies between model predictions and raw data on the observed variables. In addition to facilitating evaluation of local model fit, the approach is useful for the purpose of differentiating between plausible models stipulating different patterns of change over time. This feature of the described method becomes particularly helpful in empirical situations characterized by (very) large samples and high statistical power, which are becoming increasingly more frequent in complex sample design studies in the behavioral, health, and social sciences. The approach is similarly applicable in cross-sectional investigations, as well as with general structural equation models, and extends the set of means available to substantive researchers and methodologists for model fit evaluation beyond the traditionally used overall goodness of fit indexes. The discussed method is illustrated using data from a nationally representative study of older adults.

Torna alla lista dei seminari archiviati

### Equivalence relations for ordinary differential equations

### Mirco Tribastone (IMT - Lucca )

Ordinary differential equations (ODEs) are the primary means to modeling systems in a wide range of natural and engineering sciences. When the inherent complexity of the system under consideration is high, the number of equations required poses a serious detriment to our capability of performing effective analyses. This has motivated a large body of research, across many disciplines, into abstraction techniques that provide smaller ODE systems preserving the original dynamics in some appropriate sense. In this talk I will review some recent results that look at this problem from a computer-science perspective. I will present behavioral equivalences for nonlinear ODEs based on the established notion of bisimulation. These induce a quotienting of the set of variables of the original system such that in a self-consistent reduced ODE system each macro-variable represents the dynamics of an equivalence class. For a rather general class of nonlinear ODEs, computing the largest equivalences can be done using a symbolic partition-refinement algorithm that exploits an encoding into a satisfiability (modulo theories) problem. For ODEs with polynomial derivatives of degree at most two, covering the important cases of affine systems and chemical reaction networks, the partition refinement is based on Paige and Tarjan’s seminal solution to the coarsest refinement problem. This gives an efficient algorithm that runs in O(m n logn) time, where m is the number of monomials and n is the number of ODE variables. I will present numerical evidence using ERODE (http://sysma.imtlucca.it/tools/erode/), an ongoing project that implements such reductions algorithms, showing that our bisimulations can be effective in realistic models from chemistry, systems biology, electrical engineering, and structural mechanics. This is joint work with Luca Cardelli, Max Tschaikowski, and Andrea Vandin.

Torna alla lista dei seminari archiviati

### The role of active monitoring in reducing the execution times of public works. A Regression Discontinuity approach

### Marco Mariani (IRPET)

The economic literature on the procurement of public works usually pays great attention to the stages of auctions and contract formation, much less to the later stages of contract enforcement where public buyers are expected to combat delays and cost escalations by monitoring executors. The opportunity to appraise the role of monitoring is offered to us by an Italian regional case study, where a local law obliges the regional government to assist smaller buyers in performing tighter-than-usual monitoring on projects that surpass a certain threshold in terms of financial size and receive co-financing by the regional government above a certain level. In order to estimate the causal effect of this higher level of monitoring on the time-to-completion of public works we resort to a sharp regression discontinuity approach, made unusual by the presence of multiple forcing variables and transposed in a discrete-time survival analysis setting that allows for non-proportional hazards. Cross-validation procedures for bandwidth selection, as well as the usual robustness and sensitivity checks of RDDs, are extended to a setting with multiple forcing variables. Results of local estimations show that tighter-than-usual monitoring speeds up either those projects that would have not lasted long anyway or, more interestingly, very persistent projects. (Joint work with G.F. Gori and P. Lattarulo, IRPET)

Torna alla lista dei seminari archiviati

### The role of stage at diagnosis in colorectal cancer racial/ethnic survival disparities: a causal inference perspective

### Linda Valeri (Harvard Medical School)

Disparities in colorectal cancer survival and stage at diagnosis between White and Black patients are widely documented, whereby Black patients are more likely diagnosed at advanced stage and have poorer prognosis. Interest lies in understanding the importance of stage at diagnosis in explaining survival disparities. To this aim we propose to quantify the extent to which racial/ethnic survival disparities would be reduced if disparities in stage at diagnosis were eliminated. In particular, we develop a causal inference approach to assess the impact of a hypothetical shift in the distribution of stage at diagnosis in the black population to match that of the white population. We further develop sensitivity analysis techniques to assess the robustness of our results to the violation of the no-unmeasured confounding assumption and to selection bias due to stage at diagnosis missing not-at-random. Our results support the hypothesis that elimination of disparities in stage at diagnosis would contribute to the reduction in racial survival disparities in colorectal cancer. Important heterogeneities across the patients’ characteristics were observed and our approach easily accommodates for these features. This work illustrates how a causal inference perspective aids in identifying and formalizing relevant hypotheses in health disparities research that can inform policy decisions.

Torna alla lista dei seminari archiviati

### Some Perspectives about Generalized Linear Modeling

### Alan Agresti (University of Florida)

This talk discusses several topics pertaining to generalized linear modeling. With focus on categorical data, the topics include (1) bias in using ordinary linear models with ordinal categorical response data, (2) interpreting effects with nonlinear link functions, (3) cautions in using Wald inference (tests and confidence intervals) when effects are large or near the boundary of the parameter space, (4) the behavior and choice of residuals for GLMs, (5) an improved way to use the generalized estimating equations (GEE) method for marginal modeling of a multinomial response, and (6) modeling nonnegative zero-inflated responses. I will present few new research results, but these topics got my attention while I was writing the book `Foundations of Linear and Generalized Linear Models,' recently published by Wiley.

Torna alla lista dei seminari archiviati

### Optimal Tests of Treatment Effects for the Overall Population and Two Subpopulations in Randomized Trials, using Sparse Linear Programming

### Michael Rosenblum (Johns Hopkins Bloomberg School of Public Health)

We propose new, optimal methods for analyzing randomized trials, when it is suspected that treatment effects may differ in two predefined subpopulations. Such subpopulations could be defined by a biomarker or risk factor measured at baseline. The goal is to simultaneously learn which subpopulations benefit from an experimental treatment, while providing strong control of the familywise Type I error rate. We formalize this as a multiple testing problem and show it is computationally infeasible to solve using existing techniques. Our solution involves a novel approach, in which we first transform the original multiple testing problem into a large, sparse linear program. We then solve this problem using advanced optimization techniques. This general method can solve a variety of multiple testing problems and decision theory problems related to optimal trial design, for which no solution was previously available. In particular, we construct new multiple testing procedures that satisfy minimax and Bayes optimality criteria. For a given optimality criterion, our new approach yields the optimal tradeoff between power to detect an effect in the overall population versus power to detect effects in subpopulations. We demonstrate our approach in examples motivated by two randomized trials of new treatments for HIV.

Torna alla lista dei seminari archiviati

### Probabilistic Model Checking

### Arpit Sharma (DiSIA)

Model checking is an automated veri?cation method guaranteeing that a mathematical model of a system satis?es a formally described property. It can be used to assess both qualitative and quantitative properties of complex software and hardware systems. This seminar is going to present the basic idea behind the verification of discrete-time Markov chains (DTMCs) against linear temporal logic specifications (LTL). We are also going to discuss some challenging directions for future research.

Torna alla lista dei seminari archiviati

### Modelling systemic risk in financial markets

### Andrea Ugolini (DiSIA)

Three recent crises — the dot-com bubble and the subprime and European sovereign debt crises — have revealed the complex dynamics underpinning the global financial system and how rapidly risk is propagated across markets. Investors, regulators and researchers are thus keen to develop accurate measures of risk transmission between assets and markets. From the investors’ perspective of guaranteeing efficient portfolio diversification, quantify the risk of contagion is essential for their ongoing interest in changes in market linkages. From the regulators’ point of view, assess risk spillover is important to focalize attention on the maintenance and development of new financial regulatory and institutional rules such as circuit breakers, transaction taxes and short-sale rules. In fact, recognizing important shortcomings in financial supervision, the European Commission and European Central Bank (ECB) created the European Systemic Risk Board (ESRB) at the end of 2010 with the goal of monitoring, at the macro-prudential level, the European financial system and preventing and mitigating any propagation risk within the financial system. The literature contains many definitions of systemic risk. De Bandt and Hartmann (2000) defined it “as the risk of experiencing systemic events in the strong sense” where “strong sense” signifies the spread of news about an institution that has an adverse impact on one or more healthy institutions in a sequential manner. Billio et al. (2012) explained that “systemic risk can be realized as a series of correlated defaults among financial institutions, occurring over a short time span and triggering a withdrawal of liquidity and widespread loss of confidence in the financial system as a whole”. The above brief list of definitions points to the intricacy of the topic and the challenge faced by investors, regulators and researchers in attempting to measure the complexity and dynamics of systemic risk. Using the conditional Value-at-Risk (CoVaR) systemic risk measure (Adrian and Brunnermeier, 2011; Girardi and Ergün, 2013), we quantified systemic risk as the impact of the risky situation of a particular financial institution, market or system on the value-at-risk (VaR) of other financial institutions, markets or systems. We introduced a novel copula and vine copula approach to computing the CoVaR value, given that copula are flexible modellers of joint distribution and are particularly useful for characterizing the tail behaviour that provides such crucial information for the CoVaR.

Torna alla lista dei seminari archiviati

### Role of scientists to defend the integrity of science and public health policy

### Kathleen Ruff (International Joint Policy Committee of the Societies of Epidemiology, IJPC-SE)

When the scientific evidence is distorted so as to serve vested interests and to endanger public health, what responsibility do scientists and scientific organisations bear? Those most at risk of harm from hazardous substances are frequently those segments of the population who lack economic and political influence and thus have little ability to demand the necessary health protections. Scientists have the credibility to demand evidence-based public health policy that protects the right to health of all citizens.

Torna alla lista dei seminari archiviati

### La governance del rischio e le procedure di VIS in Italia

### Giancarlo Sturloni (International School for Advanced Studies, SISSA, Trieste)

Health Impact Assessment (HIA) is an important tool to defend local communities from health and environmental risks. Nevertheless, to keep its promise HIA should be carried on within an open, well-informed, inclusive and dialogical approach. Risk governance could help HIA in supporting the dialogue between the stakeholders and the open access to scientific information, so as to promote participative decision making processes.

Torna alla lista dei seminari archiviati

### Statistiche ufficiali, social media e big data

### Stefano Iacus (Università degli Studi di Milano)

In questo seminario presenteremo alcuni recenti sviluppi dell'analisi di big data provenienti dai social media. Tenteremo anche di mostrare quali siano le possibili interazioni tra l'analisi di dati social e quelle delle tecniche di survey tradizionali evidenziando limiti e promesse di alcuni esperimenti. A titolo di esempio mostreremo due applicazioni recenti: #iHappy, ovvero l'indice di Twitter-Happyness (http://www.blogsvoices.unimi.it) e #WNI, il Wired Next Index (http://index.wired.it), un indicatore che cerca di catturare la capacità di ripresa del paese. In particolare, il #WNI, è la sintesi tra i risultati delle statistiche ufficiali in materia di economia e benessere e i risultati di tre indicatori "social" di fiducia personale, economica e politica, e sembra essere un buon strumento di nowcasting di ciò che pretende di misurare.

Torna alla lista dei seminari archiviati

### Quantile regression coefficients modeling

### Paolo Frumento (Karolinska Institutet - Unit of Biostatistics)

Estimating conditional quantiles is of interest in many research areas, and quantile regression is foremost among the utilized methods. The coefficients of a quantile regression model depend on the order of the quantile being estimated. We present an approach to modeling the regression coefficients as parametric functions of the order of the quantile. This approach may have advantages in terms of parsimony, efficiency, and may expand the potential of statistical modeling. We describe a possible application of this method and its implementation in the qrcm R package.

Torna alla lista dei seminari archiviati

### Theory and Applications of Proper Scoring Rules

### Monica Musio (Universita' degli Studi di Cagliari)

Suppose you have to quote a probability distribution Q for an unkown quantity X. When the value x of X is eventually observed, you will be penalised by an amount S(x;Q). The function S is a proper scoring rule if, when you believe X has distribution P, you expected penalty S(P;Q) is minimised by the honest quote Q = P. Proper scoring rules can be used, as above, to motivate you to assess your true uncertainty honestly, as well as to measure the quality of your past probability forecasts in the light of the actual outcomes. They also have many other statistical applications. We will discuss some characterisations, properties and specialisations of proper scoring rules, and describe some of their uses, including robust estimation and Bayesian model selection.

Torna alla lista dei seminari archiviati

### Methods for Scalable and Robust High Dimensional Graphical Model Selection

### Bala Rajaratnam (Stanford University)

Learning high dimensional graphical models is a topic of contemporary interest. A popular approach is to use L1 regularization methods to induce sparsity in the inverse covariance estimator, leading to sparse partial covariance/correlation graphs. Such approaches can be grouped into two classes: (1) regularized likelihood methods and (2) regularized regression-based, or pseudo-likelihood, methods. Regression based methods have the distinct advantage that they do not explicitly assume Gaussianity. One gap in the area is that none of the popular methods proposed for solving regression based objective functions have provable convergence guarantees. Hence it is not clear if resulting estimators actually yield correct partial correlation/partial covariance graphs. To this end, we propose a new regression based graphical model selection method that is both tractable and has provable convergence guarantees. In addition we also demonstrate that our approach yields estimators that have good large sample properties. The methodology is illustrated on both real and simulated data. We also present a novel unifying framework that places various pseudo-likelihood graphical model selection methods as special cases of a more general formulation, leading to important insights. (Joint work with S. Oh and K. Khare)

Torna alla lista dei seminari archiviati

### Synergy, suppression and immorality

### Joe Whittaker (Lancaster University)

Give a background on suppression in regression that begins with Horst (1941), and has generated a plethora of different types. Introduce forward differences of the entropy and define synergy in terms of explained information. Characterise and generalise suppression in terms of synergy. Specialise this result to correlation matrices. Relate this to immorality via conditional synergy. Give an empirical example and some small examples from graphical models. Make some concluding remarks.

Torna alla lista dei seminari archiviati

### Threshold free estimation of functional antibody titers for a group B Streptococcus opsonophagocytic killing assay

### Luca Moraschini (Università Milano Bicocca and Novartis Vaccines Siena)

Opsonophagocytic killing assays (OPKA) are routinely used for the quantification of bactericidal antibodies against Gram-positive bacteria in clinical trial samples. The OPKA readout, the titer, is traditionally estimated using non-linear dose-response regressions as the highest serum dilution yielding a predefined threshold level of bacterial killing. Therefore, these titers depend on a specific killing threshold value and on a specific dose-response model. This talk describes a novel OPKA titer definition, the threshold free titer, which preserves biological interpretability whilst not depending on any killing threshold. First, a model-free version of this titer is briefly presented ad shown to be more precise than the traditional threshold-based titers when using simulated and experimental group B Streptococcus (GBS) OPKA experimental data. Second, a model-based threshold-free titer is introduced to automatically take into account the potential saturation of the OPKA killing curve. The posterior distributions of threshold-based and threshold-free titers is derived for each analysed sample using importance sampling embedded within a Markov chain Monte Carlo sampler of the coefficients of a 4PL logistic dose-response model. The posterior precision of threshold-free titers is again shown to be higher than that of threshold-based titers. The biological interpretability and operational characteristics demonstrated here indicate that threshold-free titers can substantially improve the routine analysis of OPKA experimental and clinical data.

Torna alla lista dei seminari archiviati

### A partial survey on model selection

### Yosef Rinott (Hebrew University & LUISS)

I will discuss some model selection methods, old and relatively new, such as AIC, BIC, Lasso and more. I will explain their goals and 'philosophy', and some of their properties. Time permitting I will discuss my recent work in this area.

Torna alla lista dei seminari archiviati

### Nonparametric prediction of internet advertising conversion rates

### Charles Taylor (University of Leeds)

In 2013 Google revenue exceeded 55 billion dollars, most of which came from advertising, and most of this from AdWords. Depending on the search term, companies will bid to have their site placed at the top of the list. Google then receive the bid price if the user clicks on an Ad link, but the company only benefits if they make an (extra) sale. Using data on conversion examples, this talk will explore a nonparametric approach to predicting success, when the data are in the form of short text strings. Different distance measures will be examined, as will a method for selecting a smoothing parameter.

Torna alla lista dei seminari archiviati

### Validating protein structure using kernel density estimates

### Charles Taylor (University of Leeds)

Measuring the quality of determined protein structures is a very important problem in bioinformatics. Kernel density estimation is a well-known nonparametric method which is often used for exploratory data analysis. Recent advances, which have extended previous linear methods to multi-dimensional circular data, give a sound basis for the analysis of conformational angles of protein backbones, which lie on the torus. By using an energy test, which is based on interpoint distances, we initially investigate the dependence of the angles on the amino acid type. Then by computing tail probabilities which are based on amino-acid conditional density estimates, a method is proposed which permits inference on a test set of data. This can be used, for example, to validate protein structures, choose between possible protein predictions and highlight unusual residue angles.

Torna alla lista dei seminari archiviati

### A Spatially Nonstationary Fay-Herriot Model for Small Area Estimation

### Hukum Chandra (Indian Agricultural Statistics Research Institute, New Delhi, India)

Small area estimates based on the widely-used area-level model proposed in Fay and Herriot (1979) assume that the area level direct estimates are spatially uncorrelated. In many cases, however, the underlying individual level data are spatially correlated. We propose an extension to the Fay-Herriot model that accounts for the presence of spatial nonstationarity in the area level data. We refer to the predictor based on this extended model as the nonstationary empirical best linear unbiased predictor (NSEBLUP). We also develop two different estimators for the mean squared error of the NSEBLUP. The first estimator uses approximations similar to those in Opsomer et al. (2008). The second estimator is based on the parametric bootstrapping approach of Gonzalez-Manteiga et al. (2008) and Molina et al. (2009). Results from model-based and design-based simulation studies using spatially nonstationary data indicate that the NSEBLUP compares favourably with alternative area-level predictors that ignore this spatial nonstationarity. In addition, both proposed methods of mean squared error estimation for the NSEBLUP seem to perform adequately.

Torna alla lista dei seminari archiviati

### Finite sample properties of estimators of dose-response functions based on the generalised propensity score

### Michela Bia (CEPS/INSTEAD, Luxembourg)

In this seminar we propose three semiparametric estimators of the dose-response function based on kernel and spline techniques. In many observational studies treatment may not be binary or categorical. In such cases, one may be interested in estimating the dose response function in a setting with a continuous treatment. This approach strongly relies on the uncounfoundedness assumption, which requires the potential outcomes are independent of the treatment conditional on a set of covariates. In this context the generalized propensity score can be used to estimate dose-response functions (DRF) and marginal treatment effect functions. We evaluate the performance of the proposed estimators using Monte Carlo simulation methods. We also apply our approach to the problem of evaluating job training program for disadvantaged youth in the United States (Job Corps program). In this regard, we provide new evidence on the intervention effectiveness by uncovering heterogeneities in the effects of Job Corps training along the different lengths of exposure.

Torna alla lista dei seminari archiviati

### Multidimensional Inequality Measures on Finite Partial Orders

### Marco Fattore (Università di Milano-Bicocca)

Multidimensional inequality indexes with ordinal variables are an open issue in socio-economic statistics. The issue is relevant, particularly in connection with material deprivation, well-being and quality-of-life studies, that often involve ordinal variables. While the axiomatic theory of multidimensional inequality indexes is well-developed for the cardinal case, it is almost unexplored when multidimensional ordinal information is involved. The few attempts that can be found in the literature do not attain fully satisfactory results yet. In this seminar, we outline a completely new approach to inequality measurement, based on partial order theory. We first show how a general theory of inequality measures can be developed for finite partial orders. We then restrict the approach to product orders and show how this leads to a theory of multidimensional ordinal inequality measures. In addition, we discuss how the important problem of attribute decomposition may be addressed drawing on particular decompositions of Hasse diagrams of product orders.

Torna alla lista dei seminari archiviati

### BIG DATA: economia, professioni, modalità di visualizzazione

### Silvano Cacciari (Università di Firenze)

I Big data altro non sono che insiemi di dati raccolti in modo così complesso da essere molto differenti dai dati tradizionali. Il punto è che non stiamo parlando solo di software e di hardware ma di una materia grezza che è stimata poter valere l'8 per cento del PIL europeo negli anni '20. Una vera, ennesima, rivoluzione informatica, economica, tecnologica, cognitiva che va ancora studiata nelle sue reali conseguenze. Il seminario si prefigge di affontare tre temi che, nei loro lineamenti, si incrociano continuamente nel fenomeno big data. - L'economia dei big data. Il modo con il quale cambiano business e modelli macroeconomici. - Le professioni dei big data. Formazione e professionalizzazione interdisciplinare, affinità e differenze. - Modalità di visualizzazione dei big data. Antropologia visuale e processo decisionale in economia e nei percorsi formativi. I lineamenti di questo seminario sono propedeutici alla didattica, alla ricerca e alla progettazione.

Torna alla lista dei seminari archiviati

### On Enhancing Plausibility of MAR in Incomplete Data Analyses via Evaluating Response-Auxiliary Variable Correlations

### Tenko Raykov (Michigan State University)

A procedure for evaluating candidate auxiliary variable correlations with response variables in incomplete data sets is outlined. The method provides point and interval estimates of the outcome-residual correlations with potentially useful auxiliaries, and of the bivariate correlations of outcome(s) with the latter variables. Auxiliary variables found in this way can enhance considerably the plausibility of the popular missing at random (MAR) assumption if included in ensuing maximum likelihood analyses, or can alternatively be incorporated in imputation models for subsequent multiple imputation analyses. The approach can be particularly helpful in empirical settings where violations of the MAR assumption are suspected, as is the case in many longitudinal studies, and is illustrated with data from cognitive aging research.

Torna alla lista dei seminari archiviati

### Penalising model component complexity: A principled practical approach to constructing priors

### Håvard Rue (Department of Mathematical Sciences, Norwegian University of Science and Technology)

Selecting appropriate prior distributions for parameters in a statistical model is the Achilles heel of Bayesian statistics. Although the prior distribution should ideally encode the users prior knowledge about the parameters, this level of knowledge transfer seems to be unattainable in practice; often standard priors are used without much thought and with the implicit hope that the obtained results are not too prior sensitive. Despite the development of so-called objective priors, which are only available (due to mathematical issues) for a few selected and highly restricted model classes, the applied statistician has in practice few guidelines to follow when choosing the priors. An easy way out of this dilemma is to re-use prior choices of others, with an appropriate reference, to avoid further questions about this issue. In a Bayesian-software system like R-INLA, where models are build by adding up various model components to construct a linear predictor, we are facing a real and practical challenge. Default priors must be set for the parameters of a large number of model components which are supposed to work well in a large number of scenarios and situations. None of the (limited) guidelines in the literature seem to be able to approach such a task. In this paper we introduce a new concept for constructing prior distributions where we make use of the natural nested structure inherent to many model components. This nested structure defines a model component as a flexible extension of a base model, which allows us to define proper priors which penalise the complexity induced from the natural base model. Based on this observation, we can compute the prior distribution after the input of a user-defined (weak) scale-parameter for that model component. These priors are invariant to reparameterisations, have a natural connection to Jeffreys priors, are designed to support Occam’s razor and seem to have the excellent robustness properties, all which are highly desirable and allow us to use this approach to define default prior distributions. We will illustrate our approach on a series of examples using the R-INLA package for doing fast approximate Bayesian inference for the class of latent Gaussian models. The Student-t case will be discussed in detail as it is the simplest non-trivial example. Then we will discuss other cases, like classical unstructured random effect models, spline smoothing and disease mapping. This joint work with Thiago G. Martins, Daniel P. Simpson, Andrea Riebler (NTNU) and Sigrunn H. Sørbye (Univ. of Tromsø).

Torna alla lista dei seminari archiviati

### Providers profiling via Bayesian nonparametrics: a model based approach for assessing performances in public health

### Francesca Ieva (Dipartimento di Matematica "Federigo Enriques", Università degli studi)

Within the seminar, an application of Bayesian nonparametric modeling of the random effects in a hierarchical generalized linear mixed model will be discussed. The goal of the modeling strategy is twofold: first, it aims at performing a model-based clustering of the grouping factor (e.g., hospital of admission of patients, who are the statistical units fro the application of interest), then the interest lies in the outcome prediction at patient level, properly accounting for the grouping factor effect. This approach is very general and may be applied in different contexts like school performance evaluation, among others. In the case of interest, a Bayesian semiparametric mixed effects model is presented for the analysis of strongly unbalanced binary survival data coming from a clinical survey on STEMI (ST segment Elevation Myocardial Infarction), where statistical units (i.e., patients) are grouped by hospital of admission. The idea is to exploit flexibility and potential of such models for carrying out model-based clustering of the random effects in order to profile hospitals according to their effect on patient's outcome. According to the twofold aim previously mentioned, the idea is to provide methods for clustering providers with similar effect on patients’ outcome, enabling the evaluation of providers’ performances with respect to clinical gold standards, and making predictions on outcomes at patient's level. According to the first issue, flexibility of Bayesian nonparametric models (Dirichlet process, Dependent Dirichlet process) suits well the overdispersed nature of data. The in-built clustering they provide is exploited to point out groups among random effect estimates. This is achieved by minimizing suitable loss functions on misclassification errors through linear integer programming, assigning different weights to wrong association of hospitals belonging to different clusters as well as to missed association of hospitals whose effects are similar, within the random partition. Concerning the second issue we propose a new method for classifying patients, starting from Bayesian posterior credibility interval estimates of survival probabilities. In terms of predictive power. It provides better results than other criteria usually adopted in literature, based on pointwise estimates. The methods have been applied to a real dataset from the STEMI Archive, a clinical survey on patients affected by STEMI and admitted to hospitals in Regione Lombardia.

Torna alla lista dei seminari archiviati

### Amenable mortality in the European Union: toward better indicators for the effectiveness of health systems (AMIEHS)

### Rasmus Hoffmann (Erasmus Medical Center Rotterdam)

Objectives: There is a renewed interest in health system indicators. The indicator „avoidable/amenable mortality“ is based on the concept that deaths from certain causes should not occur in the presence of timely and effective healthcare. In the AMIEHS project, we introduce a new approach to the selection of indicators of amenable mortality by analyzing the association between innovations in medical care and mortality trends. Although the contribution of medical care to health has been studied extensively in clinical settings, much less is known about its contribution to population health. We examine how innovations in the management of four circulatory disorders, five cancers, HIV, peptic ulcer and renal failure have influenced trends in cause-specific mortality at the population level. We also created an interactive online mortality atlas presenting trends in mortality for 45 possible amenable causes of death in 30 European countries over the period 2001-2009. Methods: Based on predefined selection criteria and a broad review of the literature on the effectiveness of medical interventions, a first set of 14 potential indicators of amenable mortality (causes of death) was selected. The timing of the introduction of medical innovations was established through reviews and questionnaires sent to national experts from seven participating European countries. The preselected indicators were then validated by a Delphi-procedure. We combined data on the timing of these innovations and cause-specific mortality trends (1970–2005) from seven European countries. We used Joinpoint-models based on linear spline regression to identify associations between the introduction of innovations and favourable changes in mortality. Results: For most conditions, the Delphi panel could not reach consensus on the role of current mortality levels as measures of effectiveness of healthcare. For both ischaemic heart disease and cerebrovascular disease, the timing of medical innovations was associated with improved mortality. This suggests that innovation has impacted positively on mortality at the population level. For hypertension and heart failure, such associations could not be identified. For none of the five specific cancers, sufficient evidence for an association could be found. The highest association was found between the introduction of antiretroviral treatment and HIV mortality and no association could be found for renal failure and peptic ulcer. Conclusions: AMIEHS offers a rigorous new approach to the concept of amenable mortality that includes empirical validation. Only validated indicators can be successfully used to assess the quality of healthcare systems in international comparisons. Although improvements in cause-specific mortality coincide with the introduction of some innovations, this is not invariably true. This is likely to reflect the incremental effects of many interventions, the time taken for them to be adopted fully, the presence of contemporaneous changes in the incidence of disease or risk factors, diffusion and improved quality of interventions. Improvements in healthcare probably lowered mortality from many of the conditions that we studied but occurred in a much more diffuse way than we assumed in the study design. A better quantification of the contribution of healthcare to mortality would require adequate data on timing of innovation and trends in diffusion. Given the gaps in knowledge, between-country differences in levels of mortality from amenable conditions should not be used for routine surveillance of healthcare performance. The timing and pace of mortality decline from amenable conditions may provide better indicators of healthcare performance.

Torna alla lista dei seminari archiviati

### Local regression for circular data

### Agnese Panzera (DISIA)

Local regression for circular data is a theme being developed only recently. We present methodological advances focusing on the kernel method, when the predictor and/or the response are circular. A number of applications are possible, from nonparametric estimation of trend in circular time series, to quantile estimation of circular distributions.

Torna alla lista dei seminari archiviati

### An Integrative Bayesian Modeling Approach to Imaging Genetics

### Francesco Stingo (University of Texas, MD Anderson Cancer Center)

In this paper we present a Bayesian hierarchical modeling approach for imaging genetics, where the interest lies in linking brain connectivity across multiple individuals to their genetic information. We have available data from a functional magnetic resonance (fMRI) study on schizophrenia. Our goals are to identify brain regions of interest (ROIs) with discriminating activation patterns between schizophrenic patients and healthy controls, and to relate the ROIs’ activations with available genetic information from single nucleotide polymorphisms (SNPs) on the subjects. For this task we develop a hierarchical mixture model that includes several innovative characteristics: it incorporates the selection of ROIs that discriminate the subjects into separate groups; it allows the mixture components to depend on selected covariates; it includes prior models that capture structural dependencies among the ROIs. Applied to the schizophrenia data set, the model leads to the simultaneous selection of a set of discriminatory ROIs and the relevant SNPs, together with the reconstruction of the correlation structure of the selected regions. To the best of our knowledge, our work represents the first attempt at a rigorous modeling strategy for imaging genetics data that incorporates all such features. This is a joint work with Michele Guindani (MD Anderson Cancer Center), Marina Vannucci (Rice University) and Vince D. Calhoun (University of New Mexico).

Torna alla lista dei seminari archiviati

### Modeling Multivariate, Overdispersed Binomial Data with Additive and Multiplicative Random Effects

### Emanuele Del Fava (I-BioStat, Hasselt University, Diepenbeek, Belgium & DONDENA Centre for Research on Social Dynam)

When modeling multivariate binomial data, it often occurs that it is necessary to take into consideration both clustering and overdispersion, the former arising from the dependence between data, and the latter due to the additional variability in the data not prescribed by the distribution. If interest lies in accommodating both phenomena at the same time, we can use separate sets of random effects that capture the within-cluster association and the extra variability. In particular, the random effects for overdispersion can be included in the model either additively or multiplicatively. For this purpose, we propose a series of Bayesian hierarchical models that deal simultaneously with both phenomena. The proposed models are applied to bivariate repeated prevalence data for hepatitis C virus (HCV) and human immunodeficiency virus (HIV) infection in injecting drug users in Italy from 1998 to 2007.

Torna alla lista dei seminari archiviati

### Model-free prediction intervals for regression and autoregression

### Dimitris Politis (University of California)

The bootstrap is an invaluable tool for prediction intervals without having to assume normality of the data. However, even when all other model assumptions are correctly specified, bootstrap prediction intervals for regression (and autoregression) are well-known to be plagued by undercoverage; this is true even in the simplest case of linear regression. Furthermore, it can be the case that model assumptions are violated in which case any model-based inference will be invalid. In this talk, the problem of statistical prediction is revisited with a view that goes beyond the typical parametric/nonparametric dilemmas in order to reach a fully model-free environment for predictive inference, i.e., point predictors and predictive intervals. The `Model-Free (MF) Prediction Principle' of Politis (2007) is based on the notion of transforming a given set-up into one that is easier to work with, namely i.i.d. or Gaussian. The two important applications are regression and autoregression whether an additive parametric/nonparametric model is applicable or not.

Torna alla lista dei seminari archiviati

### Explorations on Wellbeing from Chinese Background

### Zhanjung Xing (Shandong University)

Since the mid-1980s, researchers from mainland China began to explore quality of life (QOL) at macro level. Entering the new century, and especially since 2006, the QOL research related to public policy in China has been into a new stage. Well-being index has gradually become focus of attention from the academia, the media and the policy makers. This report will give a brief introduction to the QOL research related to public policy in mainland China, and provide a detailed introduction to some researches on wellbeing conducted by the Centre of QOL and Public Policy in Shandong University. Some problems in this research field will also be discussed.

Torna alla lista dei seminari archiviati

### Dynamic Factorial Analysis

### Isabella Corazziari (ISTAT)

The Dynamic Factorial Analysis, proposed for the first time in the ‘70s by Coppi and Zannella, aims to analyse three-way array of data, where one of the three way is intrinsically ordered. For example the array can be units x variables x times. Firstly the four models proposed in the article of Coppi and Zannella will be described, followed by some methodological developments (Corazziari 1999, Phd thesis), improved in further works by Coppi, Blanco, Corazziari. Some considerations about best strategies of analysis to improve the interpretative power of the method will be addressed (Coppi, Blanco, Corazziari). A brief demonstration of the software implemented in xlisp by Corazziari will be proposed, both in the basic version than in the better strategy one.

Torna alla lista dei seminari archiviati

### A Hierarchical Bayesian Modeling Approach for Inferring and Identifying Relevant Copy Numbers

### Alberto Cassese (Rice University)

Statistical models have been successfully developed in recent years for the analysis of single-platform high-throughput data, but only few methods integrate data from different platforms. In this paper we present a Bayesian hierarchical modeling approach for genetical genomic data. We look, in particular, at array CGH data, measuring DNA copy number changes, and DNA microarrays, measuring gene expression as mRNA abundance. Our interest lies in finding sets of CGH probes that possibly affect the expression of one or more genes and in inferring their copy number states. Our proposed modeling strategy starts with the formulation of a hierarchical model, integrating gene expression levels with genetic data, that includes measurement errors and mixture priors for variable selection. We couple this model with a hidden Markov model on the genetic covariates. Our approach utilizes prior distributions that cleverly incorporate dependencies among selected covariates. It also incorporates stochastic search variable selection techniques within an inferential scheme that allows to select associations among genomic and genetic variables while simultaneously inferring the copy number states. We show performances of our proposed model on simulated data. We also analyze a publicly available data set.

Torna alla lista dei seminari archiviati

### Gini's Mean Difference offers a response to Leamer's critique

### Shlomo Yitzhaki (Dept of Economics, Hebrew University)

Gini's mean difference has decomposition properties that nest the decomposition of the variance as a special case. By using it one may reveal the implicit assumptions imposed on the data by using the variance. I argue that some of those implicit assumptions can be traced to be the causes of Leamer's critique. By requiring the econometrician to report whether those assumptions are violated by the data, we may be able to offer a response to Leamer's critique. This will reduce the possibility of supplying "empirical proofs" which in turn may increase the trust in econometric research.

Torna alla lista dei seminari archiviati