# Seminari del DiSIA

## Abstract

18/04/2018 ore 12.00

### TBA

### TBA

TBA

22/02/2018 ore 12.00

### Job Instability and Fertility during the Economic Recession: EU Countries

### Isabella Giorgetti (Dep. of Economics and Social Sciences, Università Politecnica delle Marche)

The trends of decline in TFR varied widely across EU countries. Exploiting individual data from the longitudinal EU-SILC dataset (2005-2013), this study investigates the cross-country effect of job instability on the couple’s choice of having one (more) child. I build job instability measure for both partners by the first-order lag of own economic activity status in labour market (holding temporary, permanent contract, or being unemployed). In order to account for the unobserved heterogeneity and potential presence of endogeneity, I estimate, under sequential moment restriction, a Two Stage Least Square Model (2SLS) in first differences. Thus, I group European countries according to six different welfare regimes and I estimate the heterogeneous effects of instability in the labour market on childbearing in a comparative framework. The principal result is that the cross-country average effect of job instability on couples’ fertility decisions has not statistical relevance due to the huge countryspecific fixed effects. Distinguishing between welfare regimes, institutional settings and social active policies reveal a varying family behaviour in fertility. In low-fertility countries, however, it is confirmed that the impact of parents’ successful labour market integration might be ambiguous and it might be due to the scarcity of child care options and/or cultural norms.

15/02/2018 ore 11.30

### Data Science and Our Environment

### Francesca Dominici (Harvard T.H. Chan School of Public Health)

What if I told you I had evidence of a serious threat to American national security--a terrorist attack in which a jumbo jet will be hijacked and crashed every 12 days. Thousands will continue to die unless we act now. This is the question before us today--but the threat doesn’t come from terrorists. The threat comes from climate change and air pollution. We have developed an artificial neural network model that uses on-the-ground air-monitoring data and satellite-based measurements to estimate daily pollution levels across the continental U.S., breaking the country up into 1-square-kilometer zones. We have paired that information with health data contained in Medicare claims records from the last 12 years, and for 97% of the population ages 65 or older. We have developed statistical methods and computational efficient algorithms for the analysis over 460 million health records. Our research shows that short and long term exposure to air pollution is killing thousands of senior citizens each year. This data science platform is telling us that federal limits on the nation’s most widespread air pollutants are not stringent enough. This type of data is the sign of a new era for the role of data science in public health, and also for the associated methodological challenges. For example, with enormous amounts of data, the threat of unmeasured confounding bias is amplified, and causality is even harder to assess with observational studies. These and other challenges will be discussed.

17/01/2018 ore 15.30

### Seminario di presentazione: principali linee di ricerca e risultati ottenuti

### Francesca Giambona (DiSIA)

Nel corso del seminario verranno presentate le principali linee di ricerca e le principali tematiche oggetto di analisi e di ricerca del percorso scientifico. In particolare verranno descritte le principali metodologie statistiche utilizzate per analizzare i fenomeni oggetto di studio e presentati i più importanti risultati empirici ottenuti. Infine, verranno brevemente introdotti i temi di ricerca attualmente oggetto di studio.

17/01/2018 ore 14.30

### Latent variable modelling for dependent observations: some developments

### M. Francesca Marino (DiSIA)

When dealing with dependent data, standard statistical methods cannot be directly used for the analysis as they may produce biased results and lead to misleading inferential conclusions. In this framework, latent variables are frequently used as a tool for capturing dependence and describing association. During the seminar, some developments in the context of latent variable modelling will be presented. Three main area of research will be covered, namely quantile regression, social network analysis, and small area estimation. The main results will be presented and some hints on current research and future developments will be given

12/01/2018 ore 12.00

### SAM Based Analysis of the Impact of VAT Rate Cut on the Economic System of China after the Tax Reform

### Ma Kewei (Shanxi University of Finance and Economics)

China completed on May 2016 the tax reform, and from a bifurcated system based on business tax (BT) and Value Added Tax (VAT), moved to a VAT entirely based system. The effects of this reform are still under analysis, regarding both the alleviation of tax burden on industry and service sectors and the economic improvements. Currently, it seems there is a common agreement that, except some minor cases worth of further deepening, in both respects these effects are positive. It is interesting to analyze the impact that in a VAT based system regime would have a reduction of the VAT rate on some key industry. To our knowledge, no analysis in this direction has been performed so far. In this paper, we try to fill this gap and analyze the effect of VAT rate cuts on selected industries on the whole economic system using an impact multiplier model based on a purposively elaborated Social Accounting Matrix (SAM) 2015 for China, which also allows to conduct a preliminary analysis of the structure of China’s economic system. (Joint work with Guido Ferrari and Zichuan Mi)

29/11/2017 ore 12:00

### Probabilistic Distance Algorithm: some recent developments in data clustering

### Francesco Palumbo (Università di Napoli Federico II)

Distance-based clustering, part of the not model based methods, optimize a global criterion based on the distance among clusters. The most widely known distance-based method is k-means clustering (MacQueen, 1967) and several extensions of this method have recently been proposed (Vichi and Kiers, 2001; Rocci et al., 2011; Timmerman et al., 2013). These extensions overcome issues arising from the correlation between variables. However, k-means clustering, and extensions thereof, can fail when clusters do not have a spherical shape and/or some extreme points are present, which tend to affect the group means. Iyigun (2007) and Ben-Israel and Iyigun (2008) propose a non-hierarchical distance-based clustering method, called probabilistic distance (PD) clustering, that overcomes these issues. Tortora et al. Tortora et al. (2016) propose a factor version of the method to deal with high-dimensional data. In PD-clustering, the number of clusters K is assumed to be a priori known, and a wide review on how to choose K can be found in. Given some random centres, the probability of any point belonging to a cluster is assumed to be inversely proportional to the distance from the centre of that cluster Iyigun (2007). The aim of the seminar is to illustrate the PD algorithm and some recent applications in the data clustering framework in the model-based (Rainey et al., 2017) and not model-based perspective. References

15/11/2017 ore 12.00

### Bayesian Multilevel Latent Class Models for the Multiple Imputation of Nested Categorical Data

### Davide Vidotto (Tilburg University)

Multiple imputation of multilevel data (i.e., data collected from different groups) does not only require to take correlations among variables into account, but also to consider possible dependencies between units coming from the same group. While a number of imputation models have been proposed in the literature for continuous data, existing methods for multilevel categorical data, such as the JOMO imputation method, still have limitations. For instance, JOMO only considers pairwise relationships between variables, and uses default priors that can affect the quality of the imputations in case of small sample sizes. With the present work, we propose using Multilevel Latent Class models to perform multiple imputation of missing multilevel categorical data. The model is flexible enough to retrieve original (complex) associations of the variables at hand while respecting the data hierarchy. The model is implemented under a Bayesian framework and estimated via Gibbs sampling, a natural choice for multiple imputation applications. After formally introducing the model, we will show the results of a simulation study in which model performance is assessed, and compared with the listwise deletion and JOMO methods. Results indicate that the Bayesian Multilevel Latent Class model is able to recover unbiased and efficient parameter estimates of the analysis model considered in our study.

11/07/2017 ore 15.00 - aula 303 del Plesso Didattico Morgagni

### Resilient and Secure Cyber Physical Systems: Matching the Present and the Future!

### Andrea Bondavalli (DiMAI)

Un sistema cyber-fisico (Cyber Physical System) è un sistema in cui gli elementi computazionali interagiscono strettamente con le entità fisiche tramite sensori e attuatori, controllando così processi individuali, organizzativi o meccanici tramite l'utilizzo delle tecnologie dell'informazione e della comunicazione (computer, software e reti). Tali sistemi sono tipicamente automatizzati, intelligenti e collaborativi, e molti di essi richiedono elevati livelli di resilienza e di sicurezza che assicurino la sopravvivenza dei sistemi in presenza di anomalie casuali, attacchi deliberati e, in generale, eventi critici imprevisti.
Il seminario sarà organizzato in due parti:
- Nella prima parte saranno presentate le caratteristiche principali dei Cyber Physical System, soffermandosi in particolare sugli aspetti legati alla resilienza e sicurezza di tali sistemi, e saranno presentati esempi concreti di CPS in diversi domini applicativi, dall'Internet delle cose (IoT - Internet of Things), ai Sistemi di Sistemi, alle tematiche di Industria 4.0.
- Nella seconda parte sarà presentato il nuovo Curriculum in Resilient and Secure Cyber Physical Systems del Corso di Laurea Magistrale in Informatica, chiarendone gli aspetti caratterizzanti, gli obiettivi formativi e gli sbocchi professionali.

La partecipazione è libera per tutti, è richiesta per la prenotazione tramite il seguente form:
https://docs.google.com/forms/d/e/1FAIpQLScrONY0ixHxA190hIk04g1VQjd_4btdRbXtY1GkuGQGSWQSdg/viewform

21/06/2017 ore 11.30

### Stronger Instruments and Refined Covariate Balance in an Observational Study of the Effectiveness of Prompt Admission to the ICU

### Luke Keele (Georgetown University)

Instrumental Variable (IV) methods, subject to appropriate identification assumptions, allow for consistent estimation of causal effects in observational data in the presence of unobserved confounding. Near-far matching has been proposed as one analytic method to improve inference by strengthening the effect of the instrument on the exposure and balancing observable characteristics between groups of subjects with low and high values of the instrument. However, in settings with hierarchical data (e.g. patients nested within hospitals), or where several covariate interactions must be balanced, conventional near-far matching algorithms may fail to achieve the requisite covariate balance. We develop a new matching algorithm, that combines near-far matching with refined covariate balance, to balance large numbers of nominal covariates while also strengthening the IV. This extension of near-far matching is motivated by a UK case study that aims to identify the causal effect of prompt admission to the Intensive Care Unit on 7-day and 28-day mortality.

16/06/2017 ore 11.30

### Assessing the Efficacy of Intrapartum Antibiotic Prophylaxis for Prevention of Early-Onset Group B Streptococcal Disease through Propensity Score Design

### Elizabeth R. Zell (Centers for Disease Control and Prevention)

Observational data can assist in answering hard questions about disease prevention. Early-onset neonatal group B streptococcal disease (EOGBS) can be prevented by intrapartum antibiotic prophylaxis (IAP). Clinical trials demonstrated efficacy of beta-lactam agents for a narrow population of women. Questions about effectiveness of antibiotic durations <4 hours, and agents appropriate for penicillin allergic women remain. We applied propensity score design methods to sample survey data on EOGBS cases and over 7000 non-cases from ten US states in 2003-2004, to match infants exposed to IAP with infants who were not exposed. Antibiotic efficacy was estimated for different antibiotic classes and durations before delivery. Our analysis supports the recommendation that Beta-lactam intrapartum prophylaxis of at least four hours before delivery remains the primary treatment; less than four hours and clindamycin prophylaxis are not as effective in preventing EOGBS.

15/06/2017 ore 15.30 - Aula 13

### Spatial Chaining of Price Indexes to Improve International Comparisons of Prices and Real Incomes

### D.S. Prasada Rao (School of Economics, The University of Queensland, Brisbane, Australia )

The International Comparisons Program (ICP) compares the purchasing power of currencies and real income of almost all countries in the world. An ICP multilateral comparison uses as building blocks bilateral comparisons between all possible pairs of countries. These are then combined to obtain the overall global comparison. One problem with this approach is that some of the bilateral comparisons are typically of lower quality, and their inclusion therefore undermines the integrity of the multilateral comparison. Formulating multilateral comparisons as a graph theory problem, we show how quality can be improved by replacing bilateral comparisons with their shortest path spatially chained equivalents. We consider a number of different ways in which this can be done, and illustrate these methods using data from the 2011 round of ICP. We then propose criteria for comparing the performance of competing multilateral methods, and using these criteria demonstrate how spatial chaining improves the quality of the overall global comparison.

15/06/2017 ore 14.30 - Aula 101

### Netflix e Deep Learning

### P. Crescenzi (Università di Firenze)

Seminario divulgativo su alcuni temi "caldi" dell'informatica : • Come, tra decine di migliaia di film, qualcuno ci possa suggerire cosa vedere. E ci azzecca. Ovvero la magia dei sistemi di raccomandazione. • Come si sia studiato il funzionamento dei neuroni del cervello umano per simularne il comportamento mediante le reti neurali e come queste reti, previo addestramento, possano comportarsi in modo intelligente. Ovvero la magia dell'intelligenza artificiale. La partecipazione è libera per tutti, è richiesta però la prenotazione tramite il seguente form: https://docs.google.com/forms/d/e/1FAIpQLSd8O2B_cNeNy3zMCKB6nHUDjxZ9OVx55e0H9VFdpmVi6p9MXA/viewform

09/06/2017 ore 11.30

### Exact P-values for Network Interference

### Guido Imbens (Stanford)

We study the calculation of exact p-values for a large class of non-sharp null hypotheses about treatment effects in a setting with data from experiments involving members of a single connected network. The class includes null hypotheses that limit the effect of one unit's treatment status on another according to the distance between units; for example, the hypothesis might specify that the treatment status of immediate neighbors has no effect, or that units more than two edges away have no effect. We also consider hypotheses concerning the validity of sparsification of a network (for example based on the strength of ties) and hypotheses restricting heterogeneity in peer effects (so that, for example, only the number or fraction treated among neighboring units matters). Our general approach is to define an artificial experiment, such that the null hypothesis that was not sharp for the original experiment is sharp for the artificial experiment, and such that the randomization analysis for the artificial experiment is validated by the design of the original experiment. (with Susan Athey and Dean Eckles)

27/04/2017 ore 14.30

### Introducing AWMA and its activities to promote mathematics among African women

### Kifle Yirgalem Tsegaye (Department of Mathematics, Addis Ababa University, Ethiopia)

Archeological findings and their interpretations persuade us to think that Africa is not only the cradle of human kind, but along with ancient Mesopotamia and others, is also a birth place of mathematical and technological ideas. It is natural to ask - what happened to civilization, mathematics and technological sciences in contemporary Africa? A million-euro question, though this talk is about what African women in mathematics are doing to change, at least to improve their realities. Just like women around the world, African women, had always been denied access to important components of development like education, in particular mathematics and technological sciences, with excuses like - "these are strictly meant for men" kind of impositions and restrictions. Many had no choices but believe in it and collaborate in making only "good wives" or "attractive women" out of themselves. Things are changing in this regard, but not strong enough to liberate African women out of the stereotypical depiction of themselves and cope up in the world of science. It is believed that role models in every profession are important to convince youngsters that it is possible for them to be whoever they want. Having nation-wide, continent-wide or world-wide strong networks of women in the fields of mathematics and Technological Sciences, enables us destroy such stereotypes and open the gate wide enough for our girls to dance with all sciences, most of all in the fields of math and technological sciences. The talk is a brief introduction of: - African Women in Mathematics Association (AWMA) and its activities; - Ethiopian girls in the fields of Science, Mathematics, Engineering and Technology.

29/03/2017 ore 14.30

### Le problematiche dei rischi estremi: modelli teorici e strumenti finanziario-attuariali

### Marcello Galeotti (University of Florence)

1. Una definizione dinamica di rischio economico. Misure di rischio: VaR e Expected Shortfall. Il problema del calcolo delle misure di rischio. 2. La teoria dei valori estremi. Distribuzioni light e heavy tail. Tasso d’azzardo. Fluttuazioni di somme e massimi. L’eccesso medio. Distribuzione generalizzata di Pareto. 3. Strumenti finanziari innovativi per la gestione di rischi ambientali. Opzioni di progetto e Catastrophic bonds. Dinamiche interattive: possibilità di esiti “virtuosi” e di equilibri sub-ottimali. Un caso di studio: rischi di esondazione del fiume Arno nell’area e nella città di Firenze. 4. Un modello evolutivo per i rischi sanitari. Medicina difensiva, assicurazioni sanitarie, azioni legali. Un modello di gioco evolutivo. Il ruolo dei premi assicurativi. Comportamenti asintotici dipendenti dai principi di calcolo del premio.

23/03/2017 ore 14.45

### Endogenous Significance Levels in Finance and Economics

### Alessandro Palandri (DISiA - University of Florence)

This paper argues that rational agents who do not know the model's parameters and have to proceed to their estimation will treat the significance level as a choice variable. Calculating costs associated to errors of Types I and II, rational agents will choose significance levels that maximize expected utility. The misalignment of the investigators' standard statistical significance levels to those that are optimal for agents has profound implications for the empirical tests of the model itself. Specifically, empirical studies could reject models in terms of statistically significant misprices when in fact the models are true: the misprices are not significant for the agents, no expected prontable intervention, no force to bring the system back to equilibrium.

23/03/2017 ore 14.00

### Bayesian methods in Biostatistics

### Francesco Stingo (University of Florence)

In this talk I will review some Bayesian methods for bio-medical applications I have developed in the recent years. These methods can be classified in 4 research areas: 1) graphical models for complex biological networks, 2) hierarchical models for data integration (integromics and imaging genetics), 3) power-prior approaches for personalized medicine, 4) change-point models for cancer early detection.

09/03/2017 ore 14.30

### Validity of case-finding algorithms for diseases in database and multi-database studies

### Rosa Gini (Agenzia Regionale di Sanità della Toscana, Firenze)

In database studies in pharmacoepidemiology and health services research, variables that identify a disease are derived from existing data sources, by mean of data processing. The ‘true’ variables should be conceptualized as unobserved quantities, and the study variables entering the actual analysis as measurements, resulting from case-finding algorithms (CFA) applied to the original data. The validity of a CFA is the difference between its result and the true variable. The science of estimating the validity of CFAs is in development. Stemming from the methodology of validation of diagnostic algorithms, it nevertheless has specific hurdles and opportunities. In this talk we will introduce the concept of *component CFA*. We will show the results from a large validation study of CFAs for type 2 diabetes, hypertension, and ischaemic heart disease in Italian administrative databases, using primary care medical records as a gold standard, and propose more research to generalize and apply its results. We will then show how the component analysis may support the estimation of validity of CFAs in multi-database, multi-national studies.

09/02/2017 ore 15.15 - (Aula Anfiteatro, viale Morgagni 65)

### Introduction to Riordan graphs

### Gi-Sang Cheon (Department of Mathematics, Sungkyunkwan University (Korea))

There are many reasons to define new classes of graphs. More generally, considering the n*n symmetric Riordan matrix in modulo 2, we define Riordan graph RG(n) of order n. In this talk, we study some basic properties of Riordan graphs such as the number of edges, the degree sequence, the matching number, the clique number, the independence number and so on. Moreover, several examples of Riordan graphs are given. They include Pascal graphs, Catalan graphs, Fibonacci graphs and many others.

09/02/2017 ore 14.00 - (Aula Anfiteatro, viale Morgagni 65)

### Pascal Graphs and related open problems

### Gi-Sang Cheon (Department of Mathematics, Sungkyunkwan University (Korea))

In 1983, Deo and Quinn introduced Pascal graphs in their searching for a class of graphs with certain desired properties to be used as computer networks. One of the desired properties is that the design be simple and recursive so that when a new node is added, the entire network does not have to be reconfigured. Another property is that one central vertex be adjacent to all others. The third requirement is that there exist several paths between each pair of vertices (for reliability) and that some of these paths be of short lengths (to reduce communication delays). Finally, the graphs should have good cohesion and connectivity. A Pascal matrix PM(n) of order n is defined to be an n*n symmetric binary matrix where the main diagonal entries are all 0's and the lower triangular part of the matrix consists of the first n-1 rows of Pascal's triangle modulo 2. The graph corresponding to the adjacency matrix PM(n) is called the Pascal graph of order n.

26/01/2017 ore 14.00

### Uncertainty in Propensity Score Estimation: Bayesian Methods for Variable Selection and Model-Averaged Causal Effects.

### Corwin M. Zigler (Harvard University)

Causal inference with observational data frequently relies on the notion of the propensity score (PS) to adjust treatment comparisons for observed confounding factors. As comparative effectiveness research in the era of "big data" increasingly relies on large and complex collections of administrative resources, researchers are frequently confronted with decisions regarding which of a high-dimensional covariate set to include in the PS model in order to satisfy the assumptions necessary for estimating average causal effects. Typically, simple or ad-hoc methods are employed to arrive at a single PS model, without acknowledging the uncertainty associated with the model selection. We propose Bayesian methods for PS variable selection and model averaging that 1) select relevant variables from a set of candidate variables to include in the PS model and 2) estimate causal treatment effects as weighted averages of estimates under different PS models. The associated weight for each PS model reflects the data-driven support for that model’s ability to adjust for the necessary variables. We illustrate features of our proposed approaches with a simulation study, and ultimately use our methods to compare the effectiveness of treatments for brain tumors among Medicare beneficiaries.

26/01/2017 ore 11.00

### Bayesian Effect Estimation Accounting for Adjustment Uncertainty

### Corwin M. Zigler (Harvard University)

Model-based estimation of the effect of an exposure on an outcome is generally sensitive to the choice of which confounding factors are included in the model. We propose a new approach, which we call Bayesian adjustment for confounding (BAC), to estimate the effect of an exposure of interest on the outcome, while accounting for the uncertainty in the choice of confounders. Our approach is based on specifying two models: (1) the outcome as a function of the exposure and the potential confounders (the outcome model); and (2) the exposure as a function of the potential confounders (the exposure model). We consider Bayesian variable selection on both models and link the two by introducing a dependence parameter denoting the prior odds of including a predictor in the outcome model, given that the same predictor is in the exposure model. In the absence of dependence, BAC reduces to traditional Bayesian model averaging (BMA). In simulation studies, we show that BAC with dependence can estimate the exposure effect with smaller bias than traditional BMA, and improved coverage. We compare BAC with other methods, including traditional BMA, in a time series data set of hospital admissions, air pollution levels, and weather variables in Nassau, NY for the period 1999–2005. Using each approach, we estimate the short-term effects of PM2.5 on emergency admissions for cardiovascular diseases, accounting for confounding. This application illustrates the potentially significant pitfalls of misusing variable selection methods in the context of adjustment uncertainty.

14/11/2016 ore 11.00

### La prova statistica nel processo penale

### Benito Vittorio Frosini (Università Cattolica del Sacro Cuore)

Il contenuto del seminario sulla prova statistica nei processi penali costituisce una introduzione ai problemi che si incontrano usualmente nella trattazione (e nella letteratura) della c.d. Statistica Forense, con particolare riguardo ai processi penali; in realtà, anche nei processi civili si incontrano problematiche del tutto analoghe, che però ricevono meno attenzione nella letteratura specialistica. Le regole da seguire nella valutazione delle prove portate in un processo, e in particolare delle prove scientifiche e delle prove statistiche, sono previste praticamente in tutte le legislazioni; una esposizione abbastanza chiara e analitica è quella contenuta nelle c.d. Federal Rules of Evidence degli Stati Uniti, che giustamente non fanno distinzione fra processo civile e processo penale. Il modo generalmente raccomandato, in presenza di un evento E (evidenza processuale) che può essere prodotto da più cause, alle quali sono associabili probabilità (a priori), è il ricorso alla formula di Bayes, che trasforma le verosimiglianze (del tipo P(E|C)) nelle probabilità a posteriori delle possibili cause (del tipo P(C|E)). La correttezza di questo approccio implica una valutazione del tutto negativa della c.d. "fallacia del condizionale trasposto", che consiste nell'impiego di probabilità del tipo P(E|C) - quando sono molto piccole - in luogo delle probabilità P(C|E), come purtroppo è accaduto in vari processi. Viene inoltre discusso il problema dell'impiego delle c.d. "statistiche nude", e più in generale della c.d. "inferenza statistica nuda", consistente nell'impiego - nel particolare esempio discusso in un processo - di statistiche derivanti da una popolazione più o meno ampia, che contiene il caso processuale in discussione. Esempi rilevanti in questa discussione, analiticamente considerati, sono stati i processi concernenti Alfred Dreyfus (in Francia), Sacco e Vanzetti (negli Stati Uniti), Chedzey (Australia), Sally Clark (nel Regno Unito). Vengono anche esaminati alcuni tipi di processi (civili e penali) in cui l'approccio bayesiano è praticamente assente nella letteratura, mentre vengono usualmente impiegati i c.d. tests di significatività (Fisher-Neyman-Pearson).

10/10/2016 ore 14.00

### A Practical Framework for Specification, Analysis and Enforcement of Attribute-based Access Control Policies

### Andrea Margheri (University of Southampton)

Access control systems are widely used means for the protection of computing systems. They may take several forms, use different technologies and involve varying degrees of complexity. In this seminar, we present a fully-implemented Java-based framework for the specification, analysis and enforcement of attribute-based access control policies. The framework rests on FACPL, a formal language with a compact, yet expressive, syntax that permits expressing real-world access control policies. The analysis functionalities exploit a state-of-the-art SMT-based approach and support the automatic verifications of many properties of interest. We introduce FACPL and its supporting functionalities by means of a real-world case study from an e-Health application domain.

29/09/2016 ore 14.00

### Reasoning about the trade-off between security and performance

### Boris Köpf (IMDEA Madrid)

Today's software systems employ a wide variety of techniques for minimizing the use of resources such as time, memory, and energy. While these techniques are indispensable for achieving competitive performance, they can pose a serious threat to security: By reducing the resource consumption on average (but not in the worst case), they introduce variations that can be exploited by adversaries for recovering private information about users, or even cryptographic keys. In this talk I will give examples of attacks against a number of performance-enhancing features of software and hardware, and I will present ongoing work on techniques for quantifying the resulting threat and for choosing the most cost-effective defense.

13/07/2016 ore 14.30

### Sensitivity Analysis for Differences-in-Differences

### Luke Keele (Dept. of Political Science Penn State University)

In the estimation of causal effects with observational data, applied analysts often use the differences-in-differences (DID) method. The method is widely used since the needed before and after comparison of a treated and control group is a common situation in the social sciences. Researchers use this method since it protects against a specific form of unobserved confounding. Here, we develop a set of tools to allow analysts to better utilize the method of DID. First, we articulate the hypothetical experiment that DID seeks to replicate. Next, we develop form of matching that allows for covariate adjustment in under the DID identification strategy that is consistent with the hypothetical experiment. We also develop a set of confirmatory tests that should hold if DID is a valid identification strategy. Finally, we adapt a well known method of sensitivity analysis for hidden confounding to the DID method. We develop these sensitivity analysis methods for both binary and continuous outcomes. We show that DID is in a very real sense more sensitive to hidden confounders than a standard selection on observables identification strategy. We then apply our methods to three different empirical examples from the social sciences.

28/06/2016 ore 11.30

### Sharing on recent collaborative research work between MSE of Wuhan University and SEEM of City University of Hong Kong in Remanufacturing System and Healthcare Management.

### Xian-Kia Wang & Kwai-Sang Chin (Wuhan University, China & University of Hong Kong)

Competitive Strategy in Remanufacturing and the Effects of Government Subsidy This presentation aims to report the research ideas and initial result of a national research project in China about the effective remanufacturing systems and its integrated management. We consider a single-period model comprised of an original equipment manufacturer (OEM) who produces only new products and a remanufacturer who collects used products from consumers and produces remanufactured products. The OEM and the remanufacturer compete in the product market. We examine the effects of government subsidy as a means to promote remanufacturing activity, particularly with subsidy to remanufacturer and consumers respectively. It is found that the government subsidy to both remanufacturer and consumer increases remanufacturing activity, while the subsidy to remanufacturer shows better result. It is because the subsidy to remanufacturer is resulted in lower price of remanufactured products, thus leading to higher consumer surplus and social welfare. (b) Emergency Healthcare Operations Management This presentation aims to report partial outcome of a research project entitled “Delivering 21st Century Healthcare in Hong Kong—Building a Quality-and-Efficiency Driven System”, particularly in the Emergency Department of a hospital. Emergency Department (ED) is a pivotal component in the social healthcare system, particularly in a highly dense city like Hong Kong. The ED management in Hong Kong has been long challenged by the high patient demands, manpower shortage, and unbalanced ED staff utilization. These problems, if not properly addressed, will be creating negative impact on the patient experience as well as staff morale. Based on the support of an ED of a main public hospital in Hong Kong, we have studied the patient arrival pattern and developed a forecasting and simulation model for ED managers to evaluate the current ED performance and suggest alternative ED operations for the ever-changing ED environment and patient demand.

22/06/2016 ore 11.30

### Sensitivity analysis with unmeasured confounding

### Peng Ding (Dept. of Statistics, University of California Berkeley)

- Part I: Cornfield’s inequalities and extensions to unmeasured confounding in observational studies - Part II: Cornfield’s inequalities and Rosenbaum’s design sensitivity in causal inference with observational studies

07/06/2016 ore 10.30

### Randomization inference for treatment effect variation

### Peng Ding (Dept. of Statistics, University of California Berkeley)

- Part I: Decomposing treatment effect variation in randomized experiments with and without noncompliance - Part II: Randomization-based tests for idiosyncratic treatment effect variation

18/05/2016 ore 16.30

### Regional poverty measurement in a border region

### Ralf Münnich (Universität Trier)

Statistical outcome from National Statistical Institutes is generally produced using the national data available. For regional policies, however, not solely the information on the subregion might be important but also surrounding information. In general, this is solved using spatial modelling. In case of regions at national borders, the situation becomes more difficult since neighbouring information from other countries, in general, is not available. The presentation focuses on regional poverty measurement using the ARPR in the federal states Rhineland-Palatinate and Saarland on LAU1. Based on an area-level small area model, several approaches to handle the border statistics problem will be presented and discussed.

18/05/2016 ore 12.00

### A multilevel Heckman model to investigate financial assets among old people in Europe

### Omar Paccagnella (Department of Statistical Sciences, University of Padua)

Heckman sample selection models are often applied when the variable of interest is observed or recorded only if a (selection) condition applies. This may occur because of unit or item (survey) non-responses or because a specific product or condition is owned by a subsample of units and unobservable components affecting the inclusion in the subsample are correlated with unobservable factors influencing the variable of interest. The latter case may be well represented by the amount invested in financial and/or real assets: it may be observed and studied only if the unit has included that product in its own portfolio. Household portfolios have been widely investigated, particularly among the old people. Indeed, as earnings of the older people typically reflect income pensions, consumption in later life can be supported by spending down financial or real assets. Therefore, wealth is a key measure of the individual socio-economic status in an ageing society. On the one hand, this topic may be addressed to investigate the ownership patterns of financial or real assets and/or their invested amount. On the other hand, the interest may be focused on cross-country comparisons of ownership/amount of the assets. This paper aims at shedding more light on the behaviour of households across Europe by a joint study of their ownership patterns and invested amount in some financial assets (i.e. savings for short-term or long-term investments) among the old population. To this aim, in order to take into account the hierarchical nature of the data (households nested into countries) and the features of the variable of interest (the amount invested in a financial product is observed only if the household owns that asset), a multilevel Heckman model is the suggested methodological solution for the analysis. Such model is applied to data from SHARE (Survey of Health, Ageing and Retirement in Europe), which is an international survey on ageing, collecting detailed information on the socio-economic status of the European old population.

26/04/2016 ore 12:00

### Women’s mid-life ageing in low and middle income countries

### Tiziana Leone (London School of Economics)

To date there is little evidence on how the health needs and their ageing process of women aged 45-65 in Low and Middle Income Countries (LMICs), in particular when compared to other age groups as well as men. This is particularly significant given that this is the period when menopause occurs and the cumulative effects of multiple births or birth injuries can cause health problems across women’s life course. The aim of the project is to analyse inequalities across socio-economic groups in the ageing process of women between the ages of 45-65 and compare it to that of men within and between countries. Using data from wave 1 of SAGE in 4 countries (Ghana, Mexico, Russia and India) as well as longitudinal data from the Indonesian Longitudinal Family Survey, this study looks at the age pattern of physical and mental decline looking at objective measures of health such as grip strength, cognitive functions and walking speed as well as chronic diseases. Ultimately the study aims to understand what policy implications there are for the pace of ageing in LMICs. Results show shows a clear pattern of deterioriation of health in the middle age group significantly different from men who show a more linear pattern. This pattern is particularly prominent when we look at physical health rather than mental health. The study highlights the importance to shed more light into this field as women have health care needs beyond their reproductive life which are potentially neglected.

14/04/2016 ore 14.30

### On Latent Change Model Choice in Longitudinal Studies

### Tenko Raykov (Michigan State University)

This talk is concerned with two helpful aids in the process of choosing between models of change in repeated measure investigations in the behavioral and social sciences: (i) interval estimates of proportions explained variance in longitudinally followed variables, and (ii) individual case residuals associated with these variables. The discussed method allows obtaining confidence intervals for the R-squared indices of repeatedly administered measures, as well as subject-specific discrepancies between model predictions and raw data on the observed variables. In addition to facilitating evaluation of local model fit, the approach is useful for the purpose of differentiating between plausible models stipulating different patterns of change over time. This feature of the described method becomes particularly helpful in empirical situations characterized by (very) large samples and high statistical power, which are becoming increasingly more frequent in complex sample design studies in the behavioral, health, and social sciences. The approach is similarly applicable in cross-sectional investigations, as well as with general structural equation models, and extends the set of means available to substantive researchers and methodologists for model fit evaluation beyond the traditionally used overall goodness of fit indexes. The discussed method is illustrated using data from a nationally representative study of older adults.

06/04/2016 ore 15.30

### Equivalence relations for ordinary differential equations

### Mirco Tribastone (IMT - Lucca )

Ordinary differential equations (ODEs) are the primary means to modeling systems in a wide range of natural and engineering sciences. When the inherent complexity of the system under consideration is high, the number of equations required poses a serious detriment to our capability of performing effective analyses. This has motivated a large body of research, across many disciplines, into abstraction techniques that provide smaller ODE systems preserving the original dynamics in some appropriate sense. In this talk I will review some recent results that look at this problem from a computer-science perspective. I will present behavioral equivalences for nonlinear ODEs based on the established notion of bisimulation. These induce a quotienting of the set of variables of the original system such that in a self-consistent reduced ODE system each macro-variable represents the dynamics of an equivalence class. For a rather general class of nonlinear ODEs, computing the largest equivalences can be done using a symbolic partition-refinement algorithm that exploits an encoding into a satisfiability (modulo theories) problem. For ODEs with polynomial derivatives of degree at most two, covering the important cases of affine systems and chemical reaction networks, the partition refinement is based on Paige and Tarjan’s seminal solution to the coarsest refinement problem. This gives an efficient algorithm that runs in O(m n logn) time, where m is the number of monomials and n is the number of ODE variables. I will present numerical evidence using ERODE (http://sysma.imtlucca.it/tools/erode/), an ongoing project that implements such reductions algorithms, showing that our bisimulations can be effective in realistic models from chemistry, systems biology, electrical engineering, and structural mechanics. This is joint work with Luca Cardelli, Max Tschaikowski, and Andrea Vandin.

25/02/2016 ore 14.30

### The role of active monitoring in reducing the execution times of public works. A Regression Discontinuity approach

### Marco Mariani (IRPET)

The economic literature on the procurement of public works usually pays great attention to the stages of auctions and contract formation, much less to the later stages of contract enforcement where public buyers are expected to combat delays and cost escalations by monitoring executors. The opportunity to appraise the role of monitoring is offered to us by an Italian regional case study, where a local law obliges the regional government to assist smaller buyers in performing tighter-than-usual monitoring on projects that surpass a certain threshold in terms of financial size and receive co-financing by the regional government above a certain level. In order to estimate the causal effect of this higher level of monitoring on the time-to-completion of public works we resort to a sharp regression discontinuity approach, made unusual by the presence of multiple forcing variables and transposed in a discrete-time survival analysis setting that allows for non-proportional hazards. Cross-validation procedures for bandwidth selection, as well as the usual robustness and sensitivity checks of RDDs, are extended to a setting with multiple forcing variables. Results of local estimations show that tighter-than-usual monitoring speeds up either those projects that would have not lasted long anyway or, more interestingly, very persistent projects. (Joint work with G.F. Gori and P. Lattarulo, IRPET)

16/12/2015 ore 14.00

### The role of stage at diagnosis in colorectal cancer racial/ethnic survival disparities: a causal inference perspective

### Linda Valeri (Harvard Medical School)

Disparities in colorectal cancer survival and stage at diagnosis between White and Black patients are widely documented, whereby Black patients are more likely diagnosed at advanced stage and have poorer prognosis. Interest lies in understanding the importance of stage at diagnosis in explaining survival disparities. To this aim we propose to quantify the extent to which racial/ethnic survival disparities would be reduced if disparities in stage at diagnosis were eliminated. In particular, we develop a causal inference approach to assess the impact of a hypothetical shift in the distribution of stage at diagnosis in the black population to match that of the white population. We further develop sensitivity analysis techniques to assess the robustness of our results to the violation of the no-unmeasured confounding assumption and to selection bias due to stage at diagnosis missing not-at-random. Our results support the hypothesis that elimination of disparities in stage at diagnosis would contribute to the reduction in racial survival disparities in colorectal cancer. Important heterogeneities across the patients’ characteristics were observed and our approach easily accommodates for these features. This work illustrates how a causal inference perspective aids in identifying and formalizing relevant hypotheses in health disparities research that can inform policy decisions.

07/10/2015 ore 14.30

### Some Perspectives about Generalized Linear Modeling

### Alan Agresti (University of Florida)

This talk discusses several topics pertaining to generalized linear modeling. With focus on categorical data, the topics include (1) bias in using ordinary linear models with ordinal categorical response data, (2) interpreting effects with nonlinear link functions, (3) cautions in using Wald inference (tests and confidence intervals) when effects are large or near the boundary of the parameter space, (4) the behavior and choice of residuals for GLMs, (5) an improved way to use the generalized estimating equations (GEE) method for marginal modeling of a multinomial response, and (6) modeling nonnegative zero-inflated responses. I will present few new research results, but these topics got my attention while I was writing the book `Foundations of Linear and Generalized Linear Models,' recently published by Wiley.

27/07/2015 ore 12.00

### Optimal Tests of Treatment Effects for the Overall Population and Two Subpopulations in Randomized Trials, using Sparse Linear Programming

### Michael Rosenblum (Johns Hopkins Bloomberg School of Public Health)

We propose new, optimal methods for analyzing randomized trials, when it is suspected that treatment effects may differ in two predefined subpopulations. Such subpopulations could be defined by a biomarker or risk factor measured at baseline. The goal is to simultaneously learn which subpopulations benefit from an experimental treatment, while providing strong control of the familywise Type I error rate. We formalize this as a multiple testing problem and show it is computationally infeasible to solve using existing techniques. Our solution involves a novel approach, in which we first transform the original multiple testing problem into a large, sparse linear program. We then solve this problem using advanced optimization techniques. This general method can solve a variety of multiple testing problems and decision theory problems related to optimal trial design, for which no solution was previously available. In particular, we construct new multiple testing procedures that satisfy minimax and Bayes optimality criteria. For a given optimality criterion, our new approach yields the optimal tradeoff between power to detect an effect in the overall population versus power to detect effects in subpopulations. We demonstrate our approach in examples motivated by two randomized trials of new treatments for HIV.

29/06/2015 ore 12.00

### Probabilistic Model Checking

### Arpit Sharma (DiSIA)

Model checking is an automated veri?cation method guaranteeing that a mathematical model of a system satis?es a formally described property. It can be used to assess both qualitative and quantitative properties of complex software and hardware systems. This seminar is going to present the basic idea behind the verification of discrete-time Markov chains (DTMCs) against linear temporal logic specifications (LTL). We are also going to discuss some challenging directions for future research.

28/05/2015 ore 11.00

### Modelling systemic risk in financial markets

### Andrea Ugolini (DiSIA)

Three recent crises — the dot-com bubble and the subprime and European sovereign debt crises — have revealed the complex dynamics underpinning the global financial system and how rapidly risk is propagated across markets. Investors, regulators and researchers are thus keen to develop accurate measures of risk transmission between assets and markets. From the investors’ perspective of guaranteeing efficient portfolio diversification, quantify the risk of contagion is essential for their ongoing interest in changes in market linkages. From the regulators’ point of view, assess risk spillover is important to focalize attention on the maintenance and development of new financial regulatory and institutional rules such as circuit breakers, transaction taxes and short-sale rules. In fact, recognizing important shortcomings in financial supervision, the European Commission and European Central Bank (ECB) created the European Systemic Risk Board (ESRB) at the end of 2010 with the goal of monitoring, at the macro-prudential level, the European financial system and preventing and mitigating any propagation risk within the financial system. The literature contains many definitions of systemic risk. De Bandt and Hartmann (2000) defined it “as the risk of experiencing systemic events in the strong sense” where “strong sense” signifies the spread of news about an institution that has an adverse impact on one or more healthy institutions in a sequential manner. Billio et al. (2012) explained that “systemic risk can be realized as a series of correlated defaults among financial institutions, occurring over a short time span and triggering a withdrawal of liquidity and widespread loss of confidence in the financial system as a whole”. The above brief list of definitions points to the intricacy of the topic and the challenge faced by investors, regulators and researchers in attempting to measure the complexity and dynamics of systemic risk. Using the conditional Value-at-Risk (CoVaR) systemic risk measure (Adrian and Brunnermeier, 2011; Girardi and Ergün, 2013), we quantified systemic risk as the impact of the risky situation of a particular financial institution, market or system on the value-at-risk (VaR) of other financial institutions, markets or systems. We introduced a novel copula and vine copula approach to computing the CoVaR value, given that copula are flexible modellers of joint distribution and are particularly useful for characterizing the tail behaviour that provides such crucial information for the CoVaR.

29/04/2015 ore 15.30

### Role of scientists to defend the integrity of science and public health policy

### Kathleen Ruff (International Joint Policy Committee of the Societies of Epidemiology, IJPC-SE)

When the scientific evidence is distorted so as to serve vested interests and to endanger public health, what responsibility do scientists and scientific organisations bear? Those most at risk of harm from hazardous substances are frequently those segments of the population who lack economic and political influence and thus have little ability to demand the necessary health protections. Scientists have the credibility to demand evidence-based public health policy that protects the right to health of all citizens.

29/04/2015 ore 14.30

### La governance del rischio e le procedure di VIS in Italia

### Giancarlo Sturloni (International School for Advanced Studies, SISSA, Trieste)

Health Impact Assessment (HIA) is an important tool to defend local communities from health and environmental risks. Nevertheless, to keep its promise HIA should be carried on within an open, well-informed, inclusive and dialogical approach. Risk governance could help HIA in supporting the dialogue between the stakeholders and the open access to scientific information, so as to promote participative decision making processes.

13/02/2015 ore 12.00

### Statistiche ufficiali, social media e big data

### Stefano Iacus (Università degli Studi di Milano)

In questo seminario presenteremo alcuni recenti sviluppi dell'analisi di big data provenienti dai social media. Tenteremo anche di mostrare quali siano le possibili interazioni tra l'analisi di dati social e quelle delle tecniche di survey tradizionali evidenziando limiti e promesse di alcuni esperimenti. A titolo di esempio mostreremo due applicazioni recenti: #iHappy, ovvero l'indice di Twitter-Happyness (http://www.blogsvoices.unimi.it) e #WNI, il Wired Next Index (http://index.wired.it), un indicatore che cerca di catturare la capacità di ripresa del paese. In particolare, il #WNI, è la sintesi tra i risultati delle statistiche ufficiali in materia di economia e benessere e i risultati di tre indicatori "social" di fiducia personale, economica e politica, e sembra essere un buon strumento di nowcasting di ciò che pretende di misurare.

11/02/2015 ore 14.30

### Quantile regression coefficients modeling

### Paolo Frumento (Karolinska Institutet - Unit of Biostatistics)

Estimating conditional quantiles is of interest in many research areas, and quantile regression is foremost among the utilized methods. The coefficients of a quantile regression model depend on the order of the quantile being estimated. We present an approach to modeling the regression coefficients as parametric functions of the order of the quantile. This approach may have advantages in terms of parsimony, efficiency, and may expand the potential of statistical modeling. We describe a possible application of this method and its implementation in the qrcm R package.

20/01/2015 ore 12.00

### Theory and Applications of Proper Scoring Rules

### Monica Musio (Universita' degli Studi di Cagliari)

Suppose you have to quote a probability distribution Q for an unkown quantity X. When the value x of X is eventually observed, you will be penalised by an amount S(x;Q). The function S is a proper scoring rule if, when you believe X has distribution P, you expected penalty S(P;Q) is minimised by the honest quote Q = P. Proper scoring rules can be used, as above, to motivate you to assess your true uncertainty honestly, as well as to measure the quality of your past probability forecasts in the light of the actual outcomes. They also have many other statistical applications. We will discuss some characterisations, properties and specialisations of proper scoring rules, and describe some of their uses, including robust estimation and Bayesian model selection.

Ultimo aggiornamento 8 febbraio 2018 .