Seminari - Abstract
Statistica e causalità: un caso di convergenze parallele
Bruno Chiandotto (Università di Firenze)
Nel corso del seminario, verrà delineato lo sviluppo dei due principali approcci all’ inferenza statistica quello classico e quello bayesiano. Dei due approcci verranno sottolineati, seppure molto sommariamente, la coerenza logica e il successo conseguito in ambito applicativo, soffermando l’attenzione su alcuni aspetti caratteristici che consentono d’interpretarli come casi particolari della teoria statistica delle decisioni. Un percorso analogo verrà seguito nella illustrazione del concetto di causalità e dell’inferenza causale dedicando particolare attenzione alla cosi-detta teoria causale delle decisioni. Si cercherà, infine, di evidenziare come proprio nell’approccio decisionale possano risiedere elementi in grado, per un verso di giustificare alcuni sviluppi recenti dell’analisi causale, per altro verso d’individuare il punto di convergenza tra inferenza statistica ed inferenza causale. Punto che viene identificato nella “Teoria causale bayesiana-soggettiva delle decisioni statistiche”.
Economic and societal costs of being NEETs
Massimiliano Mascherini (Eurofound)
The traditional indicators for labour market participation are frequently criticised for their limited relevance for youth as many of them are students and hence classified as being out of the labour force. Also for this reason, EU policy makers have recently started to focus their attention on the NEETs. The acronym NEETs stands for those who are Not in Employment, Education or Training, aged typically between 15 and 24, who regardless their educational level are disengaged from both work and education and are at a higher risk of labour market and social exclusion. Being NEET is a waste of young people’s potential, but also has adverse consequences on society and the economy. In fact, spending periods of time as NEET may lead to a wide range of social disadvantages, such as disaffection, insecure and under paid employment, youth offending mental and physical problems. These outcomes each have a cost attached to and as such being NEET is not just a problem for the individual but also for our societies and our economies as a whole. Aim of this seminar is to provide an estimation of the economic and societal costs of being NEET and to highlight the importance of strengthening governments and social partners’ efforts for re-engaging NEETs into labour market.
A General Framework for modelling ordinal data: Statistical foundations, inferential issues and extensions
D. Piccolo e M. Iannario (University of Naples Federico II)
This contribution discusses some statistical issues of a mixture distribution introduced for the analysis of ordinal data generated by ratings and evaluations (CUB models). The rationale for this framework stems from the interpretation of the respondent's final choice as the output of a complex psychological process whose main components are the personal feeling towards the item and an inherent uncertainty surrounding the choice. Thus, on the basis of experimental data and statistical motivations, the response distribution is modelled as the convex combination of a shifted Binomial and a discrete Uniform random variables whose parameters may be consistently estimated and validated by maximum likelihood inference. In addition, subjects' covariates are introduced in order to assess how the characteristics of the respondents may affect the rating. The Seminar focuses on several extensions which concern objects' covariates, multi-items, hierarchical structure, shelter effects and overdispersion. The approach has been successfully applied in several fields as Sensometrics and Marketing. Finally, some empirical evidence is reported to support the usefulness of this framework.
Statistical methods and applications for seismology
Marta Gallucci (Università di Firenze)
Among the fields in which statistical applications have limited extension but greater potential, we certainly find geophysics and, particularly, seismology. In the study of seismic events there are several sources of uncertainty: first of all, the development of the theory of plate tectonics as a unifying framework for several geological phenomena is quite recent, and some aspects still remain without a commonly accepted explanation; moreover, the localization of earthquakes is obtained by means of an estimation procedure which interpolates data available in the nearby recording stations; finally, the study of the distribution of future events arises the great problem of earthquake prediction. In this talk, we present two different studies: the first one is a real data application of a statistical test for structural change to models for seismic activity; the second one is a simulation study in which the relationships between the ETAS model parameters and the inter-events times are explored by means of an emulator.
Multivariate Regression Chain Graph Models: an application to the study of determinants of fertility and housing conditions
Ilaria Vannini (Università di Firenze)
Over the last two decades variables related with housing conditions have been the focus of increased attention as determinants of fertility behaviour. Indeed, a complex structure of association between fertility and housing has been found. This work contributes to move beyond existing research findings using the theory of graphical models, in particular of multivariate regression chain graphs models. They are marginal models, i.e. focused in the distribution of clustered joint responses among which mutual dependence is treated as a nuisance, after controlling for a set of covariates. An ad-hoc parametrization, based on multivariate logistic regression and a fitting procedure specified in terms of individual data, which allows to deal with both continuous and categorical explanatory variables, have been proposed. This approach provides for modeling separately the response of each unit of a couple, resulting in important findings about the relationship between fertility and housing by women and men personal point of view. Moreover, it has been shown that multivariate regression chain graphs models are feasible even when data with a complex structure and many variables are involved. We present one of the first examples of a socio-economic application of these kinds of model.
Automated interviews on clinical case reports to elicit directed acyclic graphs
Davide Luciani (Istituto di Ricerche Farmacologiche `Mario Negri)
Setting up clinical reports within hospital information systems makes it possible to record a variety of clinical presentations. Directed acyclic graphs (Dags) offer a useful way of representing causal relations in clinical problem domains and are at the core of many probabilistic models described in the medical literature, like Bayesian networks. However, medical practitioners are not usually trained to elicit Dag features. Part of the difficulty lies in the application of the concept of direct causality before selecting all the causal variables of interest for a specific patient. We designed an automated interview to tutor medical doctors in the development of Dags to represent their understanding of clinical reports. Medical notions were analyzed to find patterns in medical reasoning that can be followed by algorithms supporting the elicitation of causal Dags. Clinical relevance was defined to help formulate only relevant questions by driving an expert’s attention towards variables causally related to nodes already inserted in the graph. Key procedural features of the proposed interview are described by four algorithms. The automated interview comprises questions on medical notions, phrased in medical terms. The first elicitation session produces questions concerning the patient’s chief complaints and the outcomes related to diseases serving as diagnostic hypotheses, their observable manifestations and risk factors. The second session focuses on questions that refine the initial causal paths by considering syndromes, dysfunctions, pathogenic anomalies, biases and effect modifiers. Revision and testing of the subjectively elicited Dag is performed by matching the collected answers with the evidence included in accepted sources of biomedical knowledge. (Joint with F. Stefanini)
Constraints on Marginalised DAGs
Robin Evans (University of Cambridge)
Models defined by the global Markov property for directed acyclic graphs (DAGs) possess many nice properties, including a simple factorisation, graphical criteria for determining independences (d-separation/moralisation), and computational tractability. As is well known, however, DAG models are not closed under the operation of marginalisation, so marginalised DAGs (mDAGs) describe a much larger and less well understood class of models. mDAG models can be represented by hyper-graphs with directed and bidirected edges, under a modest extension of Richardson's ADMGs. We describe the natural Markov property for mDAGs, which imposes observable conditional independences, 'dormant' independences (Verma constraints) and inequality constraints, and provide some results towards its characterisation. In particular we describe the Markov equivalence classes of mDAG models over three observed variables. We also give a constructive proof that when two observed variables are not joined by any edge (directed or bidirected), then this always induces some constraint on the observed joint distribution.
A Stata package for the application of nonparametric estimators of dose-response functions
Michela Bia (CEPS/INSTEAD)
In this paper we propose three semiparametric estimators of the dose-response function based on kernel and spline techniques. In many observational studies treatment may not be binary or categorical. In such cases, one may be interested in estimating the dose-response function in a setting with a continuous treatment. This approach strongly relies on the uncounfoundedness assumption, which requires the potential outcomes are independent of the treatment conditional on a set of covariates. In this context the generalized propensity score can be used to estimate dose-response functions (DRF) and marginal treatment effect functions. We present a set of Stata programs, which estimate the propensity score when the treatment is a continuous variable, test the balancing property of the generalized propensity score, and semiparametrically estimate the dose-response function. We illustrate these programs using a data set collected by Imbens, Rubin and Sacerdote (2001).
Turing e l'ignoto
Alberto Gandolfi (Università di Firenze)
Durante il lavoro per decifrare il codice di Enigma, Turing ha ideato un metodo per stimare la probabilità delle specie non osservate in un campione casuale. L’apparentemente insormontabile difficoltà di descrivere oggetti mai osservati è superata con una soluzione brillantemente semplice (una volta che la si è vista!). I dettagli matematici furono approfonditi e pubblicati solo in seguito da Good, all’epoca collaboratore di Turing, e ancora oggi sono alla base di metodi usati in molti ambiti tra cui genetica, linguistica, statistica, informatica e scienze sociali. Elenco dei seminari su Turing: http://turing.dsi.unifi.it
Structural Hierarchical Approach to Longitudinal Modeling of Health Environmental Effects: Methodology and Applications
Michael Friger (Ben-Gurion University of the Negev, Israel)
We discuss a possible approach to modeling health environmental effects based on the time-series technique. There-withal, we try to use the system analysis of the considered problem, or in other words, to use a-priori knowledge on the subject of matter. We are taking meaning of environmental health in a broad sense: e.g. effects of seasonality, meteorological factors, pollution, socio-cultural environment and the like. In the presentation, we will show several applications gained with this approach. Besides, we will discuss the methodology of modeling and estimation of climate change health effects.
Propensity Score Weighting with Multilevel Data
Fan Li (Duke University)
Propensity score methods are being increasingly used as a less parametric alternative to traditional regression to balance observed differences across groups in both descriptive and causal comparisons. Data collected in many disciplines often have analytically relevant multilevel (clustered) structure. The propensity score, however, has been developed and used primarily with unstructured data. We present and compare several propensity-score-weighted estimators for clustered data, including marginal, cluster-weighted and doubly-robust estimators. Using both analytical derivations and Monte Carlo simulations, we illustrate bias arising when the usual assumptions of propensity score analysis do not hold for multilevel data. We show that exploiting the multilevel structure, either parametrically or nonparametrically, in at least one stage of the propensity score analysis can greatly reduce these biases. These methods are applied to a study of racial disparities in breast cancer screening among beneficiaries in Medicare health plans. (This is a joint work with Alan Zaslavsky and Mary Beth Landrum.)
Ultimo aggiornamento 18 marzo 2013.