Seminari - Abstracta.a.
2010-2011 |

**Abstract 2010-2011**

**Scale parameters integration for marginal likelihood estimation in conditionally linear state space models**

**Gabriele Fiorentini (Università di Firenze)**

We show that the calculation of the marginal likelihood in the popular class of conditionally linear state space models can be improved by preliminary integration of the scale parameters. We explain how to integrate scale parameters out of the likelihood function by Kalman filtering and Gaussian quadrature. We find that this preliminary integration improves the accuracy of four marginal likelihood estimators, namely the Laplace method, the Chib's estimator, reciprocal importance sampling, and bridge sampling. For some simple but empirically relevant model specifications such as the local level and the local linear models the marginal likelihood can be obtained directly without any posterior sampling. Some examples show the details about the practical implementation of our method and its simplicity. Finally, two applications illustrate the gain in accuracy achieved. (Joint work with Christophe Planas and Alessandro Rossi)

**Merger Simulation in a Two-Sided Market: The Case of the Dutch Daily Newspapers**

**Lapo Filistrucchi (Università di Firenze and TILEC, Tilburg University)**

We develop a structural econometric framework that allows us to simulate the effects of mergers among two-sided platforms selling differentiated products. We apply the proposed methodology to the Dutch newspaper industry. Our structural model encompasses demands for differentiated products on both sides of the market and profit maximization by competing oligopolistic publishers who choose subscription and advertising prices, while taking the interactions between the two-sides of the market into account. We measure the sign and size of the indirect network e_ects between the two sides of the market and simulate the effects of a hypothetical merger on prices and welfare. (Joint paper with Tobias J. Kleinz and Thomas Michielsenx).

**The scientific collaboration network(s) of Italian academic statisticians**

**Susanna Zaccarin (Università di Trieste)**

Scientific collaboration is a complex phenomenon characterized by an intense form of interaction among scientists improving the diffusion of knowledge in the community. In this contribution the network of scientific collaboration among Italian statisticians will be analyzed along the lines discussed for other disciplines. Relationships among Italian statistician will be derived from information on their research products (scientific publications). To this aim, several data sources can be considered (from international bibliographic databases as reported in ISI web of Science or Scopus to national archives). Each data source has benefits and limitations related to both the coverage of the population of tatisticians and the inclusion of different kind of publication(papers in international journals, books, working papers). The potential of integrated information to study the patterns of interactions in a research community will also be exploited.

**Weighting the Elementary Price Relatives in Time Consumer Price Index (CPI) Construction**

**Beatriz Larraz (University of Castilla-La Mancha, Spain)**

The importance of inflation/deflation in current economic systems claims for it to be measured in an as much accurate way as possible. Consumer Price Index (CPI) has been taken as the tool for inflation/deflation measurement. This is a Laspeyres type index which suffers, inter alia, from the very well-known drawbacks of having a fixed-base weighting system, and therefore of not permitting to catch the substitution effect and the quality modifications. Moreover, in actual CPI estimation, the elementary price relatives are not weighted at the first aggregation level, made both as an arithmetic and a geometric mean, due to the lack of suitable information, that transforms into the impossibility of obtaining a reliable weighting structure. This implies the assumption that the quotations of the same item in different outlets have the same importance, whereas it is not so as the spatial location undoubtedly has an influence on prices. As a consequence, a bias is introduced since the first step of the inflation/deflation estimation procedure. Since the prices are collected in geographical locations, i.e., they are geo-referenced data, it is possible to make resort to the outlets coordinates to set up a weighting system, provided these coordinates are surveyed along with prices. This paper suggests a new approach to the elaboration of the above arithmetic and geometric means based on Kriging methodology. In particular, it proposes to weight the elementary price relatives by taking account of the spatial correlation that these prices present. It is shown that the weighted geometric and arithmetic means of elementary price relatives estimators obtained are better than the simple geometric and arithmetic means estimators. (Joint work with Guido Ferrari, University of Florence, Italy & Renmin University of China, PR China)

**The state of the art in indicators construction**

**Filomena Maggino (Università di Firenze)**

Measuring social phenomena requires a multidimensional and integrated approach aimed at describing them through a complex multifaceted and compound methodology. The aim of the seminar is to unravel some important methodological aspects and issues that should be considered in developing indicators aimed at measuring social phenomena in quantitative perspective. The purpose is to focus on, examine closely and investigate - the conceptual issues in defining and developing indicators and - the methodologies and operative issues in managing the complexity of the obtained observation, integrating different aspects of the reality.

**A MCMC data augmentation algorithm for contaminated case-control analyses**

**Fabio Divino (Università del Molise)**

In molti ambiti applicativi esiste un grande interesse verso lo studio di modelli parametrici di regressione per disegni campionari non casuali. Una situazione molto comune riguarda i cosiddetti dati di sola presenza in cui si vuole studiare la relazione fra un insieme di covariate informative (X) ed una variabile di risposta binaria (Y) nota solo nel caso Y=1. In tal senso uno schema campionario che viene adottato è quello di considerare due gruppi distinti di osservazioni. Il primo gruppo è rappresentato da un campione casuale di presenze (Y=1) mentre il secondo è un campione casuale selezionato dall'intera popolazione di riferimento P per cui solo le variabili esplicative sono osservate e non la variabile risposta Y. Questo tipo di schema, molto utilizzato in forma retrospettiva, è noto in letteratura come disegno caso-controllo contaminato (Hsieh et al., 1985) in quanto le osservazioni campionarie relative ai controlli (Y=0) sono contaminate da un meccanismo di censura. In ambito ecologico, ad esempio, è molto frequente che, per popolazioni animali o vegetali, sia difficile determinare un campione di assenze; in tal senso l'inferenza sui modelli di regressioni deve basarsi su un'informazione solo parziale della variabile risposta. Per affrontare questo tipo di situazione diverse soluzioni sono state presentate in letteratura. Di recente Ward et al. (2009) hanno proposto un algoritmo di tipo EM per studiare dati di sola presenza basato sulla funzione di verosimiglianza. Tale approccio, nella sua formulazione più efficiente, prevede però la conoscenza a priori della prevalenza marginale della variabile Y, assunzione molto difficile da accettare nella pratica. In questo lavoro vogliamo presentare un algoritmo di tipo MCMC che può essere utilizzato soprattutto in ambito Bayesiano per la stima di un modello logistico lineare. Il contributo principale riguarda l'introduzione di un'approssimazione aleatoria del fattore di correzione del modello caso-controllo contaminato. In tal senso, attraverso un passo di data-augmentation, è possibile stimare i parametri di regressione congiuntamente alla prevalenza marginale di Y. I risultati ottenuti, relativi a simulazioni di diverse situazioni sperimentali, sono molto incoraggianti soprattutto per la grande efficacia previsiva sul parametro di prevalenza. Anche la precisione e l'efficienza delle stime di regressione sono buone ma, come è naturale che sia, restano in parte condizionate al grado di informatività delle covariate X. Questo lavoro si svolge in collaborazione con Natalia Golini (Università di Roma), Giovanna Jona Lasinio (Università di Roma) e Antti Penttinen (University of Jyvaskyla).

**La SAM come strumento di analisi delle politiche: un'applicazione al caso toscano.**

**Stefano Rosignoli (Irpet)**

Le SAM (Social Accounting Matrix) sono matrici di dati che rappresentano i flussi che intercorrono tra diversi soggetti (branche produttive e settori istituzionali) di un sistema economico in un intervallo di tempo (generalmente l’anno). Nascono all’interno della teoria economica tradizionale come una estensione dei modelli input-output e sono largamente utilizzate per l’analisi delle economie in via di sviluppo e recentemente sono tornate ad essere oggetto di interesse anche per lo studio delle economie sviluppate grazie alla maggiore disponibilità, affidabilità e standardizzazione dei dati di contabilità nazionale e regionale. Le SAM costituiscono la base informativa di un’ampia gamma di modelli multisettoriali, sviluppati spesso all’interno di quadri teorici alternativi (modelli lineari, modelli di equilibrio economico generale, modelli di simulazione micro-macro): la flessibilità della loro struttura consente la calibrazione di modelli per analizzare parti specifiche dell’economia pur rimanendo all’interno di una completa e coerente cornice macroeconomica: ad esempio per studiare l’impatto macroeconomico di particolari politiche settoriali, oppure per analizzare la differenziazione geografica degli impatti (modelli multi regionali, disaggregazione rurale-urbana dell’economia). Malgrado le difficoltà teoriche e pratiche di costruzione di SAM al regionali, questo ambito territoriale risulta particolarmente adeguato per eseguire analisi e simulazione SAM-based, ancora di più se consideriamo la crescente importanza delle politiche economiche in ambito federalista. Il seminario ha lo scopo di illustrare la struttura e le linee guida seguite dall’IRPET nella costruzione della SAM dell’economia della Toscana e a illustrarne le sue potenzialità in termini applicativi. (Lavoro in collaborazione con Benedetto Rocchi).

**Measuring the Size and Structure of the World Economy: The International Comparison Program**

**Prasada Rao (School of Economics, University of Queensland, Australia)**

The International Comparison Program (ICP) has become the world’s largest international statistical activity that included 100 countries in 2005 plus the 46 additional countries in the Eurostat-OECD program . The ICP is a global statistical initiative that supports inter-country comparisons of Gross Domestic Product and its components using Purchasing Power Parities as a currency converter. The foundation of the ICP is the comparison of national prices of a well defined basket of goods and services under the conceptual framework of the System of National Accounts. While the ICP shares a common language and conceptual framework with national statistical systems for measuring the Consumer Price Index and their national accounts, it faces unique challenges in providing statistical methodology that can be carried out in practice by countries differing in size, culture, the diversity of goods and services available to their population, and statistical capabilities.

**Inequality, Health, and Mortality**

**Prasada Rao (School of Economics, University of Queensland, Australia)**

This paper proposes a new method to measure health inequalities that are caused by conditions amenable to policy intervention. The method is built on a technique that can separate avoidable and unavoidable mortality risks, using world mortality data compiled by the World Health Organization for the year 2000. The new method is applied to data from 191 countries. It is found that controlling for unavoidable mortality risks leads to a lower estimate of health inequality than otherwise, especially for developed countries. Furthermore, although countries with a higher life expectancy at birth tend to have lower health inequality, there are significant variations in health inequalities across countries with the same life expectancy. The results therefore support the WHO's plea for using health inequality as a distinct parameter from the average level of health in assessing the performance of health systems.

**Bayesian wavelet-based curve classification via discriminant analysis with Markov random tree priors**

**Francesco Stingo (Rice University)**

Discriminant analysis is an effective tool for the classification of experimental units into groups. When the number of variables is much larger than the number of observations it is necessary to include a dimension reduction procedure into the inferential process. Here we present a typical example from chemometrics that deals with the classification of different types of food into species via near infrared spectroscopy. We take a nonparametric approach by modeling the functional predictors via wavelet transforms and then apply discriminant analysis in the wavelet domain. We consider a Bayesian conjugate normal discriminant model, either linear or quadratic, that avoids independence assumptions among the wavelet coefficients. We introduce latent binary indicators for the selection of the discriminatory wavelet coefficients and propose prior formulations that use Markov random tree (MRT) priors to map scale-location connections among wavelets coefficients. We conduct posterior inference via MCMC methods, we show performances on our case study on food authenticity and compare results to several other procedures.

**Recommended tests for association in 2x2 tables**

**Stian Lydersen (Norwegian University of Science and Technology)**

Testing for association in 2x2 contingency tables is one of the most common tasks in applied statistics. A well established approach is to use Pearson’s chi-squared test in large samples and Fisher’s exact test in small samples. Both tests have drawbacks which are not widely known. The true significance level of Pearson’s chi-squared test is often larger than the nominal significance level. The true significance level of Fisher’s exact test is unnecessarily low, with poor power as the result. Better tests are available. Unconditional exact tests produce accurate P-values, have high power, and are available in commercial and free software packages. Another approach, called the mid-p, gives about the same results as an unconditional test. The traditional Fisher’s exact test should practically never be used. (Jointly with Morten Fagerland and Petter Laake)

**Point pattern modeling for degraded presence-only data over large regions**

**Alan Gelfand (Duke University)**

Explaining species distribution using local environmental features is a long standing ecological problem. Often, available data is collected as a set of presence locations only thus precluding the possibility of a presence-absence analysis. We propose that it is natural to view presence-only data for a region as a point pattern over that region and to use local environmental features to explain the intensity driving this point pattern. This suggests hierarchical modeling, treating the presence data as a realization of a spatial point process whose intensity is governed by environmental covariates. Spatial dependence in the intensity surface is modeled with random effects involving a zero mean Gaussian process. Highly variable and typically sparse sampling effort as well as land transformation degrades the point pattern so we augment the model to capture these effects. The Cape Floristic Region (CFR) in South Africa provides a rich class with such species data. The potential, i.e., nondegraded presence surfaces over the entire area are of interest from a conservation and policy perspective. Our model assumes grid cell homogeneity of the intensity process where the region is divided into 37, 000 grid cells. To work with a Gaussian process over a very large number of cells we use predictive process approximation. Bias correction by adding a heteroscedastic error component is implemented. The model was run for a number of different species. Model selection was investigated with regard to choice of environmental covariates. Also, comparison is made with the now popular Maxent approach, though the latter is much more limited with regard to inference. In fact, inference such as investigation of species richness immediately follows from our modeling framework.

**The Quality of Surveys**

**Jelke Bethlehem (University of Amsterdam)**

Surveys research is a type of research where data is collected by asking questions to a sample of persons from a population. On the basis of the collected data, conclusions are drawn about the population as a whole. The question is whether this always is a scientifically sound research method. The presentation starts with an historic overview of how surveys indeed became a reliable research instrument. Nowadays there are two important issues that may affect the quality of survey results. One of these is the phenomenon of nonresponse. It often causes estimates to be biased. Many countries suffer from increasing nonresponse rates and this makes the problem more serious. The nonresponse problem is discussed and also some techniques that may help to reduce problems. However, success is not guaranteed. The fast development of the Internet has led to a new form of survey, and this is the web survey. Almost everybody can do a web survey. There are many examples of badly designed web surveys. They suffer from methodological problems like under-coverage, self-selection and measurement problems. The effects of under-coverage and self-selection on the quality of survey outcomes are discussed in more detail.

**Introducing Bayes Factors**

**Leonhard Held (University of Zurich)**

Statistical inference is traditionally taught exclusively from a frequentist perspective. If Bayesian approaches are discussed, then Bayesian parameter estimation is described only, perhaps showing the formal equivalence of a Bayesian reference analysis and the frequentist approach. However, the Bayesian approach to hypothesis testing and model selection is intrinsically different from a classical approach and offers key insights into the nature of statistical evidence. In this talk I will give an elementary introduction to Bayesian model selection with Bayes factors. I will then summarize important results on the relationship between P-values and Bayes factors. A universal finding is that the evidence against a simple null hypothesis is by far not as strong as the P-value might suggest. If time permits, I will also describe more recent work on Bayesian model selection in generalized additive models using hyper-g priors.

**Markov equivalences for loopless mixed graphs**

**Kayvan Sadeghi (Oxford University)**

In this talk we describe a class of graphs with three types of edges, called loopless mixed graphs (LMGs). The class of LMGs contains almost all known classes of graphs used in the literature of graphical Markov models as its subclasses. We discuss motivations behind using LMGs, and define a unifying interpretation of independence structure for LMGs. We also propose four problems regarding Markov equivalences for LMGs and tackle some of them.

Ultimo aggiornamento 21 settembre 2011.