**Abstract 2003**

**Analyzing a randomized trail on breast self-examination with noncompliance and missing outcomes**

**Fabrizia Mealli, Guido W. Imbens, Salvatore Ferro, Annibale Biggeri**

Recently instrumental variables methods have been used to address non-compliance in randomized experiments. Complicating such analyses is often the presence of missing data. The standard model for missing data, Missing At Random (MAR), has some unattractive features in this context. In this paper we compare MAR-based estimates of the Complier Average Causal Effect (CACE) with an estimator based on an alternative, non-ignorable model for the missing data process, developed by Frangakis and Rubin (1999). We also introduce a new missing data model that, like the Frangakis-Rubin model, is specially suited for models with instrumental variables, but makes different substantive assumptions. We analyze these issues in the context of a randomized trial of breast self-examination (BSE). In the study two methods of teaching BSE, consisting of either mail information about BSE (standard treatment) or the attendance of a course (new treatment) involving theoretical and practical sessions, were compared with the aim of assessing whether teaching programs could increase BSE practice and improve examination skills. The study was affected by the two sources of bias mentioned above: only 55% of women assigned to receive the new treatment complied with their assignment and 35% of the women did not respond to the post-test questionnaire. The results suggest that the MAR assumption is less plausible here than some of the alternatives.

**Il "Laboratorio Virtuale di Probabilità e Statistica": una risorsa per la didattica in statistica**

**Marco J. Lombardi, Federico M. Stefanini**

The italian version of the Virtual Laboratory in Probability and Statistics by Kyle Siegrist is introduced. The project is freely available at the URL http://www.ds.unifi.it/VL.

**A Flexible Tool for Model Building: the Relevant Transformation of the Inputs Network Approach (RETINA)**

**Teodosio Perez-Amaral, Giampiero M. Gallo, Hal White**

A new method, called relevant transformation of the inputs network approach (RETINA) is proposed as a tool for model building and selection. It is designed to improve on some of the shortcomings of neural networks. RETINA has the flexibility of neural network models, the concavity of the likelihood in the weights of the usual linear models, and the ability to identify a parsimonious set of attributes that are likely to be relevant for predicting out of sample outcomes. It achieves flexibility by considering transformations of the original inputs; it splits the sample into three disjoint subsamples, sorts the candidate regressors by a saliency feature, chooses the models in subsample 1, uses subsample 2 for parameter estimation, and uses subsample 3 for cross-validation. It is modular, can be used as a data exploratory tool, and is computationally feasible in personal computers. In tests on simulated data, it achieves high rates of successes when the sample size or the R2 are large enough. As our experiments show, it is superior to alternative procedures such as the non-negative garrote and backward stepwise regression.

Published as *Oxford Bulletin of Economics and Statistics*, Volume 65-s1, pp. 821-838, 2003, published.

**Observational data and optimal experimental design discriminating between more than two models**

**Rossella Berni**

In this work we focus on observational data and on the use of large data sets to implement an experimental design without additional runs, for an efficient use of these data, exploiting theory about optimal designs. More specifically, the proposed procedure is based on several steps and is an extension of a previous work. Furthermore, the final optimal design can be obtained discriminating between two models or more than two models. In this last case the suggested algorithm must be revised in order to discriminate between several equally close models assigning a weight to each one. The final goal is an experimental design that can be assumed as optimal by the point of view of a sequential optimality.

Published as *Statistica applicata*, Volume 17, Issue 1, pp. 20, 2003, published.

**Estimates of the short term effects of air pollution in Italy using alternative modelling techniques**

**Annibale Biggeri, Michela Baccini, Gabriele Accetta, Corrado Lagazio, Joel Schwartz**

Recently serious criticism was raised about the use of standard statistical software to fit Generalized Additive Models (GAM) to epidemiological time series data. Inappropriate settings of convergence parameters in the backfitting algorithm (implemented by Splus) results in inaccurate inference about the effect of linear covariates. Moreover standard errors are underestimated, since only the linear component of the smoother(s) is included in the variance-covariance matrix computation. A Splus macro for approximate standard errors has been recently proposed. We analysed the association between PM10 and Mortality/Hospital Admissions in the Italian Meta-analysis of Short-term effects of Air pollutants (MISA), using GAM with penalized regression spline fitted by the direct method in R software (GAM-R), which correctly computes the variance-covariance matrix. A comparison with default GAM with smoothing spline fitted via backfitting in Splus (GAM-S) and with Generalized Linear Models with natural cubic spline (GLM+NS) is provided. GLM+NS and GAM-R give similar results. For total mortality GLM+NS and GAM-R gave respectively an overall percent increase of 0.98 (95 percent confidence interval: 0.35,1.61; random effects model) and 1.04 (0.41,1.67) for 10 mg/m3 of PM10, compared to the GAM-S estimate of 1.24 (0.63,1.86). The GLM+NS and GAM-R standard errors are consistently higher than GAM-S ones

**A Multiple Indicators Model For Volatility Using Intra-Daily Data**

**Robert F. Engle, Giampiero M. Gallo**

Many ways exist to measure and model financial asset volatility. In principle, as the frequency of the data increases, the quality of forecasts should improve. Yet, there is no consensus about a “true” or "best" measure of volatility. In this paper we propose to jointly consider absolute daily returns, daily high-low range and daily realized volatility to develop a forecasting model based on their conditional dynamics. As all are non-negative series, we develop a multiplicative error model that is consistent and asymptotically normal under a wide range of specifications for the error density function. The estimation results show significant interactions between the indicators. We also show that one-month-ahead forecasts match well (both in and out of sample) the market-based volatility measure provided by an average of implied volatilities of index options as measured by VIX.

Published as *Journal of Econometrics*, Volume 131, Issue 1-2, pp. 3-27, 2006; link, published.

**A way to solve the indeterminate parameters problem**

**Marco Barnabani**

A solution to the indeterminate parameters problem can be obtained forcing an asymptotic quadratic approximation of the log-likelihood to find a solution in a neighbourhood of the true parameter through the definition of a modified (penalized) log-likelihood function. The maximizing point of this function is consistent and asymptotically normally distributed with variance-covariance matrix approximated by the Moore-Penrose pseudoinverse of the information matrix. These properties allow one to construct a naive test in the Durbin sense which is a Wald-type test statistic with a "standard" distribution both under the null and alternative hypotheses.

**Bayesian Networks for Forensic Identification Via a Data Base Search of DNA Profiles**

**David Cavallini, Fabio Corradi**

In this paper we evaluate the evidence for pairs of competitive and exhaustive hypotheses obtained considering a characteristic observed on a crime sample and on individuals contained in a database. The subject considered here takes into account a debate which recently appeared in the literature concerning the appropriateness of different sets of hypotheses. First we demonstrate the problem via a computational efficient Bayesian Network (BN) obtained transforming some recognized conditional specific independencies into conditional independencies. Moreover in the proposed BN the sets of hypotheses proposed in the literature are included in the BN so that their role is better understood. Our BN is first proposed for a generic dichotomous characteristic but we are particularly interested in considering inheritable DNA traits. In this respect we show how to use the BN to evaluate the hypotheses that some individuals, who genetically related to the members of the database, are the donors of the crime sample.

wp2003_10 (Economic Statistics)

**Structural Economic Characteristics and Performance of Industry Sectors in Italy**

**Laura Grassini, Alessandro Viviani**

The present paper deals with a comparison of Italian industrial sectors in 2000, on the basis of a number of performance indicators. The analysis takes into account an important structural feature: firm’s size in terms of employment. A performance index is derived from an analysis of efficiency, based on the Data Enveloping Technique (DEA), which is a method for estimating non-parametric and deterministic frontier functions.

**Scelte di fecondità tra costrizioni economiche e cambio di valori: un'indagine sulle madri fiorentine**

**Letizia Mencarini, Silvana Salvini, M. Letizia Tanturri**

In this paper we present the main results of a survey carried out in 2002 on fertility of a group of mothers living in Florence (Italy). The survey has been conducted in the framework of a national research titled "Low fertility between economic constraints and values changes" financed by M.U.R.S.T 2000 and aimed to outline the causes and the consequences of the low fertility in Italy. It gathered information on mothers aged around 40 and their families. We examined reproductive behaviour, family choices, values and attitudes, educational and working career of these women and their partners, and their changes of economic status, gender roles and life-time after a child birth. The results are compared with those obtained through similar surveys carried out in other four Italian cities: Udine, Padua, Pesaro and Messina. On one hand, Florence shows some typical characteristics of post-modern European fertility and family structure: the percentage of women in non marital union is higher compared to the other urban contexts. On the other hand, it is similar to the other Italian cities: e.g. regarding the late transition to adulthood, made of the typical and sequential steps of exit from school and from parents' household, entry the labour market and marital union and, finally, assumption of responsibility of motherhood. Most of women participated to the labour market, but the negative relationship between work and fertility persists. This is probably also due to the *asymmetric role-set* of couples, since even when both partners are full-time workers, the woman experiments the "dual burden" of job and domestic roles, in particular child-care.

Ultimo aggiornamento 21 luglio 2010.