Regressione di Poisson: un nuovo metodo di trasformazione dei residui
Giuseppe Rossi, Laura Marchini, Marco Marchi
Residuals are used for graphical and numerical examinations of the adequacy of regression models. In generalized linear models residuals are transformed into standard normal residuals with an approximately normal distribution, zero-mean and unit variance. For Poisson regression different functional forms of standard normal residuals are available. In this paper, we present a new form of standard normal residuals and compare it with those more commonly used. Performances are compared by computers simulations. An application to real data in the field of environmental epidemiology is shown.
Small Area Estimation Using Spatial Information. The Rathbun Lake Watershed Case Study
Alessandra Petrucci, Nicola Salvati
The paper describes an application of a modified small area estimator to the data collected in the Rathbun Lake Watershed in Iowa (USA). Opsomer et al. (2003) estimated the average erosion per acre for 61 sub-watersheds within the study region using an empirical best linear unbiased predictor (EBLUP) and a composite estimator. The proposed methodology considers an EBLUP estimator with spatially correlated error taking into account the information provided by neighboring areas.
Small Area Estimation by Spatial Models: the Spatial Empirical Best Linear Unbiased Prediction (Spatial EBLUP)
The methods used for Small Area Estimation can be classiﬁed by the type of inference in design based (direct domain estimators), model assisted (synthetic and composite estimators) and model based (small area models). This paper deals the small area models and introduces spatial dependence among the random area effects. In fact, especially in most of applications on environmental data, it should be more reasonable to assume that the random effects between the neighboring areas are correlated and the correlation decays to zero as distance increases. Considering the spatial dimension of the data, a model with spatially autocorrelated errors has to be implemented, a modiﬁed estimator (Spatial EBLUP) is provided and the Mean Square Error (MSE) estimator is obtained. The empirical analysis is carried out on some simulated experiments and the results show an appreciable improvement of the statistical properties of the small area estimators.
OOBN For Forensic Identification via a Search in a Database of DNA profiles
David Cavallini, Fabio Corradi, Giuseppina Guagnano
In this paper we evaluate the evidence for pairs of competitive and exhaustive hypotheses derived from a characteristic observed on a crime sample and on individuals contained in a database. The subject considered here takes into account a debate which has recently appeared in the literature concerning the appropriateness of different sets of hypotheses. First we demonstrate the problem via a computational efficient Bayesian Network obtained by transforming some recognized conditional specific independencies into conditional independencies. Then an Object Oriented Bayesian Network representation is proposed first for a generic characteristic, then considering inheritable DNA traits. In this respect we show how to use the Object Oriented Bayesian Network to evaluate the hypotheses that some individuals genetically related to the database members are the donors of the crime sample.
wp2004_05 (Statistics for experimental and technological research)
On-line Bayesian estimation of AR signals in symmetric alpha-stable noise
Marco J. Lombardi, Simon J. Godsill
In this paper we propose an on-line Bayesian filtering and smoothing method for time series models with heavy-tailed alpha-stable noise, with a particular focus on TVAR models. alpha-stable processes have been shown in the past to be a good model for many naturally occurring noise sources. We first point out how a filter that fails to take into account the heavy-tailed character of the noise performs poorly and then examine how an alpha-stable based particle filter can be devised to overcome this problem. The filtering methodology is based on a scale mixtures of normals (SMiN) representation of the alpha-stable distribution, which allows efficient Rao-Blackwellised implementation within a conditionally Gaussian framework, and requires no direct evaluation of the alpha-stable density, which is in general unavailable in closed form. The methodology is shown to work well, outperforming the traditional Gaussian methods both on simulated data and on real audio data sets.
wp2004_06 (Economic Statistics)
Forme di aggregazione di dati e data-mining
This paper deals with the use of aggregated data in the analysis of very large data sets. This idea apparently contrasts with the proper nature of data mining which is mainly concerned with modelling and analysing data at micro level. However, aggregation, smoothing and summarization are approaches of data reduction that are widely used in data mining. After the definition of aggregation, smoothing and summarization, the paper is concerned with the problems and consequences connected with the use of aggregate data in two situations: (1) the computation of dependence measures (we considered the correlation index in a simple linear model, and a measure of monotonous dependence based on the mean difference); (2) the classification of units through linear discriminant analysis.
wp2004_07 (Statistics, Econometrics)
Indirect estimation of alpha-stable distributions and processes
Marco J. Lombardi, Giorgio Calzolari
The alpha-stable family of distributions constitutes a generalization of the Gaussian distribution, allowing for asymmetry and thicker tails. Its practical usefulness is coupled with a marked theoretical appeal, as it stems from a generalized version of the central limit theorem in which the assumption of the finiteness of the variance is replaced by a less restrictive assumption concerning a somehow regular behavior of the tails. Estimation difficulties have however hindered its diffusion among practitioners. Since simulated values from alpha-stable distributions can be straightforwardly obtained, the indirect inference approach could prove useful to overcome these estimation difficulties. In this paper we provide a description of how to implement such a method by using a skew-t distribution as an auxiliary model. The indirect inference approach will be introduced in the setting of the estimation of the distribution parameters and then extended to linear time series models with alpha-stable disturbances. The performance of this estimation method is then assessed on simulated data. An application on time-series models for the inflation rate concludes the paper.
wp2004_08 (Statistics for experimental and technological research)
Chain Graphs for Multilevel Models
Anna Gottard, Carla Rampichini
The present work proposes a possible solution to extend graphical models for correlated data. Particularly, the paper focused on hierarchical data structures, considering two-level random intercept models. The proposed solution allows to use the existing chain graphs theory in a straightforward way. After a brief introduction to multilevel models and a description of the conditional independencies derived from the model, the paper defines chain graphs for multilevel models. Some examples illustrates and an application to real data show the usefulness of the proposed models.
Inference from mitochondrial data in forensic identification
The strength of the evidence against an incriminated individual in cases of forensic identification is presented in the form of a likelihood ratio or its reciprocal (profile match probability). This paper is concerned with methods for calculating profile match probability, for personal identification, when analysis is based on mitochondrial DNA. The current method for estimating profile match probability is based on the frequency of sequences within databases. Such estimate is not statistically rigorous since complete database has not been yet compiled. The aim of this paper is to develop a method for analysing data, that allow for the effect of the mutation process which affect mitochondrial DNA molecule evolution, and computing profile match probability in the framework of a fully likelihood based approach.
Small Area Estimation considering spatially correlated errors: the unit level random effects model
Alessandra Petrucci, Nicola Salvati
The paper describes an application of a modified small area estimator to the data collected in the Rathbun Lake Watershed in Iowa (USA). Opsomer et al. (2003) estimated the average erosion per acre for 61 sub-watersheds within the study region using an empirical best linear unbiased predictor (EBLUP). The proposed methodology considers an EBLUP estimator with spatially correlated error taking into account the information provided by neighboring areas.
Bayesian inference for alpha-stable distributions: a random walk MCMC approach
Marco J. Lombardi
The alpha-stable family of distributions constitutes a generalization of the Gaussian distribution, allowing for asymmetry and thicker tails. Its practical usefulness is coupled with a marked theoretical appeal, given that it stems from a generalized version of the central limit theorem in which the assumption of the finiteness of the variance is replaced by a less restrictive assumption concerning a somehow regular behavior of the tails. The absence of the density function in a closed form and the associated estimation difficulties have however hindered its diffusion among practitioners. In this paper I introduce a novel approach for Bayesian inference in the setting of alpha-stable distributions that resorts to a FFT of the characteristic function in order to approximate the likelihood function; the posterior distributions of the parameters are then produced via a random walk MCMC method. Contrary to the other MCMC schemes proposed in the literature, the proposed approach does not require auxiliary variables, and so it is less computationally expensive, especially when large sample sizes are involved. A simulation exercise highlights the empirical properties of the sampler; an application on audio noise data demonstrates how this estimation scheme performs in practical applications.
wp2004_12 (Econometrics, Statistics)
A Comparison of Complementary Automatic Modeling Methods: RETINA and PcGets
Teodosio Perez-Amaral, Giampiero M. Gallo, Hal White
In Perez-Amaral, Gallo, and White (2003), the authors proposed an automatic predictive modelling tool called Relevant Transformation of the Inputs Network Approach (RETINA). It is designed to embody flexibility (using nonlinear transformations of the predictors of interest), selective search within the range of possible models, control of collinearity, out-of-sample forecasting ability, and computational simplicity. In this paper we compare the characteristics of RETINA with PcGets, a well-known automatic modeling method proposed by David Hendry. We point out similarities, differences, and complementarities of the two methods. In an example using US telecommunications demand data we find that RETINA can improve both in- and out-of-sample over the usual linear regression model, and over some models suggested by PcGets. Thus, both methods are useful components of the modern applied econometrician’s automated modelling tool chest.
Published as Econometric Theory, Volume 21, pp. 262-277, 2005, published.
Handling Manipulated Evidence
Gianluca Baio, Fabio Corradi
Bayesian Networks have been advocated as a useful tool to describe the relations of dependence/independence among random variables and relevant hypotheses in a crime case. Moreover, they have been applied to help the investigator structure the problem and weight the observed evidence, typically with respect to the hypothesis of guilt of a suspect. In this paper we describe a model to handle the possibility that one or more pieces of evidence have been manipulated in order to mislead the investigations. This method is based on causal inference models, although it is developed in a different, specific framework.
Ultimo aggiornamento 4 giugno 2010.