Weighted estimation in multilevel ordinal models to allow for informativeness of the sampling design
Leonardo Grilli, Monica Pratesi
Multilevel ordinal models are often fitted to survey data gathered with a complex multistage sampling design. However, if such a design is informative, in the sense that the inclusion probabilities depend on the response variable even after conditioning on the covariates, then standard maximum likelihood estimators are biased. In this paper, following the Pseudo Maximum Likelihood (PML) approach of Skinner (1989), we propose a probability-weighted estimation procedure for multilevel ordinal models which eliminates the bias generated by the informativeness of the design. The reciprocals of the inclusion probabilities at each sampling stage are used to weight the log-likelihood function and the weighted estimators obtained in this way are tested by means of a simulation study. The variance estimators are obtained by a bootstrap procedure. The maximization of the weighted log-likelihood of the model is done by the NLMIXED procedure of the SAS, which is based on adaptive Gaussian quadrature. Also the bootastrap estimation of variances is implemented in the SAS environment.
Published as Survey Methodology, Volume 30, Issue 1, pp. 93-103, 2004, published.
Grouped continuous and continuation ratio discrete time versions of the proportional hazard model with random effects: are they really equivalent?
When analyzing grouped time survival data having a hierarchical structure it is often appropriate to assume a random effects proportional hazards model for the latent continuous time and then derive the corresponding grouped time model. There are two formally equivalent grouped time versions of the proportional hazards model obtained from a different perspective, known as continuation ratio (Kalbfleisch and Prentice, 1973) and grouped continuous (McCullagh, 1980). However the two models require distinct estimation procedures and, more important, they differ substantially with respect to the extensibility to time-dependent covariates and/or non proportional effects. The paper discusses these issues in the context of random effects models, illustrating the main points with an application to a complex data set on job opportunities for a cohort of graduates.
Published as Journal of the Royal Statistical Society - Series A, Volume 168, Issue 1, pp. 83-94, 2005; link, published.
Analytic Hessian Matrices and the Computation of FIGARCH Estimates
Marco J. Lombardi, Giampiero M. Gallo
Long memory in conditional variance is one of the empirical features of most financial time series. One class of models that was suggested to capture this behavior refers to the so-called Fractionally Integrated GARCH processes (Baillie, Bollerslev and Mikkelsen 1996) in which the ideas of fractional integration originally introduced by Granger (1980) and Hosking (1981) for processes for the mean are applied to a GARCH framework. In this paper we derive analytic expressions for the second-order derivatives of the log-likelihood function of FIGARCH processes with a view to the advantages that can be gained in computational speed and estimation accuracy. The comparison is computationally intensive given the typical sample size of the time series involved and the way the likelihood function is built. An illustration is provided on exchange rate and stock index data.
Published as Statistical Methods and Applications, Volume 11, Issue 2, pp. 247-264, 2002; link, published.
wp2002_04 (Applied Statistics)
A Space-Time analysis of the relationship between Socioeconomic factors and Mortality for Lung Cancer
The relationship between socioeconomic factors and mortality for lung cancer is investigated. A space-time hierarchical Bayesian model with time dependent covariates, to define the proper lag time on which lung cancer mortality is affected by socioeconomic factors, is adopted. An example on males lung cancer in Tuscany (Italy) during the period 1971-99 is provided. Results confirm the presence of an association between mortality for lung cancer and socioeconomic factors with a temporal lag of ten-fifteen years (latency time).
Published as Environmetrics, Volume 14, Issue 5, pp. 511-521, 2003, published.
Spazi campionari e polimorfismo nelle regioni HV1 e HV2 del DNA mitocondriale
Federico M. Stefanini
Mitochondrial DNA evidences are becoming very popular in forensic science. The efficient use of molecular data needs of statistical models in which the hierarchical structure of the population of molecules is explicitly acknowledged. In this work a hierarchy of sampling spaces is introduced to formalize the relationship existing among levels of observation in the HV1 and the HV2 regions of mtDNA. The formalized hierarchy might be useful in the elicitation of prior information, in the selection of biological assumptions and to assist in the development of MCMC code.
GARCH-based Volatility Forecasts for Market Volatility Indices
Massimiliano Cecconi, Giampiero M. Gallo, Marco J. Lombardi
Volatility forecasting is one of the main issues in the financial econometrics literature. Volatility measures may be derived from statistical models for conditionalvariance, or from option prices. In recent times, indices have been suggested whichsummarize the implied volatility of widely traded market index options. One suchindex is the so-called VXN, an average of 30-day ahead implied volatilities of theoptions written on the NASDAQ-100 Index. In this paper we show how forecastsobtained with traditional GARCH-type models can be used to forecast the volatility index VXN.
Graphical Duration Models
Graphical models use graphs to represent conditional independence relationships among random variables. In this paper, the possibility of defining a special type of graphs including nodes to represent point processes has been explored: this could be relevant to the analysis of event history data, i.e. of data consisting of times and types of observed events, together with some explanatory variables observed on a number of individuals; survival analysis and duration models are particular cases. The idea consists of representing a whole process by a single node. The resulting models will be called graphical duration models.
wp2002_08 (Applied Statistics)
Alternative specifications of bivariate multilevel probit ordinal response models
Leonardo Grilli, Carla Rampichini
In the last years several methods for the analysis of ordinal multivariate multilevel data have been proposed (Muthén, 1994; Rabe-Hesketh et al., 2001; Mazzolli, 2001; Lillard and Panis 2000). The present paper highlights the interpretation of the variance-covariance parameters of the assumed multivariate distribution of the latent variables. Moreover, under the hypothesis of a multivariate Gaussian distribution, the paper illustrates some alternative specifications of the model, which have been proposed in order to use certain estimation algorithms yet implemented in the existing statistical software.
Published as Journal of Educational and Behavioral Statistics, Volume 28, Issue 1, pp. 31-44, 2003; link, published.
wp2002_09 (Applied Statistics)
The temporal dimension in data bases
The advantages brought by the proper management of information in terms of quality, integrity, and robustness in data manipulation is widely known. This is particularly true for statisticians that must grapple with multidimensionality in data, complex semantics in queries, and large quantities of data needing to be elaborated. In this context, an appropriate approach to temporal information is especially important: dynamic analyses are involved in many statistical fields, and information systems must be adequate to face the complexity of temporal logic. In particular, the structure of statistical sources must be optimised for a complete exploitation of their informative potentialities and in order to make them linkable to other sources, with the respect to their temporal coherence. The purpose of this paper is to reflect upon the potentialities that a correct approach to information management can bring to statistical analysis, with particular attention to the problem of temporal data storing.
wp2002_10 (Applied Statistics)
Un sistema informativo integrato per il monitoraggio delle condizioni di lavoro. Il caso della Tratta Alta Velocità 'Firenze-Bologna'
The aim of this work is the planning and the building of a statistical information system oriented to document and to monitor risks and activities during the construction of the new high speed railway line between Florence and Bologna (TAV). In this sense, after a first phase in which the whole information system has been modelled, the memory system has been planned and a statistical data base costructed. Quality oriented protocols and a statistical workbench for the integration between databases and packages have been set up.
Tassi di Cambio Reale: Teoria ed Evidenza Empirica
The main problem with the literature on the economics of real exchange rates is not the absence of persuading theoretical explanations, but their great fragmentation. Through a synthetic, but comprehensive overview of theoretical and empirical researches realized so far, this paper tries to provide useful hints for a future attempt of developing a general and organic explanation of the dynamics of real exchange rates.
wp2002_12 (Applied Statistics)
La soddisfazione dell'utente del servizio di emergenza (118). Analisi preliminare e implementazione del questionario tramite indagine pilota
Rossella Berni, Annibale Biggeri, Gianna Terni, Giovanni Bertini
In this paper we describe the results of the pilot survey investigating the customer satisfaction and the quality of an emergency medical service system (EMS) (house calls). More precisely, the aim of this work is related to the analysis of validation of the measures obtained by means of the questionnaire. The statistical methods applied are some descriptive indices to evaluate the association between questions, and the factorial analysis to verify the correspondence among the questions and the theoretical dimension of quality, supposed during the planning phase of the questionnaire. Unfortunately, due to the small size of the sample of respondents, it is not possible to perform the confirmative factorial analysis; nevertheless, the results are sufficiently good in order to propose the final questionnaire, notably reduced as respect the pilot questionnaire, above all regarding the evaluation of quality of service.
Wald-based Approach With Singular Information Matrix
When the information matrix is singular, the asymptotic properties of the maximum likelihood estimator are not clear and an approach to hypothesis testing involving this estimator is difficult to pursue. We propose an estimator based on the maximization of a modified (penalized) log-likelihood function. We show that this estimator is consistent and asymptotically normally distributed with a variance-covariance matrix approximated by the Moore-Penrose pseudoinverse of the information matrix. These properties allow one to construct a Wald-type test statistic which has a Chi-square distribution both under the null and the alternative hypotheses. Some examples show the relative simplicity of the solution proposed.
Volatility Estimation via Hidden Markov Models
Alessandro Rossi, Giampiero M. Gallo
In this paper we suggest a convenient way to obtain parameter estimates of a discrete state hidden Markov volatility process within a framework consistent with observed option prices and stochastic volatility. Relative to similar proposals, we simplify the model estimation by resorting to some parametric approximation of the model in a maximum likelihood context. We show how correlation between returns and volatility innovations can be easily accommodated within this framework. Empirical applications illustrate model search strategies for the SP500 stock index, comparing the performances to a standard GARCH model.
wp2002_15 (Applied Statistics)
Centre Sampling for Estimating Elusive Population Size
Monica Pratesi, Emilia Rocco
The estimation of the size of an elusive population is a problem frequently addressed in many fields of applications. In the paper a sampling strategy for the estimation of the size, named centre sampling, is proposed, and its properties are evaluated in a design-based approach. The estimator is admissible, consistent and the design is measurable. The expressions of the variance of the population size estimator and of its unbiased sample estimator are also proposed. The strategy is applied both to a simulated population and to the data collected with a CATI survey on the users of some libraries of the University of Florence.
wp2002_16 (Economic Statistics)
Evaluating the Efficiency of Human Capital formation in the Italian University: Evidence from Florence
Guido Ferrari, Tiziana Laureti
The purpose of this paper is to apply a generalized version of the Farrell measure of technical efficiency to estimate output-efficiency of the human capital formation in the University of Florence through the utilization of the Data Envelopment Analysis (DEA) on a selected set of inputs and outputs. We use the Program Evaluation procedure as well, in an attempt of sharing the variation in efficiency to factors that are out of the control and to factors that are under the control of the graduates.
wp2002_17 (Economic Statistics)
Analisi dei costi delle prestazioni dei laboratori di analisi
Laura Grassini, Maurizio Borsotti
The papers deals with an efficiency analysis of the public clinical laboratories operating in Tuscany. The efficiency evaluation for each single laboratory is made on the basis of three macro-processes which are identified through a cluster analysis, in order to aggregate those micro-processes (the single 'analita' or the specific clinical exam) that are similar in terms of the structure of direct costs. For measuring efficiency, we followed the free disposal hull approach by using data on total direct costs (input) and 'weighted number of clinical exams' (output). In fact, the aggregation of single micro-processes (more than 100) required the need of an aggregation criteria for the output of the derived macro-processes. The evalutation of inputs for the macro processes caused minor problems since the inputs are expressed in monetary value.
wp2002_18 (Applied Statistics)
Coalescence times and forensic identification problems
Paola Berchialla, Federico M. Stefanini
The interest in the application of the maternally inherited DNA (mtDNA) in forensic identification problem is due to the possibility of typing sequences from very small or degradade biological samples. In this paper we look at the distribution of pairwise differences to investigate the information carried by coalescence times. The analysis of actual data suggests that the distribution of pairwise coalescence times contains information suited to forensic identification purposes. Moreover, such information could be used as a tool to assess the level of human genetic diversity between populations. The method is illustrated using data from Mitochondrial DNA Concordance database.
Inflation Differentials before and after the EMU
This paper examines regional inflation divergence within the European EMU aiming at characterizing the properties of inflation differentials. The empirical evidence suggests that a process of price level convergence in the EMU is well on its way.
Labor-Cost Effects on Relative Prices between Regions of a Monetary Union: Implications for the EMU
Three industrial organization (IO) models suggested by Dornbusch (1987) are here adapted to study the labor-cost effects on relative prices of tradable goods between the regions of a monetary union. The assumption of imperfect and segmented goods and labor markets makes the analysis best suited for a monetary union such as the present European EMU. We find that in this context the type and extent of the effects studied depend on the different degree of competition in the same industry of different countries, the different absolute or relative number of foreign and domestic firms in the market of each country, and the product substitutability. Very simple simulations show the potentiality of IO models in a dynamic framework.
Ultimo aggiornamento 29 settembre 2010.