A misspecification test for finite-mixture logistic models for clustered binary and ordered responses

Bartolucci, Francesco and Bacci, Silvia and Pigini, Claudia (2015): A misspecification test for finite-mixture logistic models for clustered binary and ordered responses.

This is the latest version of this item.

Preview

PDF
MPRA_paper_64220.pdf
Download (422kB) | Preview

Abstract

An alternative to using normally distributed random effects in modeling clustered binary and ordered responses is based on using a finite-mixture. This approach gives rise to a flexible class of generalized linear mixed models for item responses, multilevel data, and longitudinal data. A test of misspecification for these finite-mixture models is proposed which is based on the comparison between the Marginal and the Conditional Maximum Likelihood estimates of the fixed effects as in the Hausman’s test. The asymptotic distribution of the test statistic is derived; it is of chi-squared type with a number of degrees of freedom equal to the number of covariates that vary within the cluster. It turns out that the test is simple to perform and may also be used to select the number of components of the finite-mixture, when this number is unknown. The approach is illustrated by a series of simulations and three empirical examples covering the main fields of application.

Item Type:	MPRA Paper
Original Title:	A misspecification test for finite-mixture logistic models for clustered binary and ordered responses
Language:	English
Keywords:	Generalized Linear Mixed Models, Hausman Test, Item Response Theory, Latent Class model, Longitudinal data, Multilevel data
Subjects:	C - Mathematical and Quantitative Methods > C1 - Econometric and Statistical Methods and Methodology: General > C12 - Hypothesis Testing: General C - Mathematical and Quantitative Methods > C2 - Single Equation Models ; Single Variables > C23 - Panel Data Models ; Spatio-temporal Models C - Mathematical and Quantitative Methods > C5 - Econometric Modeling > C52 - Model Evaluation, Validation, and Selection
Item ID:	64787
Depositing User:	Dr Claudia Pigini
Date Deposited:	05 Jun 2015 13:38
Last Modified:	05 Oct 2019 11:50
References:	Agresti, A. (2002). Categorical Data Analysis. John Wiley & Sons, Hoboken. Agresti, A., Caffo, B., and Ohman-Strickland, P. (2004). Examples in which misspecification of a random effects distribution reduces efficiency, and possible remedies. Computational Statistics & Data Analysis, 47:639–653. Aitkin, M. (1996). A general maximum likelihood analysis of overdispersion in generalized linear models. Statistics and computing, 6:251–262. Aitkin, M. (1999). A general maximum likelihood analysis of variance components in generalised linear models. Biometrics, 55:218–234. Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Petrov, B. and Csaki, F., editors, Second International Symposium on Information Theory, pages 267–281, Budapest. Akademiai Kiado. Alonso, A., Liti`re, S., and Molenberghs, G. (2008). A family of tests to detect misspecifications in the random-effects structure of generalized linear mixed models. Computational statistics & data analysis, 52:4474–4486. Alonso, A. A., Liti`re, S., and Molenberghs, G. (2010). Testing for misspecification in generalized linear mixed models. Biostatistics, 11:771–786. Andersen, E. B. (1970). Asymptotic properties of conditional maximum-likelihood estimators. Journal of Royal Statistical Society, Series B, 32:283–301. Andersen, E. B. (1972). The numerical solution of a set of conditional estimation equations. Journal of Royal Statistical Society, Series B, 34:42–54. Anderson, D. A. and Aitkin, M. (1985). Variance component models with binary response: interviewer variability. Journal of the Royal Statistical Society, Series B, pages 203–210. Azzimonti, L., Ieva, F., and Paganoni, A. M. (2013). Nonlinear nonparametric mixed-effects models for unsupervised classification. Computational Statistics, 28:1549–1570. Bacci, S., Bartolucci, F., and Gnaldi, M. (2014). A class of multidimensional latent class irt models for ordinal polytomous item responses. Communication in Statistics - Theory and Methods, 43:787–800. Baetschmann, G., Staub, K. E., and Winkelmann, R. (2011). Consistent estimation of the fixed effects ordered logit model. Technical Report 5443, IZA. Bartolucci, F. (2007). A class of multidimensional IRT models for testing unidimensionality and clustering items. Psychometrika, 72:141–157. Bartolucci, F., Bacci, S., and Pennoni, F. (2014a). Longitudinal analysis of self-reported health status by mixture latent auto-regressive models. Journal of the Royal Statistical Society, Series C (Applied Statistics), 63:267–288. Bartolucci, F., Bellio, R., Sartori, N., and Salvan, A. (2014b). Modified profile likelihood for fixed-effects panel data models. Econometric Reviews, in press. Bartolucci, F., Belotti, F., and Peracchi, F. (2014c). Testing for time-invariant unobserved heterogeneity in generalized linear models for panel data. Journal of Econometrics, in press. Bartolucci, F. and Forcina, A. (2005). Likelihood inference on the underlying structure of IRT models. Psychometrika, 70:31–43. Bozdogan, H. (1987). Model selection and Akaike’s Information Criterion (AIC): The general theory and its analytical extensions. Psychometrika, 52:345–370. Bozdogan, H. (1993). Choosing the number of component clusters in the mixture-model using a new informational complexity criterion of the Inverse-Fisher information matrix. In Opitz, O., Lausen, B., and Klar, R., editors, Information and Classification, Concepts, Methods and Applications, pages 40–54. Springer, Berlin. Chamberlain, G. (1980). Analysis of covariance with qualitative data. Review of Economic Studies, 47:225–238. De Boeck, P. and Wilson, M. (2004). Explanatory Item Response Models: A Generalized Linear and Nonlinear Approach. Springer-Verlag, New York. Deb, P. (2001). A discrete random effects probit model with application to the demand for preventive care. Health Economics, 10:371–383. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B, 39:1–38. Dias, J. (2006). Model selection for the binary latent class model: A Monte Carlo simulation. In Batagelj, V., Bock, H.-H., Ferligoj, A., and Ziberna, A., editors, Data Science and Classification, pages 91–99. Springer Berlin Heidelberg. Goldstein, H. (2003). Multilevel Statistical Models. Arnold, London. Goodman, L. A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61:215–231. Hagenaars, J. and McCutcheon, A. L. (2002). Applied Latent Class Analysis. Cambridge University Press, Cambridge, MA. Hambleton, R. K. and Swaminathan, H. (1985). Item Response Theory: Principles and Applications. Kluwer Nijhoff, Boston. Hausman, J. (1978). Specification tests in econometrics. Econometrica, 46:1251–1271. Heagerty, P. J. (1999). Marginally specified logistic-normal models for longitudinal binary data. Biometrics, 55:688–698. Heagerty, P. J. and Kurland, B. F. (2001). Misspecified maximum likelihood estimates and generalised linear mixed models. Biometrika, 88:973–985. Heckman, J. and Singer, B. (1984). A method for minimizing the impact of distributional assumptions in econometric model for duration data. Econometrica, 52:271–320. Heinzl, F. and Tutz, G. (2013). Clustering in linear mixed models with approximate Dirichlet process mixtures using EM algorithm. Statistical Modelling, 13:41–67. Heiss, F. (2008). Sequential numerical integration in nonlinear state space models for microeconometric panel data. Journal of Applied Econometrics, 23:373–389. Huang, G.-H. and Bandeen-Roche, K. (2004). Building an identifiable latent class model with covariate effects on underlying and measured variables. Psychometrika, 69:5–32. Huq, M. N. and Cleland, J. (1990). Bangladesh fertility survey, 1989. Technical report, Main Report. Hurvich, C. M. and Tsai, C.-L. (1989). Regression and time series model selection in small samples. Biometrika, 76:297–307. Hurvich, C. M. and Tsai, C.-L. (1993). A corrected Akaike information criterion for vector autoregressive model selection. Journal of Time Series Analysis, 14:271–279. Jain, D. C., Vilcassim, N. J., and Chintagunta, P. K. (1994). A random-coefficients logit brand-choice model applied to panel data. Journal of Business & Economic Statistics, 12:317–328. Juster, F. T. and Suzman, R. (1995). An overview of the health and retirement study. The Journal of Human Resources, 30:S7–S56. Kiefer, J. and Wolfowitz, J. (1956). Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. The Annals of Mathematical Statistics, pages 887–906. Kim, B.-D., Blattberg, R. C., and Rossi, P. E. (1995). Modeling the distribution of price sensitivity and implications for optimal retail pricing. Journal of Business & Economic Statistics, 13:291–303. Laird, N. (1978). Nonparametric maximum likelihood estimation of a mixture distribution. Journal of the American Statistical Association, 73:805–811. Lange, N. and Ryan, L. (1989). Assessing normality in random effects models. The Annals of Statistics, pages 624–642. Lazarsfeld, P. F. and Henry, N. W. (1968). Latent Structure Analysis. Houghton Mifflin, Boston. Lindsay, B., Clogg, C. C., and Grego, J. (1991). Semiparametric estimation in the rasch model and related exponential response models, including a simple latent class model for item analysis. Journal of the American Statistical Association, 86:96–107. Lindsay, B. G. (1983). The geometry of mixture likelihoods: a general theory. Annals of Statistics, 11:86–94. Liti`re, S., Alonso, A., and Molenberghs, G. (2008). The impact of a misspecified random-effects distribution on the estimation and the performance of inferential procedures in generalized linear mixed models. Statistics in Medicine, 27:3125–3144. Mazharul Islam, M. and Mahmud, M. (1995). Contraceptions among adolescents in Bangladesh. Asia Pacific Population Journal, 10:21–38. McCullagh, P. (1980). Regression models for ordinal data. Journal of the Royal Statistical Society, Series B, 42:109–142. McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd Edition. Chapman and Hall, CRC, London. McCulloch, C. E., Searle, S. R., and Neuhaus, J. M. (2008). Generalized, Linear, and Mixed Models. Wiley. McLachlan, G. and Peel, D. (2000). Finite Mixture Models. Wiley. Molenberghs, G. and Verbeke, G. (2005). Models for discrete longitudinal data. Springer. Nylund, K., Asparouhov, T., and Muth ́n, B. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural Equation Modeling, 14:535–569. Pan, Z. and Lin, D. (2005). Goodness-of-fit methods for generalized linear mixed models. Biometrics, 61:1000–1009. Pudney, S., Galassi, F. L., and Mealli, F. (1998). An econometric model of farm tenures in fifteenth-century Florence. Economica, 65:535–556. Rabe-Hesketh, S., Pickles, A., and Skrondal, A. (2003). Correcting for covariate measurement error in logistic regression using nonparametric maximum likelihood estimation. Statistical Modelling, 3:215–232. Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Danish Intitute for Educational Reserch, Copenhagen. Ritz, C. (2004). Goodness-of-fit tests for mixed models. Scandinavian journal of statistics, 31:443–458. Samejima, F. (1969). Estimation of ability using a response pattern of graded scores. Psychometrika Monograph, 17. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6:461–464. Sclove, S. L. (1987). Application of model-selection criteria to some problems in multivariate analysis. Pychometrika, 52:333–343. Skrondal, A. and Rabe-Hesketh, S. (2004). Generalized Latent Variable Modeling. Multilevel, Longitudinal and Structural Equation Models. Chapman and Hall/CRC, London. Stiratelli, R., Laird, N., and Ware, J. H. (1984). Random-effects models for serial observations with binary response. Biometrics, pages 961–971. Stroup, W. W. (2012). Generalized linear mixed models: modern concepts, methods and applications. CRC Press. Tchetgen, E. J. and Coull, B. A. (2006). A diagnostic test for the mixing distribution in a generalised linear mixed model. Biometrika, 93:1003–1010. Verbeke, G. and Lesaffre, E. (1996). A linear mixed-effects model with heterogeneity in the random-effects population. Journal of the American Statistical Association, 91:217– 221. Verbeke, G. and Molenberghs, G. (2009). Linear mixed models for longitudinal data. Springer. Verbeke, G. and Molenberghs, G. (2013). The gradient function as an exploratory goodness-of-fit assessment of the random-effects distribution in mixed models. Biostatistics, 14:477–490. Vermunt, J. K. (2003). Multilevel latent class models. Sociological Methodology, 33:213–239. Vijverberg, W. P. (2011). Testing for IIA with the Hausman-McFadden Test. IZA Discussion Papers 5826, Institute for the Study of Labor (IZA). Waagepetersen, R. (2006). A simulation-based goodness-of-fit test for random effects in generalized linear mixed models. Scandinavian journal of statistics, 33:721–731. White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica: Journal of the Econometric Society, pages 1–25. Yang, C.-C. and Yang, C.-C. (2007). Separating latent classes by information criteria. Journal of Classification, 24:183–203.
URI:	https://mpra.ub.uni-muenchen.de/id/eprint/64787

Available Versions of this Item

A misspecification test for finite-mixture logistic models for clustered binary and ordered responses. (deposited 08 May 2015 15:50)
- A misspecification test for finite-mixture logistic models for clustered binary and ordered responses. (deposited 05 Jun 2015 13:38) [Currently Displayed]

All papers reproduced by permission. Reproduction and distribution subject to the approval of the copyright owners.

View Item