Bartolucci, Francesco and Bacci, Silvia and Pigini, Claudia (2015): A misspecification test for finitemixture logistic models for clustered binary and ordered responses.
This is the latest version of this item.

PDF
MPRA_paper_64220.pdf Download (422kB)  Preview 
Abstract
An alternative to using normally distributed random effects in modeling clustered binary and ordered responses is based on using a finitemixture. This approach gives rise to a flexible class of generalized linear mixed models for item responses, multilevel data, and longitudinal data. A test of misspecification for these finitemixture models is proposed which is based on the comparison between the Marginal and the Conditional Maximum Likelihood estimates of the fixed effects as in the Hausman’s test. The asymptotic distribution of the test statistic is derived; it is of chisquared type with a number of degrees of freedom equal to the number of covariates that vary within the cluster. It turns out that the test is simple to perform and may also be used to select the number of components of the finitemixture, when this number is unknown. The approach is illustrated by a series of simulations and three empirical examples covering the main fields of application.
Item Type:  MPRA Paper 

Original Title:  A misspecification test for finitemixture logistic models for clustered binary and ordered responses 
Language:  English 
Keywords:  Generalized Linear Mixed Models, Hausman Test, Item Response Theory, Latent Class model, Longitudinal data, Multilevel data 
Subjects:  C  Mathematical and Quantitative Methods > C1  Econometric and Statistical Methods and Methodology: General > C12  Hypothesis Testing: General C  Mathematical and Quantitative Methods > C2  Single Equation Models ; Single Variables > C23  Panel Data Models ; Spatiotemporal Models C  Mathematical and Quantitative Methods > C5  Econometric Modeling > C52  Model Evaluation, Validation, and Selection 
Item ID:  64787 
Depositing User:  Dr Claudia Pigini 
Date Deposited:  05. Jun 2015 13:38 
Last Modified:  05. Jun 2015 13:41 
References:  Agresti, A. (2002). Categorical Data Analysis. John Wiley & Sons, Hoboken. Agresti, A., Caffo, B., and OhmanStrickland, P. (2004). Examples in which misspecification of a random effects distribution reduces efficiency, and possible remedies. Computational Statistics & Data Analysis, 47:639–653. Aitkin, M. (1996). A general maximum likelihood analysis of overdispersion in generalized linear models. Statistics and computing, 6:251–262. Aitkin, M. (1999). A general maximum likelihood analysis of variance components in generalised linear models. Biometrics, 55:218–234. Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Petrov, B. and Csaki, F., editors, Second International Symposium on Information Theory, pages 267–281, Budapest. Akademiai Kiado. Alonso, A., Liti`re, S., and Molenberghs, G. (2008). A family of tests to detect misspecifications in the randomeffects structure of generalized linear mixed models. Computational statistics & data analysis, 52:4474–4486. Alonso, A. A., Liti`re, S., and Molenberghs, G. (2010). Testing for misspecification in generalized linear mixed models. Biostatistics, 11:771–786. Andersen, E. B. (1970). Asymptotic properties of conditional maximumlikelihood estimators. Journal of Royal Statistical Society, Series B, 32:283–301. Andersen, E. B. (1972). The numerical solution of a set of conditional estimation equations. Journal of Royal Statistical Society, Series B, 34:42–54. Anderson, D. A. and Aitkin, M. (1985). Variance component models with binary response: interviewer variability. Journal of the Royal Statistical Society, Series B, pages 203–210. Azzimonti, L., Ieva, F., and Paganoni, A. M. (2013). Nonlinear nonparametric mixedeffects models for unsupervised classification. Computational Statistics, 28:1549–1570. Bacci, S., Bartolucci, F., and Gnaldi, M. (2014). A class of multidimensional latent class irt models for ordinal polytomous item responses. Communication in Statistics  Theory and Methods, 43:787–800. Baetschmann, G., Staub, K. E., and Winkelmann, R. (2011). Consistent estimation of the fixed effects ordered logit model. Technical Report 5443, IZA. Bartolucci, F. (2007). A class of multidimensional IRT models for testing unidimensionality and clustering items. Psychometrika, 72:141–157. Bartolucci, F., Bacci, S., and Pennoni, F. (2014a). Longitudinal analysis of selfreported health status by mixture latent autoregressive models. Journal of the Royal Statistical Society, Series C (Applied Statistics), 63:267–288. Bartolucci, F., Bellio, R., Sartori, N., and Salvan, A. (2014b). Modified profile likelihood for fixedeffects panel data models. Econometric Reviews, in press. Bartolucci, F., Belotti, F., and Peracchi, F. (2014c). Testing for timeinvariant unobserved heterogeneity in generalized linear models for panel data. Journal of Econometrics, in press. Bartolucci, F. and Forcina, A. (2005). Likelihood inference on the underlying structure of IRT models. Psychometrika, 70:31–43. Bozdogan, H. (1987). Model selection and Akaike’s Information Criterion (AIC): The general theory and its analytical extensions. Psychometrika, 52:345–370. Bozdogan, H. (1993). Choosing the number of component clusters in the mixturemodel using a new informational complexity criterion of the InverseFisher information matrix. In Opitz, O., Lausen, B., and Klar, R., editors, Information and Classification, Concepts, Methods and Applications, pages 40–54. Springer, Berlin. Chamberlain, G. (1980). Analysis of covariance with qualitative data. Review of Economic Studies, 47:225–238. De Boeck, P. and Wilson, M. (2004). Explanatory Item Response Models: A Generalized Linear and Nonlinear Approach. SpringerVerlag, New York. Deb, P. (2001). A discrete random effects probit model with application to the demand for preventive care. Health Economics, 10:371–383. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B, 39:1–38. Dias, J. (2006). Model selection for the binary latent class model: A Monte Carlo simulation. In Batagelj, V., Bock, H.H., Ferligoj, A., and Ziberna, A., editors, Data Science and Classification, pages 91–99. Springer Berlin Heidelberg. Goldstein, H. (2003). Multilevel Statistical Models. Arnold, London. Goodman, L. A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61:215–231. Hagenaars, J. and McCutcheon, A. L. (2002). Applied Latent Class Analysis. Cambridge University Press, Cambridge, MA. Hambleton, R. K. and Swaminathan, H. (1985). Item Response Theory: Principles and Applications. Kluwer Nijhoff, Boston. Hausman, J. (1978). Specification tests in econometrics. Econometrica, 46:1251–1271. Heagerty, P. J. (1999). Marginally specified logisticnormal models for longitudinal binary data. Biometrics, 55:688–698. Heagerty, P. J. and Kurland, B. F. (2001). Misspecified maximum likelihood estimates and generalised linear mixed models. Biometrika, 88:973–985. Heckman, J. and Singer, B. (1984). A method for minimizing the impact of distributional assumptions in econometric model for duration data. Econometrica, 52:271–320. Heinzl, F. and Tutz, G. (2013). Clustering in linear mixed models with approximate Dirichlet process mixtures using EM algorithm. Statistical Modelling, 13:41–67. Heiss, F. (2008). Sequential numerical integration in nonlinear state space models for microeconometric panel data. Journal of Applied Econometrics, 23:373–389. Huang, G.H. and BandeenRoche, K. (2004). Building an identifiable latent class model with covariate effects on underlying and measured variables. Psychometrika, 69:5–32. Huq, M. N. and Cleland, J. (1990). Bangladesh fertility survey, 1989. Technical report, Main Report. Hurvich, C. M. and Tsai, C.L. (1989). Regression and time series model selection in small samples. Biometrika, 76:297–307. Hurvich, C. M. and Tsai, C.L. (1993). A corrected Akaike information criterion for vector autoregressive model selection. Journal of Time Series Analysis, 14:271–279. Jain, D. C., Vilcassim, N. J., and Chintagunta, P. K. (1994). A randomcoefficients logit brandchoice model applied to panel data. Journal of Business & Economic Statistics, 12:317–328. Juster, F. T. and Suzman, R. (1995). An overview of the health and retirement study. The Journal of Human Resources, 30:S7–S56. Kiefer, J. and Wolfowitz, J. (1956). Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. The Annals of Mathematical Statistics, pages 887–906. Kim, B.D., Blattberg, R. C., and Rossi, P. E. (1995). Modeling the distribution of price sensitivity and implications for optimal retail pricing. Journal of Business & Economic Statistics, 13:291–303. Laird, N. (1978). Nonparametric maximum likelihood estimation of a mixture distribution. Journal of the American Statistical Association, 73:805–811. Lange, N. and Ryan, L. (1989). Assessing normality in random effects models. The Annals of Statistics, pages 624–642. Lazarsfeld, P. F. and Henry, N. W. (1968). Latent Structure Analysis. Houghton Mifflin, Boston. Lindsay, B., Clogg, C. C., and Grego, J. (1991). Semiparametric estimation in the rasch model and related exponential response models, including a simple latent class model for item analysis. Journal of the American Statistical Association, 86:96–107. Lindsay, B. G. (1983). The geometry of mixture likelihoods: a general theory. Annals of Statistics, 11:86–94. Liti`re, S., Alonso, A., and Molenberghs, G. (2008). The impact of a misspecified randomeffects distribution on the estimation and the performance of inferential procedures in generalized linear mixed models. Statistics in Medicine, 27:3125–3144. Mazharul Islam, M. and Mahmud, M. (1995). Contraceptions among adolescents in Bangladesh. Asia Pacific Population Journal, 10:21–38. McCullagh, P. (1980). Regression models for ordinal data. Journal of the Royal Statistical Society, Series B, 42:109–142. McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd Edition. Chapman and Hall, CRC, London. McCulloch, C. E., Searle, S. R., and Neuhaus, J. M. (2008). Generalized, Linear, and Mixed Models. Wiley. McLachlan, G. and Peel, D. (2000). Finite Mixture Models. Wiley. Molenberghs, G. and Verbeke, G. (2005). Models for discrete longitudinal data. Springer. Nylund, K., Asparouhov, T., and Muth ́n, B. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural Equation Modeling, 14:535–569. Pan, Z. and Lin, D. (2005). Goodnessoffit methods for generalized linear mixed models. Biometrics, 61:1000–1009. Pudney, S., Galassi, F. L., and Mealli, F. (1998). An econometric model of farm tenures in fifteenthcentury Florence. Economica, 65:535–556. RabeHesketh, S., Pickles, A., and Skrondal, A. (2003). Correcting for covariate measurement error in logistic regression using nonparametric maximum likelihood estimation. Statistical Modelling, 3:215–232. Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Danish Intitute for Educational Reserch, Copenhagen. Ritz, C. (2004). Goodnessoffit tests for mixed models. Scandinavian journal of statistics, 31:443–458. Samejima, F. (1969). Estimation of ability using a response pattern of graded scores. Psychometrika Monograph, 17. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6:461–464. Sclove, S. L. (1987). Application of modelselection criteria to some problems in multivariate analysis. Pychometrika, 52:333–343. Skrondal, A. and RabeHesketh, S. (2004). Generalized Latent Variable Modeling. Multilevel, Longitudinal and Structural Equation Models. Chapman and Hall/CRC, London. Stiratelli, R., Laird, N., and Ware, J. H. (1984). Randomeffects models for serial observations with binary response. Biometrics, pages 961–971. Stroup, W. W. (2012). Generalized linear mixed models: modern concepts, methods and applications. CRC Press. Tchetgen, E. J. and Coull, B. A. (2006). A diagnostic test for the mixing distribution in a generalised linear mixed model. Biometrika, 93:1003–1010. Verbeke, G. and Lesaffre, E. (1996). A linear mixedeffects model with heterogeneity in the randomeffects population. Journal of the American Statistical Association, 91:217– 221. Verbeke, G. and Molenberghs, G. (2009). Linear mixed models for longitudinal data. Springer. Verbeke, G. and Molenberghs, G. (2013). The gradient function as an exploratory goodnessoffit assessment of the randomeffects distribution in mixed models. Biostatistics, 14:477–490. Vermunt, J. K. (2003). Multilevel latent class models. Sociological Methodology, 33:213–239. Vijverberg, W. P. (2011). Testing for IIA with the HausmanMcFadden Test. IZA Discussion Papers 5826, Institute for the Study of Labor (IZA). Waagepetersen, R. (2006). A simulationbased goodnessoffit test for random effects in generalized linear mixed models. Scandinavian journal of statistics, 33:721–731. White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica: Journal of the Econometric Society, pages 1–25. Yang, C.C. and Yang, C.C. (2007). Separating latent classes by information criteria. Journal of Classification, 24:183–203. 
URI:  https://mpra.ub.unimuenchen.de/id/eprint/64787 
Available Versions of this Item

A misspecification test for finitemixture logistic models for clustered binary and ordered responses. (deposited 08. May 2015 15:50)
 A misspecification test for finitemixture logistic models for clustered binary and ordered responses. (deposited 05. Jun 2015 13:38) [Currently Displayed]