Ando, Tomohiro and Bai, Jushan (2021): Large-scale generalized linear longitudinal data models with grouped patterns of unobserved heterogeneity.
Preview |
PDF
MPRA_paper_111431.pdf Download (650kB) | Preview |
Abstract
This paper provides methods for flexibly capturing unobservable heterogeneity from longitudinal data in the context of an exponential family of distributions. The group memberships of individual units are left unspecified, and their heterogeneity is influenced by group-specific unobservable structures, as well as heterogeneous regression coefficients. We discuss a computationally efficient estimation method and derive the corresponding asymptotic theory. The established asymptotic theory includes verifying the uniform consistency of the estimated group membership. To test the heterogeneous regression coefficients within groups, we propose the Swamy-type test, which considers unobserved heterogeneity. We apply the proposed method to study the market structure of the taxi industry in New York City. Our method reveals interesting important insights from large-scale longitudinal data that consist of over 450 million data points.
Item Type: | MPRA Paper |
---|---|
Original Title: | Large-scale generalized linear longitudinal data models with grouped patterns of unobserved heterogeneity |
Language: | English |
Keywords: | Clustering; Factor analysis; Generalized linear models; Longitudinal data; Unobserved heterogeneity. |
Subjects: | C - Mathematical and Quantitative Methods > C3 - Multiple or Simultaneous Equation Models ; Multiple Variables > C33 - Panel Data Models ; Spatio-temporal Models C - Mathematical and Quantitative Methods > C3 - Multiple or Simultaneous Equation Models ; Multiple Variables > C38 - Classification Methods ; Cluster Analysis ; Principal Components ; Factor Models C - Mathematical and Quantitative Methods > C5 - Econometric Modeling > C55 - Large Data Sets: Modeling and Analysis |
Item ID: | 111431 |
Depositing User: | Tomohiro Ando |
Date Deposited: | 13 Jan 2022 08:12 |
Last Modified: | 13 Jan 2022 08:12 |
References: | Ahlquist, J.S. and Breunig, C. (2012), ``Model-based Clustering and Typologies in the Social Sciences,'' {\it Political Analysis}, {\bf 20}, 92-112. Ando, T. and Bai, J. (2015), ``A simple new test for slope homogeneity in panel data models with interactive effects,'' {\it Economics Letters}, {\bf 136}, 112--117. Ando, T. and Bai, J. (2016), ``Panel data models with grouped factor structures under unknown group membership,'' {\it Journal of Applied Econometrics}, {\bf 13}, 163--191. Ando, T. and Bai, J. (2017), ``Clustering huge number of time series: A panel data approach with high-dimensional predictors and factor structures,'' {\it Journal of the American Statistical Association}, {\bf 112}, 1182--1198. Ando, T., Bai, J. and Li, K. (2021), ``Bayesian and maximum likelihood analysis of large scale panel choice models with unobserved heterogeneity,'' {\it Journal of Econometrics}, forthcoming. Bai, J. (2009), ``Panel data models with interactive fixed effects,'' {\it Econometrica}, {\bf 77}, 1229--1279. Bai, J. and Ng, S. (2013), ``Principal components estimation and identification of static factors,'' {\it Journal of Econometrics}, {\bf 176}, 18--29. Boneva, L. and O, Linton. (2017), ``A discrete choice model for large heterogeneous panels with interactive fixed effects with an application to the determinants of corporate bond issuance,'' {\it Journal of Applied Econometrics}, {\bf 32}, 1226--1243. Bonhomme, S., and Manresa, E. (2015), ``Grouped patterns of heterogeneity in panel data,'' {\it Econometrica}, {\bf 83}, 1147--1184. Brian, D. and Dan, W. (2016), New York City Taxi Trip Data (2010-2013). University of Illinois at Urbana-Champaign. Blomquist, J. and Westerlund, J. (2013), ``Testing slope homogeneity in large panels with serial correlation,'' {\it Economics Letters}, {\bf 121}, 374--378 Cai, T., Ma, J. and Zhang, L. (2019), ``CHIME: Clustering of high-dimensional Gaussian mixtures with EM algorithm and its optimality,'' {\it Annals of Statistics}, {\bf 47}, 1234--1267. Charbonneau, K. (2017), ``Multiple Fixed Effects in Binary Response Panel Data Models,'' {\it Econometrics Journal}, {\bf 20}, S1--S13. Chen, M. Fern\'andez-Val, I. and Weidner, M. (2021), ``Nonlinear Panel Models with Interactive Effects,'' {\it Journal of Econometrics}, {\bf 220}, 296--324. Chiou, J.-M. and Li, P/-L. (2007), ``Functional clustering and identifying substructures of longitudinal data,'' {\it Journal of the Royal Statistical Society}, B{\bf 69}, 679--699 Delaigle, A., Hall, P. and Bathia, N. (2012), ``Componentwise classification and clustering of functional data,'' {\it Biometrika}, {\bf 99}, 299–313. Fan, J. and Li, R. (2001), ``Variable selection via nonconcave penalized likelihood and its oracle properties,'' {\it Journal of the American Statistical Association}, {\bf 96}, 1348–1360. Fan, J. and Peng, H. (2004), ``Nonconcave penalized likelihood with a diverging number of parameters,'' {\it Annals of Statistics}, {\bf 32}, 928--961. Fern\'andez-Val, I. and Weidner, M. (2016), ``Individual and time effects in nonlinear panel data models with large $N$, $T$,'' {\it Journal of Econometrics}, {\bf 192}, 291--312. France, S.L. and Ghose, S. (2016), ``An Analysis and Visualization Methodology for Identifying and Testing Market Structure,'' {\it Marketing Science}, {\bf 35}, 1-200. Handcock, M., Raftery, A.E. and Jeremy, T. (2007), ``Model-Based Clustering for Social Networks,'' {\it Journal of the Royal Statistical Society}, A{\bf170}, 301--354. Heard, N.A., Holmes, C.C. and Stephens, D.A. (2006), ``A quantitative study of gene regulation involved in the immune response of anopheline mosquitoes: an application of Bayesian hierarchical clustering of curves,'' {\it Journal of the American Statistical Association}, {\bf 101}, 18-–29. Hennig, C. Meila, M. Murtagh, F. and Rocci, R. (2015), {\it Handbook of Cluster Analysis}. Chapman and Hall/CRC Press. James, G.M. and Sugar, C.A. (2003), ``Clustering for sparsely sampled functional data,'' {\it Journal of the American Statistical Association}, {\bf 98}, 397-–408. Kong, N., Schaefer, A.J., Hunsaker, B. and Roberts, M. S. (2010), ``Maximizing the Efficiency of the U.S. Liver Allocation System Through Region Design,'' {\it Management Science}, {\bf 56}, 2111--2122. Lin, C.-C. and Ng, S. (2012), ``Estimation of panel data models with parameter heterogeneity when group membership is unknown,'' {\it Journal of Econometric Methods}, {\bf 1}, 42--55. Lumsdaine, R.L., Okui, R. and Wang, W. (2020), ``Estimation of panel group structure models with structural breaks in group memberships and coefficients,'' Working Paper. Moon, H., Shum, M. and Weidner, M. (2018), ``Estimation of random coefficients logit demand models with interactive fixed effects,'' {\it Journal of Econometrics}, {\bf 206}, 613--644. Park, C.H. and Park, Y.-H. (2016), ``Investigating Purchase Conversion by Uncovering Online Visit Patterns,'' {\it Marketing Science}, {\bf 35}, 894-914. Peng, J. and Muller, H. G. (2008), ``Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions,'' {\it Annals of Applied Statistics}, {\bf 2}, 1056–-1077. Pesaran, M. H. (2006), ``Estimation and inference in large heterogeneous panels with a multifactor error structure,'' {\it Econometrica}, {\bf 74}, 967--1012. Pesaran, M. H. and Yamagata, T. (2008), ``Testing slope homogeneity in large panels,'' {\it Journal of Econometrics}, {\bf 142}, 50--93. Pesaran, H., Smith, R. and Im, K. S. (1996), Dynamic linear models for heterogenous panels. In: Matyas, L. \& Sevestre, P. (Eds.), {\it Econometrics of Panel Data: A Handbook of the Theory with Applications}, second edition. Kluwer Academic Publishers, Dordrecht, 145--195. Phillips, P. C. B. and Sul, D. (2003), ``Dynamic panel estimation and homogeneity testing under cross section dependence,'' {\it Econometrics Journal} 6, 217--259. Qin, Z.S. (2006), ``Clustering microarray gene expression data using weighted Chinese restaurant process,'' {\it Bioinformatics}, {\bf 22}, 1988-1997. Su, L., Shi, Z. and Phillips, P. (2016), ``Identifying latent structures in panel data,'' {\it Econometrica}, {\bf 84}, 2215--2264. Swamy, P. A. V. B. (1970), ``Efficient inference in a random coefficient regression model,'' {\it Econometrica}, {\bf 38}, 311--323. Tibshirani, R. (1996), ``Regression shrinkage and selection via the lasso,'' {\it Journal of the Royal Statistical Society}, B58, 267--288. \hangindent=10pt \noindent Vogt, M. and Linton, O. (2017), ``Classification of nonparametric regression functions in heterogeneous panels,'' {\it Journal of the Royal Statistical Society}, {\bf B79}, 5--27. Wang, W., Phillips, P. C. B. and Su, L. (2018), ``Homogeneity pursuit in panel data models: theory and applications,'' {\it Journal of Applied Econometrics}, {\bf 33}, 797--815. Wang, W. and Su, L. (2021), ``Identifying latent group structures in nonlinear panels,'' {\it Journal of Econometrics}, {\bf 220}, 272--295. Yuan, M. and Lin, Y. (2006), ``Model selection and estimation in regression with grouped variables,'' {\it Journal of the Royal Statistical Society}, B68, 49--67 Zhang, C. H. (2010), ``Nearly unbiased variable selection under minimax concave penalty,'' {\it Annals of Statistics}, 38, 894--942. |
URI: | https://mpra.ub.uni-muenchen.de/id/eprint/111431 |