Travaglini, Guido (2011): Principal Components and Factor Analysis. A Comparative Study.

PDF
MPRA_paper_35486.pdf Download (245kB)  Preview 
Abstract
A comparison between Principal Component Analysis (PCA) and Factor Analysis (FA) is performed both theoretically and empirically for a random matrix X:(n x p) , where n is the number of observations and both coordinates may be very large. The comparison surveys the asymptotic properties of the factor scores, of the singular values and of all other elements involved, as well as the characteristics of the methods utilized for detecting the true dimension of X. In particular, the norms of the FA scores, whichever their number, and the norms of their covariance matrix are shown to be always smaller and to decay faster as n goes to infinity. This causes the FA scores, when utilized as regressors and/or instruments, to produce more efficient slope estimators in instrumental variable estimation. Moreover, as compared to PCA, the FA scores and factors exhibit a higher degree of consistency because the difference between the estimated and their true counterparts is smaller, and so is also the corresponding variance. Finally, FA usually selects a much less encumbering number of scores than PCA, greatly facilitating the search and identification of the common components of X.
Item Type:  MPRA Paper 

Original Title:  Principal Components and Factor Analysis. A Comparative Study. 
Language:  English 
Keywords:  Principal Components, Factor Analysis, Matrix Norm 
Subjects:  C  Mathematical and Quantitative Methods > C5  Econometric Modeling > C52  Model Evaluation, Validation, and Selection C  Mathematical and Quantitative Methods > C0  General > C02  Mathematical Methods C  Mathematical and Quantitative Methods > C0  General > C01  Econometrics 
Item ID:  35486 
Depositing User:  Guido Travaglini 
Date Deposited:  20. Dec 2011 05:46 
Last Modified:  11. Feb 2013 22:21 
References:  Ahn S.C. and Horenstein A.R. (2009) Eigenvalue Ratio Test for the Number of Factors, mimeo, Arizona State University and Instituto Autónomo Tecnológico de México. Alessi L., Barigozzi M. and Capasso M. (2009) A Robust Criterion for Determining the Number of Factors in Approximate Factor Models, ECORE Discussion Paper 97, European Central Bank, Frankfurt am Main, Germany. Anderson T. W. (1958) An Introduction to Multivariate Statistical Analysis, Wiley, New York, N.Y. Anderson T. W. (1963) Asymptotic Theory for Principal Component Analysis, 34, 122148. Anderson T. W. (1984) An Introduction to Multivariate Statistical Analysis, 2nd. Ed., Wiley Series in Probability and Statistics, New York, N.Y. Anderson T.W. and Rubin H. (1956) Statistical Inference in Factor Analysis, Cowles Foundation Paper No. 103. Bai J. and Ng S. (2002) Determining the Number of Factors in Approximate Factor Models, Econometrica, 70, 191221. Bai J. (2003) Inferential Theory for Factor Models of Large Dimensions, Econometrica, 71, 135171. Bai J. and Ng S. (2006) Instrumental Variable Estimation in a Data Rich Environment, NYU mimeo. Bai J. and Ng S. (2007) Determining the Number of Primitive Shocks in Factor Models, Journal of Business and Economic Statistics, 26, 52 60. Bai J. and Ng S. (2008) Selecting Instrumental Variable Estimation in a Data Rich Environment, Journal of Time Series Econometrics, 1, 132. Bai J. and Ng S. (2010) Principal Components, Estimation and Identification of the Factors, mimeo, Department of Economics, Columbia University. Bai Z.D. (1993) Convergence Rate of Expected Spectral Distribution of Large Random Matrices. Part II. Sample Covariance Matrices, The Annals of Probability, 21, 649672. Bai Z.D., Miao B. and Yao SamosMatisse J., (2000), Convergence Rates of Spectral Distributions of Large Sample Covariance Matrices, mimeo. Bair E., Hastie T., Paul D. and Tibshirani R. (2006) Prediction by Supervised Principal Components, Journal of the American Statistical Association, 101, 119137. Bernanke B. and Boivin J. (2003) Monetary Policy in a DataRich Environment, Journal of Monetary Economics, L, 525546. Bernanke B., Boivin J. and Eliasz P. (2005), Measuring the Effects of Monetary Policy: A FactorAugmented Vector Autoregressive (FAVAR) Approach, mimeo. Cattell, R.B. (1966) The Scree Test for the Number of Factors, Multivariate Behavioral Research, 1, 245276. Chamberlain G. and Rothschild M. (1983) Arbitrage, factor structure, and meanvariance analysis on large asset markets, Econometrica, 51, 12811304. Connor G. and Korajczyk R. (1986) Performance Measurement with the Arbitrage Pricing Theory: A New Framework for Analysis, Journal of Financial Economics, 15, 373394. Draper N.R. and Smith H. (1998) Applied Regression Analysis, 3d Ed., WileyInterscience, New York, N.Y. Forni, M., Hallin M., Lippi M. and Reichlin L. (2004) The Generalized Dynamic Factor Model: Consistency and Rates, Journal of Econometrics 119, 231255. Forni M. and Gambetti L. (2008) The Dynamic Effects of Monetary Policy: A Structural Factor Model Approach, Centre for Economic Policy Research, CEPR, DP7098. Geman S. (1980) A Limit Theorem for the Norm of Random Matrices, The Annals of Probability, 2, 252261. Hansen L.P. (1982) Large Sample Properties of Generalized Method of Moments Estimator, Econometrica, 50, 10291054. Johnstone I.M. (2001) On the Distribution of the Largest Eigenvalue in Principal Component Analysis, The Annals of Statistics, 29, 295327. Jolliffe I. (1982) A Note on the use of Principal Components in Regression, Applied Statistics, 31, 300303. Jolliffe I. (2002) Principal Component Analysis, 2nd. Ed., SpringerVerlag, New York, N.Y. Kapeitanos G. and Marcellino M. (2007) FactorGMM Estimation with Large Sets of Possibly Weak Instruments, Working Papers 577, Queen Mary University of London, Department of Economics. Krzanowski W.J. (1987) CrossValidation in Principal Component Analysis, Biometrics, 43, 575584. Loehlin, J.C. (2004) Latent Variable Models: an Introduction to Factor, Path, and Structural Analysis, 4th Ed., Lawrence Erlbaum Associates, N.J. Myers R.H. (2000) Classical and Modern Regression with Applications, 2nd Ed., Duxbury Press. Newey W. and Windmeijer F. (2009) GMM with Many Weak Moment Conditions, Econometrica, 77, 687719. Onatski A. (2009) Determining the Number of Factors from Empirical Distribution of Eigenvalues, manuscript, Economics Department, Columbia University. Pearson K. (1901) On Lines and Planes of Closest Fit to Systems of Points in Space, Philosophical Magazine 2, 559–572. Rudelson M. and Vershynin R. (2010) Nonasymptotic Theory of Random Matrices: Extreme Singular Values, Proceedings of the International Congress of Mathematicians, Hyderabad, India. Sargent, T.J. and Sims C.A. (1977) Business Cycle Modeling without Pretending to have too much a priori Economic Theory, In C.A. Sims, Ed., New Methods in Business Research, Federal Reserve Bank of Minneapolis, Minneapolis. Sokal A.D. (2010) A Really Simple Elementary Proof of the Uniform Boundedness Theorem, mimeo, Dept. of Physics, New York University. Stock J.H. and Watson M.W. (2002) Forecasting Using Principal Components from a Large Number of Predictors, Journal of the American Statistical Association, 97, 1167–1179. Stock J.H. and Watson M.W. (2005) Implications of Dynamic Factor Models for VAR Analysis, NBER Working Paper No. 11467. Vershynin R. (2011) Introduction to the Nonasymptotic Analysis of Random Matrices, University of Michigan, mimeo. 
URI:  http://mpra.ub.unimuenchen.de/id/eprint/35486 