Massively parallel computation using graphics processors with application to optimal experimentation in dynamic control

Morozov, Sergei and Mathur, Sudhanshu (2009): Massively parallel computation using graphics processors with application to optimal experimentation in dynamic control.

Preview

PDF
MPRA_paper_30298.pdf
Download (583kB) | Preview

Abstract

The rapid growth in the performance of graphics hardware, coupled with recent improvements in its programmability has lead to its adoption in many non-graphics applications, including a wide variety of scientific computing fields. At the same time, a number of important dynamic optimal policy problems in economics are athirst of computing power to help overcome dual curses of complexity and dimensionality. We investigate if computational economics may benefit from new tools on a case study of imperfect information dynamic programming problem with learning and experimentation trade-off, that is, a choice between controlling the policy target and learning system parameters. Specifically, we use a model of active learning and control of a linear autoregression with the unknown slope that appeared in a variety of macroeconomic policy and other contexts. The endogeneity of posterior beliefs makes the problem difficult in that the value function need not be convex and the policy function need not be continuous. This complication makes the problem a suitable target for massively-parallel computation using graphics processors (GPUs). Our findings are cautiously optimistic in that new tools let us easily achieve a factor of 15 performance gain relative to an implementation targeting single-core processors. Further gains up to a factor of 26 are also achievable but lie behind a learning and experimentation barrier of their own. Drawing upon experience with CUDA programming architecture and GPUs provides general lessons on how to best exploit future trends in parallel computation in economics.

Item Type:	MPRA Paper
Original Title:	Massively parallel computation using graphics processors with application to optimal experimentation in dynamic control
Language:	English
Keywords:	Graphics Processing Units, CUDA programming, Dynamic programming, Learning, Experimentation
Subjects:	C - Mathematical and Quantitative Methods > C6 - Mathematical Methods ; Programming Models ; Mathematical and Simulation Modeling > C63 - Computational Techniques ; Simulation Modeling C - Mathematical and Quantitative Methods > C8 - Data Collection and Data Estimation Methodology ; Computer Programs > C80 - General
Item ID:	30298
Depositing User:	Sergei Morozov
Date Deposited:	03 May 2011 17:05
Last Modified:	27 Sep 2019 21:41
References:	Abdelkhalek, A., Bilas, A., & Michaelides, A. (2001). Parallelization, optimization and performance analysis of portfolio choice models. In Proceedings of the 2001 International Conference on Parallel Processing (ICPP01). Aldrich, E. M., Fern´andez-Villaverde, J., Gallant, A. R., & Rubio-Ram´ırez, J. F. (2011). Tapping the supercomputer under your desk: Solving dynamic equilibrium models with graphics processors. Journal of Economic Dynamics and Control, 35(3), 386–393. Backus, J. (1977). Can programming be liberated from the von Neumann style? Communications of the ACM, 21(8), 613– 641. Bargen, B. & Donnelly, P. (1998). Inside DirectX. Microsoft Programming Series. Redmond, Washington: Microsoft Press. Barros, K. (2009). CUDA tricks and computational physics. Guest lecture, Course 6.963, Massachusetts Institute of Technology. Barros, K., Babich, R., Brower, R., Clark, M. A., & Rebbi, C. (2008). Blasting through lattice calculations using CUDA. Technical report, The XXVI International Symposium on Lattice Field Theory, Williamsburg, Virginia, USA. Beck, G. W. & Wieland, V. (2002). Learning and control in a changing economic environment. Journal of Economic Dynamics and Control, 26(9-10), 1359–1377. Bennemann, C., Beinker, M. W., Eggloff, D., & Guckler, M. (2008). Teraflops for games and derivatives pricing. Wilmott Magazine, 50–54. Bertsekas, D. P. (2001). Dynamic Programming and Optimal Control (2 ed.)., volume 2. Nashua, NH: Athena Scientific. Bertsekas, D. P. (2005). Dynamic Programming and Optimal Control (3 ed.)., volume 1. Nashua, NH: Athena Scientific. Boyd, C. & Schmit, M. (2009). Direct3d 11 compute shader. WinHEC 2008 presentation. Boyer, M., Tarjan, D., Acton, S. T., & Skadron, K. (2009). Accelerating leukocyte tracking using CUDA: A case study in leveraging manycore coprocessors. In 23rd IEEE International Parallel and Distributed Processing Symposium, Rome, Italy. Brent, R. P. (1973). Algorithms for Minimization without Derivatives. Englewood Cliffs: Prentice-Hall. Brezzia, M. & Lai, T. L. (2002). Optimal learning and experimentation in bandit problems. Journal of Economic Dynamics and Control, 27(1), 87–108. Buck, I. (2005). Stream computing on graphics hardware. Ph.D. Dissertation, Stanford University, Stanford, CA, USA. Chandra, R., Menon, R., Dagum, L., & Kohr, D. (2000). Parallel Programming in OpenMP. Morgan Kaufmann. Chapman, B., Jost, G., &van der Paas, R. (2007). Using OpenMP: Portable SharedMemory Parallel Programming. Scientific and Engineering Computation Series. Cambridge, MA: MIT Press. Chong, Y. Y. & Hendry, D. F. (1986). Econometric evaluation of linear macro-economic models. The Review of Economic Studies, 53(4), 671–690. Coleman,W. J. (1992). Solving nonlinear dynamic models on parallel computers. Discussion Paper 66, Institute for Empirical Macroeconomics, Federal Reserve Bank of Minneapolis. Conn, A. R., Gould, N. I. M., & Toint, P. L. (1997). A globally convergent augmented Lagrangian barrier algorithm for optimization with general inequality constraints and simple bounds. Mathematics of Computation, 66(217), 261–288. Creel, M. (2005). User-friendly parallel computations with econometric examples. Computational Economics, 26(2), 107– 128. Creel, M. & Goffe, W. L. (2008). Multi-core CPUs, clusters, and grid computing: A tutorial. Computational Economics, 32(4), 353–382. Doornik, J. A., Hendry, D. F., & Shephard, N. (2002). Computationally-intensive econometrics using a distributed matrixprogramming language. Philosophical Transactions of the Royal Society of London, Series A, 360, 1245–1266. Doornik, J. A., Shephard, N., & Hendry, D. F. (2006). Parallel computation in econometrics: A simplified approach. In E. J. Kontoghiorghes (Ed.), Handbook of Parallel Computing and Statistics, Statistics: a series of TEXTBOOKS and MONOGRAPHS chapter 15, (pp. 449–476). Boca Raton, FL USA: Chapman & Hall/CRC. Easley, D. & Kiefer, N. M. (1988). Controlling a stochastic process with unknown parameters. Econometrica, 56(5), 1045– 1064. Ferrall, C. (2003). Solving finite mixture models in parallel. Computational Economics 0303003, EconWPA. Ghuloum, A., Sprangle, E., Fang, J., Wu, G., & Zhou, X. (2007). Ct: A flexible parallel programming model for Tera-scale architectures. White paper, Intel Corporation. Goffe, W. L., Ferrier, G. D., & Rogers, J. (1994). Global optimization of statistical functions with simulated annealing. Journal of Econometrics, 60(1-2), 65–99. Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA: Addison-Wesley. Gropp, W., Lusk, E., Skjellum, A., & Thakur, R. (1999). Using MPI: Portable Parallel programming with Message Passing Interface (2 ed.). Scientific and Engineering Computation Series. Cambridge, MA: MIT Press. Harris, M. (2005). Mapping computational concepts to GPUs. In M. Parr (Ed.), GPU Gems, volume 2 chapter 31, (pp. 493–500). Addison-Wesley. Horst, R. & Paradalos, P. M. (Eds.). (1994). Handbook of Global Optimization, volume 1 of Nonconvex Optimization and Its Applications. Dordrecht, The Netherlands: Kluwer Academic Publishers. Howes, L. & Thomas, D. (2007). Efficient random number generation and application using CUDA. In H. Nguyen (Ed.), GPU Gems, volume 3 chapter 37, (pp. 805–830). Addison-Wesley. Hwu, W. W. (Ed.). (2011). GPU Computing Gems Emerald Edition. Applications of GPU Computing Series. Morgan Kaufmann. Judge, G. G., Lee, T.-C., & Hill, R. C. (1988). Introduction to the Theory and Practice of Econometrics (2 ed.). New York: Wiley. Kendrick, D. A. (1978). Non-convexities from probing an adaptive control problem. Journal of Economic Letters, 1(4), 347–351. Kessenich, J., Baldwin, D., & Rost, R. (2006). The OpenGL Shading Language. Khronos Group. Kirk, D. B. & Hwu,W.W. (2010). Programming Massively Parallel Processors: A Hands-on Approach. Morgan-Kaufmann. Kirkpatrick, S., Gelatt Jr., C. D., & Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220(4598), 671–680. Kola, K., Chhabra, A., Thulasiram, R. K., & Thulasiraman, P. (2006). A software architecture framework for on-line option pricing. In Proceedings of the 4th International Symposium on Parallel and Distributed Processing and Applications (ISPA-06), volume 4330 of Lecture Notes in Computer Science, (pp. 747–759)., Sorrento, Italy. Springer-Verlag. Lagarias, J. C., Reeds, J. A., Wright, M. H., & Wright, P. E. (1998). Convergence properties of the Nelder-Meade simplex method in low dimensions. SIAM Journal of Optimization, 9(1), 112–147. Lee, A., Yau, C., Giles, M. B., Doucet, A., & Holmes, C. C. (2010). On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Journal of Computational and Graphical Statistics, 19(4), 769– 789. Lee, V. W., Kim, C., Chhugani, J., Deisher, M., Kim, D., Nguyen, A. D., Satish, N., Smelyanskiy, M., Chennupaty, S., Hammarlund, P., Singhal, R., & Dubey, P. (2010). Debunking the 100x GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. SIGARCH Computer Architecture News, 38(3), 451–460. Lewis, R. M. & Torczon, V. (2002). A globally convergent augmented Lagrangian pattern search algorithm for optimization with general constraints and simple bounds. SIAM Journal on Optimization, 12(4), 1075–1089. Lindoff, B. & Holst, J. (1997). Suboptimal dual control of stochastic systems with time-varying parameters. mimeo, Department of Mathematical Statistics, Lund Institute of Technology. Morozov, S. (2008). Learning and active control of stationary autoregression with unknown slope and persistence. Working paper, Stanford University. Morozov, S. (2009a). Bayesian active learning and control with uncertain two-period impulse response. Working paper, Stanford University. Morozov, S. (2009b). Limits of passive learning in the Bayesian active learning control of drifting coefficient regression. Working paper, Stanford University. Munshi, A., Gaster, B., Mattson, T. G., Fung, J., & Ginsburg, D. (2011). OpenCL Programming Guide. Reading, Massachusetts: Addison-Wesley Professional. Nagurney, A., Takayama, T., & Zhang, D. (1995). Massively parallel computation of spatial price equilibrium problems as dynamical systems. Journal of Economic Dynamics and Control, 19(1-2), 3–37. Nagurney, A. & Zhang, D. (1998). A massively parallel implementation of discrete-time algorithm for the computation of dynamic elastic demand and traffic problems modeled as projected dynamical systems. Journal of Economic Dynamics and Control, 22(8-9), 1467–1485. Nelder, J. A. & Mead, R. (1965). A simplex method for function minimization. Computer Journal, 7, 308–313. NVIDIA (2009a). CUBLAS Library (2.3 ed.). Santa Clara, CA: NVIDIA Corporation. NVIDIA (2009b). CUFFT Library (2.3 ed.). Santa Clara, CA: NVIDIA Corporation. NVIDIA (2009c). NVIDIA CUDA C Programming Best Practices Guide (CUDA Toolkit 2.3 ed.). Santa Clara, CA 95050: NVIDIA Corporation. NVIDIA (2010). CUDA CUSPARSE Library. Santa Clara, CA: NVIDIA Corporation. NVIDIA (2011). NVIDIA CUDA C Programming Guide. (3.2 ed.). Santa Clara, CA: NVIDIA Corporation. Paradalos, P. M. & Romeijn, H. E. (Eds.). (2002). Handbook of Global Optimization, volume 2 of Nonconvex Optimization and Its Applications. Dordrecht, The Netherlands: Kluwer Academic Publishers. Pflug, G. C. & Swietanowski, A. (2000). Selected parallel optimization methods for financial management under uncertainty. Parallel Computing, 26(1), 3–25. Porteus, E. L. & Totten, J. (1978). Accelerated computation of the expected discounted returns in aMarkov chain. Operations Research, 26(2), 350–358. Prescott, E. C. (1972). The multi-period control problem under uncertainty. Econometrica, 40(6), 1043–1058. Rahman, M. R., Thulasiram, R. K., & Thulasiraman, P. (2002). Forecasting stock prices using neural networks on a Beowulf cluster. In Akl, S. & T.Gonzalez (Eds.), Proceedings of the Fourteenth IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS 2002), (pp. 470–475)., Cambridge, MA. MIT Press. Sanders, J. & Kandrot, E. (2010). CUDA by Example: An Introduction to General-Purpose GPU Programming. Upper Saddle River, NJ: Addison-Wesley Professional. Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Abrash, M., Dubey, P., Junkins, S., Lake, A., Sugerman, J., Cavin, R., Espasa, R., Grochowski, E., Juan, T., & Hanrahan, P. (2008). Larrabee: A many-core x86 architecture for visual computing. ACM Transactions on Graphics, 27(3), 15 pages. Sims, C. A., Waggoner, D. F., & Zha, T. (2008). Methods for inference in large-scale multiple equation Markov-switching models. Journal of Econometrics, 142(2), 255–274. Spendley, W., Hext, G. R., & Himsworth, F. R. (1962). Sequential application of simplex designs in optimization and evolutionary design. Technometrics, 4, 441–461. Svensson, L. E. O. (1997). Optimal inflation targets, ’conservative’ central banks, and linear inflation contracts. American Economic Review, 87(1), 96–114. Swann, C. A. (2002). Maximum likelihood estimation using parallel computing: An introduction to MPI. Computational Economics, 19(2), 145–178. The Portland Group (2010a). CUDA Fortran Programming Guide and Reference. The Portland Group. Release 2011. The Portland Group (2010b). PGI Fortran & C Accelerator Compilers and Programming Model. The Portland Group. Version 1.3. Tibbits, M. M., Haran, M., & Liechty, J. C. (2009). Parallel multivariate slice sampling. Working paper, Pennsylvania State University. Tomov, S., Dongarra, J., Volkov, V., & Demmel, J. (2009). MAGMA Library. University of Tennessee, Knoxville, and University of California, Berkeley. Wieland, V. (2000). Learning by doing and the value of optimal experimentation. Journal of Economic Dynamics and Control, 24(4), 501–535. Zabinsky, Z. B. (2005). Stochastic Adaptive Search for Global Optimization. Springer. Zenios, S. A. (1999). High-performance computing in finance: The last 10 years and the next. Parallel Computing, 25(13-14), 2149–2175.
URI:	https://mpra.ub.uni-muenchen.de/id/eprint/30298

All papers reproduced by permission. Reproduction and distribution subject to the approval of the copyright owners.

View Item