Morozov, Sergei and Mathur, Sudhanshu (2009): Massively parallel computation using graphics processors with application to optimal experimentation in dynamic control.
This is the latest version of this item.

PDF
MPRA_paper_30298.pdf Download (583kB)  Preview 
Abstract
The rapid growth in the performance of graphics hardware, coupled with recent improvements in its programmability has lead to its adoption in many nongraphics applications, including a wide variety of scientific computing fields. At the same time, a number of important dynamic optimal policy problems in economics are athirst of computing power to help overcome dual curses of complexity and dimensionality. We investigate if computational economics may benefit from new tools on a case study of imperfect information dynamic programming problem with learning and experimentation tradeoff, that is, a choice between controlling the policy target and learning system parameters. Specifically, we use a model of active learning and control of a linear autoregression with the unknown slope that appeared in a variety of macroeconomic policy and other contexts. The endogeneity of posterior beliefs makes the problem difficult in that the value function need not be convex and the policy function need not be continuous. This complication makes the problem a suitable target for massivelyparallel computation using graphics processors (GPUs). Our findings are cautiously optimistic in that new tools let us easily achieve a factor of 15 performance gain relative to an implementation targeting singlecore processors. Further gains up to a factor of 26 are also achievable but lie behind a learning and experimentation barrier of their own. Drawing upon experience with CUDA programming architecture and GPUs provides general lessons on how to best exploit future trends in parallel computation in economics.
Item Type:  MPRA Paper 

Original Title:  Massively parallel computation using graphics processors with application to optimal experimentation in dynamic control 
Language:  English 
Keywords:  Graphics Processing Units, CUDA programming, Dynamic programming, Learning, Experimentation 
Subjects:  C  Mathematical and Quantitative Methods > C6  Mathematical Methods ; Programming Models ; Mathematical and Simulation Modeling > C63  Computational Techniques ; Simulation Modeling C  Mathematical and Quantitative Methods > C8  Data Collection and Data Estimation Methodology ; Computer Programs > C80  General 
Item ID:  30298 
Depositing User:  Sergei Morozov 
Date Deposited:  03 May 2011 17:05 
Last Modified:  27 Sep 2019 21:41 
References:  Abdelkhalek, A., Bilas, A., & Michaelides, A. (2001). Parallelization, optimization and performance analysis of portfolio choice models. In Proceedings of the 2001 International Conference on Parallel Processing (ICPP01). Aldrich, E. M., Fern´andezVillaverde, J., Gallant, A. R., & RubioRam´ırez, J. F. (2011). Tapping the supercomputer under your desk: Solving dynamic equilibrium models with graphics processors. Journal of Economic Dynamics and Control, 35(3), 386–393. Backus, J. (1977). Can programming be liberated from the von Neumann style? Communications of the ACM, 21(8), 613– 641. Bargen, B. & Donnelly, P. (1998). Inside DirectX. Microsoft Programming Series. Redmond, Washington: Microsoft Press. Barros, K. (2009). CUDA tricks and computational physics. Guest lecture, Course 6.963, Massachusetts Institute of Technology. Barros, K., Babich, R., Brower, R., Clark, M. A., & Rebbi, C. (2008). Blasting through lattice calculations using CUDA. Technical report, The XXVI International Symposium on Lattice Field Theory, Williamsburg, Virginia, USA. Beck, G. W. & Wieland, V. (2002). Learning and control in a changing economic environment. Journal of Economic Dynamics and Control, 26(910), 1359–1377. Bennemann, C., Beinker, M. W., Eggloff, D., & Guckler, M. (2008). Teraflops for games and derivatives pricing. Wilmott Magazine, 50–54. Bertsekas, D. P. (2001). Dynamic Programming and Optimal Control (2 ed.)., volume 2. Nashua, NH: Athena Scientific. Bertsekas, D. P. (2005). Dynamic Programming and Optimal Control (3 ed.)., volume 1. Nashua, NH: Athena Scientific. Boyd, C. & Schmit, M. (2009). Direct3d 11 compute shader. WinHEC 2008 presentation. Boyer, M., Tarjan, D., Acton, S. T., & Skadron, K. (2009). Accelerating leukocyte tracking using CUDA: A case study in leveraging manycore coprocessors. In 23rd IEEE International Parallel and Distributed Processing Symposium, Rome, Italy. Brent, R. P. (1973). Algorithms for Minimization without Derivatives. Englewood Cliffs: PrenticeHall. Brezzia, M. & Lai, T. L. (2002). Optimal learning and experimentation in bandit problems. Journal of Economic Dynamics and Control, 27(1), 87–108. Buck, I. (2005). Stream computing on graphics hardware. Ph.D. Dissertation, Stanford University, Stanford, CA, USA. Chandra, R., Menon, R., Dagum, L., & Kohr, D. (2000). Parallel Programming in OpenMP. Morgan Kaufmann. Chapman, B., Jost, G., &van der Paas, R. (2007). Using OpenMP: Portable SharedMemory Parallel Programming. Scientific and Engineering Computation Series. Cambridge, MA: MIT Press. Chong, Y. Y. & Hendry, D. F. (1986). Econometric evaluation of linear macroeconomic models. The Review of Economic Studies, 53(4), 671–690. Coleman,W. J. (1992). Solving nonlinear dynamic models on parallel computers. Discussion Paper 66, Institute for Empirical Macroeconomics, Federal Reserve Bank of Minneapolis. Conn, A. R., Gould, N. I. M., & Toint, P. L. (1997). A globally convergent augmented Lagrangian barrier algorithm for optimization with general inequality constraints and simple bounds. Mathematics of Computation, 66(217), 261–288. Creel, M. (2005). Userfriendly parallel computations with econometric examples. Computational Economics, 26(2), 107– 128. Creel, M. & Goffe, W. L. (2008). Multicore CPUs, clusters, and grid computing: A tutorial. Computational Economics, 32(4), 353–382. Doornik, J. A., Hendry, D. F., & Shephard, N. (2002). Computationallyintensive econometrics using a distributed matrixprogramming language. Philosophical Transactions of the Royal Society of London, Series A, 360, 1245–1266. Doornik, J. A., Shephard, N., & Hendry, D. F. (2006). Parallel computation in econometrics: A simplified approach. In E. J. Kontoghiorghes (Ed.), Handbook of Parallel Computing and Statistics, Statistics: a series of TEXTBOOKS and MONOGRAPHS chapter 15, (pp. 449–476). Boca Raton, FL USA: Chapman & Hall/CRC. Easley, D. & Kiefer, N. M. (1988). Controlling a stochastic process with unknown parameters. Econometrica, 56(5), 1045– 1064. Ferrall, C. (2003). Solving finite mixture models in parallel. Computational Economics 0303003, EconWPA. Ghuloum, A., Sprangle, E., Fang, J., Wu, G., & Zhou, X. (2007). Ct: A flexible parallel programming model for Terascale architectures. White paper, Intel Corporation. Goffe, W. L., Ferrier, G. D., & Rogers, J. (1994). Global optimization of statistical functions with simulated annealing. Journal of Econometrics, 60(12), 65–99. Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA: AddisonWesley. Gropp, W., Lusk, E., Skjellum, A., & Thakur, R. (1999). Using MPI: Portable Parallel programming with Message Passing Interface (2 ed.). Scientific and Engineering Computation Series. Cambridge, MA: MIT Press. Harris, M. (2005). Mapping computational concepts to GPUs. In M. Parr (Ed.), GPU Gems, volume 2 chapter 31, (pp. 493–500). AddisonWesley. Horst, R. & Paradalos, P. M. (Eds.). (1994). Handbook of Global Optimization, volume 1 of Nonconvex Optimization and Its Applications. Dordrecht, The Netherlands: Kluwer Academic Publishers. Howes, L. & Thomas, D. (2007). Efficient random number generation and application using CUDA. In H. Nguyen (Ed.), GPU Gems, volume 3 chapter 37, (pp. 805–830). AddisonWesley. Hwu, W. W. (Ed.). (2011). GPU Computing Gems Emerald Edition. Applications of GPU Computing Series. Morgan Kaufmann. Judge, G. G., Lee, T.C., & Hill, R. C. (1988). Introduction to the Theory and Practice of Econometrics (2 ed.). New York: Wiley. Kendrick, D. A. (1978). Nonconvexities from probing an adaptive control problem. Journal of Economic Letters, 1(4), 347–351. Kessenich, J., Baldwin, D., & Rost, R. (2006). The OpenGL Shading Language. Khronos Group. Kirk, D. B. & Hwu,W.W. (2010). Programming Massively Parallel Processors: A Handson Approach. MorganKaufmann. Kirkpatrick, S., Gelatt Jr., C. D., & Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220(4598), 671–680. Kola, K., Chhabra, A., Thulasiram, R. K., & Thulasiraman, P. (2006). A software architecture framework for online option pricing. In Proceedings of the 4th International Symposium on Parallel and Distributed Processing and Applications (ISPA06), volume 4330 of Lecture Notes in Computer Science, (pp. 747–759)., Sorrento, Italy. SpringerVerlag. Lagarias, J. C., Reeds, J. A., Wright, M. H., & Wright, P. E. (1998). Convergence properties of the NelderMeade simplex method in low dimensions. SIAM Journal of Optimization, 9(1), 112–147. Lee, A., Yau, C., Giles, M. B., Doucet, A., & Holmes, C. C. (2010). On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Journal of Computational and Graphical Statistics, 19(4), 769– 789. Lee, V. W., Kim, C., Chhugani, J., Deisher, M., Kim, D., Nguyen, A. D., Satish, N., Smelyanskiy, M., Chennupaty, S., Hammarlund, P., Singhal, R., & Dubey, P. (2010). Debunking the 100x GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. SIGARCH Computer Architecture News, 38(3), 451–460. Lewis, R. M. & Torczon, V. (2002). A globally convergent augmented Lagrangian pattern search algorithm for optimization with general constraints and simple bounds. SIAM Journal on Optimization, 12(4), 1075–1089. Lindoff, B. & Holst, J. (1997). Suboptimal dual control of stochastic systems with timevarying parameters. mimeo, Department of Mathematical Statistics, Lund Institute of Technology. Morozov, S. (2008). Learning and active control of stationary autoregression with unknown slope and persistence. Working paper, Stanford University. Morozov, S. (2009a). Bayesian active learning and control with uncertain twoperiod impulse response. Working paper, Stanford University. Morozov, S. (2009b). Limits of passive learning in the Bayesian active learning control of drifting coefficient regression. Working paper, Stanford University. Munshi, A., Gaster, B., Mattson, T. G., Fung, J., & Ginsburg, D. (2011). OpenCL Programming Guide. Reading, Massachusetts: AddisonWesley Professional. Nagurney, A., Takayama, T., & Zhang, D. (1995). Massively parallel computation of spatial price equilibrium problems as dynamical systems. Journal of Economic Dynamics and Control, 19(12), 3–37. Nagurney, A. & Zhang, D. (1998). A massively parallel implementation of discretetime algorithm for the computation of dynamic elastic demand and traffic problems modeled as projected dynamical systems. Journal of Economic Dynamics and Control, 22(89), 1467–1485. Nelder, J. A. & Mead, R. (1965). A simplex method for function minimization. Computer Journal, 7, 308–313. NVIDIA (2009a). CUBLAS Library (2.3 ed.). Santa Clara, CA: NVIDIA Corporation. NVIDIA (2009b). CUFFT Library (2.3 ed.). Santa Clara, CA: NVIDIA Corporation. NVIDIA (2009c). NVIDIA CUDA C Programming Best Practices Guide (CUDA Toolkit 2.3 ed.). Santa Clara, CA 95050: NVIDIA Corporation. NVIDIA (2010). CUDA CUSPARSE Library. Santa Clara, CA: NVIDIA Corporation. NVIDIA (2011). NVIDIA CUDA C Programming Guide. (3.2 ed.). Santa Clara, CA: NVIDIA Corporation. Paradalos, P. M. & Romeijn, H. E. (Eds.). (2002). Handbook of Global Optimization, volume 2 of Nonconvex Optimization and Its Applications. Dordrecht, The Netherlands: Kluwer Academic Publishers. Pflug, G. C. & Swietanowski, A. (2000). Selected parallel optimization methods for financial management under uncertainty. Parallel Computing, 26(1), 3–25. Porteus, E. L. & Totten, J. (1978). Accelerated computation of the expected discounted returns in aMarkov chain. Operations Research, 26(2), 350–358. Prescott, E. C. (1972). The multiperiod control problem under uncertainty. Econometrica, 40(6), 1043–1058. Rahman, M. R., Thulasiram, R. K., & Thulasiraman, P. (2002). Forecasting stock prices using neural networks on a Beowulf cluster. In Akl, S. & T.Gonzalez (Eds.), Proceedings of the Fourteenth IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS 2002), (pp. 470–475)., Cambridge, MA. MIT Press. Sanders, J. & Kandrot, E. (2010). CUDA by Example: An Introduction to GeneralPurpose GPU Programming. Upper Saddle River, NJ: AddisonWesley Professional. Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Abrash, M., Dubey, P., Junkins, S., Lake, A., Sugerman, J., Cavin, R., Espasa, R., Grochowski, E., Juan, T., & Hanrahan, P. (2008). Larrabee: A manycore x86 architecture for visual computing. ACM Transactions on Graphics, 27(3), 15 pages. Sims, C. A., Waggoner, D. F., & Zha, T. (2008). Methods for inference in largescale multiple equation Markovswitching models. Journal of Econometrics, 142(2), 255–274. Spendley, W., Hext, G. R., & Himsworth, F. R. (1962). Sequential application of simplex designs in optimization and evolutionary design. Technometrics, 4, 441–461. Svensson, L. E. O. (1997). Optimal inflation targets, ’conservative’ central banks, and linear inflation contracts. American Economic Review, 87(1), 96–114. Swann, C. A. (2002). Maximum likelihood estimation using parallel computing: An introduction to MPI. Computational Economics, 19(2), 145–178. The Portland Group (2010a). CUDA Fortran Programming Guide and Reference. The Portland Group. Release 2011. The Portland Group (2010b). PGI Fortran & C Accelerator Compilers and Programming Model. The Portland Group. Version 1.3. Tibbits, M. M., Haran, M., & Liechty, J. C. (2009). Parallel multivariate slice sampling. Working paper, Pennsylvania State University. Tomov, S., Dongarra, J., Volkov, V., & Demmel, J. (2009). MAGMA Library. University of Tennessee, Knoxville, and University of California, Berkeley. Wieland, V. (2000). Learning by doing and the value of optimal experimentation. Journal of Economic Dynamics and Control, 24(4), 501–535. Zabinsky, Z. B. (2005). Stochastic Adaptive Search for Global Optimization. Springer. Zenios, S. A. (1999). Highperformance computing in finance: The last 10 years and the next. Parallel Computing, 25(1314), 2149–2175. 
URI:  https://mpra.ub.unimuenchen.de/id/eprint/30298 
Available Versions of this Item

Massively Parallel Computation Using Graphics Processors with Application to Optimal Experimentation in Dynamic Control. (deposited UNSPECIFIED)
 Massively parallel computation using graphics processors with application to optimal experimentation in dynamic control. (deposited 03 May 2011 17:05) [Currently Displayed]