Fischer, Manfred M. and Staufer, Petra (1998): Optimization in an Error Backpropagation Neural Network Environment with a Performance Test on a Pattern Classification Problem. Published in: Geographical Analysis , Vol. 31, No. 3 (1999): pp. 89-108.
Preview |
PDF
MPRA_paper_77810.pdf Download (646kB) | Preview |
Abstract
Various techniques of optimizing the multiple class cross-entropy error function to train single hidden layer neural network classifiers with softmax output transfer functions are investigated on a real-world multispectral pixel-by-pixel classification problem that is of fundamental importance in remote sensing. These techniques include epoch-based and batch versions of backpropagation of gradient descent, PR-conjugate gradient and BFGS quasi-Newton errors. The method of choice depends upon the nature of the learning task and whether one wants to optimize learning for speed or generalization performance. It was found that, comparatively considered, gradient descent error backpropagation provided the best and most stable out-of-sample performance results across batch and epoch-based modes of operation. If the goal is to maximize learning speed and a sacrifice in generalisation is acceptable, then PR-conjugate gradient error backpropagation tends to be superior. If the training set is very large, stochastic epoch-based versions of local optimizers should be chosen utilizing a larger rather than a smaller epoch size to avoid inacceptable instabilities in the generalization results.
Item Type: | MPRA Paper |
---|---|
Original Title: | Optimization in an Error Backpropagation Neural Network Environment with a Performance Test on a Pattern Classification Problem |
Language: | English |
Keywords: | Feedforward Neural Network Training, Numerical Optimization Techniques, Error Backpropagation, Cross-Entropy Error Function, Multispectral Pixel-by-Pixel Classification. |
Subjects: | C - Mathematical and Quantitative Methods > C4 - Econometric and Statistical Methods: Special Topics > C45 - Neural Networks and Related Topics |
Item ID: | 77810 |
Depositing User: | Dr. Manfred M. Fischer |
Date Deposited: | 06 Apr 2017 13:44 |
Last Modified: | 29 Sep 2019 14:14 |
References: | W. H. Press, S. A. Teukolsky,W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C. The Art of Scientific Computing. Cambridge, MA: Cambridge University Press, 2 ed., 1992. Y.-H. Pao, Adaptive Pattern Recognition and Neural Networks. Reading, MA: Addison Wesley, 1989. C. M. Bishop, Neural Networks for Pattern Recognition. Oxford: Clarendon Press, 1995. K. Hornik, M. Stinchcombe, and H. White, "Multilayer feedforward networks are universal approximators," Neural Networks, vol. 2, no. 4, pp. 359-366, 1989. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning internal representations by error propagation," in Parallel Distributed Processing: Explorations in the Microstructure of Cognition (D. E. Rumelhart, L. J. McClelland, and the PDP Research Group, eds.), vol. 1, pp. 318-332, Cambridge, MA: MIT Press, 1986. R. A. Jacobs, "Increased rates of convergence through learning rate adaptation," Neural Networks, vol. 1, no. 4, pp. 295-307, 1988. T. P. Vogl, J. K. Mangis, A. K. Rigler, W. T. Zink, and D. L. Alkon, "Accelerating the convergence of rhe back-propagation method," Biological Cybernetics, vol. 59, pp. 257-263, 1988. H. White, "Some asymptotic results for learning in single hidden-layer feedforward networks," Journal of the American Statistical Association, vol. 84, pp. 1003-1013, 1989. R. Battiti, "First- and second order methods for learning: Between steepest descent and Newton's method," Neural Computation, vol. 4, no. 2, pp. 141-166, 1992. P. P. Van der Smagt, "Minimization methods for training feedforward neural networks," Neural Networks, vol. 7, no. 1, pp. 1-11, 1994. E. Barnard, "Optimization for training neural nets," IEEE Transactions on Neural Networks, vol. 3, no. 2, pp. 232-240, 1992. M. F. Møller, "A scaled conjugate gradient algorithm for fast supervised learning," Neural Networks, vol. 6, no. 4, pp. 525-533, 1993. R. Brunelli, "Training neural nets through stochastic minimization," Neural Networks, vol. 7, no. 9, pp. 1405-1412, 1994. C. de Groot and D. Würtz, "'Plain backpropagation' and advanced optimization algorithms: A comparative study," Neurocomputing, vol. 6, no. 2, pp. 153-161, 1994. J. S. Bridle, "Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition," in Neurocomputing: Algorithms, Architectures and Applications (F. Fogelman Soulié and J. Hérault, eds.),pp. 227-236, New York: Springer, 1990. Y. Le Cun, "Generalization and network design strategies," in Connections in Perspective (M. Pfeifer, ed.), pp. 143-155, Amsterdam: North-Holland, 1989. W. Schiffmann, M. Jost, and R. Werner, "Comparison of optimized backpropagation algorithms," in European Symposium on Artificial Neural Networks (M. Verleysen ed.), (Brussels), pp. 97-104, 1993. P. Luenberger, Linear and Nonlinear Programing. Reading, MA: Addison-Wesley, 2 ed., 1984. R. Fletcher, Practical Methods of Optimization. New York: Wiley-Interscience, 1986. M. R. Hestenes and E. Stiefel, "Methods of conjugate gradients for solving linear systems," Journal of Research of the National Bureau of Standards, vol. 49, no. 6, pp. 409-436, 1952. A. Cichocki and R. Unbehauen, Neural Networks for Optimization and Signal Processing. Chichester New York Brisbane Toronto Singapore: Wiley, 1993. R. Battiti and G. Tecchiolli, "Learning with first, second and no derivatives: A case study in high energy physics," Neurocomputing, vol. 6, pp. 181-2061, 1994. D. F. Shanno, "Recent advances in numerical techniques for large-scale optimization," in Neural Networks for Robotics and Control (W. T. Miller, ed.), (Cambridge, MA), pp. 171-178, MIT Press, 1990. M. M. Fischer, S. Gopal, P. Staufer, and K. Steinnocher, "Evaluation of neural pattern classifiers for a remote sensing application," Geographical Systems, vol. 4, no. 2, pp. 195-224 and 243-244, 1997. D. F. Shanno, "Conjugate gradient methods with inexact searches," Mathematics of Operations Research, vol. 3, no. 3, pp. 244-256, 1978. |
URI: | https://mpra.ub.uni-muenchen.de/id/eprint/77810 |