In this paper, convergence analysis properties of online gradient training for backpropagation algorithm for feedforward neural networks with a two hidden layer is studied. We assume that in every training cycle, every training pattern in the training dataset is fed in a stochastic form to the feedforward multilayer neural network exactly once. In this study, we give a weak and strong convergence properties for the training approaches, indicating that the gradient of the error function goes to zero and the weights goes to a fixed point value, respectively. First, we give convergence result for completely stochastic order approach and then follows for special stochastic order approach. The conditions on the activation function of the network and the training rate to guarantee the convergence are relaxed compared with the existing results. Convergence properties in the current paper are studied for sigmoidal activation function type, however this results are also valid for other type of activation functions.
1. Baldi P., Vershynin R. The capacity of feedforward neural networks. Neural Networks, Vol. 116, 288–311 (2019).
2. Ismailov V.E. Approximation by neural networks with weights varying on a finite set of directions. Journal of Mathematical Analysis and Applications, Vol. 389, Issue 1, 72–83 (2012).
3. Chen Z., Cao F. The approximation operators with sigmoidal functions. Computers and Mathematics with Applications, Vol. 58, Issue 4, 758–765 (2009).
4. Pinkus A. Approximation theory of the MLP model in neural networks. Acta Numerica, Vol. 8, 143-195 (1999).
5. Yamashita R., Nishio M., Do R.K., Togashi K. Convolutional neural networks: an overview and application in radiology. Insights Imaging, Vol. 9, Issue 4, 611–629 (2018).
6. Werbos P.J. Beyond regression: new tools for prediction and analysis in the behavioral sciences. PhD thesis. Harvard University, Cambridge, MA, (1974).
7. Marakhimov A.R., Khudaybergenov K.K. A Fuzzy MLP Approach for Identification of Nonlinear Systems. Contemporary Mathematics. Fundamental Directions, Vol. 65, Issue 1, 44–53 (2019).
8. LeCun Y. Une procedure d'apprentissage pour reseau a seuil asymmetrique. A la Frontieredel'Intelligence Ariticielle des Sciences de la Connaissance des Neurosciences, Vol. 85, 599–604 (1985).
9. Rumelhart D.E., Hinton G.E., Williams R.J. Learning representations by back-propagation errors. Nature, Vol. 323, 533–536 (1986).
10. Wilson D.R., Martinez T. R. The general inefficiency of batch training for gradient descent learning. Neural Networks, Vol. 16, 1429–1451 (2003).
11. Nakama T. Theoretical analysis of batch and on-line training for gradient descent learning in neural networks. Neurocomputing, Vol. 73, 151–159 (2009).
12. Zhang H.S., Wu W., Liu F., Yao M.C. Boundedness and convergence of online gradient method with penalty for feedforward neural networks. IEEE Transactions on Neural Networks, Vol. 20, 1050–1054 (2009).
13. Bertsekas D.P., Tsitsiklis J.N. Neuro-dynamic programming. Athena Scientific, (1996).
14. Wu W., Feng G.R., Li X. Training multilayer perceptrons via minimization of sum of ridge functions. Advances in Computational Mathematics, Vol. 17, 331–347 (2002).
15. Wu W., Shao H.M., Qu D. Strong convergence of gradient methods for BP networks training. In Proceedings of 2005 international conference on neural networks and brains, 332–334 (2005).
16. Xu Z.B., Zhang R., Jin W.F. When on-line BP training converges. IEEE Transactions on Neural Networks, Vol. 20, 1529–1539 (2009).
17. Wua W., Wanga J., Chenga M., Li Z. Convergence analysis of online gradient method for BP neural networks. Neural Networks, Vol. 24, 91–98 (2011).
Marakhimov, Avazjon and Khudaybergenov, Kabul
"Convergence analysis of feedforward neural networks with backpropagation,"
Bulletin of National University of Uzbekistan: Mathematics and Natural Sciences: Vol. 2
, Article 1.
Available at: https://uzjournals.edu.uz/mns_nuu/vol2/iss2/1