Some Early References on Learning in Artificial Neural Networks
Below are references from the book of White, Halbert et al, 1992, Artificial Neural Networks: Approximation and Learning Theory.
Abstract Inference Ulf Grenander. 1981 “Method of sieves” learn multilayer feedforward networks
White, Halbert and J.M. Wooldridge (1989) Some results for sieve estimation with dependent observations. In W. Barnett, J. Powell, and G. Tauchen (Des), Nonparametric and Semi-Parametric Methods in Econometrics and Statistics. New York: Cambridge University Press.
Learning-logic : casting the cortex of the human brain in silicon. David B. Parker, 1956-
Ash, T. (1989) Dynamic node creation in backpropogation networks. Poster presentation, International Joint Conference on Neural Networks, Washington, DC.
Domowitz, I and H White (1982) Misspecified models with dependent observations. Journal of Econometrics, 20, 35-50.
Funahashi, K, (1989) On the approximate realisation of continuous mappings by neural networks. Neural Networks, 2, 183-92.
Hirose Y, K Yamashita and S Hijiya (1989) Back-propogation algorithm which varies the number of hidden units. Poster presentation. International Joint Conference on Neural Networks, Washington, DC.
Holland J (1975) Adaptation in Natural and Artificial Systems. Ann Arbor: University of Michigan Press.
Kuan, C-M and H White (1989) Recursive M-estimation, nonlinear regression and neural network learning with dependent observations. UCSD Department of Economics discussion paper.
Kullback S and R A Leibler (1951) On information and sufficiency. Annals of Mathematical Statistics, 22, 79-86.
Ljung L (1977). Analysis of recursive stochastic algorithms. IEEE Transactions on Automatic Control, AC-22, 551-75.
Rumelhart, D (1988) Parallel distributed processing. Plenary lecture, IEEE International Conference on Neural Networks, San Diego.
Tishby, N, E Levin and S Solla (1989) Consistent inference of probabilities in layered networks: predictions and generalisation. In Proceedings of the International Joint Conference on Neural Networks, Washington, DC. New York: IEEE Press, II, 403-9.
White, H (1988) Multilayer feedforward networks can learn arbitrary mappings: connectionist nonparametric regression with automatic and semi-automatic determination of network complexity. UCSD Department of Economics discussion paper.
Wiener, N (1948) Cybernetics. New York: Wiley.
Wooldridge, J (1989) Some results on specification testing against nonparametric alternatives. MIT department of Economics working paper.
Ronald J. Williams https://www.ccs.neu.edu/home/rjw/
Explorations in Parallel Distributed Processing: A Handbook of Models, Programs, and Exercises, James L. McClelland and David E. Rumelhart, 1988
Semantic Cognition: A Parallel Distributed Processing Approach, Timothy T. Rogers, James L. McClelland, 2004
Convergence of Real-time tracking
Gerencser Laszlo 1986 Paramter Tracking of Time-Varying Continuous-Time Linear Stochastic Systems, in Modelling, Identification and Robust Control, C.I Byrnes and A Lindquist, ed. New York: Elsevier, 581-594
Convergence of Learning for RNN
Kushner, Harold J and Dean S Clark (1978): Stochastic Approximation Methods for Constrained and Unconstrained Systems, New York: Springer-Verlag.
Kuan, Chung-Ming, Kurt Hornik and Halbert White (1990): Some Convergence Results for Learning in Recurrent Neural Networks, UCSD Department of Economics Discussion Paper.