key: cord-0045893-5r23lp37 authors: Kubica, Bartłomiej Jacek; Hoser, Paweł; Wiliński, Artur title: Interval Methods for Seeking Fixed Points of Recurrent Neural Networks date: 2020-05-22 journal: Computational Science - ICCS 2020 DOI: 10.1007/978-3-030-50420-5_30 sha: cc65523642d35877eebd99a6001a4d8b546f03a5 doc_id: 45893 cord_uid: 5r23lp37 The paper describes an application of interval methods to train recurrent neural networks and investigate their behavior. The HIBA_USNE multithreaded interval solver for nonlinear systems and algorithmic differentiation using ADHC are used. Using interval methods, we can not only train the network, but precisely localize all stationary points of the network. Preliminary numerical results for continuous Hopfield-like networks are presented. Artificial neural networks (ANN) have been used in many branches of science and technology, for the purposes of classification, modeling, approximation, etc. Several training algorithms have been proposed for this tool. In particular, several authors have applied interval algorithms for this purpose (cf., e.g., [5, 6, 19] ). Most of these efforts (all known to the authors) have been devoted to feedforward neural networks. Nevertheless, in some applications (like prediction of a time series or other issues related to dynamical systems, but also, e.g., in some implementations of the associative memory), we need the neural network to remember its previous states -and this can be achieved by using the feedback connections. In this paper, we apply interval methods to train this sort of networks. The output of the network is the vector of responses of each neuron, which, for the i'th one, is: where σ(·) is the activation function, described below. The weights can have both positive and negative values, i.e., neurons can both attract or repel each other. Also, typically, it is assumed that w ii = 0, i.e., neurons do not influence themselves directly, but only by means of influencing other neurons. Unlike most papers on Hopfield networks, we assume that the states of neurons are not discrete, but continuous: x i ∈ [−1, 1]. As the activation function, the step function has been used originally: but sigmoid functions can be used, as well; for instance: the hyperbolic tangent or the arctan. Please mind that in both above functions, (2) and (3), the value of the activation function ranges from −1 to 1 and not from 0 to 1, like we would have for some other types of ANNs. In our experiments, we stick to using activation functions of type (3) with β = 1, but other values β > 0 would make sense, as well. What is the purpose of such a network? It is an associative memory that can store some patterns. These patterns are fixed points of this network: when we feed the network with a vector, being one of the patters, the network results in the same vector on the output. What if we give another input, not being one of the remembered vectors? Then, the network will find the closest one of the patterns and return it. This may take a few iterations, before the output stabilizes. Networks presented in Fig. 1 , very popular in previous decades, has become less commonly used in practice, nowadays. Nevertheless, it is still an interesting object for investigations and results obtained for Hopfield-like networks should be easily extended to other ANNs. How to train a Hopfield network? There are various approaches and heuristics. Usually, we assume that the network is supposed to remember a given number of patterns (vectors) that should become its stationary points. An example is the Hebb rule, used when patterns are vectors of values +1 and −1 only, and the discrete activation function (2) is used. Its essence is to use the following weights: which results in the weights matrix of the form: Neither the above rules, nor most other training heuristics take into account problems that may arise while training the network: -several "spurious patterns" will, in fact, be stationary points of the network, as well as actual patterns, -capacity of the network is limited and there may exist no weight matrix, responding to all training vectors properly. Let us try to develop a more general approach. In general, there are two problems we may want to solve with respect to a recurrent ANN, described in Sect. 2: 1. We know the weights of the network and we want to find all stationary points. 2. We know all stationary points the network should have and we want to determine the weights, so that this condition was satisfied. In both cases, the system under consideration is similar: but different quantities are unknowns under the search or the given parameters. In the first case, we know the matrix of weights: W = [w ij ] and we seek x i 's and in the second case -vice versa. Also, the number of equations differs in both cases. The first problem is always well-determined: the number of unknowns and of equations is equal to the number of neurons n. The second problem is not necessarily well-determined: we have n · (n − 1) unknowns and the number of equations is equal to n · N , where N is the number of vectors to remember. To be more explicit: in the first case, we obtain the following problem: . . , n, such that: In the second case, it is: Find w ij , i, j = 1, . . . , n, such that: But in both cases, it is a system of nonlinear equations. What tools shall we apply to solve it? Interval analysis is well-known to be a tractable approach to finding a solution -or all solutions of a nonlinear equations system, like the above ones. There are several interval solvers of nonlinear systems (GlobSol, Ibex, Realpaver and SONIC are representative examples). In our research, we are using HIBA USNE [4] , developed by the first author. The name HIBA USNE stands for Heuristical Interval Branch-and-prune Algorithm for Underdetermined and well-determined Systems of Nonlinear Equations and it has been described in a series of papers (including [11, 12, [14] [15] [16] ; cf. Chap. 5 of [17] and the references therein). As the name states, the solver is based on interval methods (see, e.g., [8, 9, 20] ), that operate on intervals instead of real numbers (so that result of an operation on numbers always belongs to the result of operation on intervals that contain the numerical inputs). Such methods are robust, guaranteed to enclose all solutions, even if they are computationally intensive and memory demanding. Their important advantage is allowing not only to locate solutions of well-determined and underdetermined systems, but also to verify them, i.e., prove that in a given box there is a solution point (or a segment of the solution manifold). Details can be found in several textbooks, i.a., in these quoted above. Let us present the main algorithm (the standard interval notation, described in [10] , will be used). The solver is based on the branch-and-prune (B&P) schema that can be expressed by pseudocode presented in Algorithm 1. if (x does not contain solutions) then 8: discard x 9: else if (x is verified to contain a segment of the solution manifold) then 10: push (Lver, x) 11: else if (the tests resulted in two subboxes of x: x (1) and x (2) ) then 12: x = x (1) 13: push (L, x (2) ) 14: cycle loop 15: else if (wid x < ε) then 16: push (Lpos, x) {The box x is too small for bisection} 17: if (x was discarded or x was stored) then 18: if (L == ∅) then 19: return Lver, Lpos {All boxes have been considered} 20: x = pop (L) 21: else 22: bisect (x), obtaining x (1) and x (2) 23: x = x (1) 24: push (L, x (2) ) The "rejection/reduction tests", mentioned in the algorithm are described in previous papers (cf., e.g., [14] [15] [16] and references therein): -switching between the componentwise Newton operator (for larger boxes) and Gauss-Seidel with inverse-midpoint preconditioner, for smaller ones, -a heuristic to choose whether to use or not the BC3 algorithm, -a heuristic to choose when to use bound-consistency, -a heuristic to choose when to use hull-consistency, -sophisticated heuristics to choose the bisected component, -an additional second-order approximation procedure, -an initial exclusion phase of the algorithm (deleting some regions, not containing solutions) -based on Sobol sequences. It is also worth mentioning that as Algorithm 1, as some of the tests performed on subsequent boxes are implemented in a multithreaded manner. Papers [11] [12] [13] [14] [15] [16] discuss several details of this implementation and a summary can be found in Chap. 5 of [17] . The HIBA USNE solver collaborates with a library for algorithmic differentiation, also written by the first author. The library is called ADHC (Algorithmic Differentiation and Hull Consistency enforcing) [3] . Version 1.0 has been used in our experiments. This version has all necessary operations, including the exp function, used in (3), and division (that was not implemented in earlier versions of the package). Numerical experiments have been performed on a machine with two Intel Xeon E5-2695 v2 processors (2.4 GHz). Each of them has 12 cores and on each core two hyper-threads (HT) can run. So, 2 × 12 × 2 = 48 HT can be executed in parallel. The machine runs under control of a 64-bit GNU/Linux operating system, with the kernel 3.10.0-123.e17.x86 64 and glibc 2.17. They have non-uniform turbo frequencies from range 2.9-3.2 GHz. As there have been other users performing their computations also, we limited ourselves to using 24 threads only. The Intel C++ compiler ICC 15.0.2 has been used. The solver has been written in C++, using the C++11 standard. The C-XSC library (version 2.5.4) [1] was used for interval computations. The parallelization was done with the packaged version of TBB 4.3 [2] . The author's HIBA USNE solver has been used in version Beta2.5 and ADHC library, version 1.0. We consider the network with n neurons (n = 4 or n = 8) and storing 1 or 3 vectors. The first vector to remember is always (1, 1, . . . , 1) . The second one consists of n 2 values +1 and n 2 values −1. The third one consists of n − 2 values +1 and 2 values −1 (Tables 1 and 2) . The following notation is used in the tables: -fun.evals, grad.evals, Hesse evals -numbers of functions evaluations, functions' gradients and Hesse matrices evaluations (in the interval automatic differentiation arithmetic), -bisecs -the number of boxes bisections, The HIBA USNE solver can find solutions of Problem (6) pretty efficiently. The solutions get found correctly. For instance, in the case of four neurons and a single stored pattern, three solutions are quickly found; two of the solutions are guaranteed: For problems of small dimensionality, all solutions get found immediately. Unfortunately, the time increases quickly with the number of neurons (but not with the number of stored patterns!) in the network. This is partially because Hopfield networks are 'dense': each neuron is connected to all other ones. Multilayer networks have a more 'sparse' structure, that may improve the scalability of the branch-and-prune method. For Problem (7) of computing the weights matrix, the HIBA USNE solver was less successful. This is not surprising: Problem (7) is underdetermined, and can have uncountably many solutions. Actually, the solver has been successful on (7) when there had been no solutions: this can be verified easily, in many cases. As the sigmoid function (3) does not reach values ±1 for finite arguments, there are no weights for which sequences of ±1's are stationary points of the network, and the solver verifies it easily. Unfortunately, seeking weights for a feasible solution is not that efficient. For instance, seeking weights for a network with a single stationary point at (0.858, 0.858, 0.858, 0.858), had to be interrupted after three hours, without obtaining the results! Possibly, it would make sense to seek solutions of Problem (7) with some additional constraints, but this has not been determined yet. In such case, it might be beneficial to transform the equations to the form: Also, interval methods can naturally be applied to seek approximate fixed points, instead of precise ones, but such experiments have not been performed yet. The paper presents a promising application of interval methods and the HIBA USNE solver. It can be used both to train and investigate the behavior of a recurrent neural network. The interval solver of nonlinear systems can potentially be applied to determining the weights matrix of the network, but more importantly: to localizing all stationary points of the network. We have considered single-layer continuous Hopfield-like networks, but generalization to Hamming networks (Fig. 2) or convolutional multilayer ANNs (e.g., [7] ) seems straightforward. This will be the subject of our further research, as well as further studies about Hopfield network: seeking for periodic states, seeking for approximate stationary points, and more sophisticated interval algorithms to train the network. HIBA USNE Heuristical Interval Branch-and-prune Algorithm for Underdetermined and well-determined Systems of Nonlinear Equations -Beta 25 Solving the linear interval tolerance problem for weight initialization of neural networks On interval weighted three-layer neural networks Deep Learning Applied Interval Analysis Rigorous Global Search: Continuous Problems Standardized notation in interval analysis Interval methods for solving underdetermined nonlinear equations systems Tuning the multithreaded interval method for solving underdetermined systems of nonlinear equations Excluding regions using Sobol sequences in an interval branch-andprune method for nonlinear systems Presentation of a highly tuned multithreaded interval solver for underdetermined and well-determined nonlinear systems Parallelization of a bound-consistency enforcing procedure and its application in solving nonlinear systems Role of hull-consistency in the HIBA USNE multithreaded solver for nonlinear systems Interval Methods for Solving Nonlinear Constraint Satisfaction, Optimization and Similar Problems. SCI Hopfield-type neural networks Numerical methods of interval analysis in learning neural network Finite-dimensional interval analysis. Institute of Computational Technologies, Sibirian Branch of Russian Academy of Science