The Eight Epochs of Math as regards past and future Matrix Computation

This paper gives a personal assessment of Epoch making advances in Matrix Computations from antiquity and with an eye towards tomorrow. We trace the development of number systems and elementary algebra, and the uses of Gaussian Elimination methods from around 2000 BC on to current real-time Neural Network computations to solve time-varying linear equations. We include relevant advances from China from the 3rd century AD on, and from India and Persia in the 9th century and discuss the conceptual genesis of vectors and matrices in central Europe and Japan in the 14th through 17th centuries AD. Followed by the 150 year cul-de-sac of polynomial root finder research for matrix eigenvalues, as well as the superbly useful matrix iterative methods and Francis' eigenvalue Algorithm from last century. Then we explain the recent use of initial value problem solvers to master time-varying linear and nonlinear matrix equations via Neural Networks. We end with a short outlook upon new hardware schemes with multilevel processors that go beyond the 0-1 base 2 framework which all of our past and current electronic computers have been using.


Introduction
In this paper we try to outline the Epoch making achievements and transformations that have occurred over time for computations and more specifically for matrix computations. We will trace how our linear algebraic concepts and matrix computations have progressed from the beginning of recorded time until today and how they will likely progress into the future. We take this limited tack simply because in modern times matrices have become the elemental and universal tools for most any computation. This evolution of our matrix methods will be described in broad strokes. My main emphasis is to trace the mathematical genesis of Matrices and their uses and to learn how the modern matrix concept has evolved in the past and how it is evolving. I am not interested in Matrix Theory by itself, but rather in Matrix Computations, i.e., how matrix concepts and algorithms have been developed from approximately 3000 BC to today, and even tomorrow. This paper describes eight noticeably separate Epochs that are distinguished from each other by the introduction of evolutionary new concepts and subsequent radically new computational methods. Following the historical trail through six historically established Epochs, we will then look into the present and the near future. What drives us to conceptualize and compute differently now, what is leading us into the 7th and possibly 8th Epoch? When and how will we likely compute in the future. I am not a math historian, I have never taught a class in math history. Instead throughout my academic career I have worked with matrices: in matrix theory, in applications and in numerical analysis and I like to construct efficient new algorithms that solve matrix equations. The idea for this paper is in part due to my listening by chance to a very short English broadcast from Egyptian Radio on short wave some 40 years ago in the 1970s. It described an Egyptian papyrus from around 2000 BC that dealt with solving linear equations by row reduction and zeroing out coefficients in systems of linear equations, i.e., by what we now call " Gaussian Elimination". When I heard this as a young Ph. D., I was fascinated and wrote the station for more information. They never answered and when I was in Cairo many years later, the Egyptian Museum personnel could not help me either with locating the source.
Thus I became aware that Carl Friedrich Gauß did not invent what we now call by his name; but who did? arXiv:2008.01900v1 [math.NA] 5 Aug 2020 For many decades, this snippet of math history just lingered in my mind until a year ago when I was sent a book on Neural Network (NN) methods for solving time-varying linear and non-linear equations and was asked to review it. The NN methods were -to me and my understandings of numerics then -so other-worldly and brilliant that I began to think of the incredible leaps and 'bounces' that math computations have gone through over the eons, from era to era. I eventually began to detect 7 or 8 computational Sea Changes, what I call " Epochs", in our ability to compute with matrices and that is my topic.

A Short History of Matrix Computations
Nobody knows how numbers and number systems came about, just like nobody knows 'who invented the wheel'. I will start with a few historical facts about number systems and how they developed and were used across the globe in antiquity.

Early Number Systems
Humankind's first developments of number systems were very diverse and geographically widely dispersed, yet rather slow. The first circle cypher for zero occured in Babylonia around 2500 BC, or 4500 years ago. A continent or two removed, the Mayans used the same concept and circle zero symbol from around 40 BC. In India it was recognized during the 7th century. But zero only became recognized as a 'number to compute with' like all the others in the 9th century in central India. Our decimal system builds on the ten numbers 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. The decimal positional system came from China via the Indus valley and it started to be used in Persia in the 9th century. It was combined with or derived from a Hindu number system of the same time period. In fact westerners call the current decimal number symbols wrongfully 'arabic', but most westerners (and I) cannot read the license plates on cars in Egypt since the arabic world does not use our Persian/Hindu numbers in writing but its own script using arabic letters to designate numbers. Should we call our 'western' numbers 'farsi' or 'hindu' instead?
Various bases have been used for numbering. There have been base 2, base 8, base 10, base 12, base 16, base 60, base 200 number systems and possibly more at some time somewhere throughout human history. Counting and simple computations started with notched sticks for record keeping and with the invention of sand or wax tablets and then the abacus. These simple tools were developed a little bit differently and independently in many parts of the globe.

Antiquity (1st Epoch)
Around 2200 to 1600 BC, Sumerian, Babylonian and Egyptian land survey computations became mathematized in order to mark and allot land after the yearly Euphrates, Tigris and Nile floods. That naturally lead to linear equations in 2, 3 or 4 variables and subsequent methods to solve them that amounted to what we now call row reduction or Gaussian elimination. Mathematics Computations did not advance much during the Greek times as Greek mathematicians were mainly interested in mathematical theory and in establishing the concept of a formal proof, as well as elementary number theory of which the Euclidean algorithm is still used today. Neither did the complicated Roman numerals lend themselves to easy computations and no further computational advances happened there.

Early Mathematical Arts in China, India and the Near East (2nd Epoch)
(Based in part on a lecture at Hong Kong University in 2017, given by Zhou Xiangyu, Chinese Academy of Sciences, for Chinese sources. And on Indian and Arabic sources from elsewhere) In prehistoric and historic times (1600 BC -1400 AD), knot and rod calculus were prevalent in China. They were based on a decimal positional system, the so-called "rod numerals". These comprised the most advanced number system of the time and it was used for several millennia before being adopted and expanded in Persia and India in the 9th century AD and later on adopted in central Europe. The "Mathematical Classic of Sunzi" by Suanjing (from the 3rd -5th century) gives a detailed description of the arithmetic rules for counting rods. In the Indus valley clay tablets covered with sand were used for mathematical computations several millennia ago. Bhaskara (600 -680 AD) in India was first to write numbers in the Hindu positional decimal system which used the circle for zero. In 629 AD he approximated the sine function by rational expressions while commenting on Aryabhatta's (476 -550 AD) book "Aryabhatiyabhisya" from 499 AD. An Indian contemporary of Bhaskara, Brahmagupta (598 -665 AD) was the first to establish the rules that govern computing with zero. Brahmagupta texts were written in Sanskrit verse that used the Sanskrit word for 'eyes' to denote 2 and 'senses' for the number 5 etc. This was common in Indian mathematics and science writings at the time. The earliest record of multiplication and division algorithms using the Hindu numerals 1 through 9 and 0 was in writings by Al Khwarizmi 780 -850 AD, a Persian mathematician employed in Bagdad. His "Book of Manipulation and Restoration" established the golden rule of Algebra that an equation remains true if one subtracts the same quantity from both sides. He also wrote down multiplication and division rules that are identical to those of Suanjing from the 3rd to 5th century and China. To Suanjing we also owe the Chinese Remainder Theorem. Finally the advanced Hindu-Arabic decimal number system was introduced into the West by Leonardo Fibonacci (1175 -1250 AD) of Pisa in his "Liber Abaci" or 'The Book of Calculations' (1202), Applied and numerical computations were driving much of Chinese mathematics. Wang Xiaotang (580 -640 AD), for example, tried to find the roots of cubic polynomials that appeared in civil engineering and water conservation problems. In the Mathematical Treatise in Nine Sections of 1247, Qin Jiushao (1202 -1261) developed the 'Qin Jiushao method' which is now commonly called the 'Horner-Ruffini scheme' for computing with and finding roots of polynomials iteratively. William George Horner (1819) [16] and Paolo Ruffini (1804-1807-1813) reinvented the Qin Jiushao method unknowingly six hundred years later. Chinese rod calculus was the method of choice for computing in China until the abacus took over during the Ming Dynasty 1388 -1644. Cheng Dawei 1355 -1606 is the author of the first 'Numerical Analysis' book titled "The General Source of Computational Methods" and published in 1592. It describes methods to add, subtract, multiply, and divide on an abacus. The abacus itself was invented in various incarnations at various times and in several locations of the globe. It essentially combines several decimal rods on one board with beads on strings. Chinese mathematicians from the 3rd century BC onwards to the 10th century AD brought us the "Nine Chapters on the Mathematical Arts" that uses the numbers 1 through 9. This book was later disseminated further to the west, to India and Persia as described above. In its 7th chapter, determinants first appeared conceptually, while chapter 8 abstracts the concept of linear equations to represent them by matrix-like coefficient tableaux. These 'matrix equations' were solved in China, again by 'Gaussian elimination', 1500 years before Gauß' birth and 1800 years after the middle-eastern seasonal flood prone countries had first used the Gaussian algorithm around 1800 BC. Gauß himself described the method as the "common method of elimination" in his papers and mathematicians then attached his name to it as an honor.

The Genesis of Vectors and Matrices (3rd Epoch)
To advance matrix computations further there was a need to conceptualize coordinates and vectors in space. In the 14th century AD, Nicole Oresme developed a system of orthogonal coordinates for describing Euclidean space. This idea was taken up by René Descartes in the 17th century and is familiar to all of us now under the concept of Cartesian Coordinates. Thereby the world became ready for matrices and matrix computations in their own right. In 1683 Gottfried Leibnitz in Germany and Seki Kowa in Japan both and unbeknownst to each other re-invented the concept of a matrix as a rectangular array of coefficients for studying linear equations. Leibnitz also used and suggested row elimination to simplify and find their solutions. These efforts enabled Gauß to repeat what the Egyptians had done four millennia earlier: He was asked to survey the lands of his ruler, the Archduke George Augustus of Hanover and measure the size of this kingdom inside Germany in the early 1800s. Beginning in 1821 and 1823 Gauß , as Professor of Geodesy (and not of Mathematics) in Göttingen, would measure the angles and distances between many of the highest points there, such as the Brocken, the Inselsberg, 104 km apart, and the hills around Göttingen and later expanded the surveys all the way to the North Sea. He did this multiple times, preferably when the weather was clear. Thereby he set up systems of linear equations with generally more equations than unknown due to repeated measurements on different days. To solve these overdetermined and naturally 'unsolvable' system Ax = b, Gauß devised the normal equation A T Ax = A T b and solved it approximately. But the normal equations method eventually turned out to be numerically unsound. It took over a century to find out why, the reason being that condition numbers multiply, see Olga Taussky (1950) [26].

Eigenvalues and the Characteristic Polynomial (4th Epoch)
As differential operators and matrices were beginning to be investigated and dealt with by the early 1800s, their connections and similarities were slowly recognized in the mathematics world. The replication of certain functions f = 0 by a given differential operator A was noticed first and became the subject of studies. What were the functions f for which A f = α f for some scalar α? How could they be found from A , what about α?
In 1829 Augustin Cauchy [5, p. 175] began to view the erstwhile 'eigenvalue equation' A f = α f as a 'null space equation', namely A f − α f = 0 or (A − α id) f = 0 for the identity operator with id f = f for all f .Complete knowledge of the eigenvalues α and eigenfunctions f of a differential operator A allowed for a simple sum representations of the general solution of the linear differential equation described by A . Thus Cauchy's 'null space equation' became essential for determining the behavior of systems governed by linear differential equations. Cauchy's knowledge of and interest in determinants (think Cauchy-Binet Theorem) then led him to define the 'characteristic polynomial' of a square matrix A in 1839 [6, p. 827] as f A (α) = det(A − αI) and thereby he initiated renewed studies in polynomial root finding algorithms in the hope of obtaining analogous diagonalization results for linear matrix times vector products. And the search for polynomial rootfinders was on. By modern day hindsight, reducing the eigenvalue problem from an n 2 data problem of the entries of a matrix A n,n to one of the n + 1 coefficients of its characteristic polynomial is data compression and therefore it was doomed to fail. But that remained unrealized by the mathematics community for more than 100 years. James Sylvester finally gave the tableau concept of matrices its name 'matrix' in 1848 or 1850. And after roughly 2 decades 1829 -> 1839 -> 1848/50, the first century of Matrix Theory or theoretical Linear Algebra had begun.
Back to matrix computations: Cauchy's idea led mathematicians to try and compute the characteristic polynomials of matrices and find their roots in order to understand the eigen-behavior of matrices. We still teach many concepts and lessons today that are based on the 'characteristic polynomial' f A (x) of a matrix A. Why, we should ask ourselves. Because unfortunately studying 'characteristic polynomials' in place of matrices has turned out be a costly dead end for computational and applied mathematics: In the century and a half that followed Cauchy's work, more than 4,000 papers on computing the roots of polynomials were published, together with 200 to 300 books on the subject, bringing us many algorithms, all of which failed more often than not. Many illustrious careers and schools of mathematics were founded based on this unfortunate and ever elusive goal. During the same period, 2-D hand crank computing machines were invented and built to effect long number multiplications and divisions. First by Charles Babbage, then as commercial geared adding machines that were still being used in office work well into the 1960s. These worked as two-dimensional abaci of sorts. But eventually digital (at first punch-card fed) computers became the tools of our computational trade in science, in engineering, in business, in GPS, in Google, in social media, large data, automation, etc, etc.
But how could we or would we find matrix eigenvalues accurately? A turn-around, a new method, a new computational Epoch was needed. From where, by whom, and how ...?

Iterative Matrix Algorithms (5th Epoch)
To move us forward, it appears that matrix methods themselves might have to be developed that would solve the matrix-intrinsic eigenvalue problem by themselves. But before that was possible there were further unfortunate 'detours'.
Carl Friedrich Gauß -in his doctoral thesis of 1799 [12] -had disproved all earlier attempts to establish the Fundamental Theorem of Algebra, i.e., that all polynomials over the reals numbers can be factored into as many real or complex conjugate factors as their degree says. His thesis then included the first complete and correct proof of the Fundamental Theorem of Algebra. In 1824 Niels Abel [1] showed that the roots of some 5th degree polynomials can not be found by using radical expressions of their coefficients; Gauss never opened or read the submitted paper and thus in fact rejected it knowingly on the 'grounds' that God would not have complicated the World thus ... for us. Abel published his result privately, a broken man.Évariste Galois [9,10] extended Abel's result in 1830 by giving group theoretic conditions for polynomials to be solvable by radicals; the extended paper (introducing Galois Theory) was originally rejected and appeared only posthumously in 1846 [11]. Cautioned by these 'rejected' inconvenient results, the polynomial approach to matrix eigenvalue computations could have been shunned by clearer heads early on; but the 'dead end' determinants and characteristic polynomial roots road was taken instead for more than a century. Note that Cauchy's matrix result and most other fundamental matrix results from the 19th century were formulated in terms of determinants and only in the mid 20th century did the term 'matrix' appear in matrix theoretical article and book titles. A matrix based approach to the eigenvalue problem nowadays starts from the simple fact that for any n by n matrix A and any n vector b, the sequence of vector iterates b, Ab, A 2 b, ..., A k b, ..., A n b contains n + 1 vectors in n-space which makes these vectors linearly dependent. Their linear dependency then leads to an nth degree polynomial p b (A) that sends b to zero. The vanishing polynomial for any b turns out to always be a factor of the characteristic polynomial of A and it can be found by Gaussian elimination rather than using determinants. The same idea shows that vector iteration converges for every starting vector b = 0 and any given square matrix A and this has lead engineers in the early 20th century to construct iterative matrix algorithms that could solve linear equations and the matrix eigenvalue problem. Iterative matrix algorithms actually do go back further, such as to the Jacobi (1839) [17], Gauß -Seidel (1874) , and various SOR methods that are designed to solve linear systems iteratively. These use matrix splittings of A rather than vector iteration. Alexei Krylov (1931) [19] introduced and studied the vector iteration subspaces span{b, Ab, ..., A k b} in their own right. Following his ideas, large sparse matrix systems are nowadays treated iteratively in so called Krylov-based methods, both to solve linear equations and to find matrix eigenvalues. Standard widely used Krylov type iterative matrix algorithms carry the names of Steepest Descent, Conjugate Gradient, Arnoldi (1951) [2], Lanczos (1950) [20]. Others are called GMRES, BICGSTABLE, QMR, ADI etc. Most Krylov type methods are matrix and problem specific and they are mostly used for huge sparse and structured matrices where direct or semi-direct methods cannot be employed due to their high computational and storage costs. Krylov methods generally rely on preconditioners M for linear systems Ax = b that shift the spectrum of M −1 A for faster convergence and they thrive on incomplete matrix splittings etc. Typically they give only partial results. Who needs to know all the million eigenvalues of a million by million matrix model. Krylov methods can be tuned to give information where it is needed for the underlying system.

Francis Algorithm and Matrix Eigenvalues (6th Epoch)
The Second World War and post Second World War periods were filled with innovations: The atomic era had begun, as well as rocket science; commercial air flight became popular; digital computers were being developed, first as valve machines and later transistorized. Supersonic speeds were realized, Computer Science was developed, etc. But there were many crashes and disasters with the new technologies: Commercial aircraft (Super-Constellation, Convair, ...) and military ones (Starfighter ...) would crash weekly around the globe; and newly built suspension bridges would collapse in strong winds. The crux of the matter was that while matrix models of the underlying mechanical systems could readily be made using the laws of physics and mechanics, no one could reliably compute their eigenvalues. Engineers could not test their designs for eigen-modes in the right half plane! And Krylov methods were unfortunately not sufficient for testing for eigenvalues in a half plane. If a matrix model of a mechanical or electrical or ... structure has right-half plane eigenvalues λ then -upon proper excitation -there would be a eigen component of the ever increasing form e λt → ∞ as t → ∞ that resonates and self-amplifies inside the structure itself. This then leads to ever increasing destructive vibrations and ultimate failure. The aircraft 'flutter problem' was discovered during World War 2. In England during WW2, Gershgorin circles that contain all of a system's eigenvalues were drawn out in the complex plane by rather primitive valve computers and checked to ascertain system stability. The general matrix eigenvalue problem was finally solved independently and similarly by John Francis in London and by Vera Kublanovskaya in Russia nearly simultaneously around 1960. Francis' (or the QR) algorithm [7,8] is based on Alston Householder's idea to try and solve matrix problems by matrix factorizations. Francis' method is an orthogonal subspace projection method and it works differently than the Krylov based methods which solve a given matrix eigenvalue problem by projecting onto a Krylov subspace that is derived from and suitable for A. A 'divide and conquer' matrix factorization strategy was first employed by Heinz Rutishauser (1955Rutishauser ( , 1958 [24,25] in his LR matrix eigenvalue algorithm: If one can factor A = LR into the product of a lower and an upper triangular matrix L and R as A = LR and if L is invertible, then for the reverse order product A 1 = RL we have A 1 = L −1 AL since R = L −1 A. If A 1 again allows an LR factorization A 1 = L 1 R 1 with L 1 nonsingular, then by reverse order multiplying we obtain  [15], he did not know who that might have been.) Francis was aware through contacts with Jim Wilkinson of the backward stability of algorithms that involve orthogonal matrices Q. So rather than using Gaussian elimination matrices L, Francis experimented with orthogonal A = QR factorizations. At roughly the same time Vera Kublanovskaya worked on an LQ factorization of A as A = LQ and subsequent reverse order multiplications [18] in Leningrad, Russia, that would also compute the eigenvalues of A. Rutishauser had observed convergence speed-up for his LR method when replacing A by A − αI, i.e., shifting. Hence Francis experimented with shifts for QR and then established the 'Implicit Q Theorem' [8] in order to circumvent computing eigenvalues of real matrices over the complex numbers. Implicit shifts also avoid rounding errors that would be introduced by explicit shifts. Francis' second paper (1962) [8] also contains a fully computed flutter matrix problem of size 24 by 24. The eigenvalues of such 'large' problems had never before been computed successfully. Francis' Implicit Q Theorem then allowed Gene Golub and Velvel Kahan (1965) [14] to compute singular values of large matrices for the first time and this application later spawned the original Google search engine and brought us -in a way -into the internet age. In 2002 the Multishift QR Algorithm was developed by Karen Braman, Ralph Byers, and Roy Mathias [3,4]. It relies on subspace iteration and extends Francis' QR and combes it with Krylov like methods. This extension allows us today to compute the complete eigen and singular value structure of dense matrices of sizes up to 10,000 by 10,000 economically on laptops.
What is being missed today computationally? What Epoch(s) might come next? Why and how? Our current best numerical codes can solve static problem very well; that is what they are designed for. As we begin to rely more and more on time-dependent sensoring and on robotic manufacture, we need to learn how to solve our erstwhile static equations, but now in real time and with time-varying coefficients, and preferably accurately as well. It seems quite alluring to try and solve a time-varying problem by using the static time-dependent inputs at each instance statically. But such a naive solution cannot suffice since at the next time step, whose 'solution' we have just computed 'statically', the problem parameters have already changed and thus our 'static' solution solves a completely different problem, which -unfortunately -has little value.
If any at all.

Computer Hardware ( Epoch 8 ) :
Since the earliest electronic computing devices of the 1940s, all our computers have worked as giant and embellished Turing machines with logic gates, switches and memory that rely only two numerical states: 0 and 1 or on or off. Hence all our computer data is stored and manipulated as sequences of 0 and 1. Lately our computing ability has come up against the limits of storing and working with data and processors that can only deal with zeros and ones. Our processing speeds have not advanced significantly over the last couple of years; we are still stuck with 3 -4 GHz processors. To alleviate this bottleneck, chip makers have created multi-processor chips and software firms have introduced better and quicker software and operating systems, but the basic processor speed has not budged much. At this time computer scientist and manufacturers are trying to overcome this 0 -1 bottle-neck by replacing our 0 -1 processors, chips, memories and transistors by improved transistors and chips that can store and process multi-states, such as 0-1-2-3-4 or 0-1-2-3-4-5-6-7-8 or even higher numbered data representations. This could lead us to another computing sea change bringing us into a new computational Epoch via hardware. And further out on the horizon lies the possibility of having infinitely many quantum states based computers.

On Neural Network Methods (Epoch 7 already under way)
The last century brought us valuable tools to solve most static problems that involve matrices.
Our current numerical matrix tools can solve static linear equations, matrix equations such as Sylvester or Lyapunov equations, eigenvalue problems and generalizations of these, both of the dense or structured and of the solvable or unsolvable kind; likewise we can solve static optimization problems of all sizes and for nearly all structured matrices and for most static applications. But what do do with such problems when their coefficients are time-varying or time dependent? In numerical computations, there has always been a see-saw between models that resulted in derivative inspired differential equations and in linear algebra based matrix equations. Their respective computational advantages differed from problem to problem. Neural networks (NN) are an amalgam of matrix methods and differential methods using a mixture of both. NN methods are designed to solve time-varying dynamical systems. Numerical methods for time-varying dynamical systems first came about in the 1950s and subsequently have gained strength in the 1980s and 1990s and beyond, see the introduction in Getz and Marsden [13] for example. To solve a time varying equation f (X(t)) = G(t), these studies start from the error equation E(t) = f (X(t)) − G(t) and stipulate exponential decay of the error function E(t) by trying to solvė for a positive decay constant λ . There are essentially three ways to go about solving the error function differential equation (1): homotopy methods, gradient methods and ZNN Neural Network methods.

A Neural Network approach to solve time-varying linear equations A(t)x(t) = b(t)
Here A(t) is a nonsingular time-varying n by n matrix and b(t) ∈ R n is a time-varying vector, respectively. Clearly the unknown solution x(t) of the associated linear equation A(t)x(t) = b(t) will be time-dependent as well.
The first paper on Zhang Neural Networks (ZNN) was written by Yunong Zhang et al. in 2002, see [27]. Today there are over 250 papers, mostly in engineering journals that deal with time-varying applications of the ZNN method, either in hardware chip design for specialized computational tasks as part of a plant or machine or for time-varying simulation problems in computer algorithms and codes. Unfortunately, the ZNN method and the ideas behind ZNN are hardly known today among numerical analysis experts. The method itself starts with using Suanjing's and Al Khwarizmi's rule for reducing equations that first appeared 1 1/2 millennia ago. This simple rule was also employed by Cauchy in 1829 to transform the static matrix eigenvalue problem from Ax = λ x to Ax − λ x = 0 and finally to det(A − λ I) = 0. For the time-varying linear equations problem Zhang's Neural Network method starts with ZNN then looks at the error function Note that standard static methods would look at an error norm E for the error function. Neural Networks do not, they study the error function E(t) instead. And they start with an implicit 'ideal wish': What could or should we wish for E? Time-varying computations would be near ideal if their error functions were decaying exponentially fast. This is impossible to achieve (or even ask for) with our best static equations and problem solvers of the 21st century. For static numerical matrix methods backward stability is considered best. In all Zhang NN methods we stipulate that the error function E(t) decreases exponentially fast over time to the zero function. This means thaṫ E(t) = −λ E(t) for some chosen number λ 0, the decay constant.
Note that for the time-varying linear equations problem we havė This leads to the following differential equation for x(t): And we have transformed the time-varying linear equations problem into an initial value differential equations problem that needs to be solved for t > 0. This is where the different dynamical systems methods split their ways. In Zhang Neural Networks, the continuous time differential equation (2) is then discretized for 0 < t j < t end and the ensuing derivatives are approximated by high order difference quotients, with the one for the unknownẋ(t j ) being 1-step ahead and proven convergent, while the others such as forȦ(t) andḃ(t) can be backward difference formulas. How to proceed from (2) with solving A(t)x(t) = b(t) via ZNN methods is still an open problem, especially for large scale sparse or structured time-varying linear equations since the matrix A(t j ) encumbers the unknowṅ x(t j ) on the left hand side and there is no known 1-step ahead differentiation formula that can be used here. The general idea that underlies ZNN methods for time-varying problems is to replace repeated matrix computations by solving linear differential equations and associated initial value problems for discrete instances 0 < t j < t end instead.

A Zhang Neural Network approach to find time-varying generalized matrix inverses Y(t) for time-varying full rank matrices B(t) with B(t) m,n Y(t) n,m = I m
This section is based on joint work with Jian Li, Mingzhi Mao, and Yunong Zhang, see [22].

Continuous Problem Formulation :
For an m by n real time varying matrix B(t) of full rank m with m ≤ n we form the matrix-valued error function where the upper + sign always means 'generalized inverse'. Then we use the design formulȧ with design parameter λ > 0. Based on [23, Lemma 3] we havė And from (3) and (5) we obtainĖ Combining (4) and (6), we then getḂ And by right multiplying (7) with Y (t) we have With m ≤ n, we have Y + m×n (t)Y n×m (t) = I m×m and thuṡ The solution of a generalized-matrix inverse problem is not unique when m < n and we only need to find a solution that satisfies (9). Consequently the continuous model can be represented aṡ which agrees completely with [13, formula (15), p. 317]. Substituting (10) into (9), we havė which we rewrite aṡ With Y + m×n (t)Y n×m (t) = I m×m , we havė Thus model (10) satisfies (9) and its solution solves the time-varying generalized matrix inverse problem.

Zhang Neural Network Discretization :
Given a sequence of rectangular matrices B j at time instances t j ≤ t k we want to find the discrete time-varying generalized matrix inverse Y k+1 of B k+1 on each computational time interval [kτ, (k + 1)τ) ⊆ [0,t f ] so that Here B k+1 = B(t k+1 ) = B((k + 1)τ) ∈ R m×n is a time-varying full rank equidistant matrix sequence, m ≤ n and Y k+1 ∈ R n×m is unknown. Y k+1 needs to be computed in real-time for each time interval [kτ, (k + 1)τ) ⊆ [0,t end ].
Here the matrix operator + denotes the generalized inverse of a matrix and 0 ∈ R m×n is the zero matrix. Besides, k = 0, 1, · · · denotes the updating index, t end denotes the task duration and τ denotes the constant sampling gap of the time-varying matrix sequence B k+1 . For m > n, the procedure is similar. Note that we must obtain each Y k+1 at or before time t k+1 for real-time calculations while the actual value of B k+1 is unknown before t k+1 . Thus we cannot obtain the solution by calculating Y k+1 = B + k+1 . To obtain Y k+1 in real-time, we must develop a model based on the available information from before t k+1 such as that in B j , Y j and Y j−1 for j ≤ k instead of unknown information such as B k+1 .
To obtain a discrete-time model that solves the original discrete time-varying generalized matrix inverse problem (14), we need to discretize the continuous model (10). First we use the conventional 1-step forward Euler formulaẋ with truncation error of order O(τ). Based on formula (15) we approximatė and use this equation to discretize the continuous model (10) as follows Here h = τλ . In most real-world applications, information of the first-order time derivatives, i.e., the value oḟ B k may not be explicitly known for the discrete time-varying generalized matrix inverse problem (14). If this is so, the value ofḂ k can be approximated by a backward finite difference formula. To assure the accuracy and simplicity of the discretized model, the truncation error of the backward finite difference formula forḂ k should be near equal to that of the 1-step-ahead finite difference formula that approximatesẎ k . ThusḂ k in (17) should best be approximated by Euler's backward finite difference formulȧ because the truncation error order O(τ) of formula (18) equals that of formula (15). Thus we approximately havė Then we combine equation (17) with equation (19) and the Euler discrete model becomes Note that the truncation error of the discrete model (20) is of order O(τ 2 ) where the symbol O(τ 2 ) denotes a matrix in which each entry is of order O(τ 2 ). This model uses only present or past information of B k , B k−1 and Y k and solves for Y k+1 . Thus Y k+1 can be calculated during the time interval [t k ,t k+1 ) and if Y k+1 can be computed quickly enough in real-time it will be ready when time instant t k+1 arrives. Higher accuracy 1-step ahead formulas exist for discrete models, namelẏ Both have truncation errors of order O(τ 2 ). For simplicity we only consider formula (21) and call it the 4-IFD formula because 4 instants in time are used to approximate the first-order derivative of x(t k ). When we employ the 4-IFD formula (21) inside the our continuous model (10) we obtain Next we use the three-instant backward finite difference formulȧ with error order O(τ 2 ) to approximate the value ofḂ k in equation (23). Then the 4-IFD-type discretized model becomes Its truncation error is of order O(τ 3 ). Similar to the Euler based discrete model (20), the 4-IFD-type discrete model uses only present and past information such as B k , B k−1 , B k−2 , Y k , Y k−1 and Y k−2 to solve for Y k+1 . Thus it also satisfies the requirements for real-time computation.
A 5 instants finite difference formula : Any usable finite difference formula for discretizing the continuous model (10) must satisfy several restrictions. It must be 1-step ahead forẋ, i.e., approximateẋ(t k ) by using x k+1 , x k , x k−1 and earlier data, and it must be 0-stable and convergent. However, 1-step ahead finite difference formulas do not necessarily generate stable and convergent discrete models, see e.g., [29,28].
Here is a new 1-step ahead finite difference formula with higher accuracy than the Euler and 4-IFD formulas. It will be used to generate a stable and convergent discrete model that finds time-varying generalized matrix inverses more accurately in real-time.
The proof relies on four Taylor expansions that use x(t k+1 ) and x(t k−1 ) through x(t k−3 ) around x(t k ) and clever linear combinations thereof.
The new 1-step ahead discretization formula (26) then leads to the 5 instants discrete model which has a truncation error of order O(τ 4 ).

Numerical Examples :
Example 1. Consider the discrete time-varying generalized matrix inverse problem

Example 2.
Here we consider the discrete time-varying matrix inversion problem

A Zhang Neural Network approach for solving nonlinear convex optimization problems under time-varying linear constraints
This section is based on joint work with Jian Li, Mingzhi Mao, and Yunong Zhang, see [21].
Problem formulation : Building a continuous time model, i.e., formulating an equation for the problem : The Zhang Neural Network approach can be built on the Lagrange function where l(t) = [l 1 (t), · · · , l m (t)] T ∈ R m is the Lagrange-multiplier vector and .. T denotes the transpose. Note that there will no need to solve for the Lagrange functions l(t) here. Set We transform the multiplier problem into an initial value DE problem instead. By stipulating exponential decay for h(t) we obtain the model equatioṅ y(t) = −H −1 (y(t),t) λ hy(t),t) +ḣ t (y(t),t) for the Jacobian matrix Discretizing the model and choosing suitable high order finite difference formulas : To discretize the continuous modelẏ we can use the forward Euler difference formula with truncation error order O(τ) τ or the four-instance forward difference formula (4-IFD) with truncation error order O(τ 2 ). The Euler formula yields the discretized model y k+1 = −H −1 (y k ,t k ) κh(y k ,t k ) + τḣ t (y k ,t k ) + y k with κ = τλ while the 4-IFD formula results in Both discretization formulas are consistent and convergent. This can be proved via the the roots of the associated characteristic polynomial. Its roots must lie in the complex unit circle and cannot be repeated on its boundary. Since the value ofḣ t (y k ,t k ) may not be known explicitly we may replace it bẏ which uses the three-point backward finite difference formulȧ Then the discretized FIFD formula becomes more complicated but easier to implement: To implement this formula, the inverse of the Jacobian matrix H can be computed at each time t k in a fraction of the available real-time interval [t k ,t k+1 ) by using the real-time inverse finding ZNN method from the previous subsection 2.2.

Numerical example and results :
As an example we solve the following convex nonlinear optimization problem with known theoretical solution numerically by using our ZNN method, for further details and applications see [21]: Find min(cos(0.1t k+1 ) + 2)x 2 1 + (cos(0.1t k+1 ) + 2)x 2 2 + 2 sin(t k+1 )x 1 x 2 + sin(t k+1 )x 1 + cos(t k+1 )x 2 so that sin(0.2t k+1 )x 1 + cos(0.2t k+1 )x 2 = cos(t k+1 ). The FIFD formula is a 4 instance formula, while the Euler formula needs only two. Both discretization models work in real-time and both typically create the optimal solution in a fraction of a second with differing degrees of accuracy according to their orders. The example below runs for 10 sec. The time-varying values for f (x(t),t), A(t) and b(t) are given as functions and evaluated from their function formulations. In real world applications these values might be supplied by sensors during each time interval t i ≤ t i+1 and the empirical values would be inserted into the difference formulas as they are evaluated by sensors in real time. 3 On Quantum and Multi-state Computing (Epoch 8 yet to start and come) Quantum Computing and multi-state memory and computers with multi-state processors will change the way we compute once they become available. They will require new operating systems and new software with new and yet to be discovered algorithms. What will this new era entail? Nobody knows or can reliably predict. I asked an 'expert' on quantum computing three years ago as to when he expected to have a quantum computer at his disposal or on his desk. The answer was : "Not in my lifetime, not in 20 years". Currently about a dozen or more research centers in Europe and South-East Asia are trying to build quantum computers based on the quantum superposition principle and quantum entanglement of elementary particles. They do so in a multitude of different ways. The envisioned benefit of these efforts would be to be able to compute super fast in parallel and in simulations to solve huge data problems quicker than ever before and to solve problems that are unassailable now with our current best supercomputer networks. All of the proposed quantum science techniques make use of superconducting circuits and particles. The aim is to build quantum computers in one or two decades with around 100 entangled quantum bits. Such a quantum computer would be bulky, it would need much supplementary equipment for cooling and so forth and could easily take up a whole floor of a building, just as the first German and British valve computers did in the 1940s. But it would surpass the computing capacity of all current supercomputers and desk and laptops on Earth combined. Currently the largest working entangled quantum array contains fewer than 10 quantum bits. Access of a 100 bit quantum computer would probably be via the Cloud and there would be no quantum computer laptops. Quantum computers may take another 10, 20 or 30 years to materialize.
How will they come about? Which yet unknown algorithms will they use? Who will invent them? Who code them? If history can be a guide, John Francis and Vera Kublanovskaya were both working independently on circuit diagrams and logic gate designs for valve computers in England and in Russia at the time when they discovered QR (or LQ) in the late 1950s. So we possibly are looking for quantum computer hardware and software designers who know numerical analysis and algorithm development in or about the year 2040. In a similar fashion, Leibnitz and Seki formalized our now ubiquitous matrix concept independently but simultaneously in 1683, in Germany and in Japan.
Maybe it will take two again?
The references given below only go back to the year 1799.