This survey paper gives a personal assessment of epoch-making advances in matrix computations, from antiquity and with an eye toward tomorrow. It traces the development of number systems and elementary algebra and the uses of Gaussian elimination methods from around 2000 BC on to current real-time neural network computations to solve time-varying matrix equations. The paper includes relevant advances from China from the third century AD on and from India and Persia in the ninth and later centuries. Then it discusses the conceptual genesis of vectors and matrices in Central Europe and in Japan in the fourteenth through seventeenth centuries AD, followed by the 150 year cul-de-sac of polynomial root finder research for matrix eigenvalues, as well as the superbly useful matrix iterative methods and Francis’ matrix eigenvalue algorithm from the last century. Finally, we explain the recent use of initial value problem solvers and high-order 1-step ahead discretization formulas to master time-varying linear and nonlinear matrix equations via Zhang neural networks. This paper ends with a short outlook upon new hardware schemes with multilevel processors that go beyond the 0–1 base 2 framework which all of our past and current electronic computers have been using.
- Math subject classifications
In this paper we try to outline the epoch making achievements and transformations that have occurred over time for computations and more specifically for matrix computations. We will trace how our linear algebraic concepts and matrix computations have progressed from the beginning of recorded time until today and how they will likely progress into the future. We take this limited tack simply because in modern times, matrices have become the elemental and universal tools for most any computation.
This evolution of our matrix methods will be described in broad strokes. My main emphasis is to trace the mathematical genesis of matrices and their uses and to learn how the modern matrix concept has evolved in the past and how it is evolving. I am not interested in matrix theory by itself, but rather in matrix computations, i.e., how matrix concepts and algorithms have been developed from approximately 3000 BC to today, and even tomorrow.
This paper describes eight noticeably separate epochs that are distinguished from each other by the introduction of evolutionary new concepts and subsequent radically new computational methods. Following the historical trail through six historically established epochs, we will then look into the present and the near future.
What drives us to conceptualize and compute differently now, and what is leading us into the seventh and possibly eighth epoch? When and how will we likely compute in the future?
I am not a math historian, I have never taught a class in math history. Instead, throughout my academic career, I have worked with matrices: in matrix theory, in applications, and in numerical analysis. I like to construct efficient new algorithms that solve matrix equations. The idea for this paper is in part due to my listening by chance to a very short English broadcast from Egyptian radio on short wave some 40 years ago in the 1970s. It described an Egyptian papyrus from around 2000 BC that dealt with solving linear equations by row reduction and zeroing out coefficients in systems of linear equations, i.e., by what we now call “Gaussian elimination.” When I heard this as a young PhD, I was fascinated and wrote the station for more information. They never answered, and when I was in Cairo many years later, the Egyptian Museum personnel could not help me either with locating the source.
Thus, I became aware that Carl Friedrich Gauß did not invent what we now call by his name, but who did?
For many decades, this snippet of math history just lingered in my mind until a year ago when I was sent a book on Zhang neural network (ZNN) methods for solving time-varying linear and nonlinear equations and was asked to review it. The ZNN methods were—to me and my understandings of numerics then—so other-worldly and brilliant that I began to think of the incredible leaps and “bounces” that math computations have gone through over the eons, from era to era. I eventually began to detect seven or eight computational sea changes, what I call “epochs,” in our ability to compute with matrices, and that is my topic.
2. A short history of matrix computations
Nobody knows how numbers and number systems came about, just like nobody knows “who invented the wheel.” I will start with a few historical facts about number systems and how they developed and were used across the globe in antiquity.
2.1. Early number systems
Humankind’s first developments of number systems were very diverse and geographically widely dispersed, yet rather slow. The first circle cipher for zero occurred in Babylonia around 2500 BC or 4500 years ago. A continent or two removed, the Mayans used the same concept and circle zero symbol from around 40 BC. In India, it was recognized during the seventh century. But zero only became recognized as a “number to compute with” like all the others in the 9th century in Central India. Our decimal system builds on the ten numbers 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. The decimal positional system came from China via the Indus valley, and it started to be used in Persia in the ninth century. It was combined with or derived from a Hindu number system of the same time period.
In fact Westerners call the current decimal number symbols wrongfully “Arabic,” but most Westerners (and I) cannot read the license plates on cars in Egypt since the Arabic world does not use our Persian/Hindu numbers in writing but its own script using Arabic letters to designate numbers. Should we call our “western” numbers “Farsi” or “Hindu” instead?
Various bases have been used for numbering. There have been base 2, base 8, base 10, base 12, base 16, base 60, and base 200 number systems and possibly more at some time somewhere throughout human history. Counting and simple computations started with notched sticks for record keeping and with the invention of sand or wax tablets and then the abacus. These simple tools were developed a little bit differently and independently in many parts of the globe.
2.2. Antiquity: first epoch
Around 2200 to 1600 BC, Sumerian, Babylonian, and Egyptian land survey computations became mathematized in order to mark and allot land after the yearly Euphrates, Tigris, and Nile floods. That lead to linear equations in 2, 3, or 4 variables and subsequent methods to solve them that amounted to what we now call row reduction or Gaussian elimination.
Mathematical computations did not advance much during the Greek times as Greek mathematicians were mainly interested in mathematical theory and in establishing the concept of a formal proof, as well as elementary number theory of which the Euclidean algorithm is still used today.
Neither did the complicated Roman numerals lend themselves to easy computations, and no further computational advances happened there.
2.3. Early mathematical arts in China, India, and the Near East: second epoch
(Based in part on a lecture at Hong Kong University in 2017, given by Xiangyu Zhou, Chinese Academy of Sciences, for Chinese sources, and on Indian and Arabic sources from elsewhere).
In prehistoric and historic times (1600 BC–1400 AD), knot and rod calculus were prevalent in China. They were based on a decimal positional system, so-called rod numerals. These comprised the most advanced number system of the time, and it was used for several millennia before being adopted and expanded in Persia and India in the ninth century AD and later on adopted in Central Europe.
The Mathematical Classic of Sunzi by Suanjing (from the third to the fifth century) gives a detailed description of the arithmetic rules for counting rods. In the Indus valley, clay tablets covered with sand were used for mathematical computations several millennia ago. Bhaskara (600–680 AD) in India was the first one to write numbers in the Hindu positional decimal system which used the circle for zero. In 629 AD he approximated the sine function by rational expressions while commenting on Aryabhatta’s (476–550 AD) book Aryabhatiyabhisya from 499 AD. An Indian contemporary of Bhaskara, Brahmagupta (598–665 AD), was the first one to establish the rules that govern computing with zero. Brahmagupta texts were written in Sanskrit verse that used the Sanskrit word for “eyes” to denote 2, “senses” for the number 5, etc. This was common in Indian mathematics and science writings at the time. The earliest record of multiplication and division algorithms using the Hindu numerals 1 through 9 and 0 was in writings by Al Khwarizmi 780–850 AD, a Persian mathematician employed in Bagdad. His The Book of Manipulation and Restoration established the golden rule of Algebra that an equation remains true if one subtracts the same quantity from both sides. He also wrote down multiplication and division rules that are identical to those of Suanjing from the third to fifth century in China. To Suanjing we also owe the Chinese remainder theorem. Finally, the advanced Hindu-Arabic decimal number system was introduced into the west by Leonardo Fibonacci (1175–1250 AD) of Pisa in his Liber Abaci or The Book of Calculations (1202),
Applied and numerical computations were driving much of Chinese mathematics. Wang Xiaotang (580–640 AD), for example, tried to find the roots of cubic polynomials that appeared in civil engineering and water conservation problems. In the Mathematical Treatise in Nine Sections of 1247, Qin Jiushao (1202–1261) developed the “Qin Jiushao method” which is now commonly called the “Horner-Ruffini scheme” for computing with and finding roots of polynomials iteratively. William George Horner  and Paolo Ruffini (1804–1807–1813) reinvented the Qin Jiushao method unknowingly 600 years later.
Chinese rod calculus was the method of choice for computing in China until the abacus took over during the Ming dynasty (1388–1644). Cheng Dawei (1355–1606) is the author of the first “numerical analysis” book titled The General Source of Computational Methods published in 1592. It describes methods to add, subtract, multiply, and divide on an abacus. The abacus itself was invented in various incarnations at various times and in several locations of the globe. It essentially combines several decimal rods on one board with beads on strings.
Chinese mathematicians from the third century BC onward to the tenth century AD brought us the The Nine Chapters on the Mathematical Arts that uses the numbers 1 through 9. This book was later disseminated further to the west and to India and Persia as described above. In Chapter 7, determinants first appeared conceptually, while Chapter 8 abstracts the concept of linear equations to represent them by matrix-like coefficient tableaux. These “matrix equations” were solved in China, again by “Gaussian elimination,” 1500 years before Gauß’ birth and 1800 years after the middle-eastern seasonal flood prone countries had first used the Gaussian algorithm around 1800 BC. Gauß himself described the method as the “common method of elimination” in his papers, and mathematicians then attached his name to it as an honor.
2.4. The genesis of vectors and matrices: third epoch
To advance matrix computations further, there was a need to conceptualize coordinates and vectors in space.
In the fourteenth century AD, Nicole Oresme developed a system of orthogonal coordinates for describing Euclidean space. This idea was taken up by René Descartes in the seventeenth century and is familiar to all of us now under the concept of Cartesian coordinates. Thereby, the world became ready for matrices and matrix computations in their own right.
In 1683 Gottfried Leibnitz in Germany and Seki Kowa in Japan both unbeknownst to each other reinvented the concept of a matrix as a rectangular array of coefficients for studying linear equations. Leibnitz also used and suggested row elimination to simplify and find their solutions. These efforts enabled Gauß to repeat what the Egyptians had done four millennia earlier: he was asked to survey the lands of his ruler, the Archduke George Augustus of Hanover, and measure the size of this kingdom inside Germany in the early 1800s. Beginning in the 1820s, Gauß, as Professor of Geodesy (and not of Mathematics) in Göttingen, would measure the angles and distances between many of the highest points there, such as the Brocken; the Inselsberg, 104 km apart; and the hills around Göttingen, and later he expanded the surveys all the way to the North Sea coast. He and his assistants did this multiple times, preferably when the weather was clear. Thereby, they set up systems of linear equations with generally more equations than unknown due to repeated measurements on different days.
To solve these overdetermined and naturally “unsolvable” systems , Gauß devised the normal equation (1823)  and solved them approximately. But the normal equations method eventually turned out to be numerically unsound. It took over a century to find out why, the reason being that condition numbers multiply (see Olga Taussky ).
2.5. Eigenvalues and the characteristic polynomial: fourth epoch
As differential operators and matrices were beginning to be investigated and dealt with by the early 1800s, their connections and similarities were slowly recognized in the mathematics world.
The replication of certain functions by a given differential operator was noticed first and became the subject of studies. What were the functions for which for some scalar ? How could they be found from , and what about ?
In 1829, Augustin Cauchy [4, p. 175] began to view the erstwhile “eigenvalue equation” as a “null space equation,” namely, or for the identity operator with for all . Complete knowledge of the eigenvalues and eigenfunctions of a differential operator allowed for a simple sum representations of the general solution of the linear differential equation described by . Thus, Cauchy’s “null space equation” became essential for determining the behavior of systems governed by linear differential equations.
Cauchy’s knowledge of and interest in determinants (think of the Cauchy-Binet theorem) then led him to define the “characteristic polynomial” of a square matrix in 1839 [5, p. 827] as , and thereby he initiated renewed studies in polynomial root-finding algorithms in the hope of obtaining analogous diagonalization results for linear matrix times vector products. And the search for polynomial root finders was on. By modern-day hindsight, reducing the eigenvalue problem from an data problem of the entries of a matrix to one of the coefficients of its characteristic polynomial is data compression, and therefore it was doomed to fail. But that remained unrealized by the mathematics community for more than 100 years.
James Sylvester finally gave the tableau concept of matrices its name “matrix” in 1848 or 1850. And after roughly two decades 1829 → 1839 → 1848/50, the first century of matrix theory or theoretical linear algebra had begun.
2.6. Back to matrix computations
Cauchy’s idea led mathematicians to try and compute the characteristic polynomials of matrices and find their roots in order to understand the eigen-behavior of matrices. We still teach many concepts and lessons today that are based on the “characteristic polynomial” of a matrix . Why, we should ask ourselves. Because unfortunately studying “characteristic polynomials” in place of matrices has turned out be a costly dead end for computational and applied mathematics: in the century and a half that followed Cauchy’s work, more than 4000 papers on computing the roots of polynomials were published, together with 200 to 300 books on the subject, bringing us many algorithms, all of which failed more often than not. Many illustrious careers and schools of mathematics were founded based on this unfortunate and ever elusive goal.
During the same period, two-dimensional (2D) hand-cranked computing machines were invented and built to effect long number multiplications and divisions. First by Charles Babbage, then as commercial geared adding machines that were still being used in office work well into the 1960s. These worked as two-dimensional abaci of sorts. But eventually digital (at first punch card fed) computers became the tools of our computational trade in science, in engineering, in business, in GPS, in Google, in social media, in large data, in automation, etc.
But how could we or would we find matrix eigenvalues accurately? A turnaround, a new method, a new computational epoch was needed. From where, by whom, and how?
2.7. Iterative matrix algorithms: fifth epoch
To move us forward, it appears that matrix methods themselves might have to be developed that would solve the matrix-intrinsic eigenvalue problem by themselves. But before that was possible, there were further unfortunate “detours.”
Carl Friedrich Gauß—in his doctoral thesis in 1799 —had disproved all earlier attempts to establish the fundamental theorem of algebra, i.e., that all polynomials over the real numbers can be factored into as many real or complex conjugate factors as their degree says. His thesis then included the first complete and correct proof of the fundamental theorem of algebra.
In 1824 Niels Abel  showed that the roots of some fifth-degree polynomials cannot be found by using radical expressions of their coefficients; Gauß never opened or read the submitted paper and thus in fact rejected it knowingly on the grounds that God would not have complicated the World thus … for us. Abel published his result privately, a broken man. Évariste Galois [8, 9] extended Abel’s result in 1830 by giving group theoretic conditions for polynomials to be solvable by radicals; the extended paper (introducing Galois theory) was originally rejected and appeared only posthumously in 1846 .
Cautioned by these “rejected” inconvenient results, the polynomial approach to matrix eigenvalue computations could have been shunned by clearer heads early on, but the “dead end” determinants and characteristic polynomial roots road was taken instead for more than a century. Note that Cauchy’s matrix result and most other fundamental matrix results from the nineteenth century were formulated in terms of determinants and only in the mid-twentieth century did the term “matrix” appear in matrix theoretical article and book titles.
A matrix-based approach to the eigenvalue problem nowadays starts from the simple fact that for any by matrix and any vector , the sequence of vector iterates contains vectors in -space which makes these vectors linearly dependent. Their linear dependency then leads to an th-degree polynomial that sends to zero. The vanishing polynomial for any turns out to always be a factor of the characteristic polynomial of and it can be found by Gaussian elimination rather than using determinants.
The same idea shows that vector iteration converges for every starting vector and any given square matrix and this has led engineers in the early twentieth century to construct iterative matrix algorithms that could solve linear equations and the matrix eigenvalue problem. Iterative matrix algorithms actually do go back further to the Jacobi method (1839, 1845) [11, 12], the so-called Gauß-Seidel method, invented by Seidel alone (1874) , and various SOR methods that are designed to solve linear systems iteratively. The latter generally use matrix splittings of rather than vector iteration. For further thoughts on early iterative matrix methods, refer to Michele Benzi .
Alexei Krylov  introduced and studied the vector iteration subspaces span in their own right. Following his ideas, large sparse matrix systems are nowadays treated iteratively in so-called Krylov-based methods, both to solve linear equations and to find matrix eigenvalues. Standard widely used Krylov-type iterative matrix algorithms carry the names of steepest descent and conjugate gradient by Hestenes and Stiefel , Arnoldi , Lanczos . Others are called GMRES, BICGSTABLE, QMR, ADI, etc. Most Krylov-type methods are matrix and problem specific, and they are now mostly used for huge sparse and structured matrices where direct or semi-direct methods cannot be employed due to their high computational and storage costs. Krylov methods generally rely on preconditioner for a linear system that shifts the spectrum of for faster convergence, and they thrive on incomplete matrix splittings, etc. Typically, they give only partial results. Who would need or want to know all million eigenvalues of a million by million matrix model. Krylov methods can be tuned to give information where it is needed for the underlying system.
2.8. Francis algorithm and matrix eigenvalues: sixth epoch
The Second World War (WW2) and post-Second World War periods were filled with innovations.
The atomic era had begun, as well as rocket science; commercial air flight became popular; and digital computers were being developed, first as valve machines and later transistorized. Supersonic speeds were realized, computer science was developed, etc. But there were many crashes and disasters with the new technologies: commercial aircraft (Super-Constellation, Convair, etc.) and military ones (Starfighter, etc.) would crash weekly around the globe; and newly built suspension bridges would collapse in strong winds.
The crux of the matter was that while matrix models of the underlying mechanical systems could readily be made using the laws of physics and mechanics, no one could reliably compute their eigenvalues. Engineers could not test their designs for eigenmodes in the right half plane! And Krylov methods were unfortunately not sufficient for testing for eigenvalues in a half plane.
If a matrix model of a mechanical or electrical or other structure, circuit, et cetera has right half-plane eigenvalues , then—upon proper excitation—there would be an eigen-component of the ever-increasing form as that resonates and self-amplifies inside the structure itself. This then leads to ever-increasing destructive vibrations and ultimate failure. The aircraft “flutter problem” was discovered during the Second World War. In England during WW2, Gershgorin circles that contain all of a system’s eigenvalues were drawn out in the complex plane by rather primitive valve computers and checked to ascertain system stability.
The general matrix eigenvalue problem was finally solved independently and similarly by John Francis in London and by Vera Kublanovskaya in Russia nearly simultaneously around 1960. Francis’ (or the QR) algorithm [19, 20] is based on Alston Householder’s idea to try and solve matrix problems by matrix factorizations. Francis’ method is an orthogonal subspace projection method and it works differently than the Krylov-based methods which solve a given matrix eigenvalue problem by projecting onto a Krylov subspace that is derived from and suitable for .
A “divide and conquer” matrix factorization strategy was first employed by Heinz Rutishauser (1955, 1958) [21, 22] in his LR matrix eigenvalue algorithm: if one can factor into the product of a lower and an upper triangular matrix and as and if is invertible, then for the reverse order product we have since . If again allows an LR factorization with nonsingular, then by reverse order multiplication we obtain
and so for the sequence of likewise constructed matrices for if the respective LR factorizations are possible at each stage . In this case the iterates clearly remain similar to the original matrix , and thus the iterates all have the same eigenvalues as . Note, however, that if, for example, the (1,1) entry is zero in , then there exists no un-pivoted LR factorization for A, and the method breaks down since pivoting is a one-sided matrix process and not a similarity. Therefore, Rutishauser’s method is only applicable to a limited set of matrices for which every LR iterate allows an un-pivoted LR factorization. Rutishauser had noted that if LR factorizations are possible for all iterates , and the becomes nearly upper triangular for large with ’s eigenvalues on the diagonal. (Very loosely said.)
John Francis was very interested in the flutter problem at the time when, by chance, someone dropped Rutishauser’s 1958 LR paper  on his desk at the CRDC in London. (In my interview with John Francis in 2009 , he did not know who that might have been.) Francis was aware through contacts with Jim Wilkinson of the backward stability of algorithms that involve orthogonal matrices Q. So rather than using Gaussian elimination matrices , Francis experimented with orthogonal factorizations. At roughly the same time, Vera Kublanovskaya worked on an LQ factorization of as and subsequent reverse order multiplications  in Leningrad, Russia. Her LQ algorithm would also compute the eigenvalues of matrices reliably.
Rutishauser had observed convergence speedup for his LR method when replacing by , i.e., shifting. Hence Francis experimented with shifts for QR and then established the “implicit Q theorem”  in order to circumvent computing eigenvalues of real matrices over the complex numbers. Implicit shifts also avoid rounding errors that would be introduced by explicit shifts. Francis’ second paper (1962)  also contains a fully computed flutter matrix problem of size 24 by 24. The eigenvalues of such “large” problems had never before been computed successfully.
Francis’ implicit Q theorem then allowed Gene Golub and Velvel Kahan  to compute singular values of large matrices for the first time, and this application later spawned the original Google search engine and brought us—in a way—into the Internet age.
In 2002 the multishift QR algorithm was developed by Karen Braman et al. [26, 27]. It relies on subspace iteration, extends Francis’ QR, and combines it with Krylov like methods. This extension allows us today to compute the complete eigenvalue and singular-value structure of dense matrices of sizes up to 10,000 by 10,000 economically on laptops.
What is being missed today computationally? What epoch(s) might come next? Why and how?
2.9. Two new epochs ahead: the seventh and eighth epochs (yet to come)
Two new epoch generating impulses have become visible on the matrix computational horizon of today:
One expands our computational abilities from static problem-solving algorithms to real-time methods for time-varying problems.
The other involves computer hardware advances.
2.9.1. Time-varying problems and real-time solvers: seventh epoch
Our current best numerical codes can solve static problem very well; that is what they are designed for.
As we begin to rely more and more on time-dependent sensoring and on robotic manufacture, we need to learn how to solve our erstwhile static equations but now in real time and with time-varying coefficients and preferably accurately as well. It seems quite alluring to try and solve a time-varying problem by using the static time-dependent inputs at each instance statically. But such a naive solution cannot suffice since at the next time step, whose “solution” has just been computed “statically”, the problem parameters have already changed and thus our “static” solution solves a completely different problem, which—unfortunately—has little value. If any at all.
2.9.2. Computer hardware: eighth epoch
Since the earliest electronic computing devices of the 1940s, all our computers have worked as giant and embellished Turing machines with logic gates, switches, and memory that rely only on two numerical states: 0 and 1 or on or off. Hence, all our computer data is stored and manipulated as sequences of 0 and 1.
Lately our computing ability has come up against the limits of storing and working with data and processors that can only deal with zeros and ones. Our processing speeds have not advanced significantly over the last couple of years; we are still stuck with 3–4 GHz processors. To alleviate this bottleneck, chip makers have created multiprocessor chips, and software firms have introduced better and quicker software and operating systems, but the basic processor speed has not budged much.
At this time computer scientist and manufacturers are trying to overcome this 0–1 bottleneck by replacing our 0–1 processors, chips, memories, and transistors by improved transistors and chips that can store and process multistates, such as 0-1-2-3-4 or 0-1-2-3-4-5-6-7-8 or even higher-numbered data representations. This could lead us to another computing sea change bringing us into a new computational epoch via hardware. And, further out on the horizon lies the possibility of having infinitely many quantum states based computers.
3. On neural network methods: seventh epoch (already under way)
The last century brought us valuable tools to solve most static problems that involve matrices.
Our current numerical matrix tools can solve static linear equations and matrix equations such as Sylvester or Lyapunov equations, as well as eigenvalue problems, and generalizations of all of these, both of the dense or structured and of the solvable or unsolvable kinds. Likewise, we can solve static optimization problems of all sizes and for nearly all structured matrices and thereby solve most if not all static applications.
But what can we do with such problems when the entries are time-varying and the problem parameters change over time?
In numerical computations, there has always been a see-saw between models that resulted in derivative-inspired differential equations and in linear algebra based matrix equations. Their respective computational advantages differ from problem to problem. Neural networks (NN) are an amalgam of matrix methods and differential methods and use a mixture of both. NN methods are designed to solve time-varying dynamical systems. Numerical methods for time-varying dynamical systems first came about in the 1950s and subsequently have gained strength in the 1980s and 1990s and beyond (see the introduction in Getz and Marsden , for example). There are essentially three ways to go about solving the dynamical systems via differential equations: homotopy methods, gradient methods, and ZNN neural network methods introduced by Yunong Zhang et al. . To solve a time-varying equation , the ZNN method starts from the error equation and stipulates exponential decay of the error function to zero by trying to solve the differential equation:
for a positive decay constant . Ideally, the error differential equation (1) of a given problem can be discretized using high-order and convergent 1-step ahead difference formulas for the unknown. Their aim is to predict or simulate the solution at time in real time from earlier event instants with with a high degree of accuracy and low storage cost. The ZNN method will be explained below for three standard time-varying problems.
3.1. A neural network approach to solve time-varying linear equations:
Here is a nonsingular time-varying by matrix and is a time-varying vector, respectively. Clearly, the unknown solution of the associated linear equation will be time-dependent as well.
The first paper on Zhang neural networks (ZNN) was written by Yunong Zhang et al. . Today, there are well over 300 papers, mostly in engineering journals that deal with time-varying applications of the ZNN method, either in hardware chip design for specialized computational tasks as part of a plant or machine or for time-varying simulation problems in computer algorithms and codes. Unfortunately, the ZNN method and the ideas behind ZNN are hardly known today among numerical analysis experts. The method itself starts with using Suanjing’s and Al Khwarizmi’s ancient rule for reducing equations which first appeared 1 1/2 millennia ago. Recall that this simple rule was also employed by Cauchy  to transform the static matrix eigenvalue problem from to and finally to det . For the time-varying linear equation problem, Zhang’s neural network method starts with
and then works on the error function
and its time derivative . Note that standard static methods would likely look at the error norm instead of the error function. Neural networks do not; they study the error function instead. And they start with an implicit “ideal wish”: What could or should we wish for ? Time-varying computations would be near ideal if their error functions were decaying exponentially fast as functions of . This is impossible to achieve (or even ask for) with our best static equations and problem solvers of the twenty-first century. For static numerical matrix methods, backward stability is considered most desirable.
In Zhang NN methods stipulating that the error function decreases exponentially fast over time to the zero function means that
Note that for the time-varying linear equation problem, we have
This leads to the following differential equation for the time-varying solution :
And thus, we have transformed the time-varying linear equations problem into an initial value differential equation problem that needs to be solved for . This is where the different dynamical system methods split their ways. In Zhang neural networks, the continuous time differential equation (2) is then discretized for and the ensuing derivatives are approximated by high-order difference quotients, with the one for the unknown being 1-step ahead and proven convergent, while the others such as for and in equation (2) above can be backward difference formulas. This process would then yield a way to generate from earlier known data such as and and for indices . How to proceed in this problem from equation (2) with solving via ZNN methods is still an open problem, especially for large-scale sparse or structured time-varying linear equations since the matrix encumbers the unknown on the left-hand side of equation (2) and there is no known 1-step ahead differentiation formula that can be used here.
The general idea that underlies ZNN methods for time-varying problems is to replace repeated matrix computations by solving linear differential equations and associated initial value problems for discrete instances instead.
3.2. A Zhang neural network approach to find time-varying generalized matrix inverses for time-varying full rank matrices so that
This section is based on joint work with Jian Li et al. .
3.2.1. Continuous problem formulation
For an by real time-varying matrix of full rank with , we form the matrix-valued error function:
where the upper + sign always means “generalized inverse.” Then, we use the Zhang design formula:
with design parameter . Based on [31, Lemma 3], we have
And, by right multiplying equation (7) with , we have
With , we have , and thus
The solution of a generalized matrix inverse problem is not unique when , and we only need to find a solution that satisfies equation (9). Consequently, the continuous model can be represented as
which we rewrite as
With , we have
Thus model (10) satisfies model (9), and its solution solves the time-varying generalized matrix inverse problem.
3.2.2. Zhang neural network discretization
Given a sequence of rectangular matrices at time instances , we want to find the discrete time-varying generalized matrix inverse of on each computational time interval so that
Here is a time-varying full rank equidistant matrix sequence, , and is unknown. needs to be computed in real time for each time interval . Here the matrix operator denotes the generalized inverse of a matrix and is the zero matrix. Besides, denotes the updating index, denotes the task duration, and denotes the constant sampling gap of the time-varying matrix sequence . For , the procedure is similar.
Note that we must obtain each at or before time for real-time calculations, while the actual value of is unknown before . Thus we cannot obtain the solution by calculating . To obtain in real time, we must develop a model based on the available information before such as that in , , and for instead of unknown information such as .
To obtain a discrete time model that solves the original discrete time-varying generalized matrix inverse problem (14), we need to discretize the continuous model (10). First, we use the conventional 1-step forward Euler formula:
with truncation error of order . Based on formula (15), we approximate
and use this equation to discretize the continuous model (10) as follows:
Here . In most real-world applications, information of the first-order time derivatives, i.e., the value of , may not be explicitly known for the discrete time-varying generalized matrix inverse problem (14). If this is so, the value of can be approximated by a backward finite difference formula. To assure the accuracy and simplicity of the discretized model, the truncation error of the backward finite difference formula for should be near equal to that of the 1-step ahead finite difference formula that approximates . Thus, in equation (17) should best be approximated by Euler’s backward finite difference formula:
Note that the truncation error of the discrete model (20) is of order where the symbol denotes a matrix in which each entry is of order . This model uses only present or past information of , , and and solves for . Thus can be calculated during the time interval and if can be computed quickly enough in real time it will be ready when time instant arrives.
Higher-accuracy 1-step ahead formulas exist for discrete models, namely,
Both have truncation errors of order . For simplicity we only consider formula (21) and call it the 4-IFD formula because four instances in time are used to approximate the first-order derivative of . When we employ the 4-IFD formula (21) inside our continuous model (10), we obtain
Next, we use the three-instant backward finite difference formula:
with error order to approximate the value of in equation (23). Then the 4-IFD-type discretized model becomes
Its truncation error is of order . Similar to the Euler based discrete model (20), the 4-IFD-type discrete model uses only present and past information such as , , , , , and to solve for . Thus it also satisfies the requirements for real-time computation.
3.2.3. A five-instant finite difference formula
Any usable finite difference formula for discretizing the continuous model (10) must satisfy several restrictions. It must be one step ahead for , i.e., approximate by using only and possibly earlier data, and it must be -stable and convergent. However, 1-step ahead finite difference formulas do not necessarily generate stable and convergent discrete models (see, e.g., [32, 33]).
Here is a new 1-step ahead finite difference formula with higher accuracy than the Euler and 4-IFD formulas. It will be used to generate a stable and convergent discrete model that finds time-varying generalized matrix inverses more accurately in real time.
Theorem 1 The 5-IFD formula
has truncation error order .
The proof relies on four Taylor expansions that use and through around and clever linear combinations thereof.
The new 1-step ahead discretization formula (26) then leads to the five-instant discrete model:
which has a truncation error of order .
Theorem 2 The five-instant discrete model (27) is 0-stable.
The multistep formula of the five-instant discrete model time-varying generalized matrix inverses has the characteristic polynomial:
3.2.4. Numerical examples
Example 1 Consider the discrete time-varying generalized matrix inverse problem:
Example 2 Here we consider the discrete time-varying matrix inversion problem:
3.3. A Zhang neural network approach for solving nonlinear convex optimization problems under time-varying linear constraints
This section is based on joint work with Jian Li et al. .
Building a continuous time model for the problem:
The Zhang neural network approach can be built on the Lagrange function:
where is the Lagrange multiplier vector and denotes the transpose. Note that there will be no need to solve for the Lagrange functions here. Set
We transform the multiplier problem into an initial value DE problem instead. By stipulating exponential decay for , we obtain the model equation
for the Jacobian matrix
3.3.1. Discretizing the model and choosing suitable high-order finite difference formulas
To discretize the continuous model
we can use the forward Euler difference formula with truncation error order :
or the four-instant forward difference formula (4-IFD):
with truncation error order . The Euler formula yields the discretized model:
while the 4-IFD formula results in
Both discretization formulas are consistent and convergent. This can be proven via the roots of the associated characteristic polynomial. Its roots must lie in the complex unit circle and cannot be repeated on its boundary.
Since the value of may not be known explicitly, we may replace it by
which uses the three-point backward finite difference formula:
of order .
Then the discretized 4-IFD formula becomes more complicated but easier to implement:
To implement this formula, the inverse of the Jacobian matrix can be computed at each time in a fraction of the available real-time interval by using the real-time inverse finding ZNN method from the previous subsection (Section 3.2).
3.3.2. Numerical example and results:
As an example we solve the following convex nonlinear optimization problem with known theoretical solution numerically by using our ZNN method; for further details and applications see :
The 4-IFD formula is a four-instant formula, while the Euler formula needs only two. Both discretization models work in real time, and both typically create the optimal solution in a fraction of a second with differing degrees of accuracy according to their orders.
The example below runs for 10 sec. The time-varying values for , , and are given as functions and evaluated from their function formulations. In real-world applications, these values might be supplied by sensors during each time interval , and the empirical values would be inserted into the difference formulas as they are evaluated by sensors in real time (Figure 3).
4. On quantum and multistate computing: eight epoch (yet to start and come)
Quantum computing and multistate memory and computers with multistate processors will change the way we compute once they become available. They will require new operating systems and new software with new and yet-to-be-discovered algorithms. What will this new era entail? Nobody knows or can reliably predict.
I asked an “expert” on quantum computing 3 years ago as to when he expected to have a quantum computer at his disposal or on his desk. The answer was “Not in my lifetime, not in 20 years.”
Currently, about a dozen or more research centers in Europe and South-East Asia are trying to build quantum computers based on the quantum superposition principle and quantum entanglement of elementary particles. They do so in a multitude of different ways. The envisioned benefit of these efforts would be to be able to compute superfast in parallel and in simulations to solve huge data problems quicker than ever before and to solve problems that are unassailable now with our current best supercomputer networks. All of the proposed quantum science techniques make use of superconducting circuits and particles. The aim is to build quantum computers in one or two decades with around 100 entangled quantum bits. Such a quantum computer would be bulky; it would need much supplementary equipment for cooling and so forth and could easily take up a whole floor of a building, just as the first German and British valve computers did in the 1940s. But it would surpass the computing capacity of all current supercomputers and desk and laptops on Earth combined. Currently, the largest working entangled quantum array contains fewer than 10 quantum bits. Access of a 100 bit quantum computer would probably be via the cloud and there would be no quantum computer laptops. Quantum computers may take another 10, 20, or 30 years to materialize.
How will they come about? Which yet unknown algorithms will they use? Who will invent them? Who code them?
If history can be a guide, John Francis and Vera Kublanovskaya were both working independently on circuit diagrams and logic gate designs for valve computers in England and in Russia at the time when they discovered QR (or LQ) in the late 1950s.
So, we possibly are looking for quantum computer hardware and software designers who know numerical analysis and algorithm development in or about the year 2040. In a similar fashion, Leibnitz and Seki formalized our now ubiquitous matrix concept independently but simultaneously in 1683, in Germany and in Japan.
Maybe it will take two again?
The references given below only go back to the year 1799.