Programmer's guide to linear equations

Being a practicing programmer, you are very unlikely to implement yet another linear system solving algorithm all by yourself. This guide doesn't cover the implementation details of such, therefore. Although, you will probably face the problems that may be solved in a form of linear systems, and often more elegantly than by any other means. This tutorial is all about the concepts that should help you recognize these problems and find the best solution for them.

But even if you are not interested in linear algebra at all, you might still find this tutorial interesting. Things as convergence, computational error, algorithmic complexity, — are all easy to show on a task as intuitive and graphic as solving a simple linear system.

A linear system is a system of linear equations. A linear equation is a sum of weighted variables that equals a constant. In two-dimensional space, it is also an equation for an actual line. In 3 dimensions it's a plane, in 4 — a hyperplane, and so on.

The graphic form is a useful mental model because it makes the system visual. Its solution is merely an intersection of lines.

This may be a problem for numeric methods aimed to find a single solution. They may report an error or, even worse, supply you with the first-best solution they find. In this case, you should check your equations for linear dependency.

You see when you multiply every coefficient of an equation, and the constant, to the same number, all the facts about the points it describes remain intact. It's still the same equation; it's still the same line.

Sometimes, systems have more equations than variables. These systems are called overspecified.

In the general case, these systems don't have a solution, but in some rare cases, they do.

Sometimes, systems have more variables than equations. These systems are called underspecified.

Of course, in a 2-dimensional case, they only have one equation, and an infinite number of solutions. Every point that lies on the equation's line is a solution. But in more-dimensional cases, underspecified systems might not have any solutions at all.

Iterative algorithms are very simple conceptually. They all can be boiled down to these three steps

Let's make a linear solver out of that. The part of the algorithm that will bring us closer to the solution will be a simple projection. If you project an arbitrary point on a line, the projection will be closer to ane point of that line, including of course the desired solution. Therefore, by projecting a point from one line to another we will get closer and closer to the solution.

As for exit criteria, we can simply measure how far we have to travel to make an iteration — a projective “leap” from one line to another. Presumably, as we're getting closer, the leap distance should shorten, so at some point, we might consider it small enough to stop the operation.

And we can leave the question “where to start” unanswered. Of course, it would be fantastic to start right at the solution point, but generally, our algorithm should work for any starting point we choose.

When an algorithm just like this gets closer to the solution with every iteration, it is said it converges. When it gets further from the point, it diverges. Convergence is the second most important property of the iterative algorithm. It is basically its speed.

And the first most important property of an algorithm is its stability. Usually, the more mathematical operations you do, the more error you gather. But stable algorithms have the property of reducing the error automatically. It gathers some error on every iteration, sure, but it doesn't accumulate it into anything ugly.

This particular algorithm is stable and geometrically incapable of divergence. Moreover, its convergence is directly linked to the angle between the lines.

Unfortunately, stability and convergence alone don't make it any good in practice. It is vulnerable to the near-parallel corner case and the choice of the exit condition.

Unlike iterative, direct algorithms don't have convergence. They always get the job done in some determined number of operations. In some conditions, however, iterative algorithms may behave just like direct.

Being able to always solve a system in just a few leaps is neat. It's like the ultimate convergence — a metaphorical speed of light in computations. Fortunately, we can turn our iterative solver into direct by adding one small detail.

Previously we did every leap by doing an orthogonal projection to a respective line. But what if we do the projection parallel to the respective line?

Since the algorithm is now direct, it doesn't need exit criteria. Also, since it has “an ultimate convergence” starting point selection doesn't matter. That's brilliant! Why do we need iterative algorithms at all?

There is, of course, a catch. When the iterative algorithm's speed depends on convergence, the direct algorithm's speed depends on its complexity. This particular algorithm is rather complex.

Algorithm complexity itself is not a quantity. It's not just a constant number of operations to get things done. Just like with convergence, this number depends on things. In our case, it depends on the number of equations. This makes it a function, and we usually use big-O notation to characterize it. Big-O sets the form of a limiting function — a function our complexity has no right to overtake for any number of equations.

If you have something simple, like a single for loop, its complexity is linear: O(n). If you have a loop inside a loop, then it's O(n²). This kind of complexity is called polynomial, and it's quite common in the wild. For instance, our projective solver has polynomial complexity.

def cross_of(A):
  ''' n-dimensional cross product on (n-1) vectors in list A '''
  D = len(A[0])
  N = len(A)

  v_res = [0.] * D
  for i in xrange(0, D):
    for jk in xrange(0, D ** N):
      v_ijk = [i] + [(jk/(D ** (N-j-1))) % D for j in xrange(0, N)]
      t_res = v__E(v_ijk)
      if t_res != 0:
        for k in xrange(0, N):
          t_res *= A[k][v_ijk[k + 1]]
        v_res[i] += t_res
  return v_res


def solution_for(A, B):
  p = [0. for each in B]
  for i in xrange(len(A)):
    plane_n = A[i]
    plane_d = -B[i]
    other_planes_ns = A[:i] + A[i+1:]
    projection_vector = cross_of(other_planes_ns)
    p = project_by_vector(p, projection_vector, plane_n, plane_d)
  return p

You might imagine that with polynomial complexity using a direct algorithm on massive systems is not really plausible. Solving a 1000 equation system with our algorithm requires a trillion operations. In this regard, iterative algorithms are the worthy alternative. Not that they are generally faster, but it's all about balancing between convergence and complexity.

Our algorithm may be graphic, but it is too heavy to be considered practically useful. There are better options out there. For instance, Gaussian elimination has a complexity of O(n³), and the theoretical best for LU decomposition is O(n^2.376).

They both share the same idea: we can change the system however we want until it holds true. We already saw that we can equally multiply both ends of an equation and still retain the same line in the graphic form. We can also add pick any equation and sum it with any other equation from the same system. This will change the line but retain the solution.

By doing these operations alone we can turn almost any system into the trivial one.

P. S.

That's all fine, but how do you choose a method to solve your particular case? Here are a few hints.

{	a11x1 + a12x2 = b1
	a21x1 + a22x2 = b2

{	a11x1 + a12x2 = b1
	a21x1 + a22x2 = b2