This is Words and Buttons Online — a collection of interactive #tutorials, #demos, and #quizzes about #mathematics, #algorithms and #programming.

Programmer's guide to linear equations

Being a practicing programmer, you are very unlikely to implement yet another linear system solving algorithm all by yourself. This guide doesn't cover the implementation details of such, therefore. Although, you will probably face the problems that may be solved in a form of linear systems, and often more elegantly than by any other means. This tutorial is all about the concepts that should help you recognize these problems and find the best solution for them.

But even if you are not interested in linear algebra at all, you might still find this tutorial interesting. Things as convergence, computational error, algorithmic complexity, — are all easy to show on a task as intuitive and graphic as solving a simple linear system.

A linear system is a system of linear equations. A linear equation is a sum of weighted variables that equals a constant. In two-dimensional space, it is also an equation for an actual line. In 3 dimensions it's a plane, in 4 — a hyperplane, and so on.

The system may be represented as an array of equations.

{ a11x1 + a12x2 = b1
a21x1 + a22x2 = b2

The very same array of equations constitutes a matrix equation.

[ 1
0
0
1
] [ x1
x2
] = [ 0
1
]

And in the simplest case, it may also have a graphic form.

You can drag the lines and see how the equations from before change.

The graphic form is a useful mental model because it makes the system visual. Its solution is merely an intersection of lines.

Quest 1. Using the plot above, please make a system that has a solution in x1 = 0, x2 = 0. It doesn't have to be exact, it would be cumbersome to achieve that on a small plot. Just make it somewhere near (0, 0). Then press the “Check!” button.

But even a well-specified system may lack a solution (an intersection).

Quest 2. Using the same plot above, please make a system with no solution.

And sometimes it may have not a single, but an infinite number of solutions.

Quest 3. Using the same plot above, please make a system with infinite solutions.

This may be a problem for numeric methods aimed to find a single solution. They may report an error or, even worse, supply you with the first-best solution they find. In this case, you should check your equations for linear dependency.

You see when you multiply every coefficient of an equation, and the constant, to the same number, all the facts about the points it describes remain intact. It's still the same equation; it's still the same line.

x1 + 2x2 = 3
2x1 + 4x2 = 6

Sometimes, systems have more equations than variables. These systems are called overspecified.

In the general case, these systems don't have a solution, but in some rare cases, they do.

Quest 4. Using the 3-lines plot, please make a system with a single solution.

Sometimes, systems have more variables than equations. These systems are called underspecified.

Of course, in a 2-dimensional case, they only have one equation, and an infinite number of solutions. Every point that lies on the equation's line is a solution. But in more-dimensional cases, underspecified systems might not have any solutions at all.

Quest 5. Using the matrix form below, please make an underspecified system of 2 equations that don't have a solution.
[


] [ x1
x2
x3
] = [
]

There are two major classes of linear system solvers: iterative and direct.

Iterative algorithms are very simple conceptually. They all can be boiled down to these three steps

  1. Start somewhere.
  2. See if you got what you were looking for.
    If so — great, you are done!
  3. If not, do something that brings you closer to the goal.
    Go to step 2.

Let's make a linear solver out of that. The part of the algorithm that will bring us closer to the solution will be a simple projection. If you project an arbitrary point on a line, the projection will be closer to ane point of that line, including of course the desired solution. Therefore, by projecting a point from one line to another we will get closer and closer to the solution.

If, of course, there is one.

As for exit criteria, we can simply measure how far we have to travel to make an iteration — a projective “leap” from one line to another. Presumably, as we're getting closer, the leap distance should shorten, so at some point, we might consider it small enough to stop the operation.

And we can leave the question “where to start” unanswered. Of course, it would be fantastic to start right at the solution point, but generally, our algorithm should work for any starting point we choose.

Here is the interactive illustration of it.

Change:      

When an algorithm just like this gets closer to the solution with every iteration, it is said it converges. When it gets further from the point, it diverges. Convergence is the second most important property of the iterative algorithm. It is basically its speed.

And the first most important property of an algorithm is its stability. Usually, the more mathematical operations you do, the more error you gather. But stable algorithms have the property of reducing the error automatically. It gathers some error on every iteration, sure, but it doesn't accumulate it into anything ugly.

This particular algorithm is stable and geometrically incapable of divergence. Moreover, its convergence is directly linked to the angle between the lines.

Quest 6. Using the plot from above, please make a system and set up the conditions to solve it in exactly 58 iterations.

Unfortunately, stability and convergence alone don't make it any good in practice. It is vulnerable to the near-parallel corner case and the choice of the exit condition.

Quest 7. Using the plot from above, please make a system and set up the conditions so the solution will have an error of over 6.0.

Unlike iterative, direct algorithms don't have convergence. They always get the job done in some determined number of operations. In some conditions, however, iterative algorithms may behave just like direct.

Quest 8. Using the plot from above, please make a system that is always solvable in no more than 3 iterations.

Being able to always solve a system in just a few leaps is neat. It's like the ultimate convergence — a metaphorical speed of light in computations. Fortunately, we can turn our iterative solver into direct by adding one small detail.

Previously we did every leap by doing an orthogonal projection to a respective line. But what if we do the projection parallel to the respective line?

Since the algorithm is now direct, it doesn't need exit criteria. Also, since it has “an ultimate convergence” starting point selection doesn't matter. That's brilliant! Why do we need iterative algorithms at all?

There is, of course, a catch. When the iterative algorithm's speed depends on convergence, the direct algorithm's speed depends on its complexity. This particular algorithm is rather complex.

Algorithm complexity itself is not a quantity. It's not just a constant number of operations to get things done. Just like with convergence, this number depends on things. In our case, it depends on the number of equations. This makes it a function, and we usually use big-O notation to characterize it. Big-O sets the form of a limiting function — a function our complexity has no right to overtake for any number of equations.

If you have something simple, like a single for loop, its complexity is linear: O(n). If you have a loop inside a loop, then it's O(n2). This kind of complexity is called polynomial, and it's quite common in the wild. For instance, our projective solver has polynomial complexity.

def cross_of(A):
  ''' n-dimensional cross product on (n-1) vectors in list A '''
  D = len(A[0])
  N = len(A)

  v_res = [0.] * D
  for i in xrange(0, D):
    for jk in xrange(0, D ** N):
      v_ijk = [i] + [(jk/(D ** (N-j-1))) % D for j in xrange(0, N)]
      t_res = v__E(v_ijk)
      if t_res != 0:
        for k in xrange(0, N):
          t_res *= A[k][v_ijk[k + 1]]
        v_res[i] += t_res
  return v_res


def solution_for(A, B):
  p = [0. for each in B]
  for i in xrange(len(A)):
    plane_n = A[i]
    plane_d = -B[i]
    other_planes_ns = A[:i] + A[i+1:]
    projection_vector = cross_of(other_planes_ns)
    p = project_by_vector(p, projection_vector, plane_n, plane_d)
  return p
	
Quest 9. Please estimate the complexity of the algorithm above in big-O.




You might imagine that with polynomial complexity using a direct algorithm on massive systems is not really plausible. Solving a 1000 equation system with our algorithm requires a trillion operations. In this regard, iterative algorithms are the worthy alternative. Not that they are generally faster, but it's all about balancing between convergence and complexity.

Our algorithm may be graphic, but it is too heavy to be considered practically useful. There are better options out there. For instance, Gaussian elimination has a complexity of O(n3), and the theoretical best for LU decomposition is O(n2.376).

They both share the same idea: we can change the system however we want until it holds true. We already saw that we can equally multiply both ends of an equation and still retain the same line in the graphic form. We can also add pick any equation and sum it with any other equation from the same system. This will change the line but retain the solution.

By doing these operations alone we can turn almost any system into the trivial one.

{ a11x1 + a12x2 = b1
a21x1 + a22x2 = b2

P. S.

That's all fine, but how do you choose a method to solve your particular case? Here are a few hints.

  1. If your system is sparse, meaning most of the coefficients are zeroes, you can exploit this with any decomposition (factorization) or elimination based method that works on sparse structures.
  2. If your matrix is diagonally dominant, meaning that the largest elements in every equation tend to land on the diagonal, then iterative methods such as Jacobi or Gauss-Seidel should work best for you.
  3. And if your system is very small, like 2 to 4 equations, you're better off without any solver at all. Just solve it symbolically with a SymPy and copy-paste the symbolic solution back to your code. Yes, this will outperform any possible external solver. And no, it's not against the law to do so.