Why would you care about some homogeneous coordinates, whatever they are? Well, if you work with geometry: 3D-graphics, image processing, physical simulation, the answer is obvious. Knowing the mathematics behind your framework of choice lets you write more efficient code.

But even if you don’t work with geometry at all, you still might benefit from learning the concept of homogeneous coordinates. This is the one rare example of mathematical magic when a small complication benefits in enormous simplification. One little obscurity pays off in terms of unification and homogenization.

I think, learning this particular piece of geometry is a valuable experience in its own right. And you know how it works. More experience, higher level, better loot.

In our usual Cartesian coordinate system, a point on a plane is set by a pair of numbers (x_{c}, y_{c}). Feel free to pick any point. Just click anywhere on the plot.

Now in homogeneous coordinates points on a plane are set by a tuple of 3 numbers (x_{h}, y_{h}, w_{h}).

That's a bit unusual, but if a point is taken from the cartesian space, you can transform a homogeneous tuple into cartesian pair as simple as this:

x_{c} = x_{h} / w_{h}

y_{c} = y_{h} / w_{h}

Although, there is no one and only way to transform a point from cartesian to homogeneous coordinates. You can pick any (almost) value for w_{h} and it would work.

Here is a coordinate transformer. It will transform your point into homogeneous coordinates for the (almost) every w_{h} you propose.

But it wouldn't work for all the possible numbers. There is one exception.

Usually Cartesian coordinates are just the first two numbers of homogeneous coordinates divided by the third. So if it is 1, then homogeneous coordinates are basically the same thing as Cartesian. But the smaller w_{h} gets, the further the point in Cartesian coordinates travels from the null.

That’s all rather simple until one moment. What if the fourth coordinate is 0?

Intuition tells, that it should be further from the 0 than every other point. Every other point in Euclidean space that is. Homogeneous coordinates indeed denote points not only in Euclidean or, more general, affine space, but in projective space that includes and expands affine one.

Points from the projective space may lie in the Euclidean space or may be infinitely far from any point of it. If w_{h} is 0 then it's the latter. If not, then whatever w_{h} is, it's the former.

From the pragmatic point of view, this lets us, for instance, compose a 3D-scene in a manner that every object that can be reached would fit in affine space with the coordinates (x, y, z, 1), and all the objects that can never be reached will belong to its projective extension (x, y, z, 0).

So if you do work with 3D graphics, you might notice that it is quite common to write 3D points as a tuple of 4 numbers. The usual question here is: “what does that fourth coordinate stand for?” And the usual answer is: “Just set it to 1 and hope you wouldn’t screw anything!” Well, now you know what the fourth coordinate actually stands for.

It is also common to refer points from projective extension as a general direction and not a specific point in Euclidean space. A ray that starts at null and has no length has no end, only the direction. You might even have heard about it being called a vector as opposed to a point. This is not technically correct as, of course, any point is a vector in a corresponding vector space. But this still might be a useful naming convention if you are not into the real vector algebra.

Living in a projective space gives you more options. You can denote points that are unheard of in the Euclidean one. But that’s not all it is good for. In fact, we are only starting to get into the benefits.

There are two kinds of projection in Euclidean space: central and parallel. Central projection is what makes the perspective, so the closer things seem bigger, and that’s what we use in video games to render a 3D scene into a flat picture on a screen. The parallel projection preserves proportions, so that’s what we usually use in CAD systems to show bolts and nuts on drawings.

In projective space they are the same. You see, in affine space, you can set a center for a central projection very-very far away from the scene you want to render. This will make disproportion very small. But in projective space you can hurl a center infinitely far — further away than any point in affine space at all — and the disproportion will disappear completely.

So bear in mind, if you want to make a game about zombies who happen to be CAD engineers, you don’t have to implement both kinds of projections. Just set the projection central point to (x, y, z, 0), and this will automatically turn it into parallel.

I remember on my first year in college we were studying quadric surfaces and one of the exercises allegedly made up to help us learn their classification was to make an album. It was 17 sheets of paper with different graphics and formulas sewed together only to be briefly examined by the professor and thrown away a day after. You might imagine we were not fond of this activity.

Now in projective space, this exercise would have been much more environmentally friendly. That’s because in homogeneous coordinates all the algebraic surfaces are homogenous too. This means every piece of a polynomial that defines the surface has the same degree. It may contain different variables with different degrees of their own, but they all magically add up to the very same degree for every element in the sum.

And this means only one drawing with one formula to be drawn and thrown away and not seventeen. That should sum up to a couple of dead trees over the years.

Geometric transformations are something that happens to a point. They are basically functions (x', y') = f(x, y). If you want to apply the transformation to some object, most of the time you would have to represent it with points and then apply the transformation to each of them.

Sometimes this gets computationally heavy. For instance, transforming 3 000 x 4 000 pixels image would require 12 000 000 transformations. So looking for the fastest way to apply a transformation does make sense pragmatically.

Some of the most common transformations are: transtlation;

rotation;

and scale.

These can be generalized by the affine transformation that can do a translation, and a rotation, and a scale simultaneously:

As you can see, affine transformation is quite powerful, but it has to preserve parallelism. This limits it in a way. If you want to do something like perspective or projection, you would have to go for projective transformation that looks like this:

The formula for projective transformations in Cartesian coordinates looks like this:

x' = (Ax + By + C) / (ax + by +c) y' = (Dx + Ey + F) / (ax + by +c)

It is a simple geometric transformation just like all we have seen before. It also works on one point at a time. It preserves the degree of curves and surfaces so every straight line will get transformed into the straight line and each plane also into a plane. Since all the 2-nd degree surfaces are the same surface, it will also preserve the degree, but not a class in affine space so that ellipsoid may become parabaloid or hyperboloid.

It also generalizes affine transformation that has simple formula:

x' = (Ax + By + C) y' = (Dx + Ey + F)

As you can see, it's just a special case of projective transformation when a = 0, b = 0, and c = 1.

And the affine transformation in its turn generalizes translation, rotation and scale. Translation is:

x' = x + C (A = 1, B = 0) y' = x + F (D = 0, E = 1)

Rotation:

x' = sin(r) x + cos(r) y (A = sin(r), B = cos(r), C = 0) y' = cos(r) x - sin(r) y (D = cos(r), E = -sin(r), F = 0)

And scale:

x' = Ax (B = 0, C = 0) y' = Ey (D = 0, F = 0)

They are all just special cases of projective transformations.

Now please bear with me, we are entering the matrix territory.

What's going to happen if we multiply a square matrix on a point in homogeneous coordinates?

[ | A D a B E b C F c |
][ | x y w |
] | = | [ | Ax + By + Cw Dx + Ey + Fw ax + by + cw |
] |

Let's pretend the point we took came from Cartesian coordinates so w_{h} = 1. Now we see that:

x' = Ax + By + C y' = Dx + Ey + F w' = ax + by + c

And let's say we want to get back to Cartesian coordinates with this as well. So let's make our w' = 1. We can do this by dividing everything to the current w'.

x' = (Ax + By + C) / (ax + by + c) y' = (Dx + Ey + F) / (ax + by + c) w' = 1

Doesn't it look familiar? Well, of course, it does! It's the projective transformation. Or the affine. Or it could be the translation, or rotation, or scale. They all can be written as a single matrix multiplication.

But that's not yet all. Matrices are composable. You can compose your own translation + rotation + another translation + scale + projection — and it all will still fit into a single matrix!

Matrix: | [ |
1 0 0 |
0 1 0 |
0 0 1 |
] |

As you can see, you can save quite a lot of time processing millions of points with one single matrix instead of applying all the transformations separately. And you can also save lines of code by unifying all the transformations. But this is not the whole point. Usually, we lose performance, not because of some small computational inefficiency, but because of needless layering. And the layering occurs because people don’t understand and don’t trust the beauty of the plain mathematics.

I hope this page reveals a bit of it. I hope it makes it a little more trustworthy.