Why would you care about homogeneous coordinates, whatever they are? Well, if you work with geometry: 3D graphics , image processing, physical simulation, — the answer is obvious. Knowing the mathematics behind your framework helps you write more efficient code.
But even if you don’t work with geometry at all, you still might enjoy learning about homogeneous coordinates, projective space, and their link to linear algebra. This topic is a great example of mathematical investment in programming: you pay with a small complication, and you gain an enormous simplification in return.
I think, learning this particular piece of mathematics is a valuable experience in its own right. And you know how the game works: more experience, higher level, better loot.
That’s the complication. That’s the investment. The third number feels unusual and seems excessive. Especially, since every Cartesian point can be obtained from the homogeneous tuple just like this:
To translate a point from Cartesian to homogeneous coordinates, you can simply say:
Here is a coordinate translator. It will translate the point you chose on a plot from above into homogeneous coordinates for (almost) every wh you propose.
But it wouldn't work for all the possible numbers. There is one and only one exception.
Usually, Cartesian coordinates are just the first two homogeneous coordinates divided by the third. So when the third one is 1, homogeneous coordinates are the same as Cartesian.
That third coordinate still seems rather unnecessary until one moment. What if the third coordinate is 0 after all?
Intuition tells us that the point with its wh = 0 should be further from the beginning of the coordinates than every other point with wh ≠ 0 . All the points in Euclidean space have wh ≠ 0 , so this point should be somewhere other than in Euclidean space.
And that's when it gets fascinating. Homogeneous coordinates denote points not only in Euclidean (or, more generally, affine space ) but in projective space that includes and expands the affine one. There is more geometry that fits in our cozy Cartesian system. There is the Euclidean space, and there is also an infinite number of points that are infinitely far from it.
If you have trouble imagining infinitely far points, you can also imagine a triplet (x, y, 0) from this projective extension as a direction and not a specific point in space. Not a point per se, but a ray that starts at null and has no length, no end, only the direction.
This representation is often used in 3D graphics . With homogeneous coordinates, we can compose a 3D scene so that every object that can be possibly reached, like a house, a tree, or a cat, remains in the affine space with coordinates like (x, y, z, 1) . All the objects that can not be reached by design, like the Sun, the Moon, or financial stability by the age of thirty, go to the projective extension with coordinates like (x, y, z, 0) . Both types of objects share the same space.
Living in a projective space gives you the benefit of being unreachable if you desire to. But that’s not all it is good for. In fact, we are only starting to enjoy our returns of investment.
ROI 1. Central and parallel projections are the same
There are two kinds of projections in Euclidean space: central and parallel. The central projection creates a perspective. It makes things closer to the viewer seem larger. We use central projection in video games to render a graphic 3D scene into a flat picture on a screen.
The parallel projection preserves proportions. Things of the same size look the same size however far from or close to the viewer they are. That’s what we usually use in CAD systems to show bolts and nuts on technical drawings so an engineer would see that an M8 bolt is indeed an M8 bolt and not an M32 far away.
In projective space, these two projections are the same.
In affine space, you can set the central point for a central projection very far away from the scene you want to render. This will make the discrepancy between the two kinds of projections minuscule. But in projective space, you can yeet the projection central point infinitely far — further away than any point in the affine space at all, and the discrepancy will disappear completely.
So bear in mind, if you want to make a game about zombies who happen to be CAD engineers, you don’t have to implement both kinds of projections. A central projection should be enough. Just set the central point to (x, y, z, 0) , and this will automatically turn a central projection into a parallel projection with no additional programming.
ROI 2. All the quadric surfaces are the same
I remember my first year in college. We were studying quadric surfaces and one of the exercises was to make an album with all of them. 17 sheets of paper with different graphics and formulas all drawn by hand. The main purpose of this album was to be briefly examined by the professor and thrown away a day later. What a waste!
Now in projective space, this exercise would have been much more environmentally friendly. In homogeneous coordinates, all the algebraic surfaces are homogeneous too. This means that every piece of a polynomial that defines the surface has the same degree. It may contain different variables with different degrees of their own, but they all magically add up to the very same degree for every element in the sum.
And this means only one drawing with one formula to be drawn and thrown away instead of 17. That should sum up to a couple of dead trees over the years.
ROI 3. All the projective transformations are matrices
Geometric transformations are something that happens to points. They are functions (x', y') = f(x, y) . If you want to apply a transformation to some object, most of the time you would have to represent it with points and then apply a transformation to each and every one of them.
This may get computationally heavy. For instance, transforming a picture 3 000 × 4 000 pixels requires 12,000,000 transformations. And transforming a 1024 × 1024 × 1024 3D image requires exactly 1,073,741,824 . That's a billion with a “B”. If we want to see holographic television anytime soon, we should learn to do these transformations really-really fast.
Some of the most common transformations are:
1. translation ;
2. rotation ;
3. scaling .
They are generalized by the affine transformation that can do translations, and rotations, and scales simultaneously:
The affine transformation is quite powerful but it has its noticeable constraint. It preserves parallelism, which is in a way limiting. If you want to go beyond this limitation, you should do a projective transformation that, for example, may look like this:
Hidden fact #2
Ah, yes! This transformation is also limited. For instance, you can't transform a square into a concave quadrilateral. Kudos for finding this out by yourself!
Hide the hidden fact .
Formulas in Cartesian coordinates
The formula for projective transformations in Cartesian coordinates is:
This is a geometric transformation just like rotation, translation, or scaling. This transformation, however, doesn’t preserve the shape of the input entity, like translation does, it only preserves the degree of curves and surfaces so every straight line gets transformed into a straight line, and in 3D, each plane into a plane. Since all the second-degree surfaces are the same surface, it also preserves the degree but not the classification from the affine space. An ellipsoid may become a paraboloid or a hyperboloid.
Projective transformation also generalizes affine transformation which has a simpler formula:
x' = A x + B y + C
y' = D x + E y + F
So affine transformation is a special case of projective transformation when a = 0 , b = 0 , and c = 1 .
Affine transformation, in turn, generalizes translation, rotation, and scaling. A translation is:
x' = x + C (A = 1, B = 0)
y' = y + F (D = 0, E = 1)
A rotation is:
x' = sin(r) x + cos(r) y (A = sin(r), B = cos(r), C = 0)
y' = cos(r) x - sin(r) y (D = cos(r), E = -sin(r), F = 0)
And a scale is:
x' = A x (B = 0, C = 0)
y' = E y (D = 0, F = 0)
Of course, since translation, rotation, and scaling are special cases of affine transformation, they are all special cases of projective transformation too.
The matrix multiplication in homogeneous coordinates
Let's multiply a square matrix by a point in homogeneous coordinates.
If our point comes from the Cartesian coordinates then wh = 1 . Now we see that:
x' = A x + B y + C
y' = D x + E y + F
w' = a x + b y + c
To get back to the Cartesian coordinates, let's make our w' = 1 . We can do this by dividing everything by w' .
x' = (A x + B y + C) / (a x + b y + c)
y' = (D x + E y + F) / (a x + b y + c)
w' = 1
Doesn't it look familiar? Well, of course, it does! It's a projective transformation. Or, with the specific coefficients, it could even be an affine one. Or it could be a translation, or a rotation, or a scale. Every one of these transformations can be conducted by mere matrix multiplication.
It gets better. Matrices are composable. You can compose your own transformation like this: translation, and then rotation, and another translation, and then scale, and projection, — and it will all fit into a single matrix multiplication!
Hidden fact #4
Forgot to tell you! These transformations are not necessarily revertable. For instance, when you scale your square into a segment by making one of the scales 0, you can't “unscale” it back anymore. Please use the revert button instead ↓
Also, kudos for finding this out by yourself!
Hide the hidden fact .
Revert to E
This is particularly important because whatever you do: animation, image processing, physics simulation, you always want to do as little computation as possible. Composability allows you to squeeze a series of transformations into a single matrix multiplication which is in turn very super-scalar friendly. With matrices, not only do you do fewer calculations, you also benefit from vectorization both on CPU and GPU so you do them faster. Ultra-fast transformations everywhere!
Conclusion
Pragmatically, you can save a lot of processor time transforming points with one single matrix multiplication instead of applying all the transformations separately. And you can also write fewer lines of code by exploiting the common nature of all the transformations. But this is not the whole point yet.
Usually, we lose performance not because of some small computational inefficiencies but because of all the code that shouldn't be there in the first place. One time I made a piece of code run 200 times faster by simply replacing a transformation provided by the framework with a simple matrix multiplication on the spot. The original transformation was designed like this:
Drawing.Drawing2D.Matrix.TransformPoints:
┠Drawing.SafeNativeMethods.Gdip.ConvertPointToMemory,
┗Drawing.SafeNativeMethods.Gdip.ConvertGPPOINTFArrayF:
┗Drawing.UnsafeNativeMethods.PtrToStructure:
┠Drawing.Internal.GPPOINTF..ctor,
┗RuntimeType.CreateInstanceSlow:
┗Runtime.InteropServices.Marshal.PtrToStructure.
Conversions between identical structures, copying data with no good reason, constructors that do no essential work, — all of this costs time but gives nothing of value in return. The good news is, that all of this is also avoidable. It only occurs when programmers don’t understand and don’t trust the beauty of plain mathematics.
And I hope this page reveals some of that beauty. I hope it makes plain mathematics a little more trustworthy. Thank you!