Polynomial approximation and interpolation

The word approximation comes from Latin and means approximately “bringing something close enough”. We use this word when we want to substitute something complex with something simple. It is a broad term, you can technically approximate apples with oranges. But since it's usually used by boring people, it usually concerns numbers, functions, points, and vector fields.

The polynomial approximation is about composing a polynomial that could substitute another function or a set of points good enough. With a set of points, there is a nice matrix equation that does all the work for us. Here, try it out.

The matrix comes from the orthogonality principle. In brief, we want the squared difference between the polynomial and the points to be as little as possible, so this is a minimization problem. But according to the orthogonality principle, the minimum for the sum of squared residuals is the same point where all the residuals, put in a single vector, are orthogonal to the vector of polynomial values.

There is a single solution, and there is no need to run numeric methods to find it. That makes the polynomial approximation so appealing.

You might have heard term linear regression. It is a special case of polynomial approximation: an approximation of some data with a line, plane or hyperplane. So now you know how that works too.

There are three more facts I want to share, but I think it would be more fun if you reveal them yourself with a living plot from above. Every time you find something interesting, a new fact will appear below. If you don't feel like playing games, just press the “show everything” button.

Fact 1. When the number of points equals the degree of polynomial plus one, the approximation fits all the points perfectly. Another words, approximation becomes interpolation.

Fact 2. When polynomial degree is 0, the function is technically not a polynomial anymore, but a constant. Still, approximation method works even for a constant.

Fact 3. When you put several points in one place, that place attracts polynomial more than other points. This makes sense, since the polynomial minimizes the squared residuals sum. If some point occurs more than once, then it influences the sum more. That's how you do weighted approximation. You can add weights to your points by multiplying them.

Interpolation

The word interpolation also comes from Latin and it roughly translates as “smoothing things in between”. When you have a set of points, you use interpolation to find values between these points.

With polynomial interpolation, it is about finding a polynomial that runs exactly through the points we want. Here, try it out. You can add and remove points, but you can't select a polynomial degree now since it depends on the number of points.

The matrix is called Vandermonde matrix and it comes from the simple fact that if a polynomial runs through a point (x_i, y_i) then P(x_i) = y_i. Just write this down for every point, put it into the matrix form and here you go.

One widely known fact is you can't have your polynomial degree less than the number of points minus one. Another less known fact is that you can have it greater. It's very easy to see. Just imagine adding an extra point to the data set, the one you don't really need. The polynomial is still interpolating regarding the initial set, but by moving that extra point you now gain extra control over that polynomial.

Why not have both?

One of the frequently asked questions is “can we have a polynomial that runs through some chosen points and approximates the rest of the data set as well?”

The answer is: yes, but... We don't have a nice simple formula to do that so we have to either go numeric or use a hack.

The numeric approach is not so hard. Remember that the interpolating polynomial degree may be higher than the number of points in a set? We can exploit that. We can compose a Vandermonde matrix out of all interpolating points we got, and add a few fake ones to make the matrix square. Then we can use the fake ones as variables for any numeric optimizer and it will find the best fit for us.

When the optimizer finishes, we will have our polynomial both interpolating and approximating.

The downside of this approach is that the optimizer may finish its work anytime between 10 nanoseconds and 6 billion years. This depends on the number of variables and the polynomial degree.

The other approach is a hack. The thing is, in practice, we never get our polynomial exactly interpolating due to the computational error. If we're ok with this, we can choose some tolerable interpolation error beforehand and use the same orthogonal principle matrix. This will approximate our points. Then we'll duplicate our interpolating points and recalculate the approximating polynomial. The error for these points will go down. We'll keep duplicating them until the approximating polynomial fits them within the tolerable interpolation error.

This works because what the approximation does, it minimizes the sum of squared residuals. If you put a point twice, its residual will be twice as important to the optimizer as any other point. If you add it 128 times, it will attract the polynomial 128 times stronger than its neighbors.

The brutal simplicity of this approach is also its strength. You only have one variable to alter, so the optimization process is trivial.

Conclusion

Both polynomial approximation and interpolation are widely known and widely implemented. It's most unlikely that you will have to reimplement them at some point in your career. However, knowing how they work may help you come with a cunning plan when what you need is not exactly what the library provides.

Polynomial approximation and interpolation

Approximation

Interpolation

Why not have both?

Conclusion