This is Words and Buttons Online — a collection of interactive #tutorials, #demos, and #quizzes about #mathematics, #algorithms and #programming.

Challenge your performance intuition with C++ operators

I made a mistake one day. More of a typo really. I used “bitwise and” instead of “logical and” in the code. A colleague of mine noticed that during the review and said something like this. “You shouldn't use bitwise operators instead of logical ones, the latter do short-circuiting.”

Yes. Both parts of that sentence are valid. You shouldn't use bitwise operators instead of logical ones. Logical operators in C++ perform short-circuit evaluation. That's true. But the invisible implication between the two parts is wrong.

Short-circuiting may be or may be not beneficial for the performance. It may be or may be not performed on bitwise operators too. And sometimes you're better off not doing logic the most logical way to begin with.

I want to propose a game. Let's measure things and see if we can predict the measurements before they come. We did a similar thing last year, and that was fun, so why not?

So here is the benchmark example.

#include <chrono>
#include <iostream>
#include <random>
#include <array>

int main() {
  using TheType = int;
  constexpr auto TheSize = 16 * 10000000;
  std::mt19937 rng(0);
  std::uniform_int_distribution<TheType> distribution(0, 1);
  std::vector<TheType> xs(TheSize);
  for (auto& digit : xs) {
    digit = distribution(rng);
  }

  volatile auto four_1_in_a_row = 0u;
  auto start = std::chrono::system_clock::now();
  for (auto i = 0u; i < TheSize - 3; ++i)
    if(xs[i] == 1 && xs[i+1] == 1 && xs[i+2] == 1 && xs[i+3] == 1)
      ++four_1_in_a_row;
  auto end = std::chrono::system_clock::now();

  std::cout << "time: " << (end-start).count() * 1e-9
    << "  1111s: " << four_1_in_a_row << "\n";
}

I've made several variants of this code, compiled it with clang 3.8.0-2ubuntu4 -std=c++14 -O2, and measured their run time on Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz.

Using your intuition and best judgment, please estimate their relative performance. Please use the slider below the code samples.

Round 1. && vs &

The main question. Is && faster than &?

  for (auto i = 0u; i < TheSize - 3; ++i)
    if(xs[i] == 1
    && xs[i+1] == 1
    && xs[i+2] == 1
    && xs[i+3] == 1)
      ++four_1_in_a_row;
    
  for (auto i = 0u; i < TheSize - 3; ++i)
    if(xs[i] == 1
     & xs[i+1] == 1
     & xs[i+2] == 1
     & xs[i+3] == 1)
      ++four_1_in_a_row;
    

They are almost the same.

Round 2. & vs *

“Logical and” can also be substituted with a multiplication operator. It has the same truth table on 0 and 1.

  for (auto i = 0u; i < TheSize - 3; ++i)
    if(xs[i] == 1
     & xs[i+1] == 1
     & xs[i+2] == 1
     & xs[i+3] == 1)
      ++four_1_in_a_row;
    
  for (auto i = 0u; i < TheSize - 3; ++i)
    if((xs[i] == 1)
     * (xs[i+1] == 1)
     * (xs[i+2] == 1)
     * (xs[i+3] == 1))
      ++four_1_in_a_row;
    

They are almost the same.

Round 3. ==, && vs *, +, -

In this simple example, we can substitute almost all the logic with computation. Instead of comparing each value separately we can calculate the squared error for the whole 4-tuple.

  for (auto i = 0u; i < TheSize - 3; ++i)
    if(xs[i] == 1
    && xs[i+1] == 1
    && xs[i+2] == 1
    && xs[i+3] == 1)
      ++four_1_in_a_row;
    
...
inline int sq(int x) {
  return x*x;
}
...
  for (auto i = 0u; i < TheSize - 3; ++i)
    if(sq(xs[i] - 1)
     + sq(xs[i+1] - 1)
     + sq(xs[i+2] - 1)
     + sq(xs[i+3] - 1) == 0)
      ++four_1_in_a_row;
    

They are almost the same.

Round 4. * vs abs

We don't really need multiplications. We don't really need squared error. Why don't we calculate an absolute error instead?

...
inline int sq(int x) {
  return x*x;
}
...
  for (auto i = 0u; i < TheSize - 3; ++i)
    if(sq(xs[i] - 1)
     + sq(xs[i+1] - 1)
     + sq(xs[i+2] - 1)
     + sq(xs[i+3] - 1) == 0)
      ++four_1_in_a_row;
    
  for (auto i = 0u; i < TheSize - 3; ++i)
    if(std::abs(xs[i] - 1)
     + std::abs(xs[i+1] - 1)
     + std::abs(xs[i+2] - 1)
     + std::abs(xs[i+3] - 1) == 0)
      ++four_1_in_a_row;
    

They are almost the same.

Round 5. int vs double

The question is, will there be any gain if we simply translate our code to work with doubles?

...
  using TheType = int;
...
  for (auto i = 0u; i < TheSize - 3; ++i)
    if(xs[i] == 1
    && xs[i+1] == 1
    && xs[i+2] == 1
    && xs[i+3] == 1)
      ++four_1_in_a_row;
    
...
  using TheType = double;
...
  for (auto i = 0u; i < TheSize - 3; ++i)
    if(xs[i] == 1
    && xs[i+1] == 1
    && xs[i+2] == 1
    && xs[i+3] == 1)
      ++four_1_in_a_row;
    

They are almost the same.

Round 6. *, +, - int vs double

But if we change the code to avoid logic, will there be any gain?

...
inline int sq(int x) {
  return x*x;
}
...
  using TheType = int;
...
  for (auto i = 0u; i < TheSize - 3; ++i)
    if(sq(xs[i] - 1)
     + sq(xs[i+1] - 1)
     + sq(xs[i+2] - 1)
     + sq(xs[i+3] - 1) == 0)
      ++four_1_in_a_row;
    
...
inline double sq(int x) {
  return x*x;
}
...
  using TheType = double;
...
  for (auto i = 0u; i < TheSize - 3; ++i)
    if(sq(xs[i] - 1)
     + sq(xs[i+1] - 1)
     + sq(xs[i+2] - 1)
     + sq(xs[i+3] - 1) == 0)
      ++four_1_in_a_row;
    

They are almost the same.

Round 7. double vs float

That sound a bit like cheating because with single precision types we would have to read two times less from the memory but let's give it a go.

...
  using TheType = double;
...
  for (auto i = 0u; i < TheSize - 3; ++i)
    if(xs[i] == 1
    && xs[i+1] == 1
    && xs[i+2] == 1
    && xs[i+3] == 1)
      ++four_1_in_a_row;
    
...
  using TheType = float;
...
  for (auto i = 0u; i < TheSize - 3; ++i)
    if(xs[i] == 1
    && xs[i+1] == 1
    && xs[i+2] == 1
    && xs[i+3] == 1)
      ++four_1_in_a_row;
    

They are almost the same.

Round 8. 1 vs 0

This is the bonus round. There is a notion, right along with the short-circuiting being good for performance, that the comparison with zero is cheaper than any other comparison. Let's try it out.

  for (auto i = 0u; i < TheSize - 3; ++i)
    if(xs[i] == 1
    && xs[i+1] == 1
    && xs[i+2] == 1
    && xs[i+3] == 1)
      ++four_1_in_a_row;
    
  for (auto i = 0u; i < TheSize - 3; ++i)
    if(xs[i] == 0
    && xs[i+1] == 0
    && xs[i+2] == 0
    && xs[i+3] == 0)
      four_0_in_a_row++;
    

They are almost the same.

Links

A follow-up: Using logical operators for logical operations is good

Challenge your performance intuition with C++ magic squares

Challenge your performance intuition with nanosecond sorting

Challenge your performance intuition with C++ sine

Further reading: Learning the value of good benchmarking technique with C++ magic squares.

Github with all the experiments.