This is Words and Buttons Online — a collection of interactive #tutorials, #demos, and #quizzes about #mathematics, #algorithms and #programming.

Using logical operators for logical operations is good

Challenge your performance intuition with C++ operators was about how the context matters more than tricks. Unfortunately, I didn't make my point clear enough for which I'm truly sorry.

It might look like I advocate using math operators instead of the logical ones because of the performance gain. Well, I do. But only if a context is well-known and it is not going to change. This is the whole point. The context matters.

I find trickery appropriate when you know your hardware, and your compiler, and you are ready to redo your code from scratch when something changes. This might be the case if you run some computationally heavy algorithm in the cloud. Your environment is predetermined, and you pay per minute, so it makes sense to squeeze every penny from what you got.

But in general case, you should use logical operations to do logic. Not because of the short-circuiting, this is also a context-dependent trick, but because in general case compilers do the trickery better than we humans do.

Let's redo a few rounds. The benchmark is the same, the questions are the same. The compiler is the same. The only thing that changes is the platform. This is now CHIP with ARMv7.

This is the original benchmark. I only reduced the number of operations tenfold because the machine itself is much slower.

#include <chrono>
#include <iostream>
#include <random>
#include <array>

int main() {
  using TheType = int;
  constexpr auto TheSize = 16 * 1000000;
  std::mt19937 rng(0);
  std::uniform_int_distribution<TheType> distribution(0, 1);
  std::vector<TheType> xs(TheSize);
  for (auto& digit : xs) {
    digit = distribution(rng);
  }

  volatile auto four_1_in_a_row = 0u;
  auto start = std::chrono::system_clock::now();
  for (auto i = 0u; i < TheSize - 3; ++i)
    if(xs[i] == 1 && xs[i+1] == 1 && xs[i+2] == 1 && xs[i+3] == 1)
      ++four_1_in_a_row;
  auto end = std::chrono::system_clock::now();

  std::cout << "time: " << (end-start).count() * 1e-9
    << "  1111s: " << four_1_in_a_row << "\n";
}

Just like the last time, using your intuition and best judgment, please estimate the relative performance of the code snippets from below.

Round 1. && vs &

The same question. Is && faster than &?

  for (auto i = 0u; i < TheSize - 3; ++i)
    if(xs[i] == 1
    && xs[i+1] == 1
    && xs[i+2] == 1
    && xs[i+3] == 1)
      ++four_1_in_a_row;
  for (auto i = 0u; i < TheSize - 3; ++i)
    if(xs[i] == 1
     & xs[i+1] == 1
     & xs[i+2] == 1
     & xs[i+3] == 1)
      ++four_1_in_a_row;

They are almost the same.

Round 2. ==, && vs *, +, -

On Intel, substituting logic with arithmetics gave a noticeable gain. Will the trick work on ARMv7?

  for (auto i = 0u; i < TheSize - 3; ++i)
    if(xs[i] == 1
    && xs[i+1] == 1
    && xs[i+2] == 1
    && xs[i+3] == 1)
      ++four_1_in_a_row;
...
inline int sq(int x) {
  return x*x;
}
...
  for (auto i = 0u; i < TheSize - 3; ++i)
    if(sq(xs[i] - 1)
     + sq(xs[i+1] - 1)
     + sq(xs[i+2] - 1)
     + sq(xs[i+3] - 1) == 0)
      ++four_1_in_a_row;

They are almost the same.

Round 3. * vs abs

With Intel, switching multiplication to absolute value results in a noticeable loss. How will ARM do?

...
inline int sq(int x) {
  return x*x;
}
...
  for (auto i = 0u; i < TheSize - 3; ++i)
    if(sq(xs[i] - 1)
     + sq(xs[i+1] - 1)
     + sq(xs[i+2] - 1)
     + sq(xs[i+3] - 1) == 0)
      ++four_1_in_a_row;
  for (auto i = 0u; i < TheSize - 3; ++i)
    if(std::abs(xs[i] - 1)
     + std::abs(xs[i+1] - 1)
     + std::abs(xs[i+2] - 1)
     + std::abs(xs[i+3] - 1) == 0)
      ++four_1_in_a_row;

They are almost the same.

Round 4. int vs float

On Intel, double and int versions work almost the same. Since ARMv7 is a 32-bit processor, it would be fair to compare the int version with the float one. Let's do that.

...
  using TheType = int;
...
  for (auto i = 0u; i < TheSize - 3; ++i)
    if(xs[i] == 1
    && xs[i+1] == 1
    && xs[i+2] == 1
    && xs[i+3] == 1)
      ++four_1_in_a_row;
...
  using TheType = float;
...
  for (auto i = 0u; i < TheSize - 3; ++i)
    if(xs[i] == 1
    && xs[i+1] == 1
    && xs[i+2] == 1
    && xs[i+3] == 1)
      ++four_1_in_a_row;

They are almost the same.

Conclusion

The context matters. Unless you are willing to optimize for the specific platform, paying greatly in terms of maintenance and portability, writing simple code is the best strategy.