You don't have to learn assembly to read disassembly

Reading disassembly is more like reading tracks than reading a book. You have to know the language to read a book, but reading tracks, although it gets better with skills and experience, mostly requires attentiveness and logic thinking.

Most of the time we read disassembly only to answer one simple question: does compiler do what we expect it to do? In 3 simple exercises, I’ll show you that you can too often answer this question even if you have no previous knowledge of assembly. I’ll use C++ as a source language, but what I’m trying to show is more or less universal, so it doesn’t matter if you write in C or Java, C# or Rust. If you compile to any kind of machine code — you can benefit from understanding your compiler.

1. Compile time computation

Any decent compiler tries to make your binary code not only correct but efficient. This means doing as little work in runtime as possible. Sometimes it can even conduct the whole computation in a compile time, so your machine code will only have to contain the answer.

This source code defines the number of bits in a byte and returns the size of int in bits.

static int BITS_IN_BYTE = 8;

int main(){
    return sizeof(int)*BITS_IN_BYTE;
}
    

The compiler knows the size of an int. Let's say, for the target platform it is 4 bytes. We also set the number of bits in a byte explicitly. Since all we want is a simple multiplication, and both numbers are known in a compile time, a compiler can simply compute the resulting number itself instead of generating the code that computes the same number each time it's being executed.

Although, this is not something guaranteed by the standard. A compiler may or may not provide this optimization.

Now look at two possible disassemblies for this source code and decide what variant does compile time computation and what doesn’t.

  push    rbp
  mov     rbp, rsp
  mov     dword ptr [rbp - 4], 0
  movsxd  rax, dword ptr [BITS_IN_BYTE]
  shl     rax, 2
  mov     ecx, eax
  mov     eax, ecx
  pop     rbp
  ret
  mov     eax, 32
  ret

2. Function inlining

Calling a function implies some overhead by preparing input data in the particular order; then starting the execution from another piece of memory; then preparing output data; and then returning back.

Not that it is all too slow but if you only want to call a function once, you don’t have to actually call the function. It just makes sense to copy or “inline” the function's body to the place it is called from and skip all the formalities. Compilers can often do this for you so you don't even have to bother.

If the compiler makes such an optimization, this code:

inline double twice(double x){
    return x + x;
}

double triple(double x){
    return twice(x) + x;
}
    

Virtually becomes this:

// not really a source code, just explaining the idea
double triple(double x){
    return x + x + x; // twice gets inlined here
}
    

But the standard does not promise that all the functions marked as inline shall get inlined. It's more a suggestion than a directive.

Now look at these two disassembly variants below and choose the one where the function gets inlined after all.

  movapd  xmm1, xmm0
  addsd   xmm1, xmm1
  addsd   xmm1, xmm0
  movapd  xmm0, xmm1
  ret
  push    rax
  movsd   qword ptr [rsp], xmm0
  call    twice(double)
  addsd   xmm0, qword ptr [rsp]
  pop     rax
  ret
.................................
  addsd   xmm0, xmm0
  ret

3. Loop unrolling

Just like calling functions, going in loops implies some overhead. You have to increment the counter; then compare it against some number; then jump back to the loop's beginning.

Compilers know that in some context it is more effective to unroll the loop. It means that some piece of code will actually be repeated several times in a row instead of messing with the counter comparison and jumping here and there.

Let's say we have this piece of code:

float power_of_4(float x){
    float result = 1;
    for(int i = 0; i < 4; ++i)
        result *= x;
    return result;
}
    

The compiler has all the reasons to unroll such a simple loop, but it might as well choose not to.

Which disassembly has the unrolled loop?

  movaps  xmm1, xmm0
  movss   xmm0, dword ptr [rip + .LCPI0_0]
  mov     eax, 4
.LBB0_1:
  mulss   xmm0, xmm1
  add     eax, -1
  jne     .LBB0_1
  ret
  movaps  xmm1, xmm0
  mulss   xmm1, xmm1
  mulss   xmm1, xmm0
  mulss   xmm1, xmm0
  movaps  xmm0, xmm1
  ret

Conclusion

You can argue that these examples were deliberately simplified. It’s only a half-truth. I did refine them to be more demonstrative, but conceptually they are all taken from my own practice.

Using static dispatch instead of dynamic made my image processing pipeline up to 5 times faster. Repairing broken inlining helped to win back 50% of performance for edge-to-edge distance function. And changing counter type to enable loop unrolling won me about 10% performance gain on matrix transformations, which is not much, but since all it took to achieve was simply changing short int to size_t in one place, I think of is as a good return of investment.

Apparently, old versions of MSVC fail to unroll loops with counters of non-native type. Who would have thought? Well, the truth is, even if you know this particular quirk, you can't possibly know every other aspect of every compiler's behavior out there, so looking at disassembly once in a while might be good for you.

And you don't even have to spend years learning every assembly dialect. Reading disassembly is often easier than it looks. Try it.

P. S.

I would like to thank people of Reddit for constructive criticism, and especially IJzerbaard for his splendid mini-intro to disassembly. I liked it so much, I had to incorporate some of it into my expandable texts.

It's pleasure to see that “no you can't” comments are well balanced with “that's how I started”. It's satisfying that it is the boldness of the claim is being criticized the most and not the claim itself. I'm sorry, but “you can read disassembly without learning assembly but only once in a while and it's not super effective, and it, of course, does help to learn assembly after all, the more you know — the better you are at reading disassembly”, just doesn't fit for the title. Although, I have to agree, this is much closer to the truth.