Not so long ago I’ve read a note in a blog comment. It said that experts above a certain level should write their for-loop as the following:

for (int i = 0; i < ..; ++i)

The important part is the ++i. Why is this better? A vast majority of developers use i++, I haven’t any idea as to why. I use i++, too, despite the fact that I used ++i for years, when I was a C++ developer. I used it for a reason, which was the following:

The i++ is not optimal to be used in situations like for-loop, because the result of the expression is the original value of i. In order to achieve that, the compiler must generate a code which retains the original value of i before it executes the incrementation or any code behind the ++ operator.

It is mainly a concern for complex types, but it is a good habit to avoid unnecessary copies of primitive types, too.

The flaw in this explanation is that compilers recognize the unnecessary copy of primitive types, and generate code which does not retain the original value of i. This is what we are going to check in the following.

Let’s try this code:

static void Main(string[] args)
{
  for (int i = 0; i < 100; i++)
  {
    Console.WriteLine(i);
  }
}

And now the IL code generated by the C# compiler:

.method private hidebysig static void  Main(string[] args) cil managed
{
  .entrypoint
  // Code size       20 (0x14)
  .maxstack  2
  .locals init ([0] int32 i)
 
  // i = 0 (for-loop initializer section)
  IL_0000:  ldc.i4.0  // 32 bit zero value to the evaluation stack
  IL_0001:  stloc.0   // move stack-top into Local Variable Array (LVA) 
                      // 0th slot (variable i)

  // next step is to execute the loop-condition, the 
  // generated code is at the end of this method, jump there
  IL_0002:  br.s       IL_000e   // (continue from line 27)
 
  // Console::WriteLine(i)
  IL_0004:  ldloc.0   // LVA 0. (variable i) to the evaluation stack
  IL_0005:  call       void [mscorlib]System.Console::WriteLine(int32)
 
  // i++
  IL_000a:  ldloc.0   // LVA 0. (variable i) to the evaluation stack
  IL_000b:  ldc.i4.1  // 32 bit 1 value to the evaluatuion stack
  IL_000c:  add       // add the two top values from the stack together, 
  IL_000d:  stloc.0   // write result back to the stack and store result 
                      // in variable i (LVA 0. slot)
  // i < 100
  IL_000e:  ldloc.0            // LVA 0. (variable i) to the evaluatuion stack
  IL_000f:  ldc.i4.s   100     // 32 bit 100 value to the evaluatuion stack
  IL_0011:  blt.s      IL_0004 // jump to address 0004 (line 17) if the value 
                               // on the stack-top is smaller than the second 
                               // value on the stack (the two values will be 
  IL_0013:  ret                // removed from the stack)
}

The important lines are at 21-26. In theory, we should have found a code snippet here which saves the original value of i before the incrementation. We should have seen something which saves the value of LVA 0th slot into a temporary space, like LVA 1st slot – and then leaves it alone without processing. But we have found nothing. This code is exactly the same as that which would be generated for ++i.

So the code above is optimized by the C# compiler. But this is not the end of the story, the jitter can do more. Look at the code which actually executes on the processor:

00000000 push        ebp              // save the state of ebp (caller needs this)
00000001 mov         ebp,esp          // our base is the top of the stack
 
// Main() method has a parameter with reference type, and the LVA has a single
// element (local variable i). It gives 8 bytes so far. One may think that
// the parameter of Main() is already on the stack as this is the place for
// passing parameters. But CLR passes the first few parameters in registers.
// In our case, ecx holds the "arg" parameter of main(). The method will use
// ecx later, therefore a place is needed to save "args" into.   
// Anyway, the point is, we need 8 bytes on the stack - "args" and "i"
00000003 sub         esp,8            
 
// Save "args" parameter from ecx to the stack:
00000006 mov         dword ptr [ebp-4],ecx 

// I was not able to figure out what this call is. It only appears
// when debugger is attached. It seems from the SSCLI implementation
// that the jitter writes a "Just My Code" callback, but I do not know
// exactly how it works. Anyway, it is not important for us now. 
00000009 cmp         dword ptr ds:[00252E28h],0 
00000010 je          00000017 
00000012 call        65C1B701 

// "Main" method has the flag in its metadata switched on, which tells
// the CLR to zero out every local variable. The only local variable
// is "i" which is at ebp-8
00000017 xor         edx,edx                // this results zero in edx
00000019 mov         dword ptr [ebp-8],edx  // store zero in "i"
 
// The code of the for-loop starts here. Its structure is the
// same as in the case of the IL code.
 
// No, the jitter isn't such lame. This is the initializer section of the
// for-loop, which is i = 0, and the generated code is exactly the same as
// the previous two lines. If no debugger is attached, these two lines are not 
// repeated. Now, it is probably inserted to support source code debugging.
0000001c xor         edx,edx 
0000001e mov         dword ptr [ebp-8],edx 
 
00000021 nop              
 
// The loop condition is generated to the end of the method as in case
// of the IL code.
00000022 jmp         00000030 

 
// This is the code of the Console.WriteLine(i). As usual, the CLR passes
// the first parameter in ecx. This is a static method, so no "this"
// parameter must be passed. Therefore, ecx will hold the 32 bit "i" value.
// Look at the strange call operation with the indirect reference. Explanation
// after the code.
00000024 mov         ecx,dword ptr [ebp-8] 
00000027 call        dword ptr ds:[0084E080h] 

// The implementation of i++. It can be seen it does not retain the original
// value. Instead, it increments "i" directly inside its own memory slot.
// The only more optimized code would be to keep "i" in a register - if
// no debugger is attached, jitter generates the code that way.
0000002d inc         dword ptr [ebp-8] 
 
// If i < 100, jump back to 24
00000030 cmp         dword ptr [ebp-8],64h 
00000034 jl          00000024 
 
// Restore stack, return to caller
00000036 nop              
00000037 mov         esp,ebp 
00000039 pop         ebp  
0000003a ret              

It is noteworthy to look at the strange way the code calls Console.WriteLine(). This is because it is the very first call site to the Console.WriteLine(). When the jitter compiles a method, it checks whether the referenced sub-methods are compiled or not. If a sub-method is already compiled, its address is available, and the “call” operation can use the address directly. If the sub-method hasn’t been compiled yet, obviously it has no address, but the jitter must generate something which calls the non-existing code. It will not compile the sub-method just to make an address – using this logic, the jitter would compile the whole program at once. What it does instead is that, it allocates a slot in the memory, where the address of the non-existing sub-method can be stored in the future. Of course, this is still not enough, as call operation must be generated in a way which does something useful. This useful something will be a tiny code, which triggers the jitter to compile the referred sub-method, and when it is compiled, its address will be written back into the slot which has been referencing the jitter code so far. After that, the call will find the compiled code – with a small performance hit due to the indirect reference. The 0084E080 is the address of the slot which first contains the trigger code for the jitter, and then will contain the address of the compiled Console.WriteLine().

What about the concept of ++i?

The before mentioned comment also argued that ++i is not only about performance – it is also a better expression of the developer’s intention. Now, ++i or i++ is a loop-expression. The promise is that this expression will be evaluated every time, before the evaluation of the conditional-expression. The value of the loop-expression is not important. It is not used by the runtime, not like the value of the conditional-expression, which determines whether the loop continues or not.

If the value of the loop-expression is not important, then what is? The side effect of the loop-expression is! Therefore, using ++i or i++ is conceptually equivalent in a loop-expression, because both have the same side effect – increasing the value of “i”.

The conclusion is that whether we use ++i in the future or not makes no difference.

Original article in Hungarian: A profik “++i”-t használnak?