Do experts use ++i?


Not so long ago I’ve read a note in a blog comment. It said that experts above a certain level should write their for-loop as the following:

for (int i = 0; i < ..; ++i)

The important part is the ++i. Why is this better? A vast majority of developers use i++, I haven’t any idea as to why. I use i++, too, despite the fact that I used ++i for years, when I was a C++ developer. I used it for a reason, which was the following:

The i++ is not optimal to be used in situations like for-loop, because the result of the expression is the original value of i. In order to achieve that, the compiler must generate a code which retains the original value of i before it executes the incrementation or any code behind the ++ operator.

It is mainly a concern for complex types, but it is a good habit to avoid unnecessary copies of primitive types, too.

The flaw in this explanation is that compilers recognize the unnecessary copy of primitive types, and generate code which does not retain the original value of i. This is what we are going to check in the following.

Let’s try this code:

static void Main(string[] args)
{
  for (int i = 0; i < 100; i++)
  {
    Console.WriteLine(i);
  }
}

And now the IL code generated by the C# compiler:

.method private hidebysig static void  Main(string[] args) cil managed
{
  .entrypoint
  // Code size       20 (0x14)
  .maxstack  2
  .locals init ([0] int32 i)
 
  // i = 0 (for-loop initializer section)
  IL_0000:  ldc.i4.0  // 32 bit zero value to the evaluation stack
  IL_0001:  stloc.0   // move stack-top into Local Variable Array (LVA) 
                      // 0th slot (variable i)

  // next step is to execute the loop-condition, the 
  // generated code is at the end of this method, jump there
  IL_0002:  br.s       IL_000e   // (continue from line 27)
 
  // Console::WriteLine(i)
  IL_0004:  ldloc.0   // LVA 0. (variable i) to the evaluation stack
  IL_0005:  call       void [mscorlib]System.Console::WriteLine(int32)
 
  // i++
  IL_000a:  ldloc.0   // LVA 0. (variable i) to the evaluation stack
  IL_000b:  ldc.i4.1  // 32 bit 1 value to the evaluatuion stack
  IL_000c:  add       // add the two top values from the stack together, 
  IL_000d:  stloc.0   // write result back to the stack and store result 
                      // in variable i (LVA 0. slot)
  // i < 100
  IL_000e:  ldloc.0            // LVA 0. (variable i) to the evaluatuion stack
  IL_000f:  ldc.i4.s   100     // 32 bit 100 value to the evaluatuion stack
  IL_0011:  blt.s      IL_0004 // jump to address 0004 (line 17) if the value 
                               // on the stack-top is smaller than the second 
                               // value on the stack (the two values will be 
  IL_0013:  ret                // removed from the stack)
}

The important lines are at 21-26. In theory, we should have found a code snippet here which saves the original value of i before the incrementation. We should have seen something which saves the value of LVA 0th slot into a temporary space, like LVA 1st slot – and then leaves it alone without processing. But we have found nothing. This code is exactly the same as that which would be generated for ++i.

So the code above is optimized by the C# compiler. But this is not the end of the story, the jitter can do more. Look at the code which actually executes on the processor:

00000000 push        ebp              // save the state of ebp (caller needs this)
00000001 mov         ebp,esp          // our base is the top of the stack
 
// Main() method has a parameter with reference type, and the LVA has a single
// element (local variable i). It gives 8 bytes so far. One may think that
// the parameter of Main() is already on the stack as this is the place for
// passing parameters. But CLR passes the first few parameters in registers.
// In our case, ecx holds the "arg" parameter of main(). The method will use
// ecx later, therefore a place is needed to save "args" into.   
// Anyway, the point is, we need 8 bytes on the stack - "args" and "i"
00000003 sub         esp,8            
 
// Save "args" parameter from ecx to the stack:
00000006 mov         dword ptr [ebp-4],ecx 

// I was not able to figure out what this call is. It only appears
// when debugger is attached. It seems from the SSCLI implementation
// that the jitter writes a "Just My Code" callback, but I do not know
// exactly how it works. Anyway, it is not important for us now. 
00000009 cmp         dword ptr ds:[00252E28h],0 
00000010 je          00000017 
00000012 call        65C1B701 

// "Main" method has the flag in its metadata switched on, which tells
// the CLR to zero out every local variable. The only local variable
// is "i" which is at ebp-8
00000017 xor         edx,edx                // this results zero in edx
00000019 mov         dword ptr [ebp-8],edx  // store zero in "i"
 
// The code of the for-loop starts here. Its structure is the
// same as in the case of the IL code.
 
// No, the jitter isn't such lame. This is the initializer section of the
// for-loop, which is i = 0, and the generated code is exactly the same as
// the previous two lines. If no debugger is attached, these two lines are not 
// repeated. Now, it is probably inserted to support source code debugging.
0000001c xor         edx,edx 
0000001e mov         dword ptr [ebp-8],edx 
 
00000021 nop              
 
// The loop condition is generated to the end of the method as in case
// of the IL code.
00000022 jmp         00000030 

 
// This is the code of the Console.WriteLine(i). As usual, the CLR passes
// the first parameter in ecx. This is a static method, so no "this"
// parameter must be passed. Therefore, ecx will hold the 32 bit "i" value.
// Look at the strange call operation with the indirect reference. Explanation
// after the code.
00000024 mov         ecx,dword ptr [ebp-8] 
00000027 call        dword ptr ds:[0084E080h] 

// The implementation of i++. It can be seen it does not retain the original
// value. Instead, it increments "i" directly inside its own memory slot.
// The only more optimized code would be to keep "i" in a register - if
// no debugger is attached, jitter generates the code that way.
0000002d inc         dword ptr [ebp-8] 
 
// If i < 100, jump back to 24
00000030 cmp         dword ptr [ebp-8],64h 
00000034 jl          00000024 
 
// Restore stack, return to caller
00000036 nop              
00000037 mov         esp,ebp 
00000039 pop         ebp  
0000003a ret              

It is noteworthy to look at the strange way the code calls Console.WriteLine(). This is because it is the very first call site to the Console.WriteLine(). When the jitter compiles a method, it checks whether the referenced sub-methods are compiled or not. If a sub-method is already compiled, its address is available, and the “call” operation can use the address directly. If the sub-method hasn’t been compiled yet, obviously it has no address, but the jitter must generate something which calls the non-existing code. It will not compile the sub-method just to make an address – using this logic, the jitter would compile the whole program at once. What it does instead is that, it allocates a slot in the memory, where the address of the non-existing sub-method can be stored in the future. Of course, this is still not enough, as call operation must be generated in a way which does something useful. This useful something will be a tiny code, which triggers the jitter to compile the referred sub-method, and when it is compiled, its address will be written back into the slot which has been referencing the jitter code so far. After that, the call will find the compiled code – with a small performance hit due to the indirect reference. The 0084E080 is the address of the slot which first contains the trigger code for the jitter, and then will contain the address of the compiled Console.WriteLine().

What about the concept of ++i?

The before mentioned comment also argued that ++i is not only about performance – it is also a better expression of the developer’s intention. Now, ++i or i++ is a loop-expression. The promise is that this expression will be evaluated every time, before the evaluation of the conditional-expression. The value of the loop-expression is not important. It is not used by the runtime, not like the value of the conditional-expression, which determines whether the loop continues or not.

If the value of the loop-expression is not important, then what is? The side effect of the loop-expression is! Therefore, using ++i or i++ is conceptually equivalent in a loop-expression, because both have the same side effect – increasing the value of “i”.

The conclusion is that whether we use ++i in the future or not makes no difference.

Original article in Hungarian: A profik “++i”-t használnak?

  1. #1 by Baris Caglar on March 10, 2015 - 8:43 am

    If the ++ operator is overloaded, then the compiler will not do the optimization. If there is even one reason to use ++i, rather than i++, and zero reason to prefer i++ over ++i, then using ++i everywhere is just easier in my opinion.

    • #2 by Peller Viktor on March 11, 2015 - 3:00 am

      Good point, thank you! However, when ++ is overloaded in C#, and we don’t use the result of the operation, the compiler creates the same code for both i++ and ++i; it calls the operator++() and stores the result, that’s all. I created a simple struct with overloaded operator++() and used both ++a and a++:

      TestType a = new TestType();
      a++;
      Console.WriteLine(a);
      ++a;
      Console.WriteLine(a);
      ———
      initobj TestType

      ldloc.0 // “a” to stack
      call TestType::op_Increment()
      stloc.0 // store the result in “a”
      ldloc.0 // write “a” to console
      box TestType
      call Console::WriteLine()

      ldloc.0 // “a” to stack
      call op_Increment()
      stloc.0 // store the result in “a”
      ldloc.0 // write “a” to console
      box plusplus.TestType
      call Console::WriteLine(object)
      ———

      Now, if I use the result of the ++, then we can see the difference as expected:

      TestType a = new TestType();
      TestType b;

      b = a++;
      Console.WriteLine(a);
      b = ++a;
      Console.WriteLine(a);
      ———
      ldloca.s a
      initobj TestType

      ldloc.0 // “a” to stack
      dup // copy “a” on stack (two original values)
      call TestType::op_Increment()
      stloc.0 // store the incremented result in “a”
      stloc.1 // store the original in “b”

      ldloc.0 // write “a” to console
      box TestType
      call Console::WriteLine(object)

      ldloc.0 // “a” to stack
      call TestType::op_Increment()
      dup // copy the result on stack (two incremented values)
      stloc.0 // result to “a”
      stloc.1 // result to “b”

      ldloc.0 // write “a” to console
      box TestType
      call Console::WriteLine(object)

      On the other hand, I don’t remember if I’ve ever overloaded the operator++() and I am not sure I’ve used other than the built-in versions, but it’s just me. I know that in C++ I used to use it for iterators a lot. But C# has foreach/IEnumberable, so no need for overloaded operator++() for this purpose, I think.

      Anyway, thank you for your thoughts, if you have any tricky case to share, let me know!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: