Do experts use ++i?

Not so long ago I’ve read a note in a blog comment. It said that experts above a certain level should write their for-loop as the following:

for (int i = 0; i < ..; ++i)

The important part is the ++i. Why is this better? A vast majority of developers use i++, I haven’t any idea as to why. I use i++, too, despite the fact that I used ++i for years, when I was a C++ developer. I used it for a reason, which was the following:

The i++ is not optimal to be used in situations like for-loop, because the result of the expression is the original value of i. In order to achieve that, the compiler must generate a code which retains the original value of i before it executes the incrementation or any code behind the ++ operator.

It is mainly a concern for complex types, but it is a good habit to avoid unnecessary copies of primitive types, too.

The flaw in this explanation is that compilers recognize the unnecessary copy of primitive types, and generate code which does not retain the original value of i. This is what we are going to check in the following.

Let’s try this code:

static void Main(string[] args)
{
  for (int i = 0; i < 100; i++)
  {
    Console.WriteLine(i);
  }
}

And now the IL code generated by the C# compiler:

.method private hidebysig static void  Main(string[] args) cil managed
{
  .entrypoint
  // Code size       20 (0x14)
  .maxstack  2
  .locals init ([0] int32 i)
 
  // i = 0 (for-loop initializer section)
  IL_0000:  ldc.i4.0  // 32 bit zero value to the evaluation stack
  IL_0001:  stloc.0   // move stack-top into Local Variable Array (LVA) 
                      // 0th slot (variable i)

  // next step is to execute the loop-condition, the 
  // generated code is at the end of this method, jump there
  IL_0002:  br.s       IL_000e   // (continue from line 27)
 
  // Console::WriteLine(i)
  IL_0004:  ldloc.0   // LVA 0. (variable i) to the evaluation stack
  IL_0005:  call       void [mscorlib]System.Console::WriteLine(int32)
 
  // i++
  IL_000a:  ldloc.0   // LVA 0. (variable i) to the evaluation stack
  IL_000b:  ldc.i4.1  // 32 bit 1 value to the evaluatuion stack
  IL_000c:  add       // add the two top values from the stack together, 
  IL_000d:  stloc.0   // write result back to the stack and store result 
                      // in variable i (LVA 0. slot)
  // i < 100
  IL_000e:  ldloc.0            // LVA 0. (variable i) to the evaluatuion stack
  IL_000f:  ldc.i4.s   100     // 32 bit 100 value to the evaluatuion stack
  IL_0011:  blt.s      IL_0004 // jump to address 0004 (line 17) if the value 
                               // on the stack-top is smaller than the second 
                               // value on the stack (the two values will be 
  IL_0013:  ret                // removed from the stack)
}

The important lines are at 21-26. In theory, we should have found a code snippet here which saves the original value of i before the incrementation. We should have seen something which saves the value of LVA 0th slot into a temporary space, like LVA 1st slot – and then leaves it alone without processing. But we have found nothing. This code is exactly the same as that which would be generated for ++i.

So the code above is optimized by the C# compiler. But this is not the end of the story, the jitter can do more. Look at the code which actually executes on the processor:

00000000 push        ebp              // save the state of ebp (caller needs this)
00000001 mov         ebp,esp          // our base is the top of the stack
 
// Main() method has a parameter with reference type, and the LVA has a single
// element (local variable i). It gives 8 bytes so far. One may think that
// the parameter of Main() is already on the stack as this is the place for
// passing parameters. But CLR passes the first few parameters in registers.
// In our case, ecx holds the "arg" parameter of main(). The method will use
// ecx later, therefore a place is needed to save "args" into.   
// Anyway, the point is, we need 8 bytes on the stack - "args" and "i"
00000003 sub         esp,8            
 
// Save "args" parameter from ecx to the stack:
00000006 mov         dword ptr [ebp-4],ecx 

// I was not able to figure out what this call is. It only appears
// when debugger is attached. It seems from the SSCLI implementation
// that the jitter writes a "Just My Code" callback, but I do not know
// exactly how it works. Anyway, it is not important for us now. 
00000009 cmp         dword ptr ds:[00252E28h],0 
00000010 je          00000017 
00000012 call        65C1B701 

// "Main" method has the flag in its metadata switched on, which tells
// the CLR to zero out every local variable. The only local variable
// is "i" which is at ebp-8
00000017 xor         edx,edx                // this results zero in edx
00000019 mov         dword ptr [ebp-8],edx  // store zero in "i"
 
// The code of the for-loop starts here. Its structure is the
// same as in the case of the IL code.
 
// No, the jitter isn't such lame. This is the initializer section of the
// for-loop, which is i = 0, and the generated code is exactly the same as
// the previous two lines. If no debugger is attached, these two lines are not 
// repeated. Now, it is probably inserted to support source code debugging.
0000001c xor         edx,edx 
0000001e mov         dword ptr [ebp-8],edx 
 
00000021 nop              
 
// The loop condition is generated to the end of the method as in case
// of the IL code.
00000022 jmp         00000030 

 
// This is the code of the Console.WriteLine(i). As usual, the CLR passes
// the first parameter in ecx. This is a static method, so no "this"
// parameter must be passed. Therefore, ecx will hold the 32 bit "i" value.
// Look at the strange call operation with the indirect reference. Explanation
// after the code.
00000024 mov         ecx,dword ptr [ebp-8] 
00000027 call        dword ptr ds:[0084E080h] 

// The implementation of i++. It can be seen it does not retain the original
// value. Instead, it increments "i" directly inside its own memory slot.
// The only more optimized code would be to keep "i" in a register - if
// no debugger is attached, jitter generates the code that way.
0000002d inc         dword ptr [ebp-8] 
 
// If i < 100, jump back to 24
00000030 cmp         dword ptr [ebp-8],64h 
00000034 jl          00000024 
 
// Restore stack, return to caller
00000036 nop              
00000037 mov         esp,ebp 
00000039 pop         ebp  
0000003a ret              

It is noteworthy to look at the strange way the code calls Console.WriteLine(). This is because it is the very first call site to the Console.WriteLine(). When the jitter compiles a method, it checks whether the referenced sub-methods are compiled or not. If a sub-method is already compiled, its address is available, and the “call” operation can use the address directly. If the sub-method hasn’t been compiled yet, obviously it has no address, but the jitter must generate something which calls the non-existing code. It will not compile the sub-method just to make an address – using this logic, the jitter would compile the whole program at once. What it does instead is that, it allocates a slot in the memory, where the address of the non-existing sub-method can be stored in the future. Of course, this is still not enough, as call operation must be generated in a way which does something useful. This useful something will be a tiny code, which triggers the jitter to compile the referred sub-method, and when it is compiled, its address will be written back into the slot which has been referencing the jitter code so far. After that, the call will find the compiled code – with a small performance hit due to the indirect reference. The 0084E080 is the address of the slot which first contains the trigger code for the jitter, and then will contain the address of the compiled Console.WriteLine().

What about the concept of ++i?

The before mentioned comment also argued that ++i is not only about performance – it is also a better expression of the developer’s intention. Now, ++i or i++ is a loop-expression. The promise is that this expression will be evaluated every time, before the evaluation of the conditional-expression. The value of the loop-expression is not important. It is not used by the runtime, not like the value of the conditional-expression, which determines whether the loop continues or not.

If the value of the loop-expression is not important, then what is? The side effect of the loop-expression is! Therefore, using ++i or i++ is conceptually equivalent in a loop-expression, because both have the same side effect – increasing the value of “i”.

The conclusion is that whether we use ++i in the future or not makes no difference.

Original article in Hungarian: A profik “++i”-t használnak?

What does Object.GetHashCode() return?

There are some misconceptions about Object.GetHashCode(). Two of them are quite prevalent:

  • The implementation of Object.GetHashCode() derives the hash value from the reference (as a memory pointer value) of the instance
  • The return value of Object.GetHashCode() is unique to the instance. No other instances will have the same hash value.

These two claims may look to be acceptable at first sight. Let’s examine deeper.

Reference as hash value

This appears to be a good idea. Only one instance can exist on a given memory address, so different instances get different hash codes. What makes this idea inappropriate under .NET is the garbage collector. It defragments the memory by moving instances around – making our imaginary hash value volatile.

GetHashCode() as unique identifier

Why can’t everybody have a different birthday? Because there are only 365 (366) days in a year but more people live on Earth. GetHashCode() returns a 32 bit value, which means 4 billion different hash codes. It is a big number, but software can easily generate more instances. This means that 32 bit is not enough to ensure uniqueness. But it is not even required. Hash value based algorithms are designed to handle hash collisions – or at least they should be. For example the algorithm behind Dictionary or HashSet calls the Equals() method in case of a collision to distinguish instances.

The Truth

So what is the result of the default implementation of GetHashCode()? The short answer is: a pseudo random number generated by a simple function. When Object.GetHashCode() is called on an instance, this function generates a number. Then the number gets stored in a special location associated with that instance. This ensures that every call on the same instance returns the same number.

The Proof – predict next hash code

To show that it is not just a new myth, we are going to write a small program predicting the result of the next Object.GetHashCode() call. To achieve this, the same pseudo random algorithm must be implemented as .NET framework has built in.

Breaking the Random Generator

Pseudo random generators are targets of cryptographical attacks because ephemeral keys are generated using these functions. If one knows the state and the algorithm of a pseudo random function, he will be able to calculate the key material of a TLS communication making possible to decipher it real time.

The importance of this field was the driver for the development of sophisticated methods to reverse engineer a random function with its state. We could use such a method to decipher the random function behind GetHashCode(), too.

Before someone gets excited, I have to admit we will not play Neo. We can simply check the SSCLI source code for the implementation. The important parts are around sscli20\clr\src\vm\threads.cpp 1752. line and threads.h 942. line.

The following code implements the same algorithm:

 
public static class HashCodeOracle
{
    // More independent generators work in the code; every thread
    // has its own. The following number is the common seed value:
    private const uint coreSeed = 123456789;

    // When we start using the random generator of GetHashCode(),
    // we do not know whether CLR has already used it or not.    
    // Therefore our solution must be synchronized. First, get a 
    // random from GetHasCode(), then try to find that value 
    // iterating our random generator. The following value gives a 
    // limit to the iteration to avoid infinite loop:
    private const uint maxSyncTrials = 1000;

    // CLR stores hash values in a special area along with other values.
    // These other values require six bits
    private const int syncFlagCount = 6;

    // Every thread has its own state for random generation, so
    // thread local store must be used. ThreadLocal<T> calls
    // the provided method to calculate initial random state. 
    private static ThreadLocal<uint> state = new ThreadLocal<uint>(
                                                    () => GetInitStateForThread());
    private static uint GetInitStateForThread()
    {
        var initState = CalculateThreadSeed();

  
        // Synchronize with the state of the current thread.
        // The following "algorithm" is not adequate in real
        // environment but ok for our test.
        var syncObject = new Object();
        var syncValue = syncObject.GetHashCode();

        var syncIterations = 0;
        while (++syncIterations < maxSyncTrials)
        {
            initState = IterateState(initState);

            // We've got the state!!
            if ((initState >> syncFlagCount) == syncValue)
            {
                return initState;
            } // if
        } // while

        throw new NotSupportedException("Could not sync on current thread");        
    } // GetInitStateForThread()

    private static uint CalculateThreadSeed()
    {
        // Every thread starts from the same value
        var threadSeed = coreSeed;

        // The following loop works a bit different way than the
        // original in CLR, so it might happen to get a wrong seed.
        var managedThreadId = Thread.CurrentThread.ManagedThreadId;

        for (int i = 0; i < managedThreadId; i++)
        {
            threadSeed = threadSeed * 1566083941 + 1;
        } // for i

        return threadSeed;
    } // CalculateThreadSeed()

    // The pseudo random generator function.
    private static uint IterateState(uint currentState)
    {
        var managedThreadId = (uint)Thread.CurrentThread.ManagedThreadId;
        return currentState * (managedThreadId * 4u + 5u) + 1u;
    } // IterateState()

    // This gives the next hash code. It calls the
    // random function to provide a new value, then
    // clears the upper bits by a bit shift.
    public static int Next()
    {
        int result = 0;

        do
        {
            var currentState = state.Value;
            var newState = IterateState(currentState);
            state.Value = newState;

            result = (int)(newState >> syncFlagCount);
        } 
        while (result == 0);

        return result;
    } // Next()
} // class HashCodeOracle

The code above uses a thread dependent generator. So the implementation uses a thread local variable to calculate and maintain a seed value. The class can be tried out using the following test code:

static void Main(string[] args)
{
    for (var i = 0; i < 1000; i++)
    {
        Console.WriteLine("pred: " + HashCodeOracle.Next());
        var o = new object();
        Console.WriteLine("fact:   " + o.GetHashCode());
    } // for i
} // Main()

A snippet from the result:

pred: 3863873
fact: 3863873
pred: 34774863
fact: 34774863
pred: 44538317
fact: 44538317
pred: 65300541
fact: 65300541
pred: 50833958
fact: 50833958
pred: 54852443
fact: 54852443

So the hash value can be predicted by the random function.

How unique is the hash code?

We’ve mentioned that the result of the GetHashCode() call is 32 bit wide. In fact it is only 26 bit – the few bits on the higher part are reserved for other purposes. It means that 2^26=67 million different values can be resulted. That’s many, isn’t it? Not really.

Birthday Attack

Some algorithms based on weak hash functions can be forged applying the “Birthday Problem”. To understand the Birthday Problem, answer the following questions: how many people are needed in a room to reach 50% probability that two people have the same birthday (year ignored)?Most people – including me – would say a big number, maybe around 300 or more as 366 different birth days exist.

The correct answer, however, is 23. It is a big surprise and it is also interesting that 57 people are enough to reach 99%. So if there are 55-60 people in a room, it is almost certain that two of them have the same birthday.

Back to our problem, how many hash codes are needed to find two that are identical with 50% probability? Applying the formula from Wikipedia, using 26 bit hash code, 9600 hash codes are sufficient to have 50% chance for a collision. If 25000 hash codes are generated, the probability of collision is 99%. This is not a big number, a small application can reach it easily.

Hash Collision in practice

It is easy to check the magnitude of the calculations above. Let’s write a small program which generates hash values until a collision happens.

var hashCodes = new HashSet<int>();

int i = 0;
while (hashCodes.Add(HashCodeOracle.Next()))
{
  i++;
} // while

Console.WriteLine(i);  

The result of the program (which depends on many factors) on my computer is 5310.

Where does hash value get stored?

After Object.GetHashCode() generated a random value, this value must be stored and associated with the instance. Otherwise every GetHashCode() call would return a different value for the same instance. So where does this value get stored? In a hidden field of Object? Well, the answer is yes and no.

SyncBlockIndex aka sink

When we create an instance and memory is being reserved to hold instance fields, some additional space also gets reserved. Many people have already heard about the first field of the instance footprint. Sometimes it is called type handle, sometimes method table and sometimes type object pointer – and in my opinion the latter is a bit misleading. This field is the modern version of the C++ virtual table pointer. It helps to resolve virtual and interface method addresses, to find static member addresses and and has other purposes. The method table referenced by the type handle is type specific – not instance specific – so generated hash values cannot be stored in the method table.

Everyone who read the excellent book “CLR via C#” may remember that there is another hidden field reserved in memory beside instance fields – called SyncBlockIndex. Where is this field? Savvy readers and those with C++ background may know that the memory pointed by a reference starts with 4 (or 8 on 64 bit systems) bytes type handle, then followed by instance fields. There is no other field here. Now, the location of the SyncBlockIndex is prior to the reference at the -4 (or -8) offset. To help visualize it, look at the picture below:

negsyncblock

The funny thing about SyncBlockIndex is that the SSCLI source code refers it as sink – it is piled on by many unrelated things. What can be found here? The original purpose of it is to support object locking – using the lock keyword or the Monitor class on an instance. What else? If the instance is a String, there is some additional information here that can be used for optimization by the CLR. All uppercase, all ASCII, things like that, helping string comparsions/conversions. The Garbage Collector also has a bit here – this is how it knows if an instance has already been visited, avoiding circular walking.

The main purpose of the SyncBlockIndex is to support locking. Most of the instances will never be locked, however, so the majority of the bits in SyncBlockIndex will never be used. Apart from the special flags, it means 26 rarely used bits. Does this number look familiar? Yes, this is the effective bit length of the hash value returned by GetHashCode() – and this is not a coincidence.

Let’s check whether the hash code is really being stored in SyncBlockIndex:

The figure is a bit complex at first sight. First, there is a simple program code that creates a System.Object instance, and then obtains a hash code for it (1). At the bottom of the screen in the black console window the hash code is printed in hexadecimal representation.

Around (2) in the Immediate window the SOS extension has been already loaded. We need this to trace local variables, which helps us to find the Object instance. Using !CLRStack -l shows the locals in Main(), one of them is a reference to the Object instance. The address of this instance in memory is 0x0278b2ec. We mentioned that the SyncBlockIndex is at offset -4, so if we want to dump the content of the SyncBlockIndex, we need to use address 0x0278b3e8 (3). The first 4 bytes of the memory dump is 98 80 bf 0e. It does not look like the hash value. What is misleading here is that the program is hosted by a processor using Little Endian representation – so the bytes of the 32 bit value are reversed. Now we are almost there, the only subtle difference is in the first digit, which is 0x0e instead of 0x02. This is because the upper 6 bits are flags. The upper 4 bits happen to be zeros, but the next 2 bits are switched on – this gives the 0x0E value. If we ignore these two bits, we get the expected 0x02.

Now we know that the hash value is stored in the SynvBlockIndex. But what about the original purpose of the SyncBlockIndex? What happens if we lock that Object instance?

The C# lock keyword is built on the Monitor class which is built on a Critical Section object. The Windows operating system has a Critical Section construct but the .NET CLR has its own implementation. A Critical Object is a quite simple structure describing things like the lock owner thread id, flags, etc. When an object gets locked in .NET, the CLR creates a new Critical Section structure. Critical Section structures are arranged in an array. The association between Critical Section and the locked instance is expressed by an array index held by the SyncBlockIndex. Again, when an instance is used in lock, its SyncBlockIndex gets an index to the Critical Section array.

But what happens with the stored hash value in this case? Will it be lost? No, of course not. It will be saved in a different location. The above mentioned array is not a simple Critical Section array. It is an array of broader structures called SyncBlocks. The name SyncBlock is not really descriptive, because it is not only about synchronization. This structure has more bins, one for the Critical Section structure and there is another one for the hash value.

syncblock

Depending whether an object is used in lock or not, the CLR stores hash in the SyncBlock structure or in the SyncBlokcIndex. Look at the following picture:

At first this picture is even fuzzier than the previous one, but following the numbers it can be understood. The program asks for a hash code. We saw it previously that this number gets stored in SyncBlockIndex. Then the program locks on the Object instance (1). It means SyncBlockIndex must be used as an index. With the help of SOS (2) we get the address of the Object instance in memory. Looking at the SyncBlockIndex area at -4 offset, we can see that its structure is different now. Apart from the flags in the most significant byte, it contains a simple one as a value (3). So the SyncBlockIndex now contains the index value one. If we list the SyncBlock array with SOS (4), we can find the memory area where SyncBlock structure is stored. Checking this memory area we can see the hash value (Little Endian encoded).

Conclusion

The internal mechanism of the default GetHashCode() implementation is not important to know for a developer. False assumptions might cause problems, however. If a developer assumes that default hash code is a good id for his instances or that the default implementation gives the same value for the same instance state (because it is calculated from instance fields) then his software will be incorrect. (ok, default implementation of hash code calculation for value types uses field content, not random, but this is a different story)

Original article in Hungarian: Mit ad vissza az Object.GetHashCode()?