Expression Tree Based Activators

In the previous part, in the convention based factory, we used the Activator class CreateInstance() method to create a parser dynamically using a type infomation:

activator = 
  stream => Activator.CreateInstance(
              parserType,
              BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Instance,
              null,
              new object[] { stream },
              CultureInfo.CurrentCulture) as IEnumerator<Person>;

This solution works well in our example, but there is a faster way to create class instances than using Activator.

Lambda Expressions and Anonymous Methods

If we know the type of the class we want to create, it is simple to build activators:

public class XmlParser : IEnumerator<Person>
{
    public XmlParser(Stream stream) ...
}

public class BinParser : IEnumerator<Person>
{
    public BinParser(Stream stream) ...
}

...

Func<Stream, IEnumerator<Person>> activator_1 = s => new XmlParser(s);
Func<Stream, IEnumerator<Person>> activator_2 = s => new BinParser(s);

...

var parser = activator_1(stream);

Now we can pass around activator_1 or activator_2, the caller of the activator doesn’t need to know how and what specific type will be created.

In the example above the activator is implemented using a Lambda Expression. This expression will be transformed by the compiler into a method with a generated name. The result will be more or less the following – given that the program lines were written in a method called “Main”:

static IEnumerator<Person> <Main>b_0(Stream s)
{
    return new XmlParser(s);
}

...

Func<Stream, IEnumerator<Person>> activator_1 = 
  new Func<Stream, IEnumerator<Person>>(<Main>b_0);

...

var parser = activator_1(stream);

Now if we want to use lambda expressions in the ParserFactoryConfig to create activators like above, we will have a problem, because we do not know the concrete types in advance, so we do not know what to write after the ‘new’ keyword. The best would be if we could put types after the ‘new’ keyword runtime.

Expression Trees

Fortunately, this all is possible using Expression Trees. What is an Expression Tree? First, let’s notice that although I almost always use “var” for variable declaration, I didn’t do that for the activators. Would the following line compile?

var activator_1 = s => new XmlParser(s);

It would not, for a reason. The right side is a lambda expression, the compiler had to figure out the type of the variable on the left side from that. Why can’t it do this? Because lambda expressions don’t have a type. They can be converted to a compatible delegate type, like ‘Func<Stream, IEnumerator>’ or – and this is the interesting part for us – to an expression tree. It cannot decide which one we want, unless we tell it, specifying the type of the left side:

Func<Stream, IEnumerator<Person>> activator_1 = s => new XmlParser(s);

or

Expression<Func<Stream, IEnumerator<Person>>> activator_1 = s => new XmlParser(s);

We have already seen how the first version will be compiled. That will result an ordinary method with a slightly strange name like b_0. The second version is much more interesting. That will be an IL code which does something like the following:

ParameterExpression parameter = Expression.Parameter(typeof(Stream), "s");
ConstructorInfo construtorInfo = typeof(XmlParser).GetConstructor(...)
 
Expression<Func<Stream, IEnumerator<Person>>> activator_1 = 
    Expression.Lambda<Func<Stream, IEnumerator<Person>>>(
        Expression.New(
            construtorInfo , 
            new Expression[] { parameter }), 
        new ParameterExpression[] { parameter  });

What do we see? The compiler generated a code, which builds a data structure based on the Expression class. This data structure represents the lambda expression we wrote in C#. Because we wrote a lambda expression, the outmost call is an Expression.Lambda(). Our original lamda expression has a parameter and a body. The parameter became the second argument of the Expression.Lambda() call, and the body is the first argument. The body is created with the Expression.New() call, which gets the type information by a ConstructorInfo class, and the parameter for the constructor. A complex case would embed more calls into each other, and if we drew the diagram of the call sequence, it would result a tree like structure, hence the name of the expression tree. Our simple case is not really a tree, but the data structure created by the calls above looks a bit similar:

ExprTree

Benefits of Expression Trees

We can win two things using Expression Trees. First, we can dynamically put them together, using any ConstructorInfo with Expression.New(). Second, Expression Trees can be compiled runtime, and the compiled code will be really fast. We can write the following to get the compiled version of the activator:

Func<Stream, IEnumerator<Person>> activatorAsDelegate = 
  activator_1.Compile() as Func<Stream, IEnumerator<Person>>;

Now it can be used to create a parser:

var parser = activatorAsDelegate(stream);

Using this trick, we can make our Factory faster, if we modify the activator creation in the ParserFactoryConfig class TryCreateActivator method.

Tech Interview – Refactoring V – Convention Based Factory

In the previous part we fixed the code which reads the Person instances from the datafiles. A part of this code was a Factory class based on a switch statement. This solution is ok, but when we add a new parser, we need to remember to add a new case block to the switch statement. It is not really an issue in our example, probably it wouldn’t be a weekly event to add new and new parsers, but just for fun, we can try to make this process automatic.

Convention over Configuration

Around a decade ago a design paradigm started getting popular, called “Convention over Configuration” or sometimes they call it “Coding by Convention”, which name is a better fit to our case.

Probably most developers have edited xml files or even sourcecode to add mappings which described which objects should work together or which property should be used in which case. That is quite boring, easy to forget or mistype, and many times these mappings are obvious, they can be matched by names for example. Instead of the manual configuration, using naming rules, or conventions, we can let the program code make the pairing. This is what we will be doing today.

If we want a convention based factory, we need to come up with a convention first. We have these types with these parsers:

  • “xml” – XmlParser
  • “bin” – BinaryParser

The easiest way to find a good rule, if we rename the binaryParser to binParser, because in this case the sourceType + the “Parser” word gives the name of the class we need to use for a given SourceType.

Finding the parsers

Our application is only a single assembly, so we don’t need complex logic for finding parsers. We just take the executing assembly and check all classes for name matching.

...
static readonly string pattern = "^(?<format>.*)Parser$";
static readonly Regex convention = new Regex(pattern, RegexOptions.Compiled);
...
 
// take all types from the currently running assembly
foreach (var type in Assembly.GetExecutingAssembly().GetTypes())
{
    // check for name convention
    var match = convention.Match(type.Name);
 
    if (match.Success)
    {
        // found a parser that supports a given source type
        var sourceType = match.Groups[1].Value;
        ...
    }
}

The parsers are built the way that they need a Stream for their constructors, so we are interested only those classes:

var construtorInfo = 
        type.GetConstructor(
                BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Instance, 
                Type.DefaultBinder, 
                new Type[] { typeof(Stream) }, 
                null);
 
if (construtorInfo != null)
{
  ...
}

Using the constructorInfo, it is possible to create an appropriate parser, but we don’t need it right now. We need it later when we have a Stream to parse. So let’s store the constructorInfo for later use:

static Dictionary<string,Type> typeMap = 
          new Dictionary<string,Type>(StringComparer.OrdinalIgnoreCase);
...
 
if (construtorInfo != null)
{
    typeMap[sourceType] = type;
}

Now the Create method of the Factory class can be implemented the following way:

internal static IEnumerator<Person> Create(string formatType, Stream source)
{
    var instance = Activator.CreateInstance(
                        typeMap[formatType], 
                        BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Instance, 
                        null,
                        new object[] {source},
                        CultureInfo.CurrentCulture);
 
    return instance as IEnumerator<Person>;
}

Who will run the configuration?

Now we have a method which searches for parsers, and we modified the Create() method of the factory. But who will run the search? We wanted to have the factory search for parsers automatically. We know that a static constructor of a class runs before we use the class, so it is a good place to put the search there. A first version of the new factory:

internal static class EnumeratorFactory
{
    static readonly string pattern = "^(?<format>.*)Enumerator$";
    static readonly Regex convention = new Regex(pattern, RegexOptions.Compiled);
 
    static Dictionary<string,Type> typeMap = 
             new Dictionary<string,Type>(StringComparer.OrdinalIgnoreCase);
 
    static EnumeratorFactory()
    {
        foreach (var type in Assembly.GetExecutingAssembly().GetTypes())
        {
            var match = convention.Match(type.Name);
 
            if (match.Success)
            {
                var sourceType = match.Groups[1].Value;
 
                var construtorInfo =
                        type.GetConstructor(
                            BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Instance,
                            Type.DefaultBinder,
                            new Type[] { typeof(Stream) },
                            null);
 
                if (construtorInfo != null)
                {
                    typeMap[sourceType] = type;
                }
            }
        }
    }
 
    internal static IEnumerator<Person> Create(string formatType, Stream source)
    {
        var instance = 
                Activator.CreateInstance(
                    typeMap[formatType],
                    BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Instance,
                    null,
                    new object[] { source },
                    CultureInfo.CurrentCulture);
 
        return instance as IEnumerator<Person>;
    }
}

This code works, but can be improved in several ways:

Single Responsibility

The new factory class violates the Single Responsibility Principle. It searches for parsers and create parsers, these are two different things, and should be implemented in two different classes.

Complex method

A static constructor of the new factory is quite complex. It is rarely not a bad sign if a loop has several lines within it, in a loop we usually work on a different abstraction level which should be in its own method. In the code above the loop has an if statement which also has an if statement in it, so we jump through several levels and concepts within a single method.

Implementation details in Create()

The Create() method knows too much low level details about how to create a parser. Yes, this is the job of the create method, to create parser instances. But only in that sense that it should know which type needs which parser. It should not know about ConstructorInfos and Activator class. Its only responsibility should be the following:

internal static class ParserFactory
{
    static readonly Dictionary<string, Func<Stream, IEnumerator<Person>>> activators = 
          ParserFactoryConfig.GetActivatorMap();

    internal static IEnumerator<Person> Create(Stream source, string sourceType)
    {
        Func<Stream, IEnumerator<Person>> activator;

        if (activators.TryGetValue(sourceType, out activator))
        {
            return activator(source);
        }
        else
        {
            throw new NotSupportedException(
                          string.Format("Not supported format: {0}", sourceType));
        }
    }        
}

The video version of this article explains how to build the ParserFactoryConfig, the final code looks like the following:

internal static class ParserFactoryConfig
{
    static readonly string pattern = @"^(?<format>.+)Parser$";
    static readonly Regex convention = new Regex(pattern, RegexOptions.Compiled);

    internal static Dictionary<string, Func<Stream, IEnumerator<Person>>> GetActivatorMap()
    {
        var result = new Dictionary<string, Func<Stream, IEnumerator<Person>>>(StringComparer.OrdinalIgnoreCase);

        foreach (var type in GetCandidateTypes())
        {
            var formatType = default(string);
            var activator = default(Func<Stream, IEnumerator<Person>>);

            if (TryCreateActivatorByConvention(type, out formatType, out activator))
            {
                result[formatType] = activator;
            }
        }

        return result;
    }

    private static bool TryCreateActivatorByConvention(Type type, out string formatType, out Func<Stream, IEnumerator<Person>> activator)
    {
        formatType = default(string);
        activator = default(Func<Stream, IEnumerator<Person>>);
        if (TryMatchConvention(type, out formatType))
        {
            if (TryCreateActivator(type, out activator))
            {
                return true;
            }
        }

        return false;
    }

    private static bool TryCreateActivator(Type type, out Func<Stream, IEnumerator<Person>> activator)
    {
        activator = null;

        var constructorInfo = default(ConstructorInfo);
        if (TryGetConstructorInfo(type, out constructorInfo))
        {
            var parserType = type;
            activator = stream => Activator.CreateInstance(
                                                parserType,
                                                BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Instance,
                                                null,
                                                new object[] { stream },
                                                CultureInfo.CurrentCulture
                                    ) as IEnumerator<Person>;
        }

        return activator != null;
    }

    private static bool TryMatchConvention(Type type, out string formatType)
    {
        var match = convention.Match(type.Name);

        if (match.Success)
        {
            formatType = match.Groups[1].Value;
        }
        else
        {
            formatType = null;
        }

        return match.Success;
    }

    private static bool TryGetConstructorInfo(Type type, out ConstructorInfo constructorInfo)
    {
        constructorInfo = type.GetConstructor(
                                    BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Instance,
                                    Type.DefaultBinder,
                                    new Type[] { typeof(Stream) },
                                    null);

        return constructorInfo != null;
    }

    private static IEnumerable<Type> GetCandidateTypes()
    {
        return Assembly.GetExecutingAssembly().GetTypes().Where(t => typeof(IEnumerator<Person>).IsAssignableFrom(t));
    }
}