Tech Interview – Refactoring III

Error

This video doesn’t exist

In the previous part we looked over the MainWindow class and found some improvement points. Those were common mistakes with simple fix. We skipped a big switch block, because it needed more work, and today we are going to refactor that part. We will not refactor it to its final form, though, first, we just make it better. In a second blog post, we will improve the code more.
The code to be improved is the following:

switch (sourceType)
{
    case "bin":
        {
            var parser = new BinaryParser();
            parser.Open(file);

            while (!parser.HasReachedEnd)
            {
                persons.Add(parser.GetPerson());
            } 

            parser.Close();
        } 
        break;

    case "xml":
        {
            var parser = new XmlParser(file);
            parser.StartParse();

            Person parsedPerson;
            while ((parsedPerson = parser.GetNextPerson()) != null)
            {
                persons.Add(parsedPerson);
            }

            parser.FinishParse();
        }
        break;
}

When we look at this code, what do we notice? We have two ‘case’-s with very similar structure, basically they do the same:

create an appropriate parser
tell the parsers to prepare for parsing
read the data until it is possible
tell the parsers that they are not needed anymore.

What we see is that we have two classes with very similar functionality but with different interfaces. It is a code smell from the “Refactoring” book, called “Alternative Classes with Different Interfaces”. We have the ugly switch because the parsers have different interfaces, so we need two different code to use them. If they had the same interface, it wouldn’t be necessary to duplicate the code. Fortunately, we have the source code of the parsers, so we can refactor them to have a common interface.

The common interface

To create a common interface, we need to decide what operations we need. The first two steps in both cases are creating the parser then calling a method which moves the parser into a working state. These are the Open() and the StartParse() methods. It’s a little bit like a two phase initialization, even if the method name is not Init(), so we can ask that why we need two phase initialization? Why can’t we set up everything in the constructor, so by the time it returns, the parser is in a working state?

We can tell a few examples when we may need two phase initialization, for instance if the object needs a lot of resource, like memory, or it needs a network connection, then it makes sense to move the object into a working state just before we want to use it. Or maybe the object has several configuration settings, so first we create it, then we set it up, and then we call the open or connect or other method, which moves the object into a working state. LikeStat2PhaseInit

The XmlParser and BinaryParser don’t have settings, but they use files, so we could say that they use heavy resource. On the other hand, the typical use case of these classes is that we create them, immediately parse the whole data file with them, then we discard them. We don’t plan to create a parser upfront and use it later, so it doesn’t hurt if we set the parser into a working state in the constructor, even if it involves opening a file. So for now, we get rid of the Open()and StartParse() methods, making our interface simpler.

Besides the Open()-like methods, both parsers have Close() like methods, a Close() and a FinishParse(). It makes sense, because the parsers work with files, and files must be closed when we don’t need them anymore, so we need something. But the .NET framework has a way to express if we don’t want to use an object anymore. And this is the Dispose() method of the IDisposable interface. Some classes wrap Dispose() in a method with a different name, because of other conventions, for example we have been closing files and network connections for decades, so we expect that a File class has a close() method even in .Net. In other cases we have some sort of symmetry, like when we open something, then we want to close it, or if we connect, then we will disconnect.

In case of our parsers, we’ve just gotten rid of the open(), so we don’t need close() just for the symmetry, also it is not customary to close() or stop() or disconnect() a parser, so we are happy with Dispose(). Also, probably we will use these classes with the .net “using” keyword, which calls the dispose() for us, so we don’t really need anything else.

The third part we need is to be able to read the data. Both classes have a GetPerson()-like method, although they work in a different way. The XmlParser returns a null if there is no more Person to be read, with the BinaryParser we need to ask it whether it has more data or not. So they need different type of loop to read all the data, and we need to decide which one will we use.

Let’s say that we choose to use a HasReachedEnd or HasNext property. In this case we still need to handle when no more data is available and the GetNextPerson() method gets called. We could throw an exception, I think this would be the correct implementation, that we force the caller to check whether there is more data or not, and we don’t allow to call GetNextPerson(), when it doesn’t make sense.

The other option is that we return null when GetNextPerson() is called and no more data is available. Returning a null is a little bit ugly though, the user of our class needs to check this case, and if they don’t do it, usually the application doesn’t blow up immediately, it carries around the null reference and will throw an exception later. So usually it is not a good practice to return null, this is why many would like to see (Non) Nullable reference types in C#.

On the other hand its customary to return null when we cannot return real data. Many developers expect this behavior, it is a very common practice that we put a read() or next() in a ‘while’ loop, and check for null value there, so the null value will break the loop.

We are going to choose the way developers like to use classes like ours, even if it is less elegant, so we are going to return null if there is no more data. In this case we don’t need HasReacedEnd or HasNext property, those would just double the functionality of detecting the end of the data.

The interface we need becomes very simple. We agreed that we fully initialize the object in the constructor, so we don’t need init() or open(). We also agreed that we release the resources, like the file we read, using the IDisposable interface, so we don’t need close() or similar. The only thing we need is a GetNext() method.

interface IParser : IDisposable
{
    Person GetNext();
}

One may call the GetNext() method GetNextPerson() or NextPerson(), but it is not needed to use the “Person” word in the method name. The only thing we can return is a Person instance, so it is obvious that the Next thing we are requesting is a Person. GetNext() or Next() is perfectly fine.

The IParser name maybe too general, so it wouldn’t be good in a big application, but in our small program it works well, it is clear what parsers we are talking about.

Refactoring the parsers

The refactored parsers look like the following:

internal class XmlParser : IParser
{
    private static readonly string ItemTitle = "Person";
 
    private string path;
    private XmlReader xmlReader;
    private XmlSerializer xmlSerializer;
 
    public XmlParser(string path)
    {
        this.path = path;
 
        this.xmlReader = XmlReader.Create(this.path);
        this.xmlReader.MoveToContent();
        this.xmlReader.ReadToDescendant(XmlParser.ItemTitle);
 
        this.xmlSerializer = new XmlSerializer(typeof(Person));
    }
 
    public void Dispose()
    {
        this.xmlReader.Close();
    }
 
    public Person GetNext()
    {
        return this.xmlSerializer.Deserialize(this.xmlReader) as Person;
    }
}

The binary parser is a bit more complex:

internal class BinaryParser : IParser
{
    private FileStream fileStream;
    private BinaryReader binaryReader;
 
    internal BinaryParser(string path)
    {
        this.fileStream = File.OpenRead(path);
        this.binaryReader = new BinaryReader(fileStream);
    }
 
    public void Dispose()
    {
        this.binaryReader.Close();
    }
 
    public Person GetNext()
    {
        if (this.HasReachedEnd)
        {
            return null;
        }
 
        return GetPerson();
    }
 
    private Person GetPerson()
    {
        var result = new Person
        {
            LikedPersons = new List<string>()
        };
 
        result.Name = this.binaryReader.ReadString();
 
        var contactCount = this.binaryReader.ReadInt32();
        for (var i = 0; i < contactCount; i++)
        {
            result.LikedPersons.Add(
                this.binaryReader.ReadString());
        }
 
        return result;
    }
 
    private bool HasReachedEnd
    {
        get
        {
            return this.binaryReader.PeekChar() < 0;
        }
    }
}

The two parsers are not complete yet, for example if we try to read a disposed instance, we will get error from the lower layer.

Refactoring the switch

As a first step, we can transform the switch to the following form:

var filesToProcess = this.GetFileList();
var sourceType = this.GetSelectedSourceType();
 
var persons = new List<Person>();
 
foreach (var file in filesToProcess)
{
    IParser parser;
 
    switch (sourceType)
    {
        case "bin": parser = new BinaryParser(file); break;
        case "xml": parser = new XmlParser(file); break;
 
        default:
            throw new NotSupportedException(
                string.Format("Not supported format: {0}", sourceType));
    }
 
    using (parser)
    {
        Person person;
 
        while ((person = parser.GetNext()) != null)
        {
            persons.Add(person);
        }
    }
}

This code looks much better than the original, but we still have that switch. The role of the switch is clear however, its only purpose is that according to a type information, it creates an appropriate parser. It may sound familiar, the Parameterized Factory Method, which is a variant of the Factory Method Design Pattern, does the same. If we have a factory method, we can write the following:

foreach (var file in filesToProcess)
{
    using (var parser = ParserFactory.Create(file, sourceType))
    {
        Person person;
 
        while ((person = parser.GetNext()) != null)
        {
            persons.Add(person);
        }
    }
}

The switch seems to be disappeared, but actually that switch just moved to the implementation of the factory method:

internal static class ParserFactory
{
    internal static IParser Create(string path, string sourceType)
    {
        switch (sourceType)
        {
            case "bin": return new BinaryParser(path);
            case "xml": return new XmlParser(path);
 
            default:
                throw new NotSupportedException(
                    string.Format("Not supported format: {0}", sourceType));
        }
    }
}

In the next part…

The code after the refactor looks quite good, but it can be even better. Its hard to notice the issue in this solution because it is so ubiquitous, that we have gotten used to it and it looks normal. But the parsers violate the Single Responsibility design principle, because they do two things. They open the data source, and process the data. These two things can change independently. Now the datasource is always file, later it could be a network source.
Today we have two parsers, soon we are going to create a third one, and if we need to switch from file to network, we need to make the same modification at three different places. This shows that something is not right with the responsibilities. So in the next post, we are going to improve the code further.