LINQ: Deferred Execution

This is part of a short series on the basics of LINQ:

Part One – Clarity, complexity, and understanding
Part Two – Deferred Execution
Part Three – Understanding Your Query Chain
Part Four – Notation, Syntax, and Snags

In the first ~~rant~~ post of this short series on LINQ, I explained the motivation behind writing this series in the first place, which can be summarised as:

People don't know LINQ and that impacts my ability to make use of it; I should try to fix that.

To start, I'm going to explain what I believe is the most important concept in LINQ; deferred execution.

So, what is deferred execution?

Deferred execution code is not executed until the result is needed; the execution is put off (deferred) until later. By doing this we can combine a series of actions without actually executing any of them, then execute them at the time we need a result. This allows us to limit the execution of computationally expensive operations until we absolutely need them.

That's my description, here is one from an MSDN tutorial on LINQ-to-XML that perhaps puts it more clearly:

Deferred execution means that the evaluation of an expression is delayed until its realized value is actually required. Deferred execution can greatly improve performance when you have to manipulate large data collections, especially in programs that contain a series of chained queries or manipulations. In the best case, deferred execution enables only a single iteration through the source collection.

Some may be surprised to know that deferred execution was not new when LINQ arrived, it had already been around for quite some time in the form of iterator methods. In fact, it is iterator methods that give LINQ its deferred execution. Before we look at an iterator method, let's look at an example of immediate execution. For this example, we will give ourselves the task of taking a collection of people and outputting a collection of unique last names for all those born before 1980.

public struct Person
{
    public Person(string first, string last, DateTimeOffset dateOfBirth) : this()
    {
        FirstName = first;
        LastName = last;
        DateOfBirth = dateOfBirth;
    }
	
    public string FirstName { get; private set; }
    public string LastName { get; private set; }
    public DateTimeOffset DateOfBirth { get; private set; }
}

public static class Data
{
    public static IEnumerable<Person> People = new[] {
        new Person("John", "Smith", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 34 )),
        new Person("Bill", "Smith", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 20 )),
        new Person("Sarah", "Allans", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 19 )),
        new Person("John", "Johnson", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 44 )),
        new Person("James", "Jones", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 78 )),
        new Person("Alex", "Jones", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 30 )),
        new Person("Mabel", "Thomas", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 29 )),
        new Person("Sarah", "Brown", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 23 )),
        new Person("Gareth", "Smythe", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 100 )),
        new Person("Gregory", "Drake", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 90 )),
        new Person("Michael", "Johnson", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 56 )),
        new Person("Alex", "Smith", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 22 )),
        new Person("William", "Pickwick", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 17 )),
        new Person("Marcy", "Dickens", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 18 )),
        new Person("Erica", "Waters", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 26 ))
    };
}

IEnumerable<string> GetLastNamesImmediate()
{
    var distinctNamesToReturn = new List<string>();
    foreach (var person in Data.People)
    {
        if (person.DateOfBirth.Year < 1980 && !distinctNamesToReturn.Contains(person.LastName))
        {
            distinctNamesToReturn.Add(person.LastName);
        }
    }
    return distinctNamesToReturn;
}

When this method is called, it iterates over the entire collection and then returns its result, which also can then be iterated. If the collection of people were huge and we only cared about the first five names, this would be incredibly slow. To turn this into deferred execution, we can write it like this:

IEnumerable<string> GetLastNamesLazyDeferredWithoutLINQ()
{
    var lastNamesWeHaveSeen = new List<string>();
    foreach (var person in Data.People)
    {
        if (person.DateOfBirth.Year < 1980 && !lastNamesWeHaveSeen.Contains(person.LastName))
        {
            lastNamesWeHaveSeen.Add(person.LastName);
            yield return person.LastName;
        }
    }
}

Now, none of the code in this method is executed until the first time something calls `MoveNext()` on the returned enumerable. This means we could take the first five names without processing the entire collection of people, giving us a potentially enormous performance gain. If each item in the generated collection were computationally expensive to produce, without lazy evaluation, that expense would be multiplied by the total number of items in the collection on every call to the generator method; however, with lazy evaluation, the consumer of the collection gets to decide how many items are computed and therefore, how much work gets done. This ability to defer and control computationally expensive operations is the power of deferred execution.

However, not every deferred action necessarily has low overhead. Deferred execution actually comes in two flavours; eager evaluation and lazy evaluation (the example above is an example of lazy evaluation)¹. Every action in a deferred execution chain uses either lazy or eager evaluation. Though lazy evaluation is preferred, sometimes it is not possible to evaluate one item at a time, such as when sorting. Eagerly evaluated deferred execution allows us to at least defer the effort until we want it done.

An eagerly evaluated version of the iterator method we have been looking at might look like this:

IEnumerable<string> GetLastNamesEagerDeferredWithoutLINQ()
{
    var distinctNamesToReturn = new List<string>();
    foreach (var person in Data.People)
    {
        if (person.DateOfBirth.Year < 1980 && !distinctNamesToReturn.Contains(person.LastName))
        {
            distinctNamesToReturn.Add(person.LastName);
        }
    }
	
    foreach (var name in distinctNamesToReturn)
    {
        yield return name;
    }
}

In this example, because it is still an iterator method (it returns its results using `yield return`), none of the code is executed until the very first time the `MoveNext()` method is called on the returned enumerable, and therefore, the execution is deferred. When `MoveNext()` gets called for the first time, the entire collection of data is processed at once and then the results are output one by one as needed. The difference between this and the immediate execution equivalent we first looked at is that in this version, no work is done until a result is demanded.

Allowing the consumer of a collection to control how much work is done rather than work being dictated by the collection generator allows us to manage data more efficiently by building chains of operations and then processing the result in one go when needed. Lazy evaluation gives us the additional ability to spread the effort across each call to `MoveNext()`. The key to writing good LINQ is understanding which actions are immediate, which are deferred and lazily evaluated, which are deferred and eagerly evaluated, and why it matters. We will take a look at that next time.

Quite often, people use 'deferred execution' and 'lazy evaluation' interchangeably, but they are not actually synonymous, nor is 'immediate execution' synonymous with 'eager evaluation'. [↩]

7 thoughts on “LINQ: Deferred Execution”

Steve says:

June 3, 2015 at 09:51
I just read a blog post about HashSet and this seemed a nice fit to make your eager deferred execution method lazy. Nice post.
```
public IEnumerable GetLastNamesLazyDeferred()
{
    var hash = new HashSet();
    foreach (var person in _people)
    {
        if (person.DateOfBirth.Year < 1980)
        {
            if (hash.Add(person.LastName))
                yield return person.LastName;
        }
     }
}
```
1. Jeff says:
  
  June 3, 2015 at 10:02
  
  FTFY, sorry for the annoying code formatting in comments. I'll look at improving that somehow. In the meantime, you can use <pre> for multiline code examples instead of <code> and it works.
  
  Thanks for your comment. You are correct, this can be written as a lazily evaluated method; my example was merely to demonstrate eager evaluation in deferred execution (there's also a lazy evaluation implementation in the post). As for being able to convert eager evaluation into lazy evaluation, that would only be possible in some situations, not all. For example, sorting cannot be implemented with lazy evaluation.
  
Pingback: Automate the Planet
Pingback: LINQ: Understanding Your Query Chain | somewhat abstract
Pingback: LINQ: Clarity, complexity, and understanding | somewhat abstract
Pingback: LINQ: Notation, Syntax, and Snags | somewhat abstract
David Mercer says:

July 14, 2015 at 23:18

Great explanation of the different evaluation types of deferred execution: lazy vs. eager. Thanks!

Related Posts

7 thoughts on “LINQ: Deferred Execution”

Leave a Reply Cancel reply