LINQ: Understanding Your Query Chain

This is part of a short series on the basics of LINQ:

This is the third part in my small series on LINQ and it covers what I feel is the most important thing to understand when using LINQ, query chains. We are going to build on the deferred execution concepts discussed in the last entry and look at why it is important to know your query operations.

Each method in a LINQ query is either immediately executed or deferred. When deferred, a method is either lazily evaluated one element at a time or eagerly evaluated as the entire collection. Usually, you can determine which is which from the documentation or, if that fails, a little experimentation. Why does it matter? This question from StackOverflow provides us with an example:

For those that did not read it or do not understand the problem, let me summarize. The original poster had a problem where values they had obtained from a LINQ query result, when passed into the `Except()` method on that same query, did not actually exclude anything. It was as if they had taken the sequence `1,2,3,4`, called `Exclude(2)`, and that had returned `1,2,3,4` instead of the expected `1,3,4`. On the surface, the code looked like it should work, so what was going on? To explain, we need a little more detail.

The example code has a class that described a user. An XML file contained user details and this is loaded into a sequence of `User` instances using LINQ-to-XML.

IEnumerable<IUser> users = doc.Element("Users").Elements("User").Select
        (u => new User
            { ID = (int)u.Attribute("id")
              Name = (string)u.Attribute("name")
            }
        ).OfType<IUser>();       //still a query, objects have not been materialized

As noted in the commentary, the poster understood that at this point, the query is not yet evaluated. With their query ready to be iterated, they use it to determine which users should be excluded using a different query.

public static IEnumerable<IUser> GetMatchingUsers(IEnumerable<IUser> users)
{
     IEnumerable<IUser> localList = new List<User>
     {
        new User{ ID=4, Name="James"},
        new User{ ID=5, Name="Tom"}
     }.OfType<IUser>();
     var matches = from u in users
                   join lu in localList
                       on u.ID equals lu.ID
                   select u;
     return matches;
}

And then using those results, the originally loaded list of users is filtered.

var excludes = users.Except(matches);    // excludes should contain 6 users but here it contains 8 users

Now, it is clear this code is not perfect and we could rewrite it to function without so many LINQ queries (I will give an example of that later in this post), but we do not care about the elegance of the solution; we are using this code as an example of why it is important to understand what a LINQ query is doing.

As noted in the commentary, when the line declaring the initial `users` query is executed, the query it defines has not. The query does not actually become a real list of users until it gets consumed1. Where does that happen? Go on, guess.

If you guessed `GetMatchingUsers`, you are wrong. All that method does is build an additional level of querying off the initial query and return that new query. If you guessed the `Except()` method, that's wrong too, because `Except()` is also deferred. In fact, the example only implies that something eventually looks at the results of `Except()` and as such, the query is never evaluated. So, for us to continue, let's assume that after the `excludes` variable (containing yet another unexecuted query), we have some code like this to consume the results of the query:

foreach (var user in excludes)
{
   Console.WriteLine(user.ID);
}

By iterating over `excludes`,  the query is executed and gives us some results. Now that we are looking at the query results, what happens?

First, the `Except()` method takes the very first element from the `users` query, which in turn, takes the very first `User` element from the XML document and turns it into a `User` instance. This instance is then cast to `IUser` using `OfType`2.

Next, the `Except()` method takes each of the elements in the `matches` query result and compares it to the item just retrieved from the `users` collection. This means the entire `matches` query is turned into a concrete list. This causes the `users` query to be reevaluated to extract the matched users. The instances of `User` created from the `matches` query are compared with each instance from the `users` query and the ones that do not match are returned for us to output to the console.

It seems like it should work, but it does not, and the key to why is in how queries and, more importantly, deferred execution work.

Each evaluation of a deferred query is unique. It is very important to remember this when using LINQ. If there is one thing to take away from reading my blog today, it is this. In fact, it's so important, I'll repeat it:

Each evaluation of a deferred query is unique.

It is important because it means that each evaluation of a deferred query (in most cases) results in an entirely new sequence with entirely new items. Consider the following iterator method:

IEnumerable<object> GetObject()
{
    yield return new Object();
}

It returns an enumerable that upon being consumed will produce a single `Object` instance. If we had a variable of the result of `GetObject()`, such as `var obj  = GetObject()` and then iterated `obj` several times, each time would give us a different `Object` instance. They would not match because on each iteration, the deferred execution is reevaluated.

If we go back to the question from StackOverflow armed with this knowledge, we can identify that `users` is evaluated twice by the `Except()` call. One time to get the list of exceptions out of the `matches` query and another to process the list that is being filtered. It is the equivalent of this:

IEnumerable<object> GetObjects()
{
    return new[]
    {
        new Object(),
        new Object(),
        new Object(),
        new Object(),
        new Object()
    };
}

void main()
{
   var objects = GetObjects().Except(GetObjects());
   foreach (var o in objects)
   {
        Console.WriteLine(o);
   }
}

From this code, we would never expect `objects` to contain nothing since the two calls to the immediately executed `GetObjects` would return collections of completely different instances. When execution is deferred, we get the same effect; each evaluation of a query is as if it were a separate method call.

To fix this problem, we need to make sure our query is executed once to make the results "concrete", then use those concrete results to do the rest of the work. This is not only important to ensure that the objects being manipulated are the same in all uses of the queried data, but also to ensure that we don't do work more than once3. To make the query concrete, we call an immediately executed method such as `ToList()`, evaluating the query and capturing its results in a collection.

This is our solution, as the original poster of our StackOverflow question indicated. If we change the original `users` query to be evaluated and stored, everything works as it should. With the help of some investigation and knowledge of how LINQ works, we now also know why.

Now that we understand a little more about LINQ we can consider how we might rewrite the original poster's example code. For example, we really should not to iterate the `users` list twice at all; we should see the failure of `Except()` as a code smell that we are iterating the collection too often. Though making it concrete with `ToList()` fixes the bug, it does not fix this inefficiency.

To do that, we can rewrite it to something like this:

public static bool IsMatchingUser(User user)
{
     var localList = new List<User> {
        new User{ ID=4, Name="James"},
        new User{ ID=5, Name="Tom"} }
      .Cast<IUser>()
      .AsReadOnly();

    return localList.Any(u=>u.ID == user.ID && u.Name == user.Name);
}

var users = doc
    .Element("Users")
    .Elements("User")
    .Select(u => new User {
        ID = (int)u.Attribute("id")
        Name = (string)u.Attribute("name") })
    .Where(u => IsMatchingUser(u));

This update only iterates over each user once, resulting in a collection that excludes the users we don't want4.

In conclusion…

My intention here was to show why it is fundamental to know which methods are immediately executed, which ones are deferred, and whether those deferred methods are lazily or eagerly evaluated. At the end of this post are some examples of each kind of LINQ method, but a good rule of thumb is that if the method returns a type other than `IEnumerable` or `IQueryable` (e.g. `int` or `List`), it is immediately executed; all other cases are probably using deferred execution. If a method does use deferred execution, it is also helpful to know which ones iterate the entire collection every time and which ones stop iterating when they have their answer, but for this you will need to consult documentation and possibly experiment with some code.

Just knowing these different types of methods can be a part of your query will often be enough to help you write better LINQ and debug queries faster.  By knowing your LINQ methods, you can improve performance and reduce memory overhead, especially when working with large data sets and slow network resources. Without this knowledge, you are likely to evaluate queries and iterate sequences too often, and instantiate objects too many times.

Hopefully you were able to follow this post and it has helped you get a better grasp on LINQ. In the final post of this series, I will ease up on the deep code analysis, and look at query syntax versus dot notation (aka fluent notation). In the meantime, if you have any comments, I'd love to read them.

Examples of LINQ Method Varieties

Immediate Execution

`Count()`, `Last()`, `ToList()`, `ToDictionary()`, `Max()`, `Aggregate()`
Immediate iterate the entire collection every time

`Any()`, `All()`, `First()`, `Single()`
Iterate the collection until a condition is or is not met

Deferred Execution, Eager Evaluation

`Distinct()`, `OrderBy()`, `GroupBy()`
Iterate the entire collection but only when the query is evaluated

Deferred Execution, Lazy Evaluation

`Select()`,`Where()`
Iterate the entire collection

`Take()`,`Skip()`
Iterate until the specified count is reached

  1. "consumed" is often used as an alternative to "iterated" []
  2. `Cast()` should have been used here since all the objects loaded are of the same type []
  3. This is something that becomes very important when working with large queries that can be time or resource consuming to run []
  4. with more effort, I am certain there are more things that could be done to improve this code, but we're not here for that, so I'll leave is as an exercise for you; I'm generous like that []

Testing AngularJS: Directives

So far in this series of AngularJS-related posts, we have looked at some utility factories for tracking web requests and preventing navigation requests, whether in-app or not, and how to write tests for those factories using Jasmine. In this post, we will look at how I generally test directives and how that impacts their structure.

Directives

Directives are the things we write in AngularJS that allow us to extend and modify the HTML language. We provide a template, some code-behind, and data binding, and AngularJS performs its magic to create a new behavior or control that is inserted via tag, attribute or CSS class. However, directives can at first seem difficult to test.

Let's look at a convoluted example.

sa.directive('saEditableField', function () {
    return {
        restrict: 'E',
		require: 'ngModel',
        template: '<label>{{fieldName}}</label><input type="text" ng-model="fieldValue" /><button ng-click="showField()">Alert!</button>',
        scope: {
            fieldName: '@saFieldName',
            fieldValue: '=ngModel'
        },
        controller: function($scope) {
            $scope.showField = function() {
                window.alert("Your " + $scope.fieldName + " is " + $scope.fieldValue);
            };
        }
    };
});

In this example we have a very simple directive. It specifies a template, some scope, and a controller where the business logic lives. It is used like this:

<sa-editable-field sa-field-name="First Name" ng-model="fieldValue"></sa-editable-field>

As it stands, there are a one or two things that either make this difficult to test or (I feel) could be clearer1. Let's refactor this directive with testing in mind.

Tests

When I have a working directive that I want to test, I tend to start writing tests. I realise this seems obvious, but it is worth noting as I do not tend to do much refactoring before writing tests and I don't tend to write tests before I have something to test. I find it much more useful to get a concept working and then think about what tests should be there and how I can implement those tests. When I find friction in authoring those tests, I identify my refactoring priorities.

Considering the directive presented above, what tests do we need? Here is the list of things that I think should be tested:

  • That the directive exists
  • That the directive compiles
  • That the alert shows when the button is clicked
  • That the alert mentions the field name
  • That the alert mentions the field value

Testing that the directive exists is easy (note that we have to add Directive to the end for the directive to be injected).

describe 'exists', ->
  When => inject (saEditableFieldDirective) =>
    @saEditableField = saEditableFieldDirective
  Then => expect(@saEditableField).toBeDefined()

Checking it compiles is a little more convoluted, but not difficult.

describe 'compiles', ->
  Given => inject ($rootScope) =>
    @scope = $rootScope.$new()
    @scope.field = 'test'
    @htmlFixture = angular.element('<sa-editable-field sa-field-name="Test" ng-model="field"></sa-editable-field>');
  When => inject ($compile) =>
    $compile(@htmlFixture)(@scope)
    @scope.$apply()
  Then => @htmlFixture.find('label').length == 1
  And => @htmlFixture.find('label').text() == 'Test'
  And => @htmlFixture.find('input').length == 1
  And => @htmlFixture.find('button').length == 1

The linking function that $compile returns is called because  $compile can succeed even if our fixture has a typo.

Using knowledge of our directive template, we can verify that the linking worked without going so far as to validate every aspect of the magic Angular does for us. In fact, since this is really intended to test the code that gets called inside the directive when it is linked to a scope, the template for the directive could be as simple as <div></div> (we will take a closer look in a future post at how we can make the directive template replaceable).

Testing the Alert

Testing that the directive exists, compiles and links is relatively straightforward when compared with testing that the alert is shown and shows the right thing. In order to test the alert, we need to be able to invoke showField, the method that shows it. We also need to check that showField actually shows the alert as we would like.

To invoke showField, we might try using the scope that is passed to the link method returned from $compile. Since the method is added to the scope in the controller, this should work, right? Of course not; this directive has an isolate scope and as such the method has been added to its own copy of the passed scope. We can get to that isolate scope from the compiled element using the isolateScope() method;

describe '#showField', ->
  Given => inject ($rootScope) =>
    @scope = $rootScope.$new()
    @scope.field = 'test'
    @htmlFixture = angular.element('<sa-editable-field sa-field-name="Test" ng-model="field"></sa-editable-field>');
  When => inject ($compile) =>
    $compile(@htmlFixture)(@scope)
    @scope.$apply()
  Then => expect(@htmlFixture.isolateScope().showFi

…but I feel like such a test involves too much setup ceremony just to get to the test2 and it relies on too many things outside of just the bit we want to validate. Instead, what if we could get at the controller independently of the directive?

sa.directive('saEditableField', function () {
    return {
        restrict: 'E',
		require: 'ngModel',
        template: '<label>{{fieldName}}</label><input type="text" ng-model="fieldValue" /><button ng-click="showField()">Alert!</button>',
        scope: {
            fieldName: '@saFieldName',
            fieldValue: '=ngModel'
        },
        controller: 'saFieldController'
    };
});

sa.controller('saFieldController', ['$scope', function($scope) {
    $scope.showField = function() {
        window.alert("Your " + $scope.fieldName + " is " + $scope.fieldValue);
    };
}]);

Now our controller is completely separate from the directive, which refers to the controller by name. This means we can write what I feel are clearer directive tests. Using a fake for the controller, we isolate the directive from the real thing and any side-effects it may have;

describe 'saEditableField', ->
  Given -> module ($provide) ->
    fakeFieldController = jasmine.createSpyObj 'saFieldController', ['showField']
    $provide.value 'saFieldController', fakeFieldController;return
  
  describe 'exists', ->
    When => inject (saEditableFieldDirective) =>
      @saEditableField = saEditableFieldDirective
    Then => expect(@saEditableField).toBeDefined()
      
  describe 'compiles', ->
    Given => inject ($rootScope) =>
      @scope = $rootScope.$new()
      @scope.field = 'test'
      @htmlFixture = angular.element('<sa-editable-field sa-field-name="Test" ng-model="field"></sa-editable-field>');
    When => inject ($compile) =>
      $compile(@htmlFixture)(@scope)
      @scope.$apply()
    Then => @htmlFixture.find('label').length == 1
    And => @htmlFixture.find('label').text() == 'Test'
    And => @htmlFixture.find('input').length == 1
    And => @htmlFixture.find('button').length == 1

…and now the controller can have its own tests too;

describe 'saFieldController', ->
  describe 'exists', ->
    Given => inject ($rootScope) =>
      @scope = $rootScope.$new()
    When => inject ($controller) =>
      @saFieldController = $controller 'saFieldController', { $scope: @scope }
    Then => expect(@saFieldController).toBeDefined()
      
  describe '#showField', ->
    describe 'exists', ->
      Given => inject ($rootScope) =>
        @scope = $rootScope.$new()
      When => inject ($controller) =>
        @saFieldController = $controller 'saFieldController', { $scope: @scope }
      Then => expect(@scope.showField).toEqual jasmine.any(Function)

$window

However, we are not quite finished. How do we know that the alert is shown when our showField method is called? We could spy on the global window object, but that's not really very Angular-y and our alert will get shown during testing (goodness knows what other side-effects we might face by using the global window object). What we need is an injectable version of window that we can replace with a fake during our tests. Not unsurprisingly, AngularJS has us covered with the $window service.

sa.controller('saFieldController', ['$scope', '$window', function($scope, $window) {
    $scope.showField = function() {
        $window.alert("Your " + $scope.fieldName + " is " + $scope.fieldValue);
    };
}]);

Now with our controller rewritten to use $window, we can tidy up and complete its test cases.

describe 'saFieldController', ->
  Given -> module ($provide) ->
    $provide.value '$window', jasmine.createSpyObj('$window', ['alert']);return
      
  describe 'exists', ->
    Given => inject ($rootScope) =>
      @scope = $rootScope.$new()
    When => inject ($controller) =>
      @saFieldController = $controller 'saFieldController', { $scope: @scope }
    Then => expect(@saFieldController).toBeDefined()
      
  describe '#showField', ->
    describe 'exists', ->
      Given => inject ($rootScope) =>
        @scope = $rootScope.$new()
      When => inject ($controller) =>
        @saFieldController = $controller 'saFieldController', { $scope: @scope }
      Then => expect(@scope.showField).toEqual jasmine.any(Function)
        
    describe 'calls $window.alert', ->
      Given => inject ($rootScope, $controller, $window) =>
        @scope = $rootScope.$new()
        @windowService = $window
        @saFieldController = $controller 'saFieldController', { $scope: @scope }
      When => @scope.showField()
      Then => expect(@windowService.alert).toHaveBeenCalled()
      
    describe 'alert includes $scope.fieldName', ->
      Given => inject ($rootScope, $controller, $window) =>
        @scope = $rootScope.$new()
        @scope.fieldName = "Test Name"
        @windowService = $window
        @saFieldController = $controller 'saFieldController', { $scope: @scope }
      When => @scope.showField()
      Then => expect(@windowService.alert.calls.mostRecent().args[0]).toMatch /.*Test Name.*/
        
    describe 'alert includes $scope.fieldValue', ->
      Given => inject ($rootScope, $controller, $window) =>
        @scope = $rootScope.$new()
        @scope.fieldValue = "Test Value"
        @windowService = $window
        @saFieldController = $controller 'saFieldController', { $scope: @scope }
      When => @scope.showField()
      Then => expect(@windowService.alert.calls.mostRecent().args[0]).toMatch /.*Test Value.*/

And there you have it. Some simple steps that make unit testing directives a little easier.

Finally…

In this post, we have taken a very brief look at how to structure a directive to simplify unit testing by separating the directive declaration from its controller and taking advantage of Angular services such as $window.

Although we have not covered some of the more complex directive concepts such as the link function and DOM manipulation3, these simple steps should take you a long way towards providing better test coverage of your AngularJS widgets.

Until next time, take care and don't forget to leave a comment.

  1. Not unexpected since I wrote it to discuss these things []
  2. we have to compile a fixture with a scope and then get the isolate scope from the element []
  3. I may cover that in a future post if there is interest or the whim takes me []