LINQ: Clarity, complexity, and understanding

This is part of a short series on the basics of LINQ:

At CareEvolution, we tend to develop using JavaScript on the front end, and C# on the back end (with some Python, PowerShell, CoffeeScript, R, SQL, and other languages thrown in when appropriate or technical debt dictates). We have hackathons every eight weeks where we get to be creative without the constraints of day-to-day work. We have a brown bag lunch talk every Wednesday. We work hard at embracing change, exploring new ways of doing things, and sharing what we have learned with each other. Quite often, leading figures in a particular technology emerge within our organisation: Brian knows JavaScript, Chris knows CSS, Brad knows SQL. While I doubt I know even half of the things about LINQ and its various implementations for database, Web API, or file interaction, I know enough to make it useful in my day to day work and I seem to be the one that employs it most in their code. I know LINQ.

I am certain this is going to sound familiar to many, but while my colleagues and I embrace all things as a collective, quite often a specific technology or its use will be avoided, derided, and hated by some. Whether driven by ignorance, a particular terrible experience, or prejudice1, these deep-seated feelings can create conflict and occasionally hinder progress. For me, my use of LINQ has been a cause for contention during code reviews. I have faced comments like "LINQ is too hard to understand", "loops are clearer", "it's too easy to get burned using LINQ", and "I don't know how to use it so I'd prefer not to see it". And that's all true; LINQ can be confusing, it can be complicated, it can be a debugging nightmare. LINQ can suck. Whether you use the C# language keywords or the dot notation (a debate almost as passionate as tabs versus spaces), LINQ can tie you up in knots and leave you wondering what you did to deserve this fresh hell. Yet any technology could be described the same way when one doesn't know anything about it or when early mistakes have left a bitter aftertaste.

Tabs vs Spaces

In response to these dissenting voices, I usually indicate the years of academic learning and professional experience it takes us to learn how to code at all. None of it is particularly easy and straightforward without some education. Don't believe me? Go stick your mum or dad in front of Visual Studio and, assuming they have never learned anything about C# or programming, see how far they get on writing Hello World without your help. Without educational instruction, we would not know any of it and LINQ is no different. When review comments inevitably request that I change my code to use less LINQ, none at all, or more understood language features like `foreach` and `while` loops, it frustrates me. It frustrates me because I usually feel that LINQ was the right choice for the job. I feel like I am being told, "use something I already know so I don't have to learn."

Of course, this interpretation is hyperbole. In actuality, when presented with opposing views to our own, it is easy to commit the black or white fallacy and assume one must be right and the other wrong, when really we should accept that we both may have a point (or neither) and learn more about the opposing view. Since I find, when used appropriately, LINQ can provide the best, most sublime, most elegant solution to problems that require the manipulation of collections in C#, I desperately want others to see that. It is as much on me as anyone else to try and correct for the disparity between what I see and what others see when I write LINQ. So, with my next post we will begin a journey into the basics of LINQ, when to use it2, when to use dot notation over language keywords (or vice versa), and how to avoid some of the more common traps. We will begin with the cause of many confusing experiences; deferred execution.

  1. we all know someone in the "That's new, I hate it" crowd []
  2. even I recognize LINQ is not a golden hammer; it's more of a chainsaw that kicks a little []

Debugging in LINQPad

If you have been reading my blog over the last few months you will no doubt be aware that I am a regular user of LINQPad. I do not have any commercial involvement with LINQPad nor its creators, I just really like it. Recently, I decided to try out the latest release, which adds integrated debugging to the already feature rich tool. This amazingly powerful new feature adds yet another reason why this application should be in every developer's arsenal, regardless of experience and ability (it is a great learning tool for students). Here is a brief overview of this new feature, which is available with the premium license (currently on sale for $85 at time of writing; it may not be the case as you are reading this).

When running the latest LINQPad, the debugging feature adds some new buttons to the familiar toolbar. All the debugging features are available for both statement and program-based queries in C#, VB, and F# (not expressions or SQL languages). The first new button is the `Pause` button, also known as `Break`. This works as you might expect, pausing the current code execution. The other two are to specify how exceptions should be handled, informing  the debugger to break on unhandled exceptions and when exceptions are thrown. Breakpoints can be added by clicking in the margin to the left of the code or pressing `F9` when the caret is on the desired line.  When a breakpoint is active on a line, it is indicated as a large red circle. For those who regularly use Visual Studio, the breakpoint and general debugger experience will be familiar.

Pressing `F5` will run the query (or selected lines) as usual, but now, any breakpoints set on executing lines will cause the code to break. At this point, LINQPad will reveal some familar and not-so-familiar tools for debugging the code. General status information is displayed at the bottom of the LINQPad window, showing things like whether the code is executing or paused, whether the debugger is attached or not, and the process ID.

The next code statement to execute is highlighted in the code with a yellow arrow in the margin (in this case, overlaid on the breakpoint circle), and the code highlighted in yellow. In the lower left portion of the screen, we can see local variables and executing threads. We can also set up our own watches as necessary. Any objects in the `Locals` and `Watch` tabs can be expanded using the `+` glyph to reveal their constituent values. As in Visual Studio, these tabs allow the expansion of just-in-time LINQ queries so you can delve into the deep dark secrets of your code. However, you can also take advantage of LINQPad's fantastic dump feature and dump any value out to the `Results` tab on the right. If you want to control how far down the object graph a dump will go, you can modify the `Dump Depth` using the `+` and `-` controls in the column header.

The `Dump` output for the `range` variable
The `Dump` output for the `range` variable
Specifying the depth of the dump
Specifying the depth of the dump

For more information on LINQPad and its many features, check out the LINQPad website (http://linqpad.net). In my opinion, whether you use the free version or one of the paid upgrades, you will have one of the best coding utilities available for .NET.

Caching with LINQPad.Extensions.Cache

One of the tools that I absolutely adore during my day-to-day development is LINQPad . If you are not familiar with this tool and you are a .NET developer, you should go to www.linqpad.net right now and install it. The basic version is free and feature-packed, though I recommend upgrading to the professional version. Not only is it inexpensive, but it also adds some great features like Intellisense1 and Nuget package support.

I generally use LINQPad as a simple coding environment for poking around my data sources, crafting quick coding experiments, and debugging. Because LINQPad does not have the overhead of a solution or project, like a development-oriented tool such as Visual Studio, it is easy to get stuck into a task. I no longer write throwaway console or WinForms apps; instead I just throw together a quick LinqPad query. I could continue on the virtues of this tool2, but I would like to touch on one of its utility features.

As part of LINQPad , you get some useful methods and types for extending LINQPad , dumping information to LINQPad's output window, and more. Two of these methods are LINQPad.Extensions.Cache and Utils.Cache. Using either Cache method, you can execute some code and cache the result locally, then use the cached value for all subsequent runs of that query. This is incredibly useful for caching the results of an expensive database query or computationally-intensive calculation. To cache an IEnumerable<T>  or IObservable<T>  you can do something like this:

var thingThatTakesALongTime = from x in myDB.Thingymabobs
                              where x.Whatsit == "thingy"
                              select x;
var myThing = LINQPad.Extensions.Cache(thingThatTakesALongTime);

Or, since it's an extension method,

var myThing = thingThatTakesALongTime.Cache();

For other types, Util.Cache  will cache the result of an expression.

var x = Util.Cache(()=> { /* Something expensive */ });

The first time I run my LINQPad code, my lazily evaluated query or the expression is executed by the Cache method and the result is cached. From then on, each subsequent run of the code uses the cached value. Both Cache methods also take an optional name for the cached item, in case you want to differentiate items that might otherwise be indistinguishable (such as caching a loop computation).

This is, as I alluded earlier, one of many utilities provided within LINQPad that make it a joy to use. What tools do you find invaluable? Do you already use LINQPad ? What makes it a must have tool for you? I would love to hear your responses in the comments.

Updated to correct casing of LINQPad, draw attention to Cache being an extension method for some uses, and adding note of Util.Cache3.

  1. including for imported data types from your data sources []
  2. such as its support for F#, C#, SQL, etc. or its built-in IL disassembly []
  3. because, apparently, I am not observant to this stuff the first time around. SMH []

Deserializing from a sequence of bytes

I was working on some file-based persistence today and found myself needing to load a string from an array of bytes that represented that string's characters. There is, of course, more than one way to skin this particular cat, but I took it as an opportunity to play around with extension methods.

The first way I found to do this was to have a method with an iterator block that took each pair of bytes and used the BitConverter.ToChar() method to yield a character.

public static IEnumerable<char> ToChars(this IEnumerable<byte> sequence)
{
    int counter = 0;
    byte[] bytes = new byte[2];

    foreach (var b in sequence)
    {
        bytes[counter++ % 2] = b;
        if (counter % 2 == 0)
        {
            yield return BitConverter.ToChar(bytes, 0);
        }
    }
}

Turning the results of this method into an array and using the appropriate string constructor meant job done. I realise that a little more work is needed to check we have a non-null sequence with an even number of bytes, but this is just illustrative. However, what if I had wanted a sequence of integers or doubles or some other type?

A more elegant solution would be to create a partitioning method that took the sequence of bytes and returned a sequence of smaller sequences.

public static IEnumerable<byte[]> Partition(this IEnumerable<byte> sequence, int partitionSize)
{
    bool any = false;

    int partitionIndex = 0;
    byte[] partition = new byte[partitionSize];
    foreach (var b in sequence)
    {
        any = true;
        partition[partitionIndex++] = b;

        if (partitionIndex >= partitionSize)
        {
            yield return partition;
            partitionIndex = 0;
        }
    }

    // We have a partial partition to yield.
    if (any && (partitionIndex != 0))
    {
        yield return partition;
    }
}

This time, we've got ourselves a sequence of byte arrays. To get the characters we would've got from the previous example, we have to perform a quick Select() call on the sequence.

var sequenceOfChar = sequenceOfBytes
    .Partition(sizeof(char))
    .Select(x => BitConverter.ToChar(x, 0));

Of course, if we wanted a sequence of integers, the call would be a little different.

var sequenceOfInt32 = sequenceOfBytes
    .Partition(sizeof(int))
    .Select(x => BitConverter.ToInt32(x, 0));

There's a little more polish required to cope with null sequences and there's no guarantee that the last array in the partitioned sequence will have enough bytes for a full partition. Finally, in my examples this relies on the data being persisted in the order expected by the BitConverter calls, but you could manage that yourself depending on your own circumstances.

Is this useful to anyone else? Is there a better way to achieve the same goals?