Compiler – somewhat abstract

Patterns and Collection Initializers

Some of the cool parts of C# are pattern-based, rather than type-based as one might expect. For example, `foreach` does not need the enumerated type to implement `IEnumerable` in order to work, it just requires that it has a `GetEnumerator()` method. Another place where pattern-based compilation occurs that also happens to illustrate how useful this pattern-based approach can be is in collection initializers like this:

var list = new List<int> { 1, 2, 3, 4, 5, 6 };

When this gets compiled, for each value in the initializer the C# compiler¹ looks for an `Add()` method on the collection type with an appropriate number of arguments of the appropriate types, which it then calls for that value. The benefit to using a pattern-based approach is that the compiler does not need to know about every possible compatible type up front or what `Add()` methods it might support. It only enforces that the type derives from `IEnumerable` and that it has an `Add()` method that matches the initializer values. This allows us to create a collection types that can support a variety of different ways to add values without needing the compiler to know our type will ever exist. For example, we could create a collection of names with `Add()` methods that take one or two strings and then initialize elements with either just the surname or first name and surname².

var names = new NamesCollection
{
   "Jones",
   { "David", "Smith" },
   { "Daniel", "Smith" },
   "Smith"
};

Collection initializers in C#6

In C#6, a new collection initializer syntax has been added and the way the compiler interprets the existing syntax has been modified. Before we look at the newly added syntax, let us look at how the compilation of the existing syntax has been changed. To do so, consider a collection of `DateTimeOffset` values where we want to simplify adding dates and times from parsable string values. To support this we could implement an entire new type with the appropriate calls or we could derive from an existing collection type `List<DateTimeOffset>` and then implement a new `Add()` method to support `string`.

public class DateTimeOffsetList : List<DateTimeOffset>
{
    public void Add(string value)
    {
        base.Add(DateTimeOffset.Parse(value));
    }
}

Of course, not all collections are open for extension and creating new types for this is cumbersome since we want a list of `DateTimeOffset` we just happen to want to initialize it from another type. To get around sealed types and the need to implement wrapper types or derivations, VB.NET has supported using extension methods to expand the `Add()` options on a type. I like this idea since, in the previous example, our list is really still of `DateTimeOffset` and we want others to see it that way, we just happen to support adding `string` values; why should we be forced to use a different type for that? Alas ((Cue Top Gear voice style)), this feature was not included in C#…until now. As of C#6, this disparity between VB.NET and C# is no more; the compiler will use a matching `Add()` extension method in lieu of an appropriate `Add()` method on the type itself.

public static class Extensions
{
    public static void Add(this List<DateTimeOffset> list, string value)
    {
        list.Add(DateTimeOffset.Parse(value));
    }
}

Interestingly, this change to how C# resolves overloaded methods is very specific in that it only supports `Add()` extension methods and not extension methods in other pattern-based scenarios like `GetEnumerator`. I am not certain why this so, since I can imagine some cases where enumerating an existing non-enumerated type might be quite nice³, though I expect is is because it would not be clear what was going to get enumerated and therefore, the code would be ambiguous and hard to follow⁴. The `Add()` method usage in an initializer does not have this ambiguity as the compiler makes it clear if it found a suitable `Add` method that matches both the collection type and the type of the element being added.

Index Initializers

The other change to collection initializers in C#6 is the introduction of index intializer syntax. This new syntax is similar to the existing collection initializer syntax we have discussed, except that instead of using `Add()` methods, it uses indexers. With index-based collection initialization we can specify values for specific indices in a collection. This works for any indexer that a collection implements. Traditionally, we might initialize a `Dictionary<string,string>` using the `Add()` method pattern like this:

var pairs = new Dictionary<string, string>
{
    {"address","12 This Way, Anywhere"},
    {"name", "Bob Foo"},
    {"title", "Master of the universe"}
};

But with the index initializer syntax, we can make it clear that one string indexes the other to make this much more readable as:

var pairs = new Dictionary<string, string>
{
    ["address"] = "12 This Way, Anywhere",
    ["name"] = "Bob Foo",
    ["title"] = "Master of the universe"
};

I cannot speak for anyone else, but I think this really makes the code easier to read. Note, however, that this new index syntax cannot be mixed with traditional initializer syntax; for example, the following is invalid:

var pairs = new Dictionary<string, string>
{
    {"address","12 This Way, Anywhere"},
    ["name"] = "Bob Foo",
    ["title"] = "Master of the universe"
};

I think it is okay that they cannot be mixed. One way is using `Add()` method overload resolution to set values and the other is using indexers; these use different semantics and often have different implementations and connotations. By mixing them, the code becomes muddled and loses meaning; are we specifying records in a collection or are we mapping specific indexes to their records?

In Conclusion

Both of these changes to collection initialization are reasonably subtle. Of all the features C#6 brings us, these are perhaps going to be used the least. In fact, when I started writing this post I was unsure of their value. However, as I wrote and thought of usage examples, I came to the realisation that although they cater to perhaps infrequent scenarios, these changes to collection initializers each provide nice additions to the C# language. Index initializers remove a little ambiguity from the initialization of indexed collections, such as dictionaries, whereas the expansion of `Add()` method overload resolution to include extension methods reduces the number of frivolous types we have to create. In short, they allow us to write simpler, clearer code, and that is a beautiful thing.

pre-C#6 [↩]
A contrived example to be sure, but illustrative none-the-less [↩]
Such as enumerating the lines from a file stream [↩]
Much clearer to write a `LineEnumerator` wrapper for `FileStream` and use it explicitly [↩]

Magic

Of course, to end the story there without discussing the compiler magic would do a disservice to this new feature. In the example above, the result of the interpolation was stored in a variable with type `var`. This means the type is inferred by the compiler, which infers `string` and then performs appropriate compiler operations to turn our interpolated string into a call to `string.Format()`. This means that we don't have to do anything else to use this feature and get formatted strings. However, we can make the compiler do something different by rewriting the line like this²:

FormattableString formatString = $"The string, {aString}, has a {aString.Length} characters.";

We have now specified that we are using a variable of type `FormattableString`. With this declaration, the compiler changes its behavior and we get a `FormattedString` object that represents the interpolated string. From this object, we can get the `Format` string that could be passed to a call that takes a format string, such as `string.Format()` (there are several others in types like `Console`, `StringBuilder`, and `TextWriter`). We can also retrieve the number of arguments³ in the string using `ArgumentCount`, and use `GetArgument()` and `GetArguments()` to retrieve the values of those arguments. Using a combination of `Format` and `GetArguments()`, we can pass this information to a different call that might reuse or extend it to produce a different message. Finally, we can use the `ToString()` call to specify an `IFormatProvider`, allowing us to format the string according to a specific culture.

By telling the compiler that we want a `FormattableString` we get all this extra information to use as we see fit. If you look at the arguments using either of the `Get..` methods, you will see that the values have already been evaluated, so you can be assured that they won't change as you process the string. I'm sure there are situations where you might find this additional access to the formatting invaluable, such as when creating compound error messages, or perhaps doing some automatic language translation.

In conclusion…

There's not much else for me to say about C#6's string interpolation except to highlight one gotcha that I have hit a couple of times. The next two examples should illustrate appropriately:

Console.WriteLine($"{DateTime.Now}: I'm writing DateTime.Now to the console");

Console.WriteLine("{DateTime.Now}: I'm writing DateTime.Now to the console");

Here is what these two examples will output:

It's hard to argue with either of them, after all, they both wrote an interpretation of `DateTime.Now` to the console, but the first one is perhaps a more useful output⁴.

So why did the second example not work? You may have already spotted the answer to that question, especially if you're a VB programmer; it's the `$` at the start of the first example's string. This `$` tells the compiler that we are providing a string for interpolation. It's an easy thing to miss and if you forget it (or perhaps, in rare cases, add it erroneously) you'll likely only spot the mistake through thorough testing⁵ or customer diligence⁶. As always, learn the failure points and work to mitigate them with code reviews and tests. I suspect the easiest mitigation may be to always use the interpolation style strings unless a situation demands otherwise.

And that's it for this week. What do you think of the new string interpolation support? Will you start using it? If not, why not? Do you have any cool ideas for leveraging the additional information provided by `FormattableString`? Please share in the comments.

If you're interested in my other posts on some of the new things introduced by C#6, here are links to posts I have written thus far:

The `+` operator can be used in conjunction with `ToString()` but it can get messy to read and is very hard to localize [↩]

We could also cast the interpolated string to `FormattableString` and leave the variable as `var`. [↩]

Each inserted value is an argument [↩]

Except when providing examples in a blog [↩]

Unit tests or otherwise [↩]

Write automated tests and test manually; let's not use customers as QA [↩]

Tag: Compiler

C#6: String Interpolation

Magic

In conclusion…