C#7: Throw Expressions and More Expression-bodied Members

In this installment of my look at C#7, we will take a look at some nice syntactical enhancements, including the first ever community contribution to the C# language implementation. Before we get started, here is a summary of what I am covering in this series on C#7.

Throw Expressions

We have all written code like this1:

public class MyApiType
{
    private object _loadedResource;
    private object _someProperty;

    public MyApiType()
    {
        _loadedResource = LoadResource();
        if (_loadedResource == null) throw new InvalidOperationException();
    }

    public object SomeProperty
    {
        get
        {
            return _someProperty;
        }
    
        set
        {
            if (value == null) throw new ArgumentNullException();
            _someProperty = value;
        }
    }
}

I have omitted the exception arguments for brevity, but you should hopefully recognise the sort of sanity checking to which I am referring within the highlighted lines.

With throw expressions, we can now combine assignment, the null-coalescing operator2, and throw to create succinct validation code. This means that the example above can be simplified to not even need the constructor.

public class MyApiType
{
    private object _loadedResource = LoadResource() ?? throw new InvalidOperationException();
    private object _someProperty;

    public object SomeProperty
    {
        get
        {
            return _someProperty;
        }
    
        set
        {
            _someProperty = value ?? throw new ArgumentNullException();
        }
    }
}

The highlighted lines are equivalent to the code we had earlier, but now we are able to use throw as part of the expression. The introduction of throw expressions means that we can now throw exceptions in conditional and null-coalescing expressions, as well as some lambda methods where it was previously not possible to do so. Not only that, but when combined with expression-bodied members, we can write some very expressive yet terse code.

Expression-bodied Members

With C#6 we got expression-bodied members, which allowed us to express simple methods using lambda-like syntax. However, this new syntax was limited to methods and read-only properties. Via the first ever community contribution to C#3, C#7 expands this syntax to cover constructors, finalizers, and property accessors.

If we take the property example we had before, containing our throw expression as part of  property set accessor, we can now write it as:

public object SomeProperty
{
    get => _someProperty;
    set => _someProperty = value ?? throw new ArgumentNullException();
}

I won't bother with examples for constructors or finalizers; the main documentation is pretty clear on those and I am not convinced the syntax will be used very often in those cases. Constructors are rarely so simple that the expression-bodied syntax makes sense, and finalizers are so rarely needed4 that most of us will not get an opportunity to write one at all, expression-bodied or otherwise.

In Conclusion

These simple additions to the C# syntax enable us to write terse code without losing clarity, which is always a good thing. Not only that, but we have reached a landmark event; community contributions to C#. This contribution may be a little tame when compared with some of the other features coming in C#7, but it bodes well for the future of the language in its new, open source home.

Next time, we will take a look at the highly anticipated pattern matching. Until then, feel free to leave a comment, or read more about C#7 on my blog and on the official documentation.

  1. Let's ignore the nastiness of throwing exceptions during construction []
  2. You remember Elvis, right?? []
  3. Source: https://docs.microsoft.com/en-us/dotnet/articles/csharp/csharp-7#more-expression-bodied-members []
  4. If you find yourself writing a finalizer, I recommend you make sure you really need it; there is probably a better way []

C#7: Out Variables

Last time, we started to look at the new features introduced in C#7. Here is a quick refresher of just what those features are:

In this post, we will look at one of the simplest additions to the C# language; out variables.

int dummy

How often have you written code like this?

int dummy;

if (int.TryParse(someString, out dummy) && dummy > 0)
{
   // Do something
}

Or this?

double dummy;

if (myDictionary.TryGetValue(key, out dummy))
{
   //Do something
}

Sometimes you use the out value retrieved, sometimes you do not, often you only use it within the scope of the condition. In any case, there is always the variable definition awkwardly hanging out on its own line, looking more important than it really is and leaving space for it to accidentally get used before it has been initialized. Thankfully, C#7 helps us tidy things up by allowing us to combine the variable definition with the argument.

Using the out variable syntax, we can write this:

if (int.TryParse(someString, out int dummy) && dummy > 0)
{
    //Do something
}

In fact, we do not even need to declare the type of the variable explicitly. While often we want to be explicit to make it clear that it matters (and to ensure we get some compile time checking of our assumptions), we can use an implicitly typed variable like this:

if (myDictionary.TryGetValue(someKey, out var dummy))
{
    //Do something
}

In Conclusion

out variables are part of a wider set of features for reducing repetition (in written code and in run-time execution), and saying more with less (i.e. making it easier for us to infer intent from the code without additional commentary). This is a very simply addition to C# syntax, yet useful. Not only does it reduce what we need to type, it also improves code clarity (in my opinion), and reduces the possibility of silly errors like using a variable before it has been initialized, or worse, thinking that it being uninitialized was a mistake and hiding a bug by initializing it.

Until next time, if you would like to tinker with any of the C#7 features I have been covering, I recommend getting the latest LINQPad beta or Visual Studio 2017 RC.

 

C#7: Binary Literals and Numeric Literal Digit Separators

Happy New Year, y'all! I thought I would kick off 2017 with a look at C#7. The next release of Visual Studio will soon be upon us and with it a new version of C#. As with its predecessor, C#6, C#7 brings a variety of syntactical and compiler magic allowing us to do more work with less code. Just as the new features of C#6 enabled us to make code more readable by reducing ceremony and making intent clearer1, so go the new features of C#7.

Before we take a look closer look, here is an overview of the goodies in C#7:

It is a shorter list than the new features for C#6, but there is still a lot of goodness crammed in there. Over the next few posts, I want to delve into these features just a literal to familiarize myself (and you) with them and how they may impact the code we write. So, without further ado, let's take a look at the first two items on the list; binary literals and numeric literal digit separators.

Binary Literals

Numeric literals are not a new concept in C#. We have been able to define integer values in base-10 and base-16 since C# was first released. Common uses case for base-16 (also known as hexadecimal) literals are to define flags and bit masks in enumerations and constants. Since each digit in a base-16 number is 4 bits wide, each bit in that digit is represented by 1, 2, 4, and 8.

[Flags]
public enum Option
{
    None    = 0x00,
    Option1 = 0x01,
    Option2 = 0x02,
    Option3 = 0x04,
    Option4 = 0x08,
    Option5 = 0x10,
    Option6 = 0x20,
    Option7 = 0x40,
    Option8 = 0x80,
    All     = 0xFF
}

While this is familiar to most, using C#7 we can now express such things explicitly in base-2, more commonly referred to as binary. While hexadecimal literals are prefixed with 0x , binary literals are prefixed with 0b.

[Flags]
public enum Option
{
    None    = 0b00000000,
    Option1 = 0b00000001,
    Option2 = 0b00000010,
    Option3 = 0b00000100,
    Option4 = 0b00001000,
    Option5 = 0b00010000,
    Option6 = 0b00100000,
    Option7 = 0b01000000,
    Option8 = 0b10000000,
    All     = 0b11111111
}

Although I am used to using base-16 numbers for this, I can see value in being explicit by using binary literals. The strength comes when more than one bit is set. When using base-16, it can be easy to make a mistake and it is not immediately obvious what bits are set by a specific value2. With binary literals, it is immediately obvious without additional, potentially erroneous side calculations.

Digit Separators

Of course, binary values can get big fast and keeping track of which things line up with which can be fraught with problems. Sure, we can try to line up the values, but what if the indentation gets one space off? Will we really notice during that code review?

To help with readability like this and to assist in avoiding silly off-by-one issues that can arise due to misaligned values, C#7 introduces _ as a digit separator for all numeric literals. This separator is stripped out by the compiler; it is just syntactical candy to aid readability and serves no purpose within the compiled code. For example, our enumeration above that uses binary literals can be rewritten as follows:

[Flags]
public enum Option
{
    None    = 0b0000_0000,
    Option1 = 0b0000_0001,
    Option2 = 0b0000_0010,
    Option3 = 0b0000_0100,
    Option4 = 0b0000_1000,
    Option5 = 0b0001_0000,
    Option6 = 0b0010_0000,
    Option7 = 0b0100_0000,
    Option8 = 0b1000_0000,
    All     = 0b1111_1111
}

I think this really does help with readability although I was disappointed to find that I could not use this separator directly after the base modifier. I do not know about anyone else, but it seems more readable to separate the modifier from the actual value. Thankfully, we can pad the left of our number with zeroes as long as the value we define fits into the type we are assigning.

byte a = 0b_0000_0001;  //INVALID: Digit separator cannot be at the start or end of the value
byte b = 0b1_0000_0001; //INVALID: 257 doesn't fit in a byte
byte c = 0b0_0000_0001; //VALID: 1 fits into a byte just fine

I suspect we may start seeing code that uses this "padding plus separator" approach once C#7 gets wider acceptance as I think it really improves readability; 0b0_0001_0000 is clearer to me than 0b0001_0000.

In addition, the digit separator is not limited to just binary numeric literals; it can be used in any numeric literal. For example, use it to separate 32-bit parts of a large hexadecimal number, or as a thousands separator in a floating point value; anywhere that it improves readability.

In Conclusion

The new binary literal syntax and digit separator should help to make intent clearer and code easier to read when used appropriately. As with any language feature, we must always use our best judgement to ensure it is being used appropriately. For more information on the features covered in this post, see the official documentation where you can also discover other C#7 magic that I will be covering in my upcoming posts.

  1. Things like read-only auto-properties, expression-bodied member functions, exception filters, null-conditional operators, and the nameof operator to name a few []
  2. I know some can see in hex, and that's great, but not everyone is so adept []

LINQ: Notation, Syntax, and Snags

Welcome to the final post in my four part series on LINQ. So far, we've talked about:

For our last look into LINQ (at least for this mini-series), I want to tackle the mini-war of "dot notation" versus "query syntax", and look at some of the pitfalls that can be avoided by using LINQ responsibly.

Let Battle Commence…

For anyone who has written LINQ using C# (or VB.NET), you are probably aware that there is more than one way to express the query (two of which, sane people might use):

  1. Old school static method calls
  2. Method syntax
  3. Query syntax

No one in their right mind should be using the first of these options; extension methods were invented to alleviate the pain that would be caused by writing LINQ this way1. Extension methods, static methods that can be called as though member methods, are the reason why we have the second option of method syntax (more commonly known as dot notation or fluent notation). The final option, query syntax, is also known as "syntactical sugar", some language keywords that can make coding easier. These keywords map to concepts found in LINQ methods and query syntax is what gives LINQ it's name; Language INtegrated Query2.

They all map to the same thing, a sequence of methods that can be executed, or translated into an expression tree, evaluated by a LINQ provider, and executed. Anything written in one of these approaches can be written using the others. There is often contention on whether to use dot notation or query syntax, as if one is inherently better than the other, but as we all know, only the Sith deal in absolutes3.  Hopefully, by the end of these examples you will see how each has its merits.

Why are LINQ queries not always called like regular methods?

Because sometimes, such as in LINQ-to-SQL or LINQ-to-Entity Framework, the method calls need to be translated into SQL or some other querying syntax, allowing queries to take advantage of server-side querying optimizations. For a more in-depth look at all things LINQ, including the way the language keywords map to the method calls, I recommend looking at Jon Skeet's Edulinq series, which is available as a handy e-book.

Before we begin, here is a quick summary of the C# keywords that we have for writing queries in query syntax: `from`, `group`, `orderby`, `let`, `join`, `where` and `select`.  There are also contextual keywords to be used in conjunction with one or two of the main keywords:`in`, `into`, `ascending`, `descending`, `by`, `on` and `equals`. Each of these keywords has a corresponding equivalent method or methods in LINQ although it can sometimes be a little more complicated as we shall see.

So, let us look at an example and see how it can be expressed using dot notation and query syntax4). For an example, let us look at a simple projection of people to their last names.

public struct Person
{
    public Person(string first, string last, DateTimeOffset dateOfBirth) : this()
    {
        FirstName = first;
        LastName = last;
        DateOfBirth = dateOfBirth;
    }
    
    public string FirstName { get; private set; }
    public string LastName { get; private set; }
    public DateTimeOffset DateOfBirth { get; private set; }
}

public static class Data
{
    public static IEnumerable<Person> People = new[] {
        new Person("John", "Smith", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 34 )),
        new Person("Bill", "Smith", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 20 )),
        new Person("Sarah", "Allans", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 19 )),
        new Person("John", "Johnson", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 44 )),
        new Person("James", "Jones", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 78 )),
        new Person("Alex", "Jones", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 30 )),
        new Person("Mabel", "Thomas", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 29 )),
        new Person("Sarah", "Brown", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 23 )),
        new Person("Gareth", "Smythe", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 100 )),
        new Person("Gregory", "Drake", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 90 )),
        new Person("Michael", "Johnson", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 56 )),
        new Person("Alex", "Smith", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 22 )),
        new Person("William", "Pickwick", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 17 )),
        new Person("Marcy", "Dickens", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 18 )),
        new Person("Erica", "Waters", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 26 ))
    };
}
Data.People.Select(p => p.LastName);
from person in Data.People
select person.LastName;

These two queries do the exact same thing, but I find that the dot notation wins out because it takes less typing and it looks clearer. However, if we decide we want to only get the ones that were born before 1980, things look a little more even.

Data.People
    .Where(p => p.DateOfBirth.Year < 1980)
    .Select(p => p.LastName)
from person in Data.People
where person.DateOfBirth.Year < 1980
select person.LastName;

Here, there is not much difference between them, so I'd probably leave this to personal preference5. However, as soon as we want a distinct list, the dot notation starts to win out again because C# does not contain a `distinct` keyword (though VB.NET does).

Mixing dot notation and query syntax in a single query can look messy, as shown here:

(from person in Data.People
where person.DateOfBirth.Year < 1980
select person.LastName).Distinct()

So, I prefer to settle on just one style of LINQ declaration for any particular query, or to use intermediate variables and separate the query into parts (this is especially useful on complex queries as it also provides clarity; being terse is cool, but it is unnecessary, and a great way to get people to hate you and your code).

Data.People
    .Where(p => p.DateOfBirth.Year < 1980)
    .Select(p => p.LastName)
    .Distinct();
var lastNames = from person in Data.People
                where person.DateOfBirth.Year < 1980
                select person.LastName;
lastNames.Distinct();

The `Distinct()` method is not the only LINQ method that has no query syntax alternative, there are plenty of others like `Aggregate()`, `Except()`, or `Range()`. This often means dot notation wins out or is at least part of a query written in query syntax. So, thus far, dot notation seems to have the advantage in the battle against query syntax. It is starting to look like some of my colleagues are right, query syntax sucks. Even if we use ordering or grouping, dot notation seems to be our friend or at least is no more painful than query syntax:

Data.People
    .OrderBy (p => p.LastName)
    .ThenBy (p => p.FirstName)
    .GroupBy(p=>p.DateOfBirth.Year);
from person in Data.People
orderby person.LastName,
        person.FirstName
group person by person.DateOfBirth.Year;

However, it is not always so easy. What if we want to introduce variables, group something other than the original object, or use more than one source collection? It is in these scenarios where query syntax irons a lot more of the complexity. Let's assume we have another collection containing newsletters that we need to send out to all our people. To generate the individual mailings, we would need to combine these two collections6.

Data.People.SelectMany(
    person => newsletters,
    (person, newsletter) => new {person,newsletter} );
from person in Data.People
from newsletter in newsletters
select new {person, newsletter};

I know which one is clearer to read and easier to remember when I need to write a similar query. The dot notation example makes me think for a minute what it is doing; projecting each person to the newsletters collection and, using `SelectMany()`, flattening the list then selecting one result per person/newsletter combination. Our query syntax example is doing the same thing, but I don't need to think too hard to see that. Query syntax is starting to look useful.

If we were to throw in some mid-query variables (useful to avoid calculating something multiple times or to improve clarity), or join collections, query syntax becomes really useful. What if each newsletter is on a different topic and we only want to send newsletters to people who are interested in that topic?

people.SelectMany(
    person => person.Value,
    ( person, topic ) => new
    {
        person,
        topic
    } ).Join(
        newsletters,
        t => t.topic,
        newsletter => newsletter.Value,
        ( t, newsletter ) => new
        {
            t.person,
            newsletter
        } );
from person in Data.People
from topic in person.Topics
join newsletter in newsletters on topic equals newsletter.Topic
select new {person, newsletter};

I know for sure I would need to go look up how to do that in dot notation7. Query syntax is an easier way to write more complex queries like this and provided that you understand your query chain, you can declare clear, performant queries.

 

In conclusion…

In this post I have attempted to show how both dot notation and query syntax (aka fluent notation) have their vices and their virtues, and in turn, armed you with the knowledge to choose wisely.

So, think about whether someone can read and maintain what you have written. Break down complex queries into parts. Consider moving some things to lazily evaluated methods. Understand what you are writing; if you look at it and have to think about why it works, it probably needs reworking. Always favour clarity and simplicity over dogma and cleverness; to draw inspiration from Jurassic Park, even though you could, stop to think whether you should.

LINQ is a complex feature of C# and .NET (and all the other .NET languages) and there are many things I have not covered. So, if you have any questions, please leave a comment. If I can't answer it, I will hopefully be able to direct you to someone who can. Alternatively, check out Edulinq by the inimitable Jon Skeet, head over to StackOverflow where there is an Internet of people waiting to help (including Jon Skeet), or get binging (googling, yahooing, altavistaring, whatever…)8.

And that brings us to the end of this series on LINQ. From deferred execution and the query chain to dot notation versus query syntax, I hope that I have managed to paint a favourable picture of LINQ, and helped to clear up some of the prejudices and confusions that surround it. LINQ is a powerful weapon in the arsenal of a .NET programmer; to not use it, would be a waste.

  1. Just the thought of the nested method calls or high number of placeholder variables makes me shudder []
  2. I guess LIQ was too suggestive for Microsoft []
  3. That statement is an absolute, Obi Sith Kenobi []
  4. I am definitely leaving the nested static methods approach to you as an exercise (in futility []
  5. Though if you changed the `person` variable to `p`, there is less typing in the query syntax , if that is a metric you are concerned with []
  6. Yes, a nested `foreach` can achieve this simple example, but this is just illustrative, and I'd argue cleaner than a `foreach` approach []
  7. That's why I cheated and wrote it in query syntax, then used Resharper to change it to dot notation for me []
  8. Back in my day, it was called searching…grumble grumble []