Reputation Is No Substitute For Knowledge

Last week, I regrettably ventured back to answering questions on StackOverflow. The question that lured me back was this one:

Due to the general confusion over this operator, my answer, though correct, was down-voted and derided as entirely wrong. Worst of all, one of the main detractors had over 300k in reputation and, rather than try what I had suggested, spent their time telling me I was wrong as their own incorrect answer received all the up-votes. In the spirit of StackOverflow as I once knew it, I edited my answer and answered the comments, trying to clear up the confusion and get the question answered adequately. As my answer got down-voted, more incorrect answers got up-voted. However, eventually, I was able to convince my main detractor that my answer was correct. So they promptly deleted all evidence that they ever thought otherwise and, without attribution, edited their once incorrect, top-voted answer to be correct.

Though it stings a little1, I do not mind that my answer did not get accepted nor that it did not get the most votes; the question was answered correctly and that's the point of the site. What I find most disagreeable is the unsporting behavior that undermined the sense of community that once pervaded StackOverflow. I left the whole experience feeling like an outsider. In the past, those with wrong answers would delete theirs in favor of the right one, or they would edit theirs, but give credit to the right one. People would (in the most part) treat each other with respect and see reputation as a sign of being a good citizen, not necessarily a knowledgeable one. Not anymore.

I wish I could show the comments I received when answering this question, but they were deleted2. However, the general pattern of this and other experiences appears to be that someone with a high reputation score down-votes and derides other answers, then once the correct answer is clear, takes everything from the correct answers posted to edit into their own, which then earns all the reputation. It is an embittering experience that I know others have shared.

In the beginning, earning reputation and badges encouraged people to get things right and to help each other out. Now the site has matured, the easy questions are answered, and the gap between the newcomers and those with the highest reputation is huge. Newcomers languish in poverty with little opportunity, if any, to reach the top, while those at the top benefit from a bias toward answers and opinions that come from those with large reputation scores. What once incentivised good behavior and engagement, seems to have led to bullying and dishonesty. I am not saying that all people with high reputation engage in unscrupulous practices on StackOverflow —there are many generous and humble members of the community —unfortunately, bad experiences outweigh good experiences 5-1 (or as high as 12-1), so the actions of a few can poison the well.

The root of the problem as I see it3 is that reputation has become (or perhaps always was) over-valued, and in its pursuit, some have lost sight of what StackOverflow was trying to achieve; community. The community that made it special, that made me feel like I belonged, is gone, and reputation is no substitute for knowledge. What was once an all-for-one, one-for-all environment has, in the competition for reputation, turned toxic4.

I have no doubt that many reading this will think I am misrepresenting the situation, overreacting, or just plain wrong, and that is OK; I hope that those people are right, that this is not a trend, and that the overall community remains friendly and constructive. Personally, I will think twice before involving myself in answering (or even asking) a question on StackOverflow again.

Ultimately, StackOverflow works as long as the right answers get provided; but if those with the knowledge to answer get disillusioned and leave, from where will those right answers come?

Today's featured image is "Façade of the Celsus library, in Ephesus, near Selçuk, west Turkey" by Benh LIEU SONG. It is licensed under CC BY-SA 3.0. Other than being resized, the image has not been modified.

  1. we all like recognition for being right []
  2. I also deleted mine, since they were without context []
  3. if it is agreed that there is one []
  4. The fact that I even felt wronged may well be an indicator of that toxicity and my own part in its creation []

LINQ: Notation, Syntax, and Snags

Welcome to the final post in my four part series on LINQ. So far, we've talked about:

For our last look into LINQ (at least for this mini-series), I want to tackle the mini-war of "dot notation" versus "query syntax", and look at some of the pitfalls that can be avoided by using LINQ responsibly.

Let Battle Commence…

For anyone who has written LINQ using C# (or VB.NET), you are probably aware that there is more than one way to express the query (two of which, sane people might use):

  1. Old school static method calls
  2. Method syntax
  3. Query syntax

No one in their right mind should be using the first of these options; extension methods were invented to alleviate the pain that would be caused by writing LINQ this way1. Extension methods, static methods that can be called as though member methods, are the reason why we have the second option of method syntax (more commonly known as dot notation or fluent notation). The final option, query syntax, is also known as "syntactical sugar", some language keywords that can make coding easier. These keywords map to concepts found in LINQ methods and query syntax is what gives LINQ it's name; Language INtegrated Query2.

They all map to the same thing, a sequence of methods that can be executed, or translated into an expression tree, evaluated by a LINQ provider, and executed. Anything written in one of these approaches can be written using the others. There is often contention on whether to use dot notation or query syntax, as if one is inherently better than the other, but as we all know, only the Sith deal in absolutes3.  Hopefully, by the end of these examples you will see how each has its merits.

Why are LINQ queries not always called like regular methods?

Because sometimes, such as in LINQ-to-SQL or LINQ-to-Entity Framework, the method calls need to be translated into SQL or some other querying syntax, allowing queries to take advantage of server-side querying optimizations. For a more in-depth look at all things LINQ, including the way the language keywords map to the method calls, I recommend looking at Jon Skeet's Edulinq series, which is available as a handy e-book.

Before we begin, here is a quick summary of the C# keywords that we have for writing queries in query syntax: `from`, `group`, `orderby`, `let`, `join`, `where` and `select`.  There are also contextual keywords to be used in conjunction with one or two of the main keywords:`in`, `into`, `ascending`, `descending`, `by`, `on` and `equals`. Each of these keywords has a corresponding equivalent method or methods in LINQ although it can sometimes be a little more complicated as we shall see.

So, let us look at an example and see how it can be expressed using dot notation and query syntax4). For an example, let us look at a simple projection of people to their last names.

public struct Person
{
    public Person(string first, string last, DateTimeOffset dateOfBirth) : this()
    {
        FirstName = first;
        LastName = last;
        DateOfBirth = dateOfBirth;
    }
    
    public string FirstName { get; private set; }
    public string LastName { get; private set; }
    public DateTimeOffset DateOfBirth { get; private set; }
}

public static class Data
{
    public static IEnumerable<Person> People = new[] {
        new Person("John", "Smith", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 34 )),
        new Person("Bill", "Smith", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 20 )),
        new Person("Sarah", "Allans", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 19 )),
        new Person("John", "Johnson", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 44 )),
        new Person("James", "Jones", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 78 )),
        new Person("Alex", "Jones", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 30 )),
        new Person("Mabel", "Thomas", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 29 )),
        new Person("Sarah", "Brown", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 23 )),
        new Person("Gareth", "Smythe", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 100 )),
        new Person("Gregory", "Drake", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 90 )),
        new Person("Michael", "Johnson", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 56 )),
        new Person("Alex", "Smith", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 22 )),
        new Person("William", "Pickwick", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 17 )),
        new Person("Marcy", "Dickens", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 18 )),
        new Person("Erica", "Waters", DateTimeOffset.Now - TimeSpan.FromDays( 365 * 26 ))
    };
}
Data.People.Select(p => p.LastName);
from person in Data.People
select person.LastName;

These two queries do the exact same thing, but I find that the dot notation wins out because it takes less typing and it looks clearer. However, if we decide we want to only get the ones that were born before 1980, things look a little more even.

Data.People
    .Where(p => p.DateOfBirth.Year < 1980)
    .Select(p => p.LastName)
from person in Data.People
where person.DateOfBirth.Year < 1980
select person.LastName;

Here, there is not much difference between them, so I'd probably leave this to personal preference5. However, as soon as we want a distinct list, the dot notation starts to win out again because C# does not contain a `distinct` keyword (though VB.NET does).

Mixing dot notation and query syntax in a single query can look messy, as shown here:

(from person in Data.People
where person.DateOfBirth.Year < 1980
select person.LastName).Distinct()

So, I prefer to settle on just one style of LINQ declaration for any particular query, or to use intermediate variables and separate the query into parts (this is especially useful on complex queries as it also provides clarity; being terse is cool, but it is unnecessary, and a great way to get people to hate you and your code).

Data.People
    .Where(p => p.DateOfBirth.Year < 1980)
    .Select(p => p.LastName)
    .Distinct();
var lastNames = from person in Data.People
                where person.DateOfBirth.Year < 1980
                select person.LastName;
lastNames.Distinct();

The `Distinct()` method is not the only LINQ method that has no query syntax alternative, there are plenty of others like `Aggregate()`, `Except()`, or `Range()`. This often means dot notation wins out or is at least part of a query written in query syntax. So, thus far, dot notation seems to have the advantage in the battle against query syntax. It is starting to look like some of my colleagues are right, query syntax sucks. Even if we use ordering or grouping, dot notation seems to be our friend or at least is no more painful than query syntax:

Data.People
    .OrderBy (p => p.LastName)
    .ThenBy (p => p.FirstName)
    .GroupBy(p=>p.DateOfBirth.Year);
from person in Data.People
orderby person.LastName,
        person.FirstName
group person by person.DateOfBirth.Year;

However, it is not always so easy. What if we want to introduce variables, group something other than the original object, or use more than one source collection? It is in these scenarios where query syntax irons a lot more of the complexity. Let's assume we have another collection containing newsletters that we need to send out to all our people. To generate the individual mailings, we would need to combine these two collections6.

Data.People.SelectMany(
    person => newsletters,
    (person, newsletter) => new {person,newsletter} );
from person in Data.People
from newsletter in newsletters
select new {person, newsletter};

I know which one is clearer to read and easier to remember when I need to write a similar query. The dot notation example makes me think for a minute what it is doing; projecting each person to the newsletters collection and, using `SelectMany()`, flattening the list then selecting one result per person/newsletter combination. Our query syntax example is doing the same thing, but I don't need to think too hard to see that. Query syntax is starting to look useful.

If we were to throw in some mid-query variables (useful to avoid calculating something multiple times or to improve clarity), or join collections, query syntax becomes really useful. What if each newsletter is on a different topic and we only want to send newsletters to people who are interested in that topic?

people.SelectMany(
    person => person.Value,
    ( person, topic ) => new
    {
        person,
        topic
    } ).Join(
        newsletters,
        t => t.topic,
        newsletter => newsletter.Value,
        ( t, newsletter ) => new
        {
            t.person,
            newsletter
        } );
from person in Data.People
from topic in person.Topics
join newsletter in newsletters on topic equals newsletter.Topic
select new {person, newsletter};

I know for sure I would need to go look up how to do that in dot notation7. Query syntax is an easier way to write more complex queries like this and provided that you understand your query chain, you can declare clear, performant queries.

 

In conclusion…

In this post I have attempted to show how both dot notation and query syntax (aka fluent notation) have their vices and their virtues, and in turn, armed you with the knowledge to choose wisely.

So, think about whether someone can read and maintain what you have written. Break down complex queries into parts. Consider moving some things to lazily evaluated methods. Understand what you are writing; if you look at it and have to think about why it works, it probably needs reworking. Always favour clarity and simplicity over dogma and cleverness; to draw inspiration from Jurassic Park, even though you could, stop to think whether you should.

LINQ is a complex feature of C# and .NET (and all the other .NET languages) and there are many things I have not covered. So, if you have any questions, please leave a comment. If I can't answer it, I will hopefully be able to direct you to someone who can. Alternatively, check out Edulinq by the inimitable Jon Skeet, head over to StackOverflow where there is an Internet of people waiting to help (including Jon Skeet), or get binging (googling, yahooing, altavistaring, whatever…)8.

And that brings us to the end of this series on LINQ. From deferred execution and the query chain to dot notation versus query syntax, I hope that I have managed to paint a favourable picture of LINQ, and helped to clear up some of the prejudices and confusions that surround it. LINQ is a powerful weapon in the arsenal of a .NET programmer; to not use it, would be a waste.

  1. Just the thought of the nested method calls or high number of placeholder variables makes me shudder []
  2. I guess LIQ was too suggestive for Microsoft []
  3. That statement is an absolute, Obi Sith Kenobi []
  4. I am definitely leaving the nested static methods approach to you as an exercise (in futility []
  5. Though if you changed the `person` variable to `p`, there is less typing in the query syntax , if that is a metric you are concerned with []
  6. Yes, a nested `foreach` can achieve this simple example, but this is just illustrative, and I'd argue cleaner than a `foreach` approach []
  7. That's why I cheated and wrote it in query syntax, then used Resharper to change it to dot notation for me []
  8. Back in my day, it was called searching…grumble grumble []

LINQ: Understanding Your Query Chain

This is part of a short series on the basics of LINQ:

This is the third part in my small series on LINQ and it covers what I feel is the most important thing to understand when using LINQ, query chains. We are going to build on the deferred execution concepts discussed in the last entry and look at why it is important to know your query operations.

Each method in a LINQ query is either immediately executed or deferred. When deferred, a method is either lazily evaluated one element at a time or eagerly evaluated as the entire collection. Usually, you can determine which is which from the documentation or, if that fails, a little experimentation. Why does it matter? This question from StackOverflow provides us with an example:

For those that did not read it or do not understand the problem, let me summarize. The original poster had a problem where values they had obtained from a LINQ query result, when passed into the `Except()` method on that same query, did not actually exclude anything. It was as if they had taken the sequence `1,2,3,4`, called `Exclude(2)`, and that had returned `1,2,3,4` instead of the expected `1,3,4`. On the surface, the code looked like it should work, so what was going on? To explain, we need a little more detail.

The example code has a class that described a user. An XML file contained user details and this is loaded into a sequence of `User` instances using LINQ-to-XML.

IEnumerable<IUser> users = doc.Element("Users").Elements("User").Select
        (u => new User
            { ID = (int)u.Attribute("id")
              Name = (string)u.Attribute("name")
            }
        ).OfType<IUser>();       //still a query, objects have not been materialized

As noted in the commentary, the poster understood that at this point, the query is not yet evaluated. With their query ready to be iterated, they use it to determine which users should be excluded using a different query.

public static IEnumerable<IUser> GetMatchingUsers(IEnumerable<IUser> users)
{
     IEnumerable<IUser> localList = new List<User>
     {
        new User{ ID=4, Name="James"},
        new User{ ID=5, Name="Tom"}
     }.OfType<IUser>();
     var matches = from u in users
                   join lu in localList
                       on u.ID equals lu.ID
                   select u;
     return matches;
}

And then using those results, the originally loaded list of users is filtered.

var excludes = users.Except(matches);    // excludes should contain 6 users but here it contains 8 users

Now, it is clear this code is not perfect and we could rewrite it to function without so many LINQ queries (I will give an example of that later in this post), but we do not care about the elegance of the solution; we are using this code as an example of why it is important to understand what a LINQ query is doing.

As noted in the commentary, when the line declaring the initial `users` query is executed, the query it defines has not. The query does not actually become a real list of users until it gets consumed1. Where does that happen? Go on, guess.

If you guessed `GetMatchingUsers`, you are wrong. All that method does is build an additional level of querying off the initial query and return that new query. If you guessed the `Except()` method, that's wrong too, because `Except()` is also deferred. In fact, the example only implies that something eventually looks at the results of `Except()` and as such, the query is never evaluated. So, for us to continue, let's assume that after the `excludes` variable (containing yet another unexecuted query), we have some code like this to consume the results of the query:

foreach (var user in excludes)
{
   Console.WriteLine(user.ID);
}

By iterating over `excludes`,  the query is executed and gives us some results. Now that we are looking at the query results, what happens?

First, the `Except()` method takes the very first element from the `users` query, which in turn, takes the very first `User` element from the XML document and turns it into a `User` instance. This instance is then cast to `IUser` using `OfType`2.

Next, the `Except()` method takes each of the elements in the `matches` query result and compares it to the item just retrieved from the `users` collection. This means the entire `matches` query is turned into a concrete list. This causes the `users` query to be reevaluated to extract the matched users. The instances of `User` created from the `matches` query are compared with each instance from the `users` query and the ones that do not match are returned for us to output to the console.

It seems like it should work, but it does not, and the key to why is in how queries and, more importantly, deferred execution work.

Each evaluation of a deferred query is unique. It is very important to remember this when using LINQ. If there is one thing to take away from reading my blog today, it is this. In fact, it's so important, I'll repeat it:

Each evaluation of a deferred query is unique.

It is important because it means that each evaluation of a deferred query (in most cases) results in an entirely new sequence with entirely new items. Consider the following iterator method:

IEnumerable<object> GetObject()
{
    yield return new Object();
}

It returns an enumerable that upon being consumed will produce a single `Object` instance. If we had a variable of the result of `GetObject()`, such as `var obj  = GetObject()` and then iterated `obj` several times, each time would give us a different `Object` instance. They would not match because on each iteration, the deferred execution is reevaluated.

If we go back to the question from StackOverflow armed with this knowledge, we can identify that `users` is evaluated twice by the `Except()` call. One time to get the list of exceptions out of the `matches` query and another to process the list that is being filtered. It is the equivalent of this:

IEnumerable<object> GetObjects()
{
    return new[]
    {
        new Object(),
        new Object(),
        new Object(),
        new Object(),
        new Object()
    };
}

void main()
{
   var objects = GetObjects().Except(GetObjects());
   foreach (var o in objects)
   {
        Console.WriteLine(o);
   }
}

From this code, we would never expect `objects` to contain nothing since the two calls to the immediately executed `GetObjects` would return collections of completely different instances. When execution is deferred, we get the same effect; each evaluation of a query is as if it were a separate method call.

To fix this problem, we need to make sure our query is executed once to make the results "concrete", then use those concrete results to do the rest of the work. This is not only important to ensure that the objects being manipulated are the same in all uses of the queried data, but also to ensure that we don't do work more than once3. To make the query concrete, we call an immediately executed method such as `ToList()`, evaluating the query and capturing its results in a collection.

This is our solution, as the original poster of our StackOverflow question indicated. If we change the original `users` query to be evaluated and stored, everything works as it should. With the help of some investigation and knowledge of how LINQ works, we now also know why.

Now that we understand a little more about LINQ we can consider how we might rewrite the original poster's example code. For example, we really should not to iterate the `users` list twice at all; we should see the failure of `Except()` as a code smell that we are iterating the collection too often. Though making it concrete with `ToList()` fixes the bug, it does not fix this inefficiency.

To do that, we can rewrite it to something like this:

public static bool IsMatchingUser(User user)
{
     var localList = new List<User> {
        new User{ ID=4, Name="James"},
        new User{ ID=5, Name="Tom"} }
      .Cast<IUser>()
      .AsReadOnly();

    return localList.Any(u=>u.ID == user.ID && u.Name == user.Name);
}

var users = doc
    .Element("Users")
    .Elements("User")
    .Select(u => new User {
        ID = (int)u.Attribute("id")
        Name = (string)u.Attribute("name") })
    .Where(u => IsMatchingUser(u));

This update only iterates over each user once, resulting in a collection that excludes the users we don't want4.

In conclusion…

My intention here was to show why it is fundamental to know which methods are immediately executed, which ones are deferred, and whether those deferred methods are lazily or eagerly evaluated. At the end of this post are some examples of each kind of LINQ method, but a good rule of thumb is that if the method returns a type other than `IEnumerable` or `IQueryable` (e.g. `int` or `List`), it is immediately executed; all other cases are probably using deferred execution. If a method does use deferred execution, it is also helpful to know which ones iterate the entire collection every time and which ones stop iterating when they have their answer, but for this you will need to consult documentation and possibly experiment with some code.

Just knowing these different types of methods can be a part of your query will often be enough to help you write better LINQ and debug queries faster.  By knowing your LINQ methods, you can improve performance and reduce memory overhead, especially when working with large data sets and slow network resources. Without this knowledge, you are likely to evaluate queries and iterate sequences too often, and instantiate objects too many times.

Hopefully you were able to follow this post and it has helped you get a better grasp on LINQ. In the final post of this series, I will ease up on the deep code analysis, and look at query syntax versus dot notation (aka fluent notation). In the meantime, if you have any comments, I'd love to read them.

Examples of LINQ Method Varieties

Immediate Execution

`Count()`, `Last()`, `ToList()`, `ToDictionary()`, `Max()`, `Aggregate()`
Immediate iterate the entire collection every time

`Any()`, `All()`, `First()`, `Single()`
Iterate the collection until a condition is or is not met

Deferred Execution, Eager Evaluation

`Distinct()`, `OrderBy()`, `GroupBy()`
Iterate the entire collection but only when the query is evaluated

Deferred Execution, Lazy Evaluation

`Select()`,`Where()`
Iterate the entire collection

`Take()`,`Skip()`
Iterate until the specified count is reached

  1. "consumed" is often used as an alternative to "iterated" []
  2. `Cast()` should have been used here since all the objects loaded are of the same type []
  3. This is something that becomes very important when working with large queries that can be time or resource consuming to run []
  4. with more effort, I am certain there are more things that could be done to improve this code, but we're not here for that, so I'll leave is as an exercise for you; I'm generous like that []

Unit testing attribute driven late-binding

I've been working on a RESTful API using ASP WebAPI. It has been a great experience so far. Behind the API is a custom framework that involves some late-binding. I decorate certain types with an attribute that associates the decorated type with another type1. The class orchestrating the late-binding takes a collection of IDecorated instances. It uses reflection to look at their attributes to determine the type they are decorated with and then instantiates that type.

It's not terribly complicated. At least it wasn't until I tried to test it. As part of my development I have been using TDD, so I wanted unit tests for my late-binding code, but I soon hit a hurdle. In mocking IDecorated, how do I make sure the mocked concrete type has the appropriate attribute?

var mockedObject = new Mock();

// TODO: Add attribute

binder.DoSpecialThing( mockedObject.Object ).Should().BeAwesome();

I am using Moq for my mocking framework accompanied by FluentAssertions for my asserts2. Up until this point, Moq seemed to have everything covered, yet try as I might I couldn't resolve this problem of decorating the generated type. After some searching around I eventually found a helpful Stack Overflow question and answer that directed me to TypeDescriptor.AddAttribute, a .NET-framework method that provides one with the means to add attributes at run-time!

var mockedObject = new Mock();

TypeDescriptor.AddAttribute(
    mockedObject.Object.GetType(),
    new MyDecoratorAttribute( typeof(SuperCoolThing) );

binder.DoSpecialThing( new [] { mockedObject.Object } )
    .Should()
    .BeAwesome();

Yes! Runtime modification of type decoration. Brilliant.

So, why didn't it work?

My binding class that I was testing looked a little like this:

public IEnumerable<Blah> DoSpecialThing( IEnumerable<IDecorated> decoratedThings )
{
    return from thing in decoratedThings
           let converter = GetBlahConverter( d.GetType() )
           where d != null
           select converter.Convert( d );
}

private IConverter GetBlahConverter( Type type )
{
    var blahConverterAttribute = Attribute
        .GetCustomAttributes( type, true )
        .Cast<BlahConverterAttribute>()
        .FirstOrDefault();

    if ( blahConverterAttribute != null )
    {
        return blahConverterAttribute.ConverterType;
    }

    return null;
}

Looks fine, right? Yet when I ran it in the debugger and took a look, the result of GetCustomAttributes was an empty array. I was stumped.

After more time trying different things that didn't work than I'd care to admit, I returned to the StackOverflow question and started reading the comments; why was the answer accepted answer when it clearly didn't work? Lurking in the comments was the missing detail; if you use TypeDescriptor.AddAttributes to modify the attributes then you have to use TypeDescriptor.GetAttributes to retrieve them.

I promptly refactored my code with this detail in mind.

public IEnumerable<Blah> DoSpecialThing( IEnumerable<IDecorated> decoratedThings )
{
    return from thing in decoratedThings
           let converter = GetBlahConverter( d.GetType() )
           where d != null
           select converter.Convert( d );
}

private IConverter GetBlahConverter( Type type )
{
    var blahConverterAttribute = TypeDescriptor
        .GetAttributes( type )
        .OfType<BlahConverterAttribute>()
        .FirstOrDefault();

    if ( blahConverterAttribute != null )
    {
        return blahConverterAttribute.ConverterType;
    }

    return null;
}

Voila! My test passed and my code worked. This was one of those things that had me stumped for longer than it should have. I am sharing it in the hopes of making sure there are more hits when someone else goes Internet fishing for help. Now, I'm off to update Stack Overflow so that this is clearer there too.

  1. Something similar to TypeConverterAttribute usage in the BCL []
  2. Though I totally made up the BeAwesome() assertion in this blog post []

A little help with GetHashCode

Implementing the GetHashCode override has proven to be a confusing issue as there are a lot of intricacies to following the rules. There have been plenty of blog posts and other discussions regarding when and how to follow the rules; I don't intend to cover that ground again. Instead, please take a look at some of these posts and discussions:

However, I thought I'd share something that I've found the following useful when implementing GetHashCode for my different types. The following static class encapsulates the algorithm recommended by Jon Skeet in this StackOverflow answer (though it can be found in other places too – it is also the algorithm used by the C# compiler when implementing GetHashCode for anonymous types, for example). This algorithm uses two prime numbers; one for the initial hash and the other as a multiplier when combining a hash code. These prime numbers are used to reduce hash code collisions, something that a basic XOR hash does not do.

My HashCode class extension methods use generics to avoid boxing for value types and also support combining the hash codes of collection elements, if this is needed.

//*****************************************************************************
//
//  Copyright © 2012 Jeff Yates
//
//*****************************************************************************
using System;
using System.Collections.Generic;

namespace SomewhatAbstract
{
    /// <summary>
    /// Provides extensions for producing hash codes to support overrides
    /// of <see cref="System.Object.GetHashCode"/>.
    /// </summary>
    public static class HashCode
    {
        /// <summary>
        /// Provides the initial value for a hash code.
        /// </summary>
        public static readonly int InitialHash = 17;
        private const int Multiplier = 37;

        /// <summary>
        /// Combines the hash code for the given value to an existing hash code
        /// and returns the hash code.
        /// </summary>
        /// <param name="code">The previous hash value.</param>
        /// <param name="value">The value to hash.</param>
        /// <returns>The new hash code.</returns>
        private static int Hash<T>(int code, T value)
        {
            int hash = 0;
            if (value != null)
            {
                hash = value.GetHashCode();
            }

            unchecked
            {
                code = (code * HashCode.Multiplier) + hash;
            }
            return code;
        }

        /// <summary>
        /// Combines the hash codes for the elements in the given sequence with an
        /// existing hash code and returns the hash code.
        /// </summary>
        /// <param name="code">The code.</param>
        /// <param name="sequence">The sequence.</param>
        /// <returns>The new hash code.</returns>
        public static int HashWithContentsOf<T>(this int code, IEnumerable<T> sequence)
        {
            foreach (T element in sequence)
            {
                code = code.HashWith(element);
            }
            return code;
        }

        /// <summary>
        /// Combines the given value's hash code with an existing hash code value.
        /// </summary>
        /// <typeparam name="T">The type of the value being hashed.</typeparam>
        /// <param name="code">The previous hash code.</param>
        /// <param name="value">The value to hash.</param>
        /// <returns>The new hash code.</returns>
        public static int HashWith<T>(this int code, T value)
        {
            return Hash(code, value);
        }
    }
}

Here is an example of how you might use these extension methods to implement an override of GetHashCode. It doesn't necessarily save any typing over just hand-implementing the algorithm, but I find it helps avoid inconsistencies and forgetfulness while giving a little clarity to the code.

public struct MyCustomType
{
    private readonly ReadOnlyCollection myListOfThings;
    private readonly bool aFlagOfSomething;
    private readonly object aThing;

    public override int GetHashCode()
    {
        return HashCode.InitialHash
            .HashWith(this.aFlagOfSomething)
            .HashWith(this.aThing)
            .HashWithContentsOf(this.myListOfThings);
    }
}

Let me know if you find this useful. Perhaps you have a different approach when implementing GetHashCode. If you do, I'd love to hear about it.