Breaking The Code

Last week saw the return of Learn Something, a monthly software hacking event held at the offices of Fanzoo Technology in Ann Arbor. Each month, attendees can choose to hack on their own projects or take part in the monthly Learn Something challenge. Teams and pairing are encouraged, providing opportunities to work with new people, tools and techniques.

The message we had to decode
The message we had to decode

This month the challenge was to decipher a message encoded using a simple substitution cipher. Participants started with a copy of the encoded message (known as the ciphertext) and a text file of words from the English language. The message (shown above) is one example of a cipher, known as 'alienese', that appeared in several episodes of Futurama. Though fans of the show have already solved the cipher and we could have searched the Internet for the solution, our goal at Learn Something was to create a software solution that could solve it (or at least narrow down the possibilities1).

What is a simple substitution cipher? A simple substitution cipher is a way of encoding a message by replacing (substituting) each letter of the alphabet with a different letter or symbol. For example, if we had the substitutions t to a, h to g, and e to o, the word the would become ago.

Alex Zolynsky, one of the organizers of Learn Something, provided a clearer print out of the message with punctuation symbols written in red (see below). The intention was to ignore the punctuation and focus on the actual words, so before we started, one of the participants assigned a letter to each symbol such that we had the message written in a more familiar alphabet (and one that was easier to type on our laptops).

The annotated message
The annotated message

The resulting message read:

abbc bdefg hgij kble cmna ompf mlc pangaebc jpkgai nb qgo emq cmllgf

Clearly, our work was far from done, but before I continue with this blog, I need to come clean.

At Learn Something last week, I was buried in some work of my own, and other than the occasional interjection, I was not involved in the challenge. However, at the end of the night, I took the cipher home with me and the following evening, I got stuck in. It took me about three hours to crack the message. How did I do it?

Well, there are many ways to tackle breaking a substitution cipher. For example, if the ciphertext is long, we could use frequency analysis to gain some hints at the substitutions2. However, this ciphertext was short and as such, frequency analysis was not particularly helpful3. Instead, I decided to follow the same track as the participants at Learn Something and take the longest word in the phrase then find words in English that had the same pattern and use that to start reducing the possibilities. So, I opened up LINQPad (my tool of choice for hacking around), and got started.

The longest word in the ciphertext is pangaebc. To match words that follow the same pattern, I treated the word as its own ciphertext and made another substitution, just as the symbols had been substituted for letters earler. This allowed me to normalize different words to see if they matched the same patter. Only words that normalized to the same string would be of the same pattern. So, for pangaebc, the normalized version is abcdbefg. Examples of English words that match this pattern are airfield, windiest, and the grosspustular, each of these has the pattern abcdbefg when normalized. In fact, in the dictionary I used there were over 500 matches, over 500 possibilities for the word that could be in our decoded message. Each of the possible matches for the longest word provided a possible part of the substitution cipher. The next step was to take another word from the ciphertext and find English words that matched the pattern of that ciphertext word as well as the substitutions found from the first word, pangaebc. This started to build up a tree of candidate words for the decoded message. Each node in the tree contained words that match the substitutions required by its parent node but also matched the pattern of a word in the ciphertext. By recursing this approach for each word in the ciphertext sentence, a tree of possible plaintext4 sentences could be generated.

Once all words in the ciphertext had been processed, I could take the branches of the resulting tree that included the same number of nodes as words in the ciphertext and determine the possible substitutions that might give the right plaintext solution. This produced 15 possible sentences. It was then up to me to read each one and pick the one that made the most sense. Of course, I'm not going to spoil it by telling you the solutions I found. Instead, I encourage you to give the challenge a go for yourselves and see what you come up with (we both know you could cheat by searching the Internet for an answer, but you're better than that, right?).

I really enjoyed tackling this problem. Not only was it a fun distraction, but now I have a LINQPad query that can solve any substitution cipher as long as I know what language in which the message is written. I am definitely looking forward to the next time I attend Learn Something. Hopefully, your interest is piqued and I will see you there. In the meantime, if you give this challenge a go, I would love to hear how you tackled it, what you did differently to me, and what you learned. Until next time, thanks for reading.

Featured image: "Confederate cipher disk" by RadioFan (talk) – I (RadioFan (talk)) created this work entirely by myself. Licensed under CC BY-SA 3.0 via Wikipedia.


  1. to fully solve it programmatically would have needed our software solutions to have understanding of English sentence structure, which was thankfully outside the scope of the challenge 

  2. assuming we know the language in which it is written 

  3. I did try it just to see, but frequency analysis needs a longer ciphertext than we had for this challenge 

  4. decoded text 

Some of my favourite tools

Update: This post has been updated to recognise that CodeLineage is now maintained by Hippo Camp Software and not Red Gate Software as was originally stated.

If you know me, you might well suspect this post is about some of the idiots I know, but it is not, this is entirely about some of the tools I use in day-to-day development. This is by no means an exhaustive list, nor is it presented in any particular order. However, assuming you are even a little bit like me as a developer, you will see a whole bunch of things you already use, but hopefully there is at least one item that is new to you. If you do find something new and useful here, or you have some suggestions of your own, please feel free to post a comment.

OzCode

OzCode is an add-in for Visual Studio that provides some debugging super powers like collection searching, adding computed properties to objects, pinning properties so that you don't have to go hunting in the object tree, simpler tracepoint creation, and a bunch more. I first tried this during beta and was quickly sold on its value. Give the 30-day trial a chance and see if it works for you.

Resharper

This seems to be a staple for most C# developers. I was a late-comer to using this tool and I am not sure I like it for the same reasons as everyone else. I actually love Resharper for its test runner, which is a more performant alternative to Visual Studio's built-in Test Explorer, and the ability to quickly change file names to match the type they contain. However, it has a lot of features, so while this is not free, give the trial a chance and see if it fits.

Web Essentials

Another staple for many Visual Studio developers, Web Essentials provides lots of support for web-related development including enhanced support for JavaScript, CSS, CoffeeScript, LESS, SASS, MarkDown, and much more. If you do any kind of web development, this is essential1.

LinqPad

I was late to the LinqPad party, but gave it a shot during Ann Arbor Give Camp 2013 and within my first hour or two of using it, dropped some cash on the premium version (it is very inexpensive for what you get). Since then, whether it is hacking code or hacking databases, I have been using LinqPad as my standard tool for hacking.

For code, it does not have the overhead of creating projects and command line, WinForms or WPF wrapper tools that you would have to do in Visual Studio. For databases, LinqPad gives you the freedom to use SQL, C#, F# or VB for querying and manipulating your database as well as support for many different data sources beyound just SQL Server, providing an excellent alternative to SQL Management Studio.

LinqPad is free, but you get some cool features if you go premium, and considering the sub-$100 price, it is totally worth it.

JustDecompile

When Red Gate stopped providing Reflector for free, JetBrains and Telerik stepped up with their own free decompilers for poking around inside .NET code. These are often invaluable when tracking down obscure bugs or wanting to learn more about the code that is running when you did not write it. While JetBrains' dotPeek is useful, I have found that JustDecompile from Telerik has a better feature set (including showing MSIL, which I could not find in dotPeek).

Chutzpah

Chutzpah is a test runner for JavaScript unit tests and is available as a Nuget package. It supports tests written for Jasmine, Mocha, and QUnit, as well as a variety of languages including CoffeeScript and TypeScript. There are also two Visual Studio extensions to provide Test Explorer integration and a handy context menu. I find the context menu most useful out of these.

Chutzpah is a great option when you cannot leverage a NodeJS-based tool-chain like Grunt or Gulp, or some other non-Visual Studio build process.

CodeLineage

CodeLineage is a free Visual Studio extension from Hippo Camp Software2. Regardless of your source control provider, CodeLineage provides you with a simple interface for comparing different points in the history of a given file. The simple interface makes it easy to select which versions to compare. I do not use this tool often, but when I need it, it is fantastic.

FileNesting

This Visual Studio extension from the developer of Web Essentials makes nesting files under one another a breeze. You can set up automated nesting rules or perform nesting manually.

I like to keep types separated by file when developing in C#. Files are cheap and it helps discovery when navigating code. However, this sometimes means using partial classes to keep nested types separate, so to keep my solution explorer tidy, I edit the project files and nest source code files. I also find this useful for Angular directives, allowing me to apply the familiar pattern  of organizing code-behind under presentation by nesting JavaScript files under the template HTML.

Whether you have your own nesting guidelines or want to ensure generated code is nested under its corresponding definition (such as JavaScript generated from CoffeeScript), this extension is brilliant.

Switch Startup Project

Ever hit F5 to debug only to find out you tried to start a non-executable project and have to hunt for the right project in the Solution Explorer? This used to happen to me a lot, but not since this handy extension, which adds a drop down to the toolbar where I can select the project I want to be my startup project. A valuable time saver.

MultiEditing

Multi-line editing has been a valuable improvement in recent releases of Visual Studio, but it has a limitation in that you can only edit contiguous lines at the same column location. Sometimes, you want to edit multiple lines in a variety of locations and with this handy extension, you can. Just hold ALT and click the locations you want to multi-edit, then type away.

Productivity Power Tools

Productivity Power Tools for Visual Studio have been a staple extension since at least Visual Studio 2008. Often the test bed of features that eventually appear as first class citizens in the Visual Studio suite, Productivity Power Tools enhances the overall Visual Studio experience.

The current version for Visual Studio 2013 provides support for colour printing, custom document tabs, copying as HTML, error visualization in the Solution Explorer, time stamps in the debug output margin, double-click to maximize and dock windows, and much more. This is a must-have for any Visual Studio user.


  1. yes, I went there 

  2. though it was maintained by Red Gate when I first started using it 

Caching with LINQPad.Extensions.Cache

One of the tools that I absolutely adore during my day-to-day development is LINQPad . If you are not familiar with this tool and you are a .NET developer, you should go to www.linqpad.net right now and install it. The basic version is free and feature-packed, though I recommend upgrading to the professional version. Not only is it inexpensive, but it also adds some great features like Intellisense1 and Nuget package support.

I generally use LINQPad as a simple coding environment for poking around my data sources, crafting quick coding experiments, and debugging. Because LINQPad does not have the overhead of a solution or project, like a development-oriented tool such as Visual Studio, it is easy to get stuck into a task. I no longer write throwaway console or WinForms apps; instead I just throw together a quick LinqPad query. I could continue on the virtues of this tool2, but I would like to touch on one of its utility features.

As part of LINQPad , you get some useful methods and types for extending LINQPad , dumping information to LINQPad's output window, and more. Two of these methods are LINQPad.Extensions.Cache and Utils.Cache. Using either Cache method, you can execute some code and cache the result locally, then use the cached value for all subsequent runs of that query. This is incredibly useful for caching the results of an expensive database query or computationally-intensive calculation. To cache an IEnumerable<T>  or IObservable<T>  you can do something like this:

Or, since it's an extension method,

For other types, Util.Cache  will cache the result of an expression.

The first time I run my LINQPad code, my lazily evaluated query or the expression is executed by the Cache method and the result is cached. From then on, each subsequent run of the code uses the cached value. Both Cache methods also take an optional name for the cached item, in case you want to differentiate items that might otherwise be indistinguishable (such as caching a loop computation).

This is, as I alluded earlier, one of many utilities provided within LINQPad that make it a joy to use. What tools do you find invaluable? Do you already use LINQPad ? What makes it a must have tool for you? I would love to hear your responses in the comments.

Updated to correct casing of LINQPad, draw attention to Cache being an extension method for some uses, and adding note of Util.Cache3.


  1. including for imported data types from your data sources 

  2. such as its support for F#, C#, SQL, etc. or its built-in IL disassembly 

  3. because, apparently, I am not observant to this stuff the first time around. SMH