Worked example: Roman numerals

More pattern matching in practice

Last time we looked at parsing a command line. This time we'll we'll look at another pattern matching example, this time using Roman numerals.

As before, we will try to have a "pure" internal model with separate stages to convert the input to the internal model, and then another separate stage to convert from the internal model to the output.

Requirements

Let's start with the requirements:

First version

As before we'll start by first creating the internal model, and then look at how we can parse the input into the internal model.

Here's a first stab at the model. We'll treat a RomanNumeral as a list of RomanDigits.

No, stop right there! A RomanDigit is not just any digit, it has to be taken from a limited set.

Also RomanNumeral should not just be a type alias for a list of digits. It would be better if it was its own special type as well. We can do this by creating a single case union type.

Here's a much better version:

Output: Converting a numeral to an int

Now let's do the output logic, converting a Roman numeral to an int.

The digit conversion is easy:

Note that we're using the function keyword instead of the match..with expression.

To convert a list of digits, we'll use a recursive loop again. There is a special case when we need to look ahead to the next digit, and if it is larger than the current one, then use their difference.

Note that the "less than" operation did not have to be defined. The types are automatically sorted by their order of declaration.

Finally, we can convert the RomanNumeral type itself by unpacking the contents into a list and calling digitsToInt.

That takes care of the output.

Input: Converting a string to an Roman Numeral

Now let's do the input logic, converting a string to our internal model.

First, let's handle a single character conversion. It seems straightforward.

The compiler doesn't like that! What happens if we get some other character?

This is a great example of how exhaustive pattern matching can force you to think about missing requirements.

So, what should be done for bad input. How about printing an error message?

Let's try again and add a case to handle all other characters:

The compiler doesn't like that either! The normal cases return a valid RomanDigit but the error case returns unit. As we saw in the earlier post, every branch must return the same type.

How can we fix this? We could throw an exception, but that seems a bit excessive. If we think about it some more, there's no way that charToRomanDigit can always return a valid RomanDigit. Sometimes it can, and sometimes it can't. In other words, we need to use something like an option type here.

But on further consideration, we might also want the caller to know what the bad char was. So we need to create our own little variant on the option type to hold both cases.

Here's the fixed up version:

Note that I have removed the error message. Since the bad char is being returned, the caller can print its own message for the BadChar case.

Next, we should check the function signature to make sure it is what we expect:

That looks good.

Now, how can we convert a string into these digits? We convert the string to a char array, convert that into a list, and then do a final conversion using charToRomanDigit.

But the compiler complains again with "FS0072: Lookup on object of indeterminate type",

That typically happens when you use a method rather than a function. Any object could implement .ToCharArray() so the type inference cannot tell what type is meant.

In this case, the solution is just to use an explicit type annotation on the parameter -- our first so far!

But look at the signature:

It still has the pesky ParsedChar in it rather than RomanDigits. How do we want to proceed? Answer, let's pass the buck again and let someone else deal with it!

"Passing the buck" in this case is actually a good design principle. This function doesn't know what its clients might want to do -- some might want to ignore errors, while others might want to fail fast. So just pass back the information and let them decide.

In this case, the client is the top level function that creates a RomanNumeral type. Here's our first attempt:

The compiler is not happy -- the RomanNumeral constructor requires a list of RomanDigits, but the toRomanDigitList is giving us a list of ParsedChars instead.

Now we finally do have to commit to an error handling policy. Let's choose to ignore bad chars, but print out errors when they occur. We'll use the List.choose function for this. It's similar to List.map, but in addition has a filter built into it. Elements that are valid (Some something) are returned, but elements that are None are filtered out.

Our choose function should thus do the following:

  • For valid digits return Some digit

  • For the invalid BadChars, print the error message and return None.

If we do this, the output of List.choose will be a list of RomanDigits, exactly as needed as the input to the RomanNumeral constructor.

Here is everything put together:

Let's test!

Ok, everything is good so far. Let's move on to validation.

Validation rules

The validation rules were not listed in the requirements, so let's put down our best guess based on what we know about Roman numerals:

  • Five in a row of any digit is not allowed

  • Some digits are allowed in runs of up to 4. They are I,X,C, and M. The others (V,L,D) can only appear singly.

  • Some lower digits can come before a higher digit, but only if they appear singly. E.g. "IX" is ok but "IIIX" is not.

  • But this is only for pairs of digits. Three ascending numbers in a row is invalid. E.g. "IX" is ok but "IXC" is not.

  • A single digit with no runs is always allowed

We can convert these requirements into a pattern matching function as follows:

Again, note that "equality" and "less than" did not need to be defined.

And let's test the validation:

Finally, we add a top level function to test validity of the RomanNumeral type itself.

The entire code for the first version

Here's all the code in one module:

Second version

The code works, but there is something that's bugging me about it. The validation logic seems very complicated. Surely the Romans didn't have to think about all of this?

And also, I can think of examples that should fail validation, but pass, such as "VIV":

We could try to tighten up our validation rules, but let's try another tack. Complicated logic is often a sign that you don't quite understand the domain properly.

In other words -- could we change the internal model to make everything simpler?

What about if we stopped trying to map letters to digits, and created a domain that mapped how the Romans thought it. In this model "I", "II", "III", "IV" and so on would each be a separate digit.

Let's run with it and see what happens.

Here's the new types for the domain. I now have a digit type for every possible digit. The RomanNumeral type stays the same.

Output: second version

Next, converting a single RomanDigit to an integer is the same as before, but with more cases:

Calculating the sum of the digits is now trivial. No special cases needed:

Finally, the top level function is identical:

Input: second version

For the input parsing, we'll keep the ParsedChar type. But this time, we have to match 1,2,3, or 4 chars at a time. That means we can't just pull off one character like we did in the first version -- we have to match in the main loop. This means the loop now has to be recursive.

Also, we want to convert IIII into a single IIII digit rather than 4 separate I digits, so we put the longest matches at the front.

Well, this is much longer than the first version, but otherwise basically the same.

The top level functions are unchanged.

Validation: second version

Finally, let's see how the new domain model affects the validation rules. Now, the rules are much simpler. In fact, there is only one.

  • Each digit must be smaller than the preceding digit

Alas, after all that, we still didn't fix the bad case that triggered the rewrite!

There is a not-too-complicated fix for this, but I think it's time to leave it alone now!

The entire code for the second version

Here's all the code in one module for the second version:

Comparing the two versions

Which version did you like better? The second one is more longwinded because it has many more cases, but on the other hand, the actual logic is the same or simpler in all areas, with no special cases. And as a result, the total number of lines of code is about the same for both versions.

Overall, I prefer the second implementation because of the lack of special cases.

As a fun experiment, try writing the same code in C# or your favorite imperative language!

Making it object-oriented

Finally, let's see how we might make this object oriented. We don't care about the helper functions, so we probably just need three methods:

  • A static constructor

  • A method to convert to a int

  • A method to convert to a string

And here they are:

Note: you can ignore the compiler warning about deprecated overrides.

Let's use this in an object oriented way now:

Summary

In this post we've seen lots and lots of pattern matching!

But again, as with the last post, what's equally important is that we've seen how easy it is to create a properly designed internal model for very trivial domains. And again, our internal model used no primitive types -- there is no excuse not to create lots of little types in order to represent the domain better. For example, the ParsedChar type -- would you have bothered to create that in C#?

And as should be clear, the choice of an internal model can make a lot of difference to the complexity of the design. But if and when we do refactor, the compiler will almost always warn us if we have forgotten something.

Last updated

Was this helpful?