Catamorphism examples
Applying the rules to other domains
Last updated
Was this helpful?
Applying the rules to other domains
Last updated
Was this helpful?
This post is the second in a series.
In the , I introduced "catamorphisms", a way of creating functions for recursive types, and listed some rules which can be used to implement them mechanically. In this post, we'll use these rules to implement catamorphisms for some other domains.
Here's the contents of this series:
Part 1: Introduction to recursive types and catamorphisms
Part 2: Catamorphism examples
Part 3: Introducing folds
Part 4: Understanding folds
Part 5: Generic recursive types
Part 6: Trees in the real world
We saw in the previous post that creating a catamorphism is a mechanical process, and the rules were:
Create a function parameter to handle each case in the structure.
For non-recursive cases, pass the function parameter all the data associated with that case.
For recursive cases, perform two steps:
First, call the catamorphism recursively on the nested value.
Then pass the handler all the data associated with that case, but with the result of the catamorphism replacing the original nested value.
Let's now see if we can apply these rules to create catamorphisms in other domains.
Let's start with a very crude model of a file system:
Each file has a name and a size.
Each directory has a name and a size and a list of subitems.
Here's how I might model that:
I admit it's a pretty bad model, but it's just good enough for this example!
Ok, here are some sample files and directories:
Time to create the catamorphism!
Let's start by looking at the signatures to figure out what we need.
The File
constructor takes a File
and returns a FileSystemItem
. Using the guidelines above, the handler for the File
case needs to have the signature File -> 'r
.
That's simple enough. Let's put together an initial skeleton of cataFS
, as I'll call it:
The Directory
case is more complicated. If we naively applied the guidelines above, the handler for the Directory
case would have the signature Directory -> 'r
, but that would be incorrect, because the Directory
record itself contains a FileSystemItem
that needs to be replaced with an 'r
too. How can we do this?
One way is to "explode" the Directory
record into a tuple of (string,int,FileSystemItem list)
, and then replace the FileSystemItem
with 'r
in there too.
In other words, we have this sequence of transformations:
Another issue is that the data associated with the Directory case is a list of FileSystemItem
s. How can we convert that into a list of 'r
s?
Well, the recurse
helper turns a FileSystemItem
into an 'r
, so we can just use List.map
passing in recurse
as the mapping function, and that will give us the list of 'r
s we need!
Putting it all together, we get this implementation:
and if we look at the type signature, we can see that it is just what we want:
So we're done. It's a bit complicated to set up, but once built, we have a nice reusable function that can be the basis for many others.
totalSize
exampleAlrighty then, let's use it in practice.
To start with, we can easily define a totalSize
function that returns the total size of an item and all its subitems:
And here are the results:
largestFile
exampleHow about a more complicated function, such as "what is the largest file in the tree?"
Before we start this one, let's think about what it should return. That is, what is the 'r
?
You might think that it's just a File
. But what if the subdirectory is empty and there are no files?
So let's make 'r
a File option
instead.
The function for the File
case should return Some file
then:
The function for the Directory
case needs more thought:
If list of subfiles is empty, then return None
If list of subfiles is non-empty, then return the largest one
But remember that 'r
is not a File
but a File option
. So that means that subfiles
is not a list of files, but a list of File option
.
Now, how can we find the largest one of these? We probably want to use List.maxBy
and pass in the size. But what is the size of a File option
?
Let's write a helper function to provide the size of a File option
, using this logic:
If the File option
is None
, return 0
Else return the size of the file inside the option
Here's the code:
Putting it all together then, we have our largestFile
function:
If we test it, we get the results we expect:
Again, a little bit tricky to set up, but no more than if we had to write it from scratch without using a catamorphism at all.
Let's work with a slightly more complicated domain. This time, imagine that we make and sell products of some kind:
Some products are bought, with an optional vendor.
Some products are made on our premises, built from subcomponents,
where a subcomponent is some quantity of another product.
Here's the domain modelled as types:
Note that the types are mutally recursive. Product
references MadeProduct
which references Component
which in turn references Product
again.
Here are some example products:
Now to design the catamorphism, we need to do is replace the Product
type with 'r
in all the constructors.
Just as with the previous example, the Bought
case is easy:
The Made
case is trickier. We need to expand the MadeProduct
into a tuple. But that tuple contains a Component
, so we need to expand that as well. Finally we get to the inner Product
, and we can then mechanically replace that with 'r
.
Here's the sequence of transformations:
When implementing the cataProduct
function we need to the same kind of mapping as before, turning a list of Component
into a list of (int,'r)
.
We'll need a helper for that:
You can see that this uses the recurse
function to turn the inner product (comp.product
) into an 'r
and then make a tuple int * 'r
.
With convertComponentToTuple
available, we can convert all the components to tuples using List.map
:
componentTuples
is a list of (int * 'r)
, which is just what we need for the fMade
function.
The complete implementation of cataProduct
looks like this:
productWeight
exampleWe can now use cataProduct
to calculate the weight, say.
Let's test it interactively to make sure it works:
That's as we expect.
Try implementing productWeight
from scratch, without using a helper function like cataProduct
. Again, it's do-able, but you'll probably waste quite bit of time getting the recursion logic right.
mostUsedVendor
exampleLet's do a more complex function now. What is the most used vendor?
The logic is simple: each time a product references a vendor, we'll give that vendor one point, and the vendor with the highest score wins.
Again, let's think about what it should return. That is, what is the 'r
?
You might think that it's just a score of some kind, but we also need to know the vendor name. Ok, a tuple then. But what if there are no vendors?
So let's make 'r
a VendorScore option
, where we are going to create a little type VendorScore
, rather than using a tuple.
We'll also define some helpers to get data from a VendorScore
easily:
Now, you can't determine the most used vendor over until you have results from the entire tree, so both the Bought
case and the Made
case need to return a list which can added to as we recurse up the tree. And then, after getting all the scores, we'll sort descending to find the vendor with the highest one.
So we have to make 'r
a VendorScore list
, not just an option!
The logic for the Bought
case is then:
If the vendor is present, return a VendorScore
with score = 1, but as a one-element list rather than as a single item.
If the vendor is missing, return an empty list.
The function for the Made
case is more complicated.
If list of subscores is empty, then return an empty list.
If list of subscores is non-empty, we sum them by vendor and then return the new list.
But the list of subresults passed into the fMade
function will not be a list of subscores, it will be a list of tuples, qty * 'r
where 'r
is VendorScore list
. Complicated!
What we need to do then is:
Turn qty * 'r
into just 'r
because we don't care about the qty in this case. We now have a list of VendorScore list
. We can use List.map snd
to do this.
But now we would have a list of VendorScore list
. We can flatten a list of lists into a simple list using List.collect
. And in fact, using List.collect snd
can do both steps in one go.
Group this list by vendor so that we have a list of key=vendor; values=VendorScore list
tuples.
Sum up the scores for each vendor (values=VendorScore list
) into a single value, so that we have a list of key=vendor; values=VendorScore
tuples.
At this point the cata
function will return a VendorScore list
. To get the highest score, use List.sortByDescending
then List.tryHead
. Note that maxBy
won't work because the list could be empty.
Here's the complete mostUsedVendor
function:
Now let's test it:
This isn't the only possible implementation of fMade
, of course. I could have used List.fold
and done the whole thing in one pass, but this version seems like the most obvious and readable implementation.
It's also true that I could have avoided using cataProduct
altogether and written mostUsedVendor
from scratch. If performance is an issue, then that might be a better approach, because the generic catamorphism creates intermediate values (such as the list of qty * VendorScore option
) which are over general and potentially wasteful.
On other hand, by using the catamorphism, I could focus on the counting logic only and ignore the recursion logic.
So as always, you should consider the pros and cons of reuse vs. creating from scratch; the benefits of writing common code once and using it in a standardized way, versus the performance but extra effort (and potential bugginess) of custom code.
We've seen in this post how to define a recursive type, and been introduced to catamorphisms.
And we have also seen some uses for catamorphisms:
Any function that "collapses" a recursive type, such as Gift -> 'r
, can be written in terms of the catamorphism for that type.
Catamorphisms can be used to hide the internal structure of the type.
Catamorphisms can be used to create mappings from one type to another by tweaking the functions that handle each case.
Catamorphisms can be used to create a clone of the original value by passing in the type's case constructors.
But all is not perfect in the land of catamorphisms. In fact, all the catamorphism implementations on this page have a potentially serious flaw.
See you then!
UPDATE: Fixed logic error in mostUsedVendor
as pointed out by Paul Schnapp in comments. Thanks, Paul!
In the we'll see what can go wrong with them, how to fix them, and in the process look at the various kinds of "fold".
The source code for this post is available at .