Refactoring to remove cyclic dependencies
Cyclic dependencies: Part 2
In the previous post, we looked at the concept of dependency cycles, and why they are bad.
In this post, we'll look at some techniques for eliminating them from your code. Having to do this may seem annoying at first, but really, you'll come to appreciate that in the long run, "it's not a bug, it's a feature!"
Classifying some common cyclic dependencies
Let's classify the kinds of dependencies you're likely to run into. I'll look at three common situations, and for each one, demonstrate some techniques for dealing with them.
First, there is what I will call a "method dependency".
Type A stores a value of type B in a property
Type B references type A in a method signature, but doesn't store a value of type A
Second, there is what I will call a "structural dependency".
Type A stores a value of type B in a property
Type B stores a value of type A in a property
Finally, there is what I will call an "inheritance dependency".
Type A stores a value of type B in a property
Type B inherits from type A
There are, of course, other variants. But if you know how to deal with these, you can use the same techniques to deal with the others as well.
Three tips on dealing with dependencies in F# ##
Before we get started, here are three useful tips which apply generally when trying to untangle dependencies.
Tip 1: Treat F# like F#.
Tip 2: Separate types from behavior.
Tip 3: Parameterize, parameterize, parameterize.
Dependencies can only happen when a specific type is referenced. If you use generic types, you cannot have a dependency!
And rather than hard coding behavior for a type, why not parameterize it by passing in functions instead? The List
module is a great example of this approach, and I'll show some examples below as well.
Dealing with a "method dependency"
We'll start with the simplest kind of dependency -- what I will call a "method dependency".
Here is an example.
The Customer
class has a property/field of type CustomerObserver
, but the CustomerObserver
class has a method which takes a Customer
as a parameter, causing a mutual dependency.
Using the "and" keyword
One straightforward way to get the types to compile is to use the and
keyword, as I did above.
The and
keyword is designed for just this situation -- it allows you to have two or more types that refer to each other.
To use it, just replace the second type
keyword with and
. Note that using and type
, as shown below, is incorrect. Just the single and
is all you need.
But and
has a number of problems, and using it is generally discouraged except as a last resort.
First, it only works for types declared in the same module. You can't use it across module boundaries.
Second, it should really only be used for tiny types. If you have 500 lines of code between the type
and the and
, then you are doing something very wrong.
The code snippet shown above is an example of how not to do it.
In other words, don't treat and
as a panacea. Overusing it is a symptom that you have not refactored your code properly.
Introducing parameterization
So, instead of using and
, let's see what we can do using parameterization, as mentioned in the third tip.
If we think about the example code, do we really need a special CustomerObserver
class? Why have we restricted it to Customer
only? Can't we have a more generic observer class?
So why don't we create a INameObserver<'T>
interface instead, with the same OnNameChanged
method, but the method (and interface) parameterized to accept any class?
Here's what I mean:
In this revised version, the dependency has been broken! No and
is needed at all. In fact, you could even put the types in different projects or assemblies now!
The code is almost identical to the first version, except that the Customer
constructor accepts a interface, and CustomerObserver
now implements the same interface. In fact, I would argue that introducing the interface has actually made the code better than before.
Here is the same code again, but this time the CustomerObserver
class has been eliminated completely and the INameObserver
created directly.
This technique will obviously work for more complex interfaces as well, such as that shown below, where there are two methods:
Using functions instead of parameterization
In many cases, we can go even further and eliminate the interface class as well. Why not just pass in a simple function that is called when the name changes, like this:
I think you'll agree that this snippet is "lower ceremony" than either of the previous versions. The observer is now defined inline as needed, very simply:
True, it only works when the interface being replaced is simple, but even so, this approach can be used more often than you might think.
A more functional approach: separating types from functions
As I mentioned above, a more "functional design" would be to separate the types themselves from the functions that act on those types. Let's see how this might be done in this case.
Here is a first pass:
In the example above, we now have three modules: one for the types, and one each for the functions. Obviously, in a real application, there will be a lot more Customer related functions in the Customer
module than just this one!
In this code, though, we still have the mutual dependency between Customer
and CustomerObserver
. The type definitions are more compact, so it is not such a problem, but even so, can we eliminate the and
?
Yes, of course. We can use the same trick as in the previous approach, eliminating the observer type and embedding a function directly in the Customer
data structure, like this:
Making types dumber
The Customer
type still has some behavior embedded in it. In many cases, there is no need for this. A more functional approach would be to pass a function only when you need it.
So let's remove the observer
from the customer type, and pass it as an extra parameter to the changeName
function, like this:
Here's the complete code:
You might be thinking that I have made things more complicated now -- I have to specify the observer
function everywhere I call changeName
in my code. Surely this is worse than before? At least in the OO version, the observer was part of the customer object and I didn't have to keep passing it in.
But wait... there's more!
Let's look at the changeName
function again:
It has the following steps:
do something to make a result value
call the observer with the result value
return the result value
This is completely generic logic -- it has nothing to do with customers at all. So we can rewrite it as a completely generic library function. Our new function will allow any observer function to "hook into" into the result of any other function, so let's call it hook
for now.
Actually, I called it hook2
because the function f
being "hooked into" has two parameters. I could make another version for functions that have one parameter, like this:
Ok, back to the code -- how do we use this generic hook
function?
Customer.changeName
is the function being hooked into, and it has two parameters, so we usehook2
.The observer function is just as before
So, again, we create a partially applied changeName
function, but this time we create it by passing the observer and the hooked function to hook2
, like this:
Note that the resulting changeName
has exactly the same signature as the original Customer.changeName
function, so it can be used interchangably with it anywhere.
Here's the complete code:
Creating a hook
function like this might seem to add extra complication initially, but it has eliminated yet more code from the main application, and once you have built up a library of functions like this, you will find uses for them everywhere.
By the way, if it helps you to use OO design terminology, you can think of this approach as a "Decorator" or "Proxy" pattern.
Dealing with a "structural dependency"
The second of our classifications is what I am calling a "structural dependency", where each type stores a value of the other type.
Type A stores a value of type B in a property
Type B stores a value of type A in a property
For this set of examples, consider an Employee
who works at a Location
. The Employee
contains the Location
they work at, and the Location
stores a list of Employees
who work there.
Voila -- mutual dependency!
Here is the example in code:
Before we get on to refactoring, let's consider how awkward this design is. How can we initialize an Employee
value without having a Location
value, and vice versa.
Here's one attempt. We create a location with an empty list of employees, and then create other employees using that location:
But this code doesn't work as we want. We have to set the list of employees for location
as empty because we can't forward reference the alice
and bob
values..
F# will sometimes allow you to use the and
keyword in these situation too, for recursive "lets". Just as with "type", the "and" keyword replaces the "let" keyword. Unlike "type", the first "let" has to be marked as recursive with let rec
.
Let's try it. We will give location
a list of alice
and bob
even though they are not declared yet.
But no, the compiler is not happy about the infinite recursion that we have created. In some cases, and
does indeed work for let
definitions, but this is not one of them! And anyway, just as for types, having to use and
for "let" definitions is a clue that you might need to refactor.
So, really, the only sensible solution is to use mutable structures, and to fix up the location object after the individual employees have been created, like this:
So, a lot of trouble just to create some values. This is another reason why mutual dependencies are a bad idea!
Parameterizing again
To break the dependency, we can use the parameterization trick again. We can just create a parameterized vesion of Employee
.
Note that we create a type alias for Employee
, like this:
One nice thing about creating an alias like that is that the original code for creating employees will continue to work unchanged.
Parameterizing with behavior dependencies
The code above assumes that the particular class being parameterized over is not important. But what if there are dependencies on particular properties of the type?
For example, let's say that the Employee
class expects a Name
property, and the Location
class expects an Age
property, like this:
How can we possibly parameterize this?
Well, let's try using the same approach as before:
The Location
is happy with ParameterizedEmployee.Age
, but location.Name
fails to compile. obviously, because the type parameter is too generic.
One way would be to fix this by creating interfaces such as ILocation
and IEmployee
, and that might often be the most sensible approach.
But another way is to let the Location parameter be generic and pass in an additional function that knows how to handle it. In this case a getLocationName
function.
One way of thinking about this is that we are providing the behavior externally, rather than as part of the type.
To use this then, we need to pass in a function along with the type parameter. This would be annoying to do all the time, so naturally we will wrap it in a function, like this:
With this in place, the original test code continues to work, almost unchanged (we have to change new Employee
to just Employee
).
The functional approach: separating types from functions again
Now let's apply the functional design approach to this problem, just as we did before.
Again, we'll separate the types themselves from the functions that act on those types.
Before we go any further, let's remove some unneeded code. One nice thing about using a record type is that you don't need to define "getters", so the only functions you need in the modules are functions that manipulate the data, such as AverageAge
.
Parameterizing again
Once again, we can remove the dependency by creating a parameterized version of the types.
Let's step back and think about the "location" concept. Why does a location have to only contain Employees? If we make it a bit more generic, we could consider a location as being a "place" plus "a list of things at that place".
For example, if the things are products, then a place full of products might be a warehouse. If the things are books, then a place full of books might be a library.
Here are these concepts expressed in code:
Of course, these locations are not exactly the same, but there might be something in common that you can extract into a generic design, especially as there is no behavior requirement attached to the things they contain.
So, using the "location of things" design, here is our dependency rewritten to use parameterized types.
In this revised design you will see that the AverageAge
function has been completely removed from the Location
module. There is really no need for it, because we can do these kinds of calculations quite well "inline" without needing the overhead of special functions.
And if you think about it, if we did need to have such a function pre-defined, it would probably be more appropriate to put in the Employee
module rather than the Location
module. After all, the functionality is much more related to how employees work than how locations work.
Here's what I mean:
This is one advantage of modules over classes; you can mix and match functions with different types, as long as they are all related to the underlying use cases.
Moving relationships into distinct types
In the examples so far, the "list of things" field in location has had to be mutable. How can we work with immutable types and still support relationships?
Well one way not to do it is to have the kind of mutual dependency we have seen. In that design, synchronization (or lack of) is a terrible problem
For example, I could change Alice's location without telling the location she points to, resulting in an inconsistency. But if I tried to change the contents of the location as well, then I would also need to update the value of Bob as well. And so on, ad infinitum. A nightmare, basically.
The correct way to do this with immutable data is steal a leaf from database design, and extract the relationship into a separate "table" or type in our case. The current relationships are held in a single master list, and so when changes are made, no synchronization is needed.
Here is a very crude example, using a simple list of Relationship
s.
Or course, a more efficient design would use dictionaries/maps, or special in-memory structures designed for this kind of thing.
Inheritance dependencies
Finally, let's look at an "inheritance dependency".
Type A stores a value of type B in a property
Type B inherits from type A
We'll consider a UI control hierarchy, where every control belongs to a top-level "Form", and the Form itself is a Control.
Here's a first pass at an implementation:
The thing to note here is that the Form passes itself in as the form
value for the Control constructor.
This code will compile, but will cause a NullReferenceException
error at runtime. This kind of technique will work in C#, but not in F#, because the class initialization logic is done differently.
Anyway, this is a terrible design. The form shouldn't have to pass itself in to a constructor.
A better design, which also fixes the constructor error, is to make Control
an abstract class instead, and distinguish between non-form child classes (which do take a form in their constructor) and the Form
class itself, which doesn't.
Here's some sample code:
Our old friend parameterization again
To remove the circular dependency, we can parameterize the classes in the usual way, as shown below.
A functional version
I will leave a functional design as an exercise for you to do yourself.
If we were going for truly functional design, we probably would not be using inheritance at all. Instead, we would use composition in conjunction with parameterization.
But that's a big topic, so I'll save it for another day.
Summary
In the next post in this series, I'll look at dependency cycles "in the wild", by comparing some real world C# and F# projects.
Last updated
Was this helpful?