Choosing properties for property-based testing
Or, I want to use FsCheck and Quickcheck, but I can never think of any properties to use
Last updated
Was this helpful?
Or, I want to use FsCheck and Quickcheck, but I can never think of any properties to use
Last updated
Was this helpful?
UPDATE: I did a talk on property-based testing based on these posts.
In , I described the basics of property-based testing, and showed how it could save a lot of time by generating random tests.
But here's a common problem. Everyone who sees a property-based testing tool like FsCheck or QuickCheck thinks that it is amazing... but when it times come to start creating your own properties, the universal complaint is: "what properties should I use? I can't think of any!"
The goal of this post is to show some common patterns that can help you discover the properties that are applicable to your code.
In my experience, many properties can be discovered by using one of the seven approaches listed below.
This is by no means a comprehensive list, just the ones that have been most useful to me. For a different perspective, check out that the PEX team at Microsoft have compiled.
These kinds of properties are based on combining operations in different orders, but getting the same result. For example, in the diagram below, doing X
then Y
gives the same result as doing Y
followed by X
.
The commutative property of addition is an obvious example of this pattern. For example, the result of add 1
then add 2
is the same as the result of add 2
followed by add 1
.
This pattern, generalized, can produce a wide range of useful properties. We'll see some more uses of this pattern later in this post.
These kinds of properties are based on combining an operation with its inverse, ending up with the same value you started with.
In the diagram below, doing X
serializes ABC
to some kind of binary format, and the inverse of X
is some sort of deserialization that returns the same ABC
value again.
In addition to serialization/deserialization, other pairs of operations can be checked this way: addition
/subtraction
, write
/read
, setProperty
/getProperty
, and so on.
Other pair of functions fit this pattern too, even though they are not strict inverses, pairs such as insert
/contains
, create
/exists
, etc.
These kinds of properties are based on an invariant that is preserved after some transformation.
In the diagram below, the transform changes the order of the items, but the same four items are still present afterwards.
Common invariants include size of a collection (for map
say), the contents of a collection (for sort
say), the height or depth of something in proportion to size (e.g. balanced trees).
These kinds of properties are based on "idempotence" -- that is, doing an operation twice is the same as doing it once.
In the diagram below, using distinct
to filter the set returns two items, but doing distinct
twice returns the same set again.
Idempotence properties are very useful, and can be extended to things like database updates and message processing.
These kinds of properties are based on "structural induction" -- that is, if a large thing can be broken into smaller parts, and some property is true for these smaller parts, then you can often prove that the property is true for a large thing as well.
In the diagram below, we can see that the four-item list can be partitioned into an item plus a three-item list, which in turn can be partitioned into an item plus a two-item list. If we can prove the property holds for two-item list, then we can infer that it holds for the three-item list, and for the four-item list as well.
Induction properties are often naturally applicable to recursive structures such as lists and trees.
Often an algorithm to find a result can be complicated, but verifying the answer is easy.
In the diagram below, we can see that finding a route through a maze is hard, but checking that it works is trivial!
Many famous problems are of this sort, such as prime number factorization. But this approach can be used for even simple problems.
For example, you might check that a string tokenizer works by just concatenating all the tokens again. The resulting string should be the same as what you started with.
In many situations you often have an alternate version of an algorithm or process (a "test oracle") that you can use to check your results.
For example, you might have a high-performance algorithm with optimization tweaks that you want to test. In this case, you might compare it with a brute force algorithm that is much slower but is also much easier to write correctly.
Similarly, you might compare the result of a parallel or concurrent algorithm with the result of a linear, single thread version.
In this section, we'll apply these categories to see if we can come up with properties for some simple functions such as "sort a list" and "reverse a list".
Let's start with "different paths, same destination" and apply it to a "list sort" function.
Can we think of any way of combining an operation before List.sort
, and another operation after List.sort
, so that you should end up with the same result? That is, so that "going up then across the top" is the same as "going across the bottom then up".
How about this?
Path 1: We add one to each element of the list, then sort.
Path 2: We sort, then add one to each element of the list.
Both lists should be equal.
Here's some code that implements that property:
Well, that works, but it also would work for a lot of other transformations too. For example, if we implemented List.sort
as just the identity, then this property would be satisfied equally well! You can test this for yourself:
The problem with this property is that it is not exploiting any of the "sortedness". We know that a sort will probably reorder a list, and certainly, the smallest element should be first.
How about adding an item that we know will come at the front of the list after sorting?
Path 1: We append Int32.MinValue
to the end of the list, then sort.
Path 2: We sort, then prepend Int32.MinValue
to the front of the list.
Both lists should be equal.
Here's the code:
The bad implementation fails now!
In other words, the bad sort of [0; minValue]
is not the same as [minValue; 0]
.
So that's good!
In other words, the EDFH can identify which path we are on and have special cases for each one:
And when we check it...
We could fix this by (a) picking a random number smaller than any number in the list and (b) inserting it at a random location rather than always appending it. But rather than getting too complicated, let's stop and reconsider.
An alternative approach which also exploits the "sortedness" is to first negate all the values, then on the path that negates after the sort, add an extra reverse as well.
This property is harder for the EDFH to beat because there are no magic numbers to help identify which path you are on:
You might argue that we are only testing sorting for lists of integers. But the List.sort
function is generic and knows nothing about integers per se, so I have high confidence that this property does test the core sorting logic.
Ok, enough of List.sort
. What about applying the same ideas to the list reversal function?
We can do the same append/prepend trick:
Here's the code for the property:
Here are the test results for the correct function and for two incorrect functions:
You might notice something interesting here. I never specified the type of the list. The property works with any list.
In cases like these, FsCheck will generate random lists of bools, strings, ints, etc.
In both failing cases, the anyValue
is a bool. So FsCheck is using lists of bools to start with.
Here's an exercise for you: Is this property good enough? Is there some way that the EDFH can create an implementation that will pass?
Sometimes the multi-path style properties are not available or too complicated, so let's look at some other approaches.
We'll start with properties involving inverses.
Let's start with list sorting again. Is there an inverse to sorting? Hmmm, not really. So we'll skip sorting for now.
What about list reversal? Well, as it happens, reversal is its own inverse!
Let's turn that into a property:
And it passes:
Unfortunately, a bad implementation satisfies the property too!
Nevertheless, the use of properties involving inverses can be very useful to verify that your inverse function (such as deserialization) does indeed "undo" the primary function (such as serialization).
We'll see some real examples of using this in the next post.
So far we've been testing properties without actually caring about the end result of an operation.
But of course in practice, we do care about the end result!
Now we normally can't really tell if the result is right without duplicating the function under test. But often we can tell that the result is wrong quite easily. In the maze diagram from above, we can easily check whether the path works or not.
If we are looking for the shortest path, we might not be able to check it, but at least we know that we have some valid path.
This principle can be applied quite generally.
For example, let's say that we want to check whether a string split
function is working. We don't have to write a tokenizer -- all we have to do is ensure that the tokens, when concatenated, give us back the original string!
Here's the core code from that property:
But how can we create an original string? The random strings generated by FsCheck are unlikely to contain many commas!
There are ways that you can control exactly how FsCheck generates random data, which we'll look at later.
For now though, we'll use a trick. The trick is to let FsCheck generate a list of random strings, and then we'll build an originalString
from them by concatting them together.
So here's the complete code for the property:
When we test this we are happy:
So how can we apply this principle to a sorted list? What property is easy to verify?
The first thing that pops into my mind is that for each pair of elements in the list, the first one will be smaller than the second.
So let's make that into a property:
But something funny happens when we try to check it. We get an error!
What does System.Exception: type not handled System.IComparable
mean? It means that FsCheck is trying to generate a random list, but all it knows is that the elements must be IComparable
. But IComparable
is not a type than can be instantiated, so FsCheck throws an error.
How can we prevent this from happening? The solution is to specify a particular type for the property, such as int list
, like this:
This code works now.
Note that even though the property has been constrained, the property is still a very general one. We could have used string list
instead, for example, and it would work just the same.
TIP: If FsCheck throws "type not handled", add explicit type constraints to your property
Are we done now? No! One problem with this property is that it doesn't catch malicious implementations by the EDFH.
Is it a surprise to you that a silly implementation also works?
Hmmm. That tells us that there must be some property other than pairwise ordering associated with sorting that we've overlooked. What are we missing here?
This is a good example of how doing property-based testing can lead to insights about design. We thought we knew what sorting meant, but we're being forced to be a bit stricter in our definition.
As it happens, we'll fix this particular problem by using the next principle!
A useful kind of property is based on an invariant that is preserved after some transformation, such as preserving length or contents.
They are not normally sufficient in themselves to ensure a correct implementation, but they do often act as a counter-check to more general properties.
And in the "list sort" example above, we could satisfy the "pairwise ordered" property with a function that just returned an empty list! How could we fix that?
Our first attempt might be to check the length of the sorted list. If the lengths are different, then the sort function obviously cheated!
We check it and it works:
And yes, the bad implementation fails:
Unfortunately, the BDFH is not defeated and can come up with another compliant implementation! Just repeat the first element N times!
Now when we test this, it passes:
What's more, it also satisfies the pairwise property too!
So now we have to try again. What is the difference between the real result [1;2;3]
and the fake result [1;1;1]
?
Answer: the fake result is throwing away data. The real result always contains the same contents as the original list, but just in a different order.
That leads us to a new property: a sorted list is always a permutation of the original list. Aha! Let's write the property in terms of permutations now:
Great, now all we need is a permutation function.
Some quick interactive tests confirm that it works as expected:
Excellent! Now let's run FsCheck:
Hmmm. That's funny, nothing seems to be happening. And my CPU is maxing out for some reason. What's going on?
What's going on is that you are going to be sitting there for a long time! If you are following along at home, I suggest you right-click and cancel the interactive session now.
The innocent looking permutations
is really really slow for any normal sized list. For example, a list of just 10 items has 3,628,800 permutations. While with 20 items, you are getting to astronomical numbers.
And of course, FsCheck will be doing hundreds of these tests! So this leads to an important tip:
TIP: Make sure your property checks are very fast. You will be running them a LOT!
We've already seen that even in the best case, FsCheck will evaluate the property 100 times. And if shrinking is needed, even more. So make sure your tests are fast to run!
But what happens if you are dealing with real systems such as databases, networks, or other slow dependencies?
Of course, killing real nodes thousands of times was too slow, so they extracted the core logic into a virtual model, and tested that instead. As a result, the code was later refactored to make this kind of testing easier. In other words, property-based testing influenced the design of the code, just as TDD would.
Ok, so we can't use permutations by just looping through them. So let's use the same idea but write a function that is specific for this case, a isPermutationOf
function.
Here's the code for isPermutationOf
and its associated helper functions:
Let's try the test again. And yes, this time it completes before the heat death of the universe.
What's also great is that the malicious implementation now fails to satisfy this property!
In fact, these two properties, adjacent pairs from a list should be ordered
and a sorted list has same contents as the original list
should indeed ensure that any implementation is correct.
Just above, we noted that there were two properties needed to define the "is sorted" property. It would be nice if we could combine them into one property is sorted
so that we can have a single test.
Well, of course we can always merge the two sets of code into one function, but it's preferable to keep functions as small as possible. Furthermore, a property like has same contents
might be reusable in other contexts as well.
What we want then, is an equivalent to AND
and OR
that is designed to work with properties.
FsCheck to the rescue! There are built in operators to combine properties: .&.
for AND
and .|.
for OR
.
Here is an example of them in use:
When we test the combined property with a good implementation of sort
, everything works as expected.
And if we test a bad implementation, the combined property fails as well.
But there's a problem now. Which of the two properties failed?
What we would like to do is add a "label" to each property so that we can tell them apart. In FsCheck, this is done with the |@
operator:
And now, when we test with the bad sort, we get a message Label of failing property: a sorted list has same contents as the original list
:
And now, back to the property-divising strategies.
Sometimes you have a recursive data structure or a recursive problem. In these cases, you can often find a property that is true of a smaller part.
For example, for a sort, we could say something like:
Here is that logic expressed in code:
This property is satisfied by the real sort function:
But unfortunately, just like previous examples, the malicious implementations also pass.
So as before, we'll need another property (such as the has same contents
invariant) to ensure that the code is correct.
If you do have a recursive data structure, then try looking for recursive properties. They are pretty obvious and low hanging, when you get the hang of it.
In the last few examples, I've noted that trivial but wrong implementations often satisfy the properties as well as good implementations.
But should we really spend time worrying about this? I mean, if we ever really released a sort algorithm that just duplicated the first element it would be obvious immediately, surely?
So yes, it's true that truly malicious implementations are unlikely to be a problem. On the other hand, you should think of property-based testing not as a testing process, but as a design process -- a technique that helps you clarify what your system is really trying to do. And if a key aspect of your design is satisfied with just a simple implementation, then perhaps there is something you have overlooked -- something that, when you discover it, will make your design both clearer and more robust.
Our next type of property is "idempotence". Idempotence simply means that doing something twice is the same as doing it once. If I tell you to "sit down" and then tell you to "sit down" again, the second command has no effect.
If you are designing these kinds of real-world systems it is well worth ensuring that your requests and processes are idempotent.
I won't go too much into this right now, but let's look at two simple examples.
First, our old friend sort
is idempotent (ignoring stability) while reverse
is not, obviously.
In the real world, this may not be the case. A simple query on a datastore run at different times may give different results.
Here's a quick demonstration.
First we'll create a NonIdempotentService
that gives different results on each query.
But if we test it now, we find that it does not satisfy the required idempotence property:
As an alternative, we can create a (crude) IdempotentService
that requires a timestamp for each transaction. In this design, multiple GETs using the same timestamp will always retrieve the same data.
And this one works:
So, if you are building a REST GET handler or a database query service, and you want idempotence, you should consider using techniques such as etags, "as-of" times, date ranges, etc.
And finally, last but not least, we come to the "test oracle". A test oracle is simply an alternative implementation that gives the right answer, and that you can check your results against.
So for "list sort", there are many simple but slow implementations around. For example, here's a quick implementation of insertion sort:
With this in place, we can write a property that tests the result against insertion sort.
When we test the good sort, it works. Good!
And when we test a bad sort, it doesn't. Even better!
We can also use the test oracle approach to cross-check two different implementations when you're not sure that either implementation is right!
The first algorithm was based on understanding that Roman numerals were based on tallying, leading to this simple code:
Another way to think about Roman numerals is to imagine an abacus. Each wire has four "unit" beads and one "five" bead.
This leads to the so-called "bi-quinary" approach:
We now have two completely different algorithms, and we can cross-check them with each other to see if they give the same result.
But if we try running this code, we get a ArgumentException: The input must be non-negative
due to the String.replicate
call.
So we need to only include inputs that are positive. We also need to exclude numbers that are greater than 4000, say, since the algorithms break down there too.
How can we implement this filter?
We saw in the previous post that we could use preconditions. But for this example, we'll try something different and change the generator.
First we'll define a new arbitrary integer called arabicNumber
which is filtered as we want (an "arbitrary" is a combination of a generator algorithm and a shrinker algorithm, as described in the previous post).
Next, we create a new property which is constrained to only use "arabicNumber" by using the Prop.forAll
helper.
We'll give the property the rather clever name of "for all values of arabicNumber, biquinary should give same result as tallying".
Now finally, we can do the cross-check test:
And we're good! Both algorithms work correctly, it seems.
"Model-based" testing, which we will discuss in more detail in a later post, is a variant on having a test oracle.
The way it works is that, in parallel with your (complex) system under test, you create a simplified model.
Then, when you do something to the system under test, you do the same (but simplified) thing to your model.
At the end, you compare your model's state with the state of the system under test. If they are the same, you're done. If not, either your SUT is buggy or your model is wrong and you have to start over!
With that, we have come to the end of the various property categories. We'll go over them one more time in a minute -- but first, an interlude.
If you sometimes feel that trying to find properties is a mental challenge, you're not alone. Would it help to pretend that it is a game?
As it happens, there is a game based on property-based testing.
The other players then have to guess what the rule (property) is, based on what they can see.
Here's a picture of a Zendo game in progress:
The white stones mean the property has been satisfied, while black stones mean failure. Can you guess the rule here? I'm going to guess that it's something like "a set must have a yellow pyramid that's not touching the ground".
With all these categories in hand, let's look at one more example problem, and see if we can find properties for it.
This sample is based on the well-known Dollar
example described in Kent Beck's "TDD By Example" book.
In it, he expressed some frustration about property-based testing being useful in practice. So let's revisit the example he referenced and see what we can do with it.
So what do we have?
A Dollar
class that stores an Amount
.
Methods Add
and Times
that transform the amount in the obvious way.
So, first let's try it out interactively to make sure it works as expected:
But that's just playing around, not real testing. So what kind of properties can we think of?
Let's run through them all again:
Different paths to same result
Inverses
Invariants
Idempotence
Structural induction
Easy to verify
Test oracle
Let's skip the "different paths" one for now. What about inverses? Are there any inverses we can use?
Yes, the setter and getter form an inverse that we can create a property from:
Idempotence is relevant too. For example, doing two sets in a row should be the same as doing just one. Here's a property for that:
Any "structural induction" properties? No, not relevant to this case.
Any "easy to verify" properties? Not anything obvious.
Finally, is there a test oracle? No. Again not relevant, although if we really were designing a complex currency management system, it might be very useful to cross-check our results with a third party system.
A confession! I cheated a bit in the code above and created a mutable class, which is how most OO objects are designed.
But in "TDD by Example" , Kent quickly realizes the problems with that and changes it to an immutable class, so let me do the same.
Here's the immutable version:
What's nice about immutable code is that we can eliminate the need for testing of setters, so the two properties we just created have now become irrelevant!
To tell the truth they were pretty trivial anyway, so it's no great loss.
So then, what new properties can we devise now?
Let's look at the Times
method. How can we test that? Which one of the strategies can we use?
I think the "different paths to same result" is very applicable. We can do the same thing we did with "sort" and do a times operation both "inside" and "outside" and see if they give the same result.
Here's that property expressed in code:
Great! Let's see if it works!
Oops -- it doesn't work!
Why not? Because we forgot that Dollar
is a reference type and doesn't compare equal by default!
As a result of this mistake, we have discovered a property that we might have overlooked! Let's encode that before we forget.
So now we need to fix this by adding support for IEquatable
and so on.
You can do that if you like -- I'm going to switch to F# record types and get equality for free!
Here's the Dollar
rewritten again:
And now our two properties are satisfied:
We can extend this approach for different paths. For example, we can extract the amount and compare it directly, like this:
The code looks like this:
And we can also include Add
in the mix as well.
For example, we can do a Times
followed by an Add
via two different paths, like this:
And here's the code:
So this "different paths, same result" approach is very fruitful, and we can generate lots of paths this way.
Shall we call it done then? I would say not!
We are beginning to get a whiff of a code smell. All this (start * multiplier) + adder
code seems like a bit of duplicated logic, and could end up being brittle.
Can we abstract out some commonality that is present all these cases?
If we think about it, our logic is really just this:
Transform the amount on the "inside" in some way.
Transform the amount on the "outside" in the same way.
Make sure that the results are the same.
But to test this, the Dollar
class is going to have to support an arbitrary transform! Let's call it Map
!
Now all our tests can be reduced to this one property:
Let's add a Map
method to Dollar
. And we can also rewrite Times
and Add
in terms of Map
:
Now the code for our property looks like this:
But how can we test it now? What functions should we pass in?
Don't worry! FsCheck has you covered! In cases like this, FsCheck will actually generate random functions for you too!
Try it -- it just works!
Our new "map" property is much more general than the original property using "times", so we can eliminate the latter safely.
There's a little problem with the property as it stands. If you want to see what the function is that FsCheck is generating, then Verbose mode is not helpful.
Gives the output:
We can't tell what the function values actually were.
However, you can tell FsCheck to show more useful information by wrapping your function in a special F
case, like this:
And now when you use Verbose mode...
... you get a detailed log of each function that was used:
Each { 2->-2 }
, { 10->28 }
, etc., represents the function that was used for that iteration.
How does property-based testing (PBT) fit in with TDD? This is a common question, so let me quickly give you my take on it.
First off, TDD works with specific examples, while PBT works with universal properties.
As I said in the previous post, I think examples are useful as a way into a design, and can be a form of documentation. But in my opinion, relying only on example-based tests would be a mistake.
Property-based approaches have a number of advantages over example-based tests:
Property-based tests are more general, and thus are less brittle.
Property-based tests provide a better and more concise description of requirements than a bunch of examples.
As a consequence, one property-based test can replace many, many, example-based tests.
By generating random input, property-based tests often reveal issues that you have overlooked, such as dealing with nulls, missing data, divide by zero, negative numbers, etc.
Property-based tests force you to think.
Property-based tests force you to have a clean design.
These last two points are the most important for me. Programming is not a matter of writing lines of code, it is about creating a design that meets the requirements.
So, anything that helps you think deeply about the requirements and what can go wrong should be a key tool in your personal toolbox!
For example, in the Roman Numeral section, we saw that accepting int
was a bad idea (the code broke!). We had a quick fix, but really we should model the concept of a PositiveInteger
in our domain, and then change our code to use that type rather than just an int
. This demonstrates how using PBT can actually improve your domain model, not just find bugs.
Similarly, introducing a Map
method in the Dollar scenario not only made testing easier, but actually improved the usefulness of the Dollar "api".
Stepping back to look at the big picture, though, TDD and property-based testing are not at all in conflict. They share the same goal of building correct programs, and both are really more about design than coding (think "Test-driven design" rather than "Test-driven development").
So that brings us to the end of another long post on property-based testing!
I hope that you now have some useful approaches that you can take away and apply to your own code base.
Next time, we'll look at some real-world examples, and how you can create custom generators that match your domain.
But... we've got some hard coded things in there that the Enterprise Developer From Hell () could take advantage of! The EDFH will exploit the fact that we always use Int32.MinValue
and that we always prepend or append it to the test list.
For example, in , we created commutative and associative properties for addition, but then noticed that simply having an implementation that returned zero would satisfy them just as well! It was only when we added x + 0 = x
as a property that we could eliminate that particular malicious implementation.
Let's head over to StackOverflow and steal . Here it is:
In his (highly recommended) , John Hughes tells of when his team was trying to detect flaws in a distributed data store that could be caused by network partitions and node failures.
For more on these operators, .
Idempotence is and is and message-based architectures.
In general, any kind of query should be idempotent, or to put it another way: .
If you need tips on how to do this, searching for will turn up some good results.
Often the test oracle implementation is not suitable for production -- it's too slow, or it doesn't parallelize, or it's , etc., but that doesn't stop it being very useful for testing.
For example, in my post I came up with two completely different algorithms for generating Roman Numerals. Can we compare them to each other and test them both in one fell swoop?
It's called and it involves placing sets of objects (such as plastic pyramids) on a table, such that each layout conforms to a pattern -- a rule -- or as we would now say, a property!.
Alright, I suppose Zendo wasn't really inspired by property-based testing, but it is a fun game, and it has even been known to make an appearance at .
If you want to learn more about Zendo, .
Nat Pryce, of fame, wrote a blog post about property-based testing a while ago ().
We're not going to attempt to critique the design itself and make it more type-driven -- . Instead, we'll take the design as given and see what properties we can come up with.
The code samples used in this post are .