Cycles and modularity in the wild

Comparing some real-world metrics of C# and F# projects

(Updated 2013-06-15. See comments at the end of the post)

(Updated 2014-04-12. A follow up post that applies the same analysis to Roslyn)

(Updated 2015-01-23. A much clearer version of this analysis has been done by Evelina Gabasova. She knows what she is talking about, so I highly recommend you read her post first!)

This is a follow up post to two earlier posts on module organization and cyclic dependencies.

I thought it would be interesting to look at some real projects written in C# and F#, and see how they compare in modularity and number of cyclic dependencies.

The plan

My plan was to take ten or so projects written in C# and ten or so projects written in F#, and somehow compare them.

I didn't want to spend too much time on this, and so rather than trying to analyze the source files, I thought I would cheat a little and analyze the compiled assemblies, using the Mono.Cecil library.

This also meant that I could get the binaries directly, using NuGet.

The projects I picked were:

C# projects

Mono.Cecil, which inspects programs and libraries in the ECMA CIL format.
NUnit
SignalR for real-time web functionality.
NancyFx, a web framework
YamlDotNet, for parsing and emitting YAML.
SpecFlow, a BDD tool.
Json.NET.
Entity Framework.
ELMAH, a logging framework for ASP.NET.
NuGet itself.
Moq, a mocking framework.
NDepend, a code analysis tool.
And, to show I'm being fair, a business application that I wrote in C#.

F# projects

Unfortunately, there is not yet a wide variety of F# projects to choose from. I picked the following:

FSharp.Core, the core F# library.
FSPowerPack.
FsUnit, extensions for NUnit.
Canopy, a wrapper around the Selenium test automation tool.
FsSql, a nice little ADO.NET wrapper.
WebSharper, the web framework.
TickSpec, a BDD tool.
FSharpx, an F# library.
FParsec, a parser library.
FsYaml, a YAML library built on FParsec.
Storm, a tool for testing web services.
Foq, a mocking framework.
Another business application that I wrote, this time in F#.

I did choose SpecFlow and TickSpec as being directly comparable, and also Moq and and Foq.

But as you can see, most of the F# projects are not directly comparable to the C# ones. For example, there is no direct F# equivalent to Nancy, or Entity Framework.

Nevertheless, I was hoping that I might observe some sort of pattern by comparing the projects. And I was right. Read on for the results!

What metrics to use?

I wanted to examine two things: "modularity" and "cyclic dependencies".

First, what should be the unit of "modularity"?

From a coding point of view, we generally work with files (Smalltalk being a notable exception), and so it makes sense to think of the file as the unit of modularity. A file is used to group related items together, and if two chunks of code are in different files, they are somehow not as "related" as if they were in the same file.

In C#, the best practice is to have one class per file. So 20 files means 20 classes. Sometimes classes have nested classes, but with rare exceptions, the nested class is in the same file as the parent class. This means that we can ignore them and just use top-level classes as our unit of modularity, as a proxy for files.

In F#, the best practice is to have one module per file (or sometimes more). So 20 files means 20 modules. Behind the scenes, modules are turned into static classes, and any classes defined within the module are turned into nested classes. So again, this means that we can ignore nested classes and just use top-level classes as our unit of modularity.

The C# and F# compilers generate many "hidden" types, for things such as LINQ, lambdas, etc. In some cases, I wanted to exclude these, and only include "authored" types, which have been coded for explicitly. I also excluded the case classes generated by F# discriminated unions from being "authored" classes as well. That means that a union type with three cases will be counted as one authored type rather than four.

So my definition of a top-level type is: a type that is not nested and which is not compiler generated.

The metrics I chose for modularity were:

The number of top-level types as defined above.
The number of authored types as defined above.
The number of all types. This number would include the compiler generated types as well. Comparing this number to the top-level types gives us some idea of how representative the top-level types are.
The size of the project. Obviously, there will be more types in a larger project, so we need to make adjustments based on the size of the project. The size metric I picked was the number of instructions, rather than the physical size of the file. This eliminates issues with embedded resources, etc.

Dependencies

Once we have our units of modularity, we can look at dependencies between modules.

For this analysis, I only want to include dependencies between types in the same assembly. In other words, dependencies on system types such as String or List do not count as a dependency.

Let's say we have a top-level type A and another top-level type B. Then I say that a dependency exists from A to B if:

Type A or any of its nested types inherits from (or implements) type B or any of its nested types.
Type A or any of its nested types has a field, property or method that references type B or any of its nested types as a parameter or return value. This includes private members as well -- after all, it is still a dependency.
Type A or any of its nested types has a method implementation that references type B or any of its nested types.

This might not be a perfect definition. But it is good enough for my purposes.

In addition to all dependencies, I thought it might be useful to look at "public" or "published" dependencies. A public dependency from A to B exists if:

Type A or any of its nested types inherits from (or implements) type B or any of its nested types.
Type A or any of its nested types has a public field, property or method that references type B or any of its nested types as a parameter or return value.
Finally, a public dependency is only counted if the source type itself is public.

The metrics I chose for dependencies were:

The total number of dependencies. This is simply the sum of all dependencies of all types. Again, there will be more dependencies in a larger project, but we will also take the size of the project into account.
The number of types that have more than X dependencies. This gives us an idea of how many types are "too" complex.

Cyclic dependencies

Given this definition of dependency, then, a cyclic dependency occurs when two different top-level types depend on each other.

Note what not included in this definition. If a nested type in a module depends on another nested type in the same module, then that is not a cyclic dependency.

If there is a cyclic dependency, then there is a set of modules that are all linked together. For example, if A depends on B, B depends on C, and then say, C depends on A, then A, B and C are linked together. In graph theory, this is called a strongly connected component.

The metrics I chose for cyclic dependencies were:

The number of cycles. That is, the number of strongly connected components which had more than one module in them.
The size of the largest component. This gives us an idea of how complex the dependencies are.

I analyzed cyclic dependencies for all dependencies and also for public dependencies only.

Doing the experiment

First, I downloaded each of the project binaries using NuGet. Then I wrote a little F# script that did the following steps for each assembly:

Analyzed the assembly using Mono.Cecil and extracted all the types, including the nested types
For each type, extracted the public and implementation references to other types, divided into internal (same assembly) and external (different assembly).
Created a list of the "top level" types.
Created a dependency list from each top level type to other top level types, based on the lower level dependencies.

This dependency list was then used to extract various statistics, shown below. I also rendered the dependency graphs to SVG format (using graphViz).

For cycle detection, I used the QuickGraph library to extract the strongly connected components, and then did some more processing and rendering.

If you want the gory details, here is a link to the script that I used, and here is the raw data.

It is important to recognize that this is not a proper statistical study, just a quick analysis. However the results are quite interesting, as we shall see.

Modularity

Let's look at the modularity first.

Here are the modularity-related results for the C# projects:

Project

Code size

Top-level types

Authored types

All types

Code/Top

Code/Auth

Code/All

Auth/Top

All/Top

269521

514

565

876

524

477

308

1.1

1.7

jsonDotNet

148829

215

232

283

692

642

526

1.1

1.3

nancy

143445

339

366

560

423

392

256

1.1

1.7

cecil

101121

240

245

247

421

413

409

1.0

nuget

114856

216

237

381

532

485

301

1.1

1.8

signalR

65513

192

229

311

341

286

211

1.2

1.6

nunit

45023

173

195

197

260

231

229

1.1

specFlow

46065

242

287

331

190

161

139

1.2

1.4

elmah

43855

116

140

141

378

313

311

1.2

yamlDotNet

23499

336

322

1.0

fparsecCS

57474

1402

625

618

2.2

2.3

moq

133189

397

420

533

335

317

250

1.1

1.3

ndepend

478508

734

828

843

652

578

568

1.1

ndependPlat

151625

185

205

820

740

1.1

personalCS

422147

195

278

346

2165

1519

1220

1.4

1.8

TOTAL

2244670

3869

4392

5420

580

511

414

1.1

1.4

And here are the results for the F# projects:

Project

Code size

Top-level types

Authored types

All types

Code/Top

Code/Auth

Code/All

Auth/Top

All/Top

fsxCore

339596

173

328

2024

1963

1035

168

1.9

11.7

fsCore

226830

154

313

1186

1473

725

191

2.0

7.7

fsPowerPack

117581

150

410

1264

784

287

1.6

4.4

storm

73595

405

1098

1051

182

1.0

6.0

fParsec

67252

245

8407

2802

274

3.0

30.6

websharper

47391

128

285

911

370

166

2.5

5.5

tickSpec

30797

170

906

629

181

1.4

5.0

websharperHtml

14787

822

528

205

1.6

4.0

canopy

15105

103

2518

944

147

2.7

17.2

fsYaml

15191

160

2170

1381

1.6

22.9

fsSql

15434

162

1187

857

1.4

12.5

fsUnit

1848

924

616

264

1.5

3.5

foq

26957

103

770

562

262

1.4

2.9

personalFS

118893

146

655

3963

814

182

4.9

21.8

TOTAL

1111257

692

1332

5987

1606

834

186

1.9

8.7

The columns are:

Code size is the number of CIL instructions from all methods, as reported by Cecil.
Top-level types is the total number of top-level types in the assembly, using the definition above.
Authored types is the total number of types in the assembly, including nested types, enums, and so on, but excluding compiler generated types.
All types is the total number of types in the assembly, including compiler generated types.

I have extended these core metrics with some extra calculated columns:

Code/Top is the number of CIL instructions per top level type / module. This is a measure of how much code is associated with each unit of modularity. Generally, more is better, because you don't want to have to deal with multiple files if you don't have too. On the other hand, there is a trade off. Too many lines of code in a file makes reading the code impossible. In both C# and F#, good practice is not to have more than 500-1000 lines of code per file, and with a few exceptions, that seems to be the case in the source code that I looked at.
Code/Auth is the number of CIL instructions per authored type. This is a measure of how "big" each authored type is.
Code/All is the number of CIL instructions per type. This is a measure of how "big" each type is.
Auth/Top is the ratio of all authored types to the top-level-types. It is a rough measure of how many authored types are in each unit of modularity.
All/Top is the ratio of all types to the top-level-types. It is a rough measure of how many types are in each unit of modularity.

Analysis

The first thing I noticed is that, with a few exceptions, the code size is bigger for the C# projects than for the F# projects. Partly that is because I picked bigger projects, of course. But even for a somewhat comparable project like SpecFlow vs. TickSpec, the SpecFlow code size is bigger. It may well be that SpecFlow does a lot more than TickSpec, of course, but it also may be a result of using more generic code in F#. There is not enough information to know either way right now -- it would be interesting to do a true side by side comparison.

Next, the number of top-level types. I said earlier that this should correspond to the number of files in a project. Does it?

I didn't get all the sources for all the projects to do a thorough check, but I did a couple of spot checks. For example, for Nancy, there are 339 top level classes, which implies that there should be about 339 files. In fact, there are actually 322 .cs files, so not a bad estimate.

On the other hand, for SpecFlow there are 242 top level types, but only 171 .cs files, so a bit of an overestimate there. And for Cecil, the same thing: 240 top level classes but only 128 .cs files.

For the FSharpX project, there are 173 top level classes, which implies there should be about 173 files. In fact, there are actually only 78 .fs files, so it is a serious over-estimate by a factor of more than 2. And if we look at Storm, there are 67 top level classes. In fact, there are actually only 35 .fs files, so again it is an over-estimate by a factor of 2.

So it looks like the number of top level classes is always an over-estimate of the number of files, but much more so for F# than for C#. It would be worth doing some more detailed analysis in this area.

Ratio of code size to number of top-level types

The "Code/Top" ratio is consistently bigger for F# code than for C# code. Overall, the average top-level type in C# is converted into 580 instructions. But for F# that number is 1606 instructions, about three times as many.

I expect that this is because F# code is more concise than C# code. I would guess that 500 lines of F# code in a single module would create many more CIL instructions than 500 lines of C# code in a class.

If we visually plot "Code size" vs. "Top-level types", we get this chart:

What's surprising to me is how distinct the F# and C# projects are in this chart. The C# projects seem to have a consistent ratio of about 1-2 top-level types per 1000 instructions, even across different project sizes. And the F# projects are consistent too, having a ratio of about 0.6 top-level types per 1000 instructions.

In fact, the number of top level types in F# projects seems to taper off as projects get larger, rather than increasing linearly like the C# projects.

The message I get from this chart is that, for a given size of project, an F# implementation will have fewer modules, and presumably less complexity as a result.

You probably noticed that there are two anomalies. Two C# projects are out of place -- the one at the 50K mark is FParsecCS and the one at the 425K mark is my business application.

I am fairly certain that this because both these implementations have some rather large C# classes in them, which helps the code ratio. Probably a necessarily evil for a parser, but in the case of my business application, I know that it is due to cruft accumulating over the years, and there are some massive classes that ought to be refactored into smaller ones. So a metric like this is probably a bad sign for a C# code base.

Ratio of code size to number of all types

On the other hand, if we compare the ratio of code to all types, including compiler generated ones, we get a very different result.

Here's the corresponding chart of "Code size" vs. "All types":

This is surprisingly linear for F#. The total number of types (including compiler generated ones) seems to depend closely on the size of the project. On the other hand, the number of types for C# seems to vary a lot.

The average "size" of a type is somewhat smaller for F# code than for C# code. The average type in C# is converted into about 400 instructions. But for F# that number is about 180 instructions.

I'm not sure why this is. Is it because the F# types are more fine-grained, or could it be because the F# compiler generates many more little types than the C# compiler? Without doing a more subtle analysis, I can't tell.

Ratio of top-level types to authored types

Having compared the type counts to the code size, let's now compare them to each other:

Again, there is a significant difference. For each unit of modularity in C# there are an average of 1.1 authored types. But in F# the average is 1.9, and for some projects a lot more than that.

Of course, creating nested types is trivial in F#, and quite uncommon in C#, so you could argue that this is not a fair comparison. But surely the ability to create a dozen types in as many lines of F# has some effect on the quality of the design? This is harder to do in C#, but there is nothing to stop you. So might this not mean that there is a temptation in C# to not be as fine-grained as you could potentially be?

The project with the highest ratio (4.9) is my F# business application. I believe that this is due to this being only F# project in this list which is designed around a specific business domain, I created many "little" types to model the domain accurately, using the concepts described here. For other projects created using DDD principles, I would expect to see this same high number.

Dependencies

Now let's look at the dependency relationships between the top level classes.

Here are the results for the C# projects:

Project

Top Level Types

Total Dep. Count

Dep/Top

One or more dep.

Three or more dep.

Five or more dep.

Ten or more dep.

Diagram

514

2354

4.6

76%

51%

32%

13%

The plan

What metrics to use?

Dependencies

Cyclic dependencies

Doing the experiment

Modularity

Analysis

Ratio of code size to number of top-level types

Ratio of code size to number of all types

Ratio of top-level types to authored types

Dependencies

Analysis

Distribution of dependencies

The dependency diagrams

OO vs functional design revealed?

Moq compared with Foq

FParsec compared with FParsecCS

What counts as a dependency?

Cyclic dependencies

Analysis

My business applications compared

Summary

Future work

Update 2013-06-15