Using F# for database related tasks

Twenty six low-risk ways to use F# at work (part 4)

This post is a continuation of the previous series on low-risk and incremental ways to use F# at work.

In this one, we'll see how F# can be unexpectedly helpful when it comes to database related tasks.

Series contents

Before moving on to the content of the post, here's the full list of the twenty six ways:

Part 1 - Using F# to explore and develop interactively

1. Use F# to explore the .NET framework interactively 2. Use F# to test your own code interactively 3. Use F# to play with webservices interactively 4. Use F# to play with UI's interactively

Part 2 - Using F# for development and devops scripts

5. Use FAKE for build and CI scripts 6. An F# script to check that a website is responding 7. An F# script to convert an RSS feed into CSV 8. An F# script that uses WMI to check the stats of a process 9. Use F# for configuring and managing the cloud

Part 3 - Using F# for testing

10. Use F# to write unit tests with readable names 11. Use F# to run unit tests programmatically 12. Use F# to learn to write unit tests in other ways 13. Use FsCheck to write better unit tests 14. Use FsCheck to create random dummy data 15. Use F# to create mocks 16. Use F# to do automated browser testing 17. Use F# for Behaviour Driven Development

Part 4. Using F# for database related tasks

18. Use F# to replace LINQpad 19. Use F# to unit test stored procedures 20. Use FsCheck to generate random database records 21. Use F# to do simple ETL 22. Use F# to generate SQL Agent scripts

Part 5: Other interesting ways of using F#

23. Use F# for parsing 24. Use F# for diagramming and visualization 25. Use F# for accessing web-based data stores 26. Use F# for data science and machine learning (BONUS) 27: Balance the generation schedule for the UK power station fleet

This next group of suggestions is all about working with databases, and MS SQL Server in particular.

Relational databases are a critical part of most applications, but most teams do not approach the management of these in the same way as with other development tasks.

For example, how many teams do you know that unit test their stored procedures?

Or their ETL jobs?

Or generate T-SQL admin scripts and other boilerplate using a non-SQL scripting language that's stored in source control?

Here's where F# can shine over other scripting languages, and even over T-SQL itself.

  • The database type providers in F# give you the power to create simple, short scripts for testing and admin, with the bonus that...

  • The scripts are type-checked and will fail at compile time if the database schema changes, which means that...

  • The whole process works really well with builds and continuous integration processes, which in turn means that...

  • You have really high confidence in your database related code!

We'll look at a few examples to demonstrate what I'm talking about:

  • Unit testing stored procedures

  • Using FsCheck to generate random records

  • Doing simple ETL with F#

  • Generating SQL Agent scripts

Getting set up

The code for this section is available on github. In there, there are some SQL scripts to create the sample database, tables and stored procs that I'll use in these examples.

To run the examples, then, you'll need SQL Express or SQL Server running locally or somewhere accessible, with the relevant setup scripts having been run.

Which type provider?

There are a number of SQL Type Providers for F# -- see the fsharp.org Data Access page. For these examples, I'm going to use the SqlDataConnection type provider, which is part of the FSharp.Data.TypeProviders DLL. It uses SqlMetal behind the scenes and so only works with SQL Server databases.

The SQLProvider project is another good choice -- it supports MySql, SQLite and other non-Microsoft databases.

18. Use F# to replace LINQPad

The code for this section is available on github.

LINQPad is a great tool for doing queries against databases, and is also a general scratchpad for C#/VB/F# code.

You can use F# interactive to do many of the same things -- you get queries, autocompletion, etc., just like LINQPad.

For example, here's one that counts customers with a certain email domain.

If you want to see what SQL code is generated, you can turn logging on, of course:

The logged output for this query is:

You can also do more complicated things, such as using subqueries. Here's an example from MSDN:

Note that, as befitting a functional approach, queries are nice and composable.

And if the SQL engine doesn't support certain functions such as regexes, and assuming the size of the data is not too large, you can just stream the data out and do the processing in F#.

As you can see from the code above, the nice thing about doing the processing in F# is that you can define helper functions separately and connect them together easily.

19. Use F# to unit test stored procedures

The code for this section is available on github.

Now let's look at how we can use the type provider to make creating unit tests for stored procs really easy.

First, I create a helper module (which I'll call DbLib) to set up the connection and to provide shared utility functions such as resetDatabase, which will be called before each test.

Now I can write a unit test, using NUnit say, just like any other unit test.

Assume that we have Customer table, and a sproc called up_Customer_Upsert that either inserts a new customer or updates an existing one, depending on whether the passed in customer id is null or not.

Here's what a test looks like:

Note that, because the setup is expensive, I do multiple asserts in the test. This could be refactored if you find this too ugly!

Here's one that tests that updates work:

And one more, that checks for exceptions:

As you can see, the whole process is very straightforward.

These tests can be compiled and run as part of the continuous integration scripts. And what is great is that, if the database schema gets out of sync with the code, then the tests will fail to even compile!

20. Use FsCheck to generate random database records

The code for this section is available on github.

As I showed in an earlier example, you can use FsCheck to generate random data. In this case we'll use it to generate random records in the database.

Let's say we have a CustomerImport table, defined as below. (We'll use this table in the next section on ETL)

Using the same code as before, we can then generate random instances of CustomerImport.

So far so good.

Now we get to the age column, which is nullable. This means we can't generate random ints, but instead we have to generate random Nullable<int>s. This is where type checking is really useful -- the compiler has forced us to take that into account. So to make sure we cover all the bases, we'll generate a null value one time out of twenty.

Putting it altogether...

Once we have a random generator, we can fetch as many records as we like, and insert them using the type provider.

In the code below, we'll generate 10,000 records, hitting the database in batches of 1,000 records.

Finally, let's do it and time it.

It's not as fast as using BCP, but it is plenty adequate for testing. For example, it only takes a few seconds to create the 10,000 records above.

I want to stress that this is a single standalone script, not a heavy binary, so it is really easy to tweak and run on demand.

And of course you get all the goodness of a scripted approach, such as being able to store it in source control, track changes, etc.

21. Use F# to do simple ETL

The code for this section is available on github.

Say that you need to transfer data from one table to another, but it is not a totally straightforward copy, as you need to do some mapping and transformation.

This is a classic ETL (Extract/Transform/Load) situation, and most people will reach for SSIS.

But for some situations, such as one off imports, and where the volumes are not large, you could use F# instead. Let's have a look.

Say that we are importing data into a master table that looks like this:

But the system we're importing from has a different format, like this:

As part of this import then, we're going to have to:

  • Concatenate the FirstName and LastName columns into one Name column

  • Map the EmailAddress column to the Email column

  • Calculate a Birthdate given an Age

  • I'm going to skip the CustomerId for now -- hopefully we aren't using IDENTITY columns in practice.

The first step is to define a function that maps source records to target records. In this case, we'll call it makeTargetCustomer.

Here's some code for this:

With this transform in place, the rest of the code is easy, we just just read from the source and write to the target.

Because these are sequence operations, only one record at a time is in memory (excepting the LINQ submit buffer), so even large data sets can be processed.

To see it in use, first insert a number of records using the dummy data script just discussed, and then run the transfer as follows:

Again, it only takes a few seconds to transfer 10,000 records.

And again, this is a single standalone script -- it's a very lightweight way to create simple ETL jobs.

22. Use F# to generate SQL Agent scripts

For the last database related suggestion, let me suggest the idea of generating SQL Agent scripts from code.

In any decent sized shop you may have hundreds or thousands of SQL Agent jobs. In my opinion, these should all be stored as script files, and loaded into the database when provisioning/building the system.

Alas, there are often subtle differences between dev, test and production environments: connection strings, authorization, alerts, log configuration, etc.

That naturally leads to the problem of trying to keep three different copies of a script around, which in turn makes you think: why not have one script and parameterize it for the environment?

But now you are dealing with lots of ugly SQL code! The scripts that create SQL agent jobs are typically hundreds of lines long and were not really designed to be maintained by hand.

F# to the rescue!

In F#, it's really easy to create some simple record types that store all the data you need to generate and configure a job.

For example, in the script below:

  • I created a union type called Step that could store a Package, Executable, Powershell and so on.

  • Each of these step types in turn have their own specific properties, so that a Package has a name and variables, and so on.

  • A JobInfo consists of a name plus a list of Steps.

  • An agent script is generated from a JobInfo plus a set of global properties associated with an environment, such as the databases, shared folder locations, etc.

I can't share the actual F# code, but I think you get the idea. It's quite simple to create.

Once we have these .FSX files, we can generate the real SQL Agent scripts en-masse and then deploy them to the appropriate servers.

Below is an example of a SQL Agent script that might be generated automatically from the .FSX file.

As you can see, it is a nicely laid out and formatted T-SQL script. The idea is that a DBA can review it and be confident that no magic is happening, and thus be willing to accept it as input.

On the other hand, it would be risky to maintain scripts like. Editing the SQL code directly could be risky. Better to use type-checked (and more concise) F# code than untyped T-SQL!

Summary

I hope that this set of suggestions has thrown a new light on what F# can be used for.

In my opinion, the combination of concise syntax, lightweight scripting (no binaries) and SQL type providers makes F# incredibly useful for database related tasks.

Please leave a comment and let me know what you think.

Last updated

Was this helpful?