Visual Studio Database Projects and the Problem With Reference Data (Part One)

I’ll be honest. I wasn’t a fan of Visual Studio Database Projects. I still believe that designing databases using a data modeling tool like ER/Studio or ERwin is the very best way to build a database. But after taking a hard look at the Database Project project template I’ve changed my tune. There are some good things going on in Visual Studio Database Projects. It allows a database schema and scripts to be put into source control. For the DBAs reading this, if your database isn’t in source control it should be. The compare and update feature makes it easy to compare and update databases and the database project. The publish feature, although terrifying in a production environment, provides a mechanism to automatically deploy changes to a database without writing scripts or opening an instance of SSMS.

All of these things is good news for developers. As developers we should be using a local copy of the database for development and the Database Project makes it easy for developers to merge changes from other developers into a local copy. It’s still a bit manual since you have to hit the publish button to move changes, but this method is a whole lot better than using a central “dev” copy of the database. I’ve seen entire development teams halted for days because an update to the central development database broke a current code branch.

One of the challenges that I’ve seen when working with Visual Studio Database Projects is how to keep reference data consistent. Reference data is data that is known and remains fairly consistent. Think data in a type table. I’ve seen blog posts that suggest that reference data can be added with INSERT in a post-deployment script. But that’s not a great solution since once the INSERT has been published SQL Server will fail the statement when the publish is run again. A better way would be to use the MERGE statement in the post-deployment script. The MERGE statement allows data to be inserted, updated, or deleted all in one statement.

There are some challenges with the MERGE statement solution. First of all, it’s complex. We’re smashing three different SQL statements into one so it’s going to be a bit messy. Secondly, we’re mingling data with the implementation. Because were using the VALUES argument the data is intertwined with the MERGE statement. It would be much better if the data was in a JSON object or XML file. This way the data would live on its own, independent of the SQL statement. Finally, we’re asking developers to write and maintain data in SQL. This isn’t horrible, but most devs that I’ve know don’t like SQL. It’s sad, I know. Look at the popularity of ORMs like Hibernate or Entity Framework, or the rise of tools like LINQ that can write SQL syntax; they point to devs writing less SQL not more.

What we need is a solution for reference data that:

  1. Keeps the developers from writing SQL.
  2. Has the data separated from the SQL implementation.
  3. Updates the data in source control automatically from the database.
  4. Is automatically deployed via the Database Project “Publish”.

That’s a tall order. But I have a solution that solves this problem…but you’re going to have to wait until the next blog post for that.