DRY and the Deleted Region

Here's a bit of a trick that I don't know if I've described before, so I figured some folks might like to hear about it here... This is yet another great reason to always, always, always use foreign key constraints (except when you shouldn't). ;)

When should you not use foreign key constraints? My one exception in almost all cases is for historical data - a historical table should never be constrained to only what exists in the live data tables. For example if you delete a member from your database, two things should happen. One it should be unquestionable that the member is gone from your data set and two, historical tables containing records of the user's activity, such as log tables showing the modification of content or even forum postings (yes even those), should remain.

The remaining records created by that member should either display the last name given for the member or in failing that, should display "anonymous" or some other label to indicate that the member has been subsequently removed. In a forum thread, you don't want to remove a deleted member's messages because that's going to make reading the forum rather confusing and possibly remove important information from the conversation.

This leads some engineers to resort to "soft deletes" in which records are never actually removed from the database, but instead flagged in some way as having been "deleted". This concept sounds great. In practice it's a nightmare. Firstly it totally destroys the concept of Object-Orientation (OO) and Don't Repeat Yourself (DRY) because now you can never query from that table without first considering the ramifications of "deleted" information.

If it were just the one table that might be okay, but it invariably is NOT just the one table, because you create queries that join other tables and then those queries have to consider the ramifications of "deleted" information in each joined table. You suddenly no longer have a tight collection of little black boxes that each know their own responsibilities and do their jobs very well independently. Instead everyone sees everything about everyone else and nobody can do anything without considering everyone else's domain. George in Accounting can't move a finger unless Scruffy the Janitor knows what he's doing.

Secondly the "soft delete" also leaves all the data in your tables, which means that any time a user creates a new record, realizes they made a mistake and removes it, your table is slightly larger... And over time, all the extra records from users having made simple clerical errors becomes weight on your database server that you don't want. This is a much smaller issue, but it's still an issue.

So those are my reasons for not liking "soft deletes", but how do you make sure you remove all the right data and none of the data you need to keep? In most cases you can still have a foreign key constraint and simply make the foreign key column nullable (not required) in the alternate table. This way when DataFaucet deletes the record it will set those columns to null before performing the delete and everything moves along swimmingly.

Occasionally however that may not be quite what you want. In my member-management and security plugin for the onTap framework I have a couple of tables for handling region information as a nested set. So regions start at the top with countries and then within each country you can add states or provinces and cities. But in my case the region table goes even further than that, because I anticipated the desire particularly with corporate clients to want to inject their own regions like counties in the US or perhaps even more importantly internal corporate regional designations.

So the clients at Telecom Company X might have regions within the "United States" region for Southwest, Midwest, Northwest, Southeast and Northeast ... or they might have regions divided by timezone or be more or less granular. The idea here is to overcome the problems of handing them a static set of choices by providing them with a system by which any company can assign the regions in a way that makes sense within their organization. And these companies are likely to then include states within their regions and may or may not also include cities and/or counties within those. It's up to the company to decide what they need.

But the company's needs may also change over time, so how do you allow them to easily stretch their existing system to allow for expansion or even cutbacks where regions might be added or removed? Lets say as an example that you want to know the regions where your employees work, so the employees are related to a region in the regions table. Well first if you delete a region, you don't want to delete the employees! So you make the regionid column in the employee table nullable (not required). Then you run into your next snag.

Hypothetical. The company at one time included individual office territories within cities in their region data, but have decided to move that information somewhere else or remove it from their data all-together. So they're going to delete all the little office regions (neighborhoods) within the cities... If the system allows DataFaucet to simply delete those regions the way it does by default, all the employees in those regions will be suddenly dissociated from the regions where they worked. Instead of knowing that Bob Slydell worked at Telecom Company X - US / Southwest / Texas / Dallas / North Dallas office, you'll know that Bob Slydell worked at "Telecom Company X".

WAIT!? We only deleted "North Dallas"... US / Southwest / Texas / Dallas is still in the database. If we still have the data for Dallas, why don't we know that Bob works in Dallas? Because DataFaucet doesn't know what you're doing with that data -- it can't. It only knows you wanted to delete the North Dallas region, so it did what it was asked.

There are a couple of ways to tackle this issue, one is to write a lot of convoluted OO code to make sure that any other objects have an opportunity to move their records before the delete. So for example, your application might have an EmployeeManager that would then be able to reassign all the employees in the North Dallas region to the Dallas region before the office is deleted.

  • The plus here is that those objects have more control over how the change is made and whatever other cleanup might be necessary.
  • The down side is that it's not very likely that either that level of control over the change will be needed or that there will be any cleanup necessary for most applications, which means more work even though you don't yet have a need.

An alternative solution is to use the foreign key constraint information to automatically reassign any related records in any table to the parent region. Now while my plugin also allows you to write extra code if necessary for reassigning records that need special handling, it doesn't force you to do that. If you just create a simple foreign key constraint in your table, the RegionManager will automatically handle the task of reassigning your records, whether they're employee records, client records, service-orders or something else. No work required on your part. :) So this is the best of both worlds - it gives you the benefit I described, plus it eliminates the drawback.

I'll be releasing a new version of the onTap framework and the existing plugins hopefully over the weekend. I was just working out a few minor issues with the region manager and came across the function that handles those foreign keys and thought it might be a good subject for a blog. :)

Comments
BlogCFC was created by Raymond Camden. This blog is running version 5.5.006.