Ben Nadel posted this blog early this morning with a bunch of query optimization hints from Rick Osborne. Thanks Ben and thanks Rick!
I don't have a DB2 database handy, which is part of the reason why there's not currently a DB2 sql-agent in DataFaucet. There is however no reason why anyone couldn't simply create a sql-agent for DB2. These comments about query optimization however lead to some interesting thoughts about potential improvements for the sql-agents in general in terms of making the agent perform more efficiently for its target platform.
As an example:
Rick says: "Yes, put as much of the filtering as you can in ON clauses. Not only does it put the conditions where they are most relevant, but in some engines you'll get orders of magnitude better performance. The DB2/400 optimizer is so dumb (how dumb is it?) that if you put the conditions in the WHERE instead of the ON it will do the joins first, no matter how big the tables, and then only apply the conditions at the end. For extremely large tables, this is a nightmare."
By default, DataFaucet's query-builder automatically puts all the join conditions in the ON clauses and when performing a left join it properly places any filters on the joined table inside the ON clause as well, so that you can filter on a left join without turning it into an inner join in disguise.
But what struck me about Rick's comment here was that it would be pretty easy to write the sql agent so that it places those filters first before the join condition to improve the query performance just on DB2. For that matter Rick mentions a number of potential optimizations that could be similarly handled inside the abstraction. DataFaucet doesn't currently handle anything in that kind of detail, for example, reordering tables based on size, etc. but in the future it could. It's at least an idea worth keeping in mind for now. :)
Something else Rick said: "And no, the dream of having one query work perfectly on multiple engines is really just a dream."
If you're talking about flat out queries, yes, that may be true. Part of the reason why I started working on DataFaucet in the first place however, way back in the CF5 days, was to produce platform agnostic SQL. So while it may not be possible within an individual query, it might certainly still be possible within the abstraction. ;)
There are two other comments from Rick that I'd like to highlight here.
Rick said: "In most modern DBMSes, you almost don't need to index as long as you have Primary *and* Foreign keys set up. Joins are where indexes really shine, so proper keying will get you 90% of the way there."
Wouldn't you know, I've been trying to convince people to use foreign key constraints for years. ;) DataFaucet makes really good use of them and also makes them really easy to build. If you're using the built-in DDL features that allow the objects to automatically install tables, making a foreign key constraint is as easy as declaring a join (or easier I think). Here's an example from a previous article.
<cfcomponent output="false" extends="datafaucet.system.activerecord">
<cfproperty name="productid" type="uuid" required="true" key="1" />
<cfproperty name="productname" type="string" required="true" length="100*" />
<cfproperty name="productdescription" type="string" required="false" length="long*" />
<cfproperty name="productprice" type="numeric" required="true" length="real" />
<!--- create a foreign key constraint to ensure this product is placed in a category --->
<cfproperty name="categoryid" type="uuid" required="true" references="tblProductCategory.CategoryID" />
<cfset setTable("tblProduct") />
</cfcomponent>
And lastly I'll just encourage you to go read the article on Ben's blog, because Rick made a really clear analogy that helps to explain "selectivity" of an index, which you may have also heard described as the "cardinality" of the data. It's a good help to understanding how indexes work to improve the performance of your queries.