Documentation is Included in the Download

A couple of people have asked about documentation in the DataFaucet zip archive.

It's in the "samples" directory.

Sorry for the confusion.

Seeking a New Home for the DataFaucet framework website

Hi guys.

A while ago, Brian Meloche turned me on to the folks at HostMySite.com (now Hosting.com) because they were helping out in the community with some free accounts for hosting open-source projects and they agreed to set me up a free account for the onTap framework. I then put the DataFaucet site on that same account... apparently something happened (I'm not real sure what) and the account was closed a few months ago. I suspect it was a clerical issue, because I never got a notification about it being closed (although I did get a bill that I was told by their support staff to ignore).

This is by no means a complaint about HostMySite.com - I was actually very happy with them in general and would definitely recommend them in the future.

This does however change things for me a bit and I'd like to see if I can find a place to home those open-source project sites where I'd have a more one-on-one kind of relationship with whoever is hosting them. So if any of you have some extra space where I could set up these sites (and the AutLabs project site -- creating jobs for people with Autism and Asperger Syndrome), I'd be mighty grateful. :D

All three of them are low-traffic sites, so they shouldn't have much impact on your server.

Here are the RIAForge Projects for them:

And at some point I may also want to get domains and create official project sites for CacheBox and FreeAgent, but probably not right away.

Thanks! :D

CacheBox Presentation Recording

We had a great time with today's presentation of the new CacheBox framework!

We had an even dozen show up and a pleasant surprise at the end. There's a poll at the end of these ColdFusionMeetup.com presentations so the people who attended can anonymously rate the presentation. It asks "was this information useful?" and the options range from 1-star (it was a waste of time) to 5-stars (yes, very much). In this case 50%, an even half of the guys who came gave me 5 stars. So I feel really good about the presentation. :) There is certainly still room for improvement in my presentation skills, but I think this is great progress.

I've also used some of the feedback from this presentation and updated the intro to the documentation for the CacheBox project to clarify the benefits of using it.

Here's the presentation recording for anyone who missed it: https://admin.na3.acrobat.com/_a204547676/p26212200/

CacheBox Presentation

This latest version of DataFaucet includes integration of the new CacheBox project, the hot-swappable caching framework for ColdFusion. That's a great thing for DataFaucet because it means the DF cache is now much more configurable and you've got a handy management application that lets you look right into it, which wasn't there before.

Honestly the framework is still incomplete, but it is in a working state. We could use input into the management application, features (did we cover everything?), and in particular I'm really hoping to get some folks to provide some additional insight to help me finish writing the intelligence portion -- the methods that will allow CacheBox to auto-configure the cache based on usage patterns.

I hope you will all come join us for the discussion tomorrow at 6pm EST at the Online ColdFusion Meetup group.

DataFaucet and ColdFusion 9

Thanks to Ryan McIlmoyl (I hope I spelled that correctly), for a heads up about an issue with ColdFusion 9. I was unaware that CF9 added a "local" scope and many of the functions in the DataFaucet ORM framework used a var'ed structure to store temp variables, which obviously conflicted with the new scope. And not just DataFaucet, but there were literally 3.5-thousand references to "local" in the onTap framework as well (including plenty of false-positives).

I'd like to think that every upgrade will be as seamless as the upgrade from CF7 to CF8, where I didn't have to do anything (that I remember), but that's not the case. This isn't the first time I've had this issue. I remember around the time that Macromedia released ColdFusion MX I had to do a lot of work for a similar issue. I think it was the "this" scope, where not just myself, but a number of other developers also had structures named "this" in their code for CF5 and previous versions.

So a day and a half later I'm polishing off the update for the onTap framework and now getting around to uploading new copies of all the archives. Ryan tells me that his tests run well with the new code on CF9. I haven't installed CF9 to test anything with it yet, so I'm testing it all in CF8 and taking his word for it. :) Anyway if you've installed CF9 or are planning to install CF9 there's a new zip archive on RIAForge.

Enjoy!

Who's Using DataFaucet

A lot of open source projects have a "who's using" page on their website and I think this helps keep the projects active. So knowing that there have been quite a few downloads for DataFaucet in the past year or two, I'd like to get your help creating a Who's Using DataFaucet page for our site. You can reply on the blog here, post a note to the google group, or if you'd prefer to remain anonymous you can email me at info@datafaucet.com.

Thanks!

New 1.1 Release Candidate

I just put up a new version of DataFaucet, 1.1 and now a release candidate. This new version has a couple of new features.

1. Integrates with the new CacheBox caching framework that lets you hot-swap the caching engine and gives you a nice management application for all your applications (ColdBox, Wheels, onTap, DataFaucet, FuseBox, etc).

2. A new revision of the Persistence Service that I had started working on last year. This feature was inspired by discussion on two of Joe Rineharts blog entries last year, titled Does ColdFusion Have NO Real ORM Frameworks? and What Makes a Framework an ORM? Out of those discussions I created this new system in DataFaucet that I think will appeal to developers who enjoy Dependency Injection (DI) frameworks like ColdSpring or Lightwire, and would like an ORM that will more or less "stay out of your way". It uses your existing DI Factory as a source for objects and will even generate all the database tables necessary to persist those objects.

Here's the documentation for the new Persistence Service:

http://www.datafaucet.com/persistenceservice.cfm

Enjoy!

Pros and Cons of an ORM

I just came across this blog thanks to a post on Hal Helms Facebook page. Every design pattern we use has some advantages and then some drawbacks, it's just the nature of the beast. We're essentially purchasing the advantages we want by paying with the drawbacks, so the question becomes what is the advantage worth? In the case of a traditional ORM for example, is it worth a certain loss of direct control over our persistence layer (database) in order to theoretically simplify our day-to-day coding tasks.

I think Bill Karwin made some good points in his blog and it's interesting because we tend not to talk about or hear much about the drawbacks of the design patterns we like to use. This is actually just a basic fact of human nature. For example, if you happen to be a big fan of Linux, you're liable to conveniently forget anything about Linux that's challenging or frustrating. And of course the same thing happens with people who are fans of Microsoft. But it's good to remind ourselves from time to time that you can only see polarization from the outside. By this I mean, a fan boy usually doesn't think he's a fan boy. ;)

This blog entry points out the effect of polarization so that you can see the obvious biases between Glenn Block who is an ORM enthusiast and Karwin who is obviously not. In many cases they even disagree about whether something belongs in the pro column or the con column. So looking at the same feature, they have exactly opposite reactions to it. It doesn't get much simpler than this example from Karwin's blog where he gives his opinion of something Block describes as an advantage of ORM systems:

Block: 4. Rich query capability.

Karwin: Absolutely wrong.

I'm always curious to know what other people really think about my software. DataFaucet seems to have been rather well received. But DataFaucet may also be difficult to categorize in particular as an ORM or simply as a Data Access Layer (DAL). Steve Bryant describes his DataMgr tool as a DAL - it's much smaller and much simpler than Reactor, Transfer or DataFaucet. But this sort of thing really depends who you ask, because according to Joe Rinehart, none of the tools billed as ORM for ColdFusion actually fit the definition of ORM. Joe's blog encouraged some additions to DataFaucet that are in source control, but not yet officially released, but, even with added features that make DataFaucet more like traditional ORM tools, does that make it an ORM? Well if you asked Bill Karwin again, who inspired this article, I think he would say no. And here's why I think Karwin would say that DataFaucet is not an ORM:

Block: 2. Huge reduction in code.

Karwin: Depends. When executing simple CRUD operations against a single table, yes. When executing complex queries, most ORM implementations fail spectacularly compared to the simplicity of using SQL queries.

... Block: 4. Rich query capability.

Karwin: Absolutely wrong.

Block: 5. You can navigate object relationships transparently.

Karwin: This is definitely a negative rather than a positive. When you want a result set to include rows from dependent tables, do a JOIN. Doing the "lazy-load" approach, executing additional SQL queries internally when you reference columns of related tables, is usually less efficient. Leaving it up to the ORM internals deprives you of the opportunity to decide which solution is better.

Block: 6. Data loads are completely configurable ...

Karwin: This is not a benefit of an ORM. It is actually easier to achieve this using plain SQL.

My impression is that, although it might be a little slower, Karwin would not have these same objections to DataFaucet, which started it's life as an attempt to abstract the SQL language in a way that Ben Forta declared impossible. The key word was "portability" at the time, but in the process I managed to find ways to not only make querying the database portable, but easier as well. The ability to specify and/or keywords in search queries (think Google) is a prime example of a use case in which standard SQL is a big challenge, but DataFaucet is dead easy. The reason is because for all the ORM features in DataFaucet, it started as a "language". As far as I know, none of the other ORM systems for ColdFusion have approached this particular task.

Although I think Joe Rinehart might have been wrong when he said ColdFusion doesn't have any real ORM systems. If what you want really is a traditional ORM system, I believe there actually is one for ColdFusion. It's a built-in part of the FarCry framework called FourQ. It wasn't included in Steve Bryant's comparison of DAL tools, I think primarily because it's inseparable from FarCry, which if memory serves is an 8MB download (that's compressed). According to their own documentation, the objective of FourQ is to ensure that as a programmer, the word "database" never enters your mind. I'm not sure how effective they've been at achieving that goal, since I don't use it with any regularity. It may work beautifully if that's what you're after. But it does mean that you won't get the kind of querying flexibility that a system designed as a querying language like DataFaucet will give you.

It's not my intention to promote or to bash anyone here (except of course obviously to promote DataFaucet). But I think Karwin made some good points and these are worth considering when choosing between DAL or ORM frameworks.

DataFaucet ORM API Documentation

Dan Lancelot just submitted to the mailing list this API documentation for the framework created using Mark Mandel's ColdDoc application.

Dan also noted a couple of leftovers in the documentation where I had used "save as" and then neglected to update the hint on the CFC. Oops!

Anyway, I decided to put it up on the framework site for folks and say thanks to Dan and Mark. Of course as new versions of the framework are published, we'll also update the api information, and I may actually decide make it part of the distribution.

Query Optimization Hints - thanks to Rick Osborne and Ben Nadel

Ben Nadel posted this blog early this morning with a bunch of query optimization hints from Rick Osborne. Thanks Ben and thanks Rick!

I don't have a DB2 database handy, which is part of the reason why there's not currently a DB2 sql-agent in DataFaucet. There is however no reason why anyone couldn't simply create a sql-agent for DB2. These comments about query optimization however lead to some interesting thoughts about potential improvements for the sql-agents in general in terms of making the agent perform more efficiently for its target platform.

As an example:

Rick says: "Yes, put as much of the filtering as you can in ON clauses. Not only does it put the conditions where they are most relevant, but in some engines you'll get orders of magnitude better performance. The DB2/400 optimizer is so dumb (how dumb is it?) that if you put the conditions in the WHERE instead of the ON it will do the joins first, no matter how big the tables, and then only apply the conditions at the end. For extremely large tables, this is a nightmare."

By default, DataFaucet's query-builder automatically puts all the join conditions in the ON clauses and when performing a left join it properly places any filters on the joined table inside the ON clause as well, so that you can filter on a left join without turning it into an inner join in disguise.

But what struck me about Rick's comment here was that it would be pretty easy to write the sql agent so that it places those filters first before the join condition to improve the query performance just on DB2. For that matter Rick mentions a number of potential optimizations that could be similarly handled inside the abstraction. DataFaucet doesn't currently handle anything in that kind of detail, for example, reordering tables based on size, etc. but in the future it could. It's at least an idea worth keeping in mind for now. :)

Something else Rick said: "And no, the dream of having one query work perfectly on multiple engines is really just a dream."

If you're talking about flat out queries, yes, that may be true. Part of the reason why I started working on DataFaucet in the first place however, way back in the CF5 days, was to produce platform agnostic SQL. So while it may not be possible within an individual query, it might certainly still be possible within the abstraction. ;)

There are two other comments from Rick that I'd like to highlight here.

Rick said: "In most modern DBMSes, you almost don't need to index as long as you have Primary *and* Foreign keys set up. Joins are where indexes really shine, so proper keying will get you 90% of the way there."

Wouldn't you know, I've been trying to convince people to use foreign key constraints for years. ;) DataFaucet makes really good use of them and also makes them really easy to build. If you're using the built-in DDL features that allow the objects to automatically install tables, making a foreign key constraint is as easy as declaring a join (or easier I think). Here's an example from a previous article.

<cfcomponent output="false" extends="datafaucet.system.activerecord">
<cfproperty name="productid" type="uuid" required="true" key="1" />
<cfproperty name="productname" type="string" required="true" length="100*" />
<cfproperty name="productdescription" type="string" required="false" length="long*" />
<cfproperty name="productprice" type="numeric" required="true" length="real" />
<!--- create a foreign key constraint to ensure this product is placed in a category --->
<cfproperty name="categoryid" type="uuid" required="true" references="tblProductCategory.CategoryID" />

<cfset setTable("tblProduct") />
</cfcomponent>

And lastly I'll just encourage you to go read the article on Ben's blog, because Rick made a really clear analogy that helps to explain "selectivity" of an index, which you may have also heard described as the "cardinality" of the data. It's a good help to understanding how indexes work to improve the performance of your queries.

Database Introspection and the MetaData Facade

Being a forerunner is often kind of exciting. There's a certain sense of pride that goes along with being the first person to achieve a particular goal. Y'know, ego stroking. ;) But there's another more challenging side to being a forerunner. You're the first person to get hit when things are launched in your direction.

Okay well I may not have been the first to get hit in this case but my challenges over the past couple days certainly stem from having been a forerunner.

As of today I'm running my ColdFusion 8 installation on my horribly under-powered, not to mention OLD notebook with the new CF Admin option "disable access to internal ColdFusion Java components".

The challenge in doing this was that I needed to find a way to get all the metadata I needed from the database without being able to get an actual JDBC metadata object. The JDBC object I need is actually *NOT* part of the ColdFusion server, oddly enough. The problem stems from the fact that it needs to be associated with an active Java SQL.connection object and the only way I can get that is by going through the undocumented "coldfusion.server.ServiceFactory".

Until Adobe released ColdFusion 8, this wasn't a problem. And Adobe actually tried to resolve the problem I have by also introducing the CFDBINFO tag in ColdFusion 8. (I'm pretty sure I submitted the enhancement request to Adobe for this tag and I've had several people thank me for it, saying it's one of their favorite new features, so I put that in the win category.) ;) Unfortunately although they do return some good and useful introspection data, they failed to solve my problems in particular. There are a handful of problems with the data returned by the cfdbinfo tag, like the fact that it won't tell you the names of any of the foreign key constraints (which, you MUST know if you ever plan to drop them programmatically). The data's there - it's part of the standard set of data returned by the JDBC metadata object even.

That wasn't such a huge thing, the bigger issue (and one I may still need to address more) is that it doesn't return any schema information for tables, views or foreign key constraints. Most of the time your applications will only work in one particular schema (on SQL server it's usually although not always DBO), but particularly if you work with Oracle for the enterprise at all there's a good chance that you may need to draw data from alternate schemas. So the DataFaucet introspection code was designed to account for that -- after all, it's been part of the SQL standard for many, many years! And without that information the foreign key constraint code in particular will fail. And so because of that I couldn't use cfdbinfo for all of the metadata I needed. Instead I had to go back to using information_schema for constraint information (although it's in the sqlagent.cfc which means you can customize it for different db's if you need to).

Don't get me wrong. I understand exactly what Adobe was trying to do when they implemented cfdbinfo. They were trying to take the complexities of JDBC introspection and boil them down into a package that would be easier for developers to digest. And so instead of requesting imported vs. exported foreign keys, you simply request columns and the column list says which columns are foreign keys. That much I understand and could even agree with. But they dropped vital information while they were at it. Hopefully they'll fix those oversights in a future release.

I have actually used information_schema before. Many years ago, I started doing database introspection when we were still limited to ColdFusion 5. That was before CFCs even, all I had to work with were custom tags. NOBODY was talking about writing db abstraction in ColdFusion at the time. In fact, pretty much everyone recommended to avoid even trying it. Except me. I'm not sure Mark was even around yet and Transfer certainly wasn't. Neither was Reactor. Of course now it's common, but not back then. And those tags I had to work with were full of an amalgam of information_schema and proprietary meta-data tools. Ugh! But that was the price of being a forerunner, working on things that weren't easy to pull off. It wasn't until around the time that 6.1 (Red Sky) was released that I converted everything to use JDBC, specifically because even between databases that support the information_schema standard, the data from JDBC is still more consistent and reliable.

So ultimately the existing code in DataFaucet that relies on the serviceFactory is still the most solid way of handling all the things that DataFaucet does. So it's going to try and use that by default, because that's a best case scenario. But we've been working on a project with Eric Jones the past week and their hosting provider disables access to the ColdFusion classes. That meant I had to work on an alternative solution. And I've found myself more or less in the same sort of situation I was in back when I was doing this with CF5, slogging through a maze of different and none-too-consistent standards for getting the information I need.

For that matter, I even asked both Mark Mandel and Sean Corfield if they had any suggestions on the Java front for an easier way that maybe I could get at a loaded metadata object without going through the serviceFactory since it's not a ColdFusion class (and so wouldn't be protected). Neither of them had any advice and for that matter, Mark doesn't even do database introspection generally speaking. So here I am again, the forerunner... bitten in the rear by my success. ;)

I can't say that the solution hasn't been interesting. I had to design a system that would attempt to get the serviceFactory and then on failure create a couple of new components. These new components are facades for the connection and metadata objects. They're just CFCs like everything else in the framework, but what they're doing is pretending to be the java objects the framework wanted. (And hooray for duck-typing in ColdFusion! Because this problem would have been a lot harder to solve in an explicitly typed language like Java.) So these objects have to return the data in the same format that's in use. Since 6.1 that's been the JDBC format... and guess what, neither cfdbinfo nor information_schema follow the same format or naming conventions. So I can't say that it hasn't been interesting (and certainly challenging), but I do wish it weren't necessary.

But then, I'm used to these kinds of tough challenges. I've never been the kind of guy to run away from a technical challenge. ;) And this does mean that if you've had problems using the ORM due to the limitations of your hosting there's now a solution. :) (Unless they disable the creation of Java objects all-together in your sandbox.)

Anyway I haven't released the code yet, but I have checked it in to the SVN repository. So if you have an immediate need, you can try the new code by getting the BER from SVN. I probably won't release a new downloadable archive until I create a small sample application and some documentation for the new Persistence Service that I've talked about in my last few blogs. And at that point because of the new service and because of the metadata facade I think I may roll it up from 1.0 to version 1.1.

More Entries

BlogCFC was created by Raymond Camden. This blog is running version 5.5.006. | Protected by Akismet | Blog with WordPress