Greywire(s) - Thoughts from a Nerd: 7/22/07

Friday, July 27, 2007

ORM BAD! ARGGHH!!!

The more I think about it the more I realize two things:

1. True Object Relational Mapping is inherently flawed. Database tables are not objects, nor are records. They cannot inherit from another. They do not encapsulate data with access methods.

2. Lots of systems calling themselves ORM really are not by the strictest definition of ORM. They are Object Oriented Database Abstractions... but that just doesn't have as nice a ring to it. It also doesn't fit in with the politically correct "Everything should be an object" doctrine.

Look, people: Object Orientation is great. But so is procedural programing, arrays and set theory, aspect oriented programing, etc.

What is it that these ORM systems are often trying to solve? In many cases its not "I have a ton of objects I need persisted to a storage medium". I think more often than not, its "I don't want to have to write an assload of code to do basically the same stuff over and over again when accessing my database". They are not so much concerned with what form the data is in, just that they can Create it, Read it, Update it or Delete it. Yes folks, its CRUD. And I think CRUD is set to whip ORM's ass just like REST is beating SOAP and JSON vs XML.

That's not to say that we can't or shouldn't use OO design to create our CRUD code. We just shouldn't be trying to directly map our objects into data.

So, now that we have that out of the way, lets look at the two common perspectives on how this should work:

1. Define the Code first, then generate a database schema to match.
2. Define the database schema first and then generate the code to access it.

A sub part of method 2 is whether to go strictly by the schema itself, or to add a config file of some sort.

The main idea here is be further lazy (ok, its supposed to be to keep the code and schema in sync; in database terms, normalized..) and define the data structure only once. Going with the code first method is excelent if you don't have a database already created. But what if you have an existing DB? What if you have a DBA who makes changes to the database but is not a programmer? What if you have another part your application written in another language and framework? What if you just want to go into phpMyAdmin and tweek the database that way?

Using the schema first method eliminates all those issues, but has some of its own. What if you want to switch to another type of database backend? With the previous method, it could be as easy as plugging in the new database adapter and regenerating the schema. What if your database design (or your database system) doesn't provide enough or the right kind of meta-data to be able to create usable code from? This is where the config file comes in, covering what's missing or non-standard.

A third option is actually possible, wherein the config file takes full control and defines the schema in a code and database neutral form, from which both the code and schema are generated from.

I definitely am ruling out the code first option. I think there are too many problems with this, and databases should really be designed by a database expert. I think bad code is much easier to fix later on than a badly designed database. Especially if you can regenerate much of the code from the database. The third option is great if you intend to target more than one database backend.

Suitable for framing

I have been looking for a framework to adopt for an existing large project. I have been using a RAD tool (Codecharge Studio, CCS) since the project's inception some years ago. It has treated me relatively well, allowing me to rapidly get the thing working, where it would probably have taken too long any other way.

But over that time I have come up against the limitations of this tool. It works great for what it does.

Which is this: it creates your CRUD pages from an existing database. This is somewhat ORM-ish, as it creates a set of classes that loosly match the MVC concept. And that works great for a lot of things, and its fast.

But these classes are not interoperable. You can't have one interact with another and have them create a SQL JOIN for instance, or even just embed one inside another (say updating one table requires inserting something in another). You also can't easy swap out parts of the MVC structure, like say having a choice of views for HTML or JSON output, or changing the model to one that doesn't interact with the database (lets say the data came from a remote call to another application). Its also hard to make the view more flexible, like having a variable number of columns.

So I've wound up hand coding some of the parts of my app.

Now I will probably still use CCS for some things, like the admin pages, were it works well. But the core of my app needs reworking to support more advanced AJAX style features, more modularity, etc.

But enough background.

What interests me in particular is the database layer, and the MVC structure. The ORM concept (be it model-creates-schema, schema-creates-model, or external tool creates both) is somewhat flawed, at least when it becomes "the end all solution" (see this), but I see it as similar to CCS -- a rapid dev tool that covers maybe 80% of the mundane stuff. The only thing that seems to be lacking really is that the ORM tools only cover the model and database, they don't help you with the view, where CCS does all three.

I would like to see the database layer have multiple options. I see no reason you can't have it support both defining the model first and then asking it to generate a schema, or having it be able to look at the schema and generate the model. Either way, when the application is done, the code should be largely the same. That is to say, the only time code or schema generation takes place is when you as a developer tell it to do so.

Now, on to MVC. From what I can see, it seems like all the PHP frameworks use MVC. Which is to say, that none of them use PAC or HMVC. The main thing here, for me, is that I want flexibility in the layout. I want to put multiple components (MVC/PAC/HMVC triads) on a web page, some of which may be collections of components, and I want a method of specifying arguments to each of them from the URL. The only framework I saw that did this was Claw, which seems to have been abandoned.

So what's the PHP framework for me? I don't know yet.