Mar 222013
English: Drawing of the object model for Liqui...

English: Drawing of the object model for LiquidThreads. From a meeting at Wikimedia Headquarters, San Francisco. (Photo credit: Wikipedia)

Yesterday, the wolf pack was talking through a particularly complex domain model.  It was much more complex than your general one-to-many, one-to-one issues; it contained objects that were very fluidly related and often cross-related.  Object A has a collection of Object B, except sometimes Object B is a child object of Object C in Object A’s other collection.  Object B is also the parent of Object D, which may or may not be associated with an Object A.  So on, and so forth.

Of course, this wasn’t a discussion about coding; this was a discussion about how our customers wanted to use the software and what kinds of interactions best fit how they wanted to use the system.

So, we got a basic model put together that seemed to be all right, and just for kicks, I let Entity Framework generate the database tables.  It did a great job, making decisions about keys and constraints and who owned what relationships that I probably would not have thought of.  It was pretty genius.  But here’s the catch: although the object model made sense, using that database table structure – ingenious as it was – would cause lots of multiple table joins for even some of the most basic things users would want to see and do.

Then, the conversation moved toward the most efficient relational database design to get the performance we wanted, knowing we could use the ORM to map our objects to whatever structure we came up with.

A lot of times, when people think of ORMs, the main value they think of is not having to write a lot of data access code.  This is certainly an important value, and I don’t want to minimize that, but that’s sort of like saying the primary value of TDD is the ability to refactor your code without breaking anything.  It’s certainly important, but not really where most of the bang for your buck comes in.

To me, the real value of the ORM is the decoupling of your database tables from your object model.  This allows me to write my domain in the way that best reflects how these things actually interact in our business, in our code, and for our customers – while at the same time being free to optimize the database for the most performant way to store and retrieve the data relationally.  Having the extra layer of abstraction also allows me to tinker with either layer without breaking anything.

For simple applications, your database tables and domain classes may look similar, and for applications using NoSQL, this issue is barely on the radar if at all, but for complex apps using relational databases, you really don’t want your domain looking like a bunch of database tables and vice-versa.  Those two things have completely different roles in your application and have very different definitions of efficiency and good design.

So, if you struggle with your domain being hard to model in a database, or your domain is full of classes like “UserToRoles,” you might look into whether or not an ORM would help you out.

Enhanced by Zemanta
Jan 102012

If I had the time and the expertise, I’d love to make a 60s style health film about the Select N+1 problem.

This problem arises usually when you need to get related data from more than one database table.

Let’s say I have a Customers table with CustomerId as a primary key.  There is an Orders table that uses CustomerId as a foreign key, and there is an OrderLineItems table that uses OrderId as a foreign key.

On my website, let’s say I have a page where I want to show a customer a list of their orders and the details of those orders.  From a pure SQL standpoint, what I want to do (probably) is do one SQL SELECT statement where I join those three tables together using those IDs to do it.  Makes sense, right?

But imagine this.  Imagine I took my CustomerId, did a SELECT in the Orders table to get all the OrderIds tied to that Customer (that’s one SQL statement), then proceeded to execute a SELECT statement against the OrderLineItems table for each individual OrderId that I’d found.  In other words, instead of something like this:

SELECT * FROM Customers C, Orders O, OrderLineItems OL WHERE C.CustomerId = O.CustomerId AND O.OrderId = OL.OrderId

I did something like this:

SELECT OrderId FROM Customers C, Orders O WHERE C.CustomerId = O.CustomerId

SELECT * FROM OrderLineItems WHERE OrderId = 3

SELECT * FROM OrderLineItems WHERE OrderId = 7

SELECT * FROM OrderLineItems WHERE OrderId = 13

And so on.

That second solution is what we call a Select N+1: instead of doing one join, you’re doing a SELECT over and over to get your data out of each “join id” value individually.

Sometimes, people will do this in raw code or SQL operations, but the problem is far more common when you use ORMs that lazy load.  If my ORM lazy loads by default, then in code, when I do something like this:

foreach(var lineItem in myCustomer.Orders[0].LineItems)

My ORM is going to think, “Ok, I need this customer’s orders.  Ok, I’ll hit the database and load that.  Oh, wait, I also need all the line items.  Better go back and get those.”

For the example above, that’s not that bad because we’re just doing one order, but if we wanted to iterate through all the line items in every order (maybe we want to display if a customer has already ordered an item if they’re viewing it on our website), the odds are very good that your ORM will go back and get the line items by way of using Select N+1.

So, how do you make sure this doesn’t happen?

All the solutions to this problem revolve around eagerly fetching the data that you’re going to use rather than your ORM having to “go back and get it.”  Unfortunately, there’s not a one size fits all solution for every situation.  Even in the exact same application, it might be better to solve the Select N+1 one way in one spot, and take care of it a different way in a different spot.

One solution is to turn off lazy loading.  Most of the time, this is not ideal.  Not only does it typically mean worse performance across the board, it also means your ORM can’t do a lot of its automated magic, and you end up having to do a lot of synchronization in your context, etc.  Sometimes, though, it might be useful to disable this at a property level if you find you are always using the data from a property along with its aggregate root.

Another solution is to specify the things you need to eagerly fetch in your query.  This is usually how I deal with this problem.  In my query, I just tell it, “When you get a customer this time, go ahead and load their orders and line items, too.”  I mean, I don’t say that out loud, but I code that into the query.

Another solution may involve how your object model is laid out.  Maybe you have two entities joining together when, in reality, one is actually a component of the other.  Maybe you’re using a lot of joined entities to get something you could get from a simpler object that just represents the data that you need for that operation (typically, these are DTOs).  Maybe you’re querying an aggregate root when you don’t need anything from it and can query the “child” objects directly.

In any case, using an ORM does not free an application team from having to worry about the SQL.  Even if your mappings are great and your object model is awesome, you can still run into Select N+1 without even knowing it.  Check the SQL your ORM is generating and decide if it needs tweaking.

Dec 122011
English: Example of a relational database tabl...

Image via Wikipedia

There’s a lot of popular misconceptions about how NHibernate Sessions work, especially things like the Save() and Update() methods, and what the ramifications are for Transactions.

Imagine the NHibernate Session like a container. When you load an object with NHibernate, it gets put into this container. Any change you make to that object anywhere, NHibernate knows about it. Those changes also happen in the container. They aren’t “disconnected.”

Let me say this a different way: If you use an NHibernate Session to Get() or Load() or whatever an object, NHibernate is now aware of that object and all of its changes until the Session is closed.

So, you might end up with two or three different objects or collections of objects in this container.

Most people use an NHibernate Transaction for everything, which is a good practice, considering that NHibernate uses an implicit Transaction for everything whether you specify one or not.

Here’s the kicker.

When your Transaction commits, NHibernate will take your objects in the container and synch up the database with the state of those objects, whether you have explicitly told it to save those objects or not.

This can seriously bite you in the butt if you’re not aware of it.

Trying to save some overhead by not loading an object’s child collections? It goes into the Session without collections, so when the Transaction commits, guess what? NHibernate will delete the child records when the Transaction commits, whether you have explicitly told it to or not.

Did you change some data in your object for display purposes? NHibernate will save that change back to the database when the transaction commits.

Now, the overwhelming majority of the time, NHibernate tracking our changes automatically is very nice and saves us from having to atomically manage every add, update, and delete on every single object. However, it is important to note that these changes will be committed to the database whether or not you have explicitly told NHibernate to do so.

Thankfully, for those times you’d like to manipulate an object and not have those changes synched with the database, NHibernate offers a very simple way to detach an object from the Session.


And then you can do whatever you want, but keep in mind NHibernate is now effectively “blind” to this object.

If for some reason you need to attach a disconnected object to the Session, you can do that by:

  • Session.Update(yourObjectHere); attaches a changed object
  • Session.Lock(yourObjectHere); attaches an unchanged object
  • Session.Merge(yourObjectHere); if the object exists in the Session, this will merge the changes from the detached object into the one in Session – otherwise, it behaves as Lock().

It’s important to note that this is not the case if the objects are sent over the wire somewhere else, such as via a web service. NHibernate has no way to know what you’re doing to those objects outside that system boundary, so for all practical purposes, your web service acts like a blanket Evict() statement for the client.

Sep 212011
A mirror reflects Sarge and the Quake III logo...

Image via Wikipedia

The problem isn’t their work ethic.  Your developers work hard.  They often work overtime that you may not even know about.  They kill themselves to meet their deadlines.  They aren’t playing Quake when you aren’t looking.

The problem isn’t their intelligence.  Your developers are extremely smart people.  They have put themselves in an industry that is constantly changing, and the rate of that change will only increase.  They have to deal with abstractions and concepts that rival math, linguistic analysis, and other varieties of pattern-spotting and computational jobs.

The problem isn’t their industry knowledge.  Your developers are reading magazine articles, blog posts, going to conferences and Code Camps, books, and working on side projects at home to sharpen their skills and stay on top of things.

The problem with your developers is that they know there are better ways out there to do what they do, and they’d do them if they had someone to show them how to successfully implement them in real projects instead of just giving them theoretical knowledge and examples, but they don’t have that.

It will take them literally years to compensate for that – to claw their way up into a higher level of development skill and productivity – all for the want of someone just to show them how these fancy concepts like TDD, Interface-Based Programming, Dependency Injection, etc. actually work together to get a real project done.

That’s their problem.

Feb 242011

Image by sunshinedistrict via Flickr

On February 17, 2011, Ayende said this stuff.

Oh, look everyone – a lesson on simplicity from the man who invented HBM files!

Leaving aside the specifics of the recommendation and the perhaps unwarranted comparison to CQRS, every developer needs to put aside their ego for a moment and think about the values that Ayende is trying to preserve: simplicity of understanding code and ease of modification.  I hope that any developer who isn’t deliberately trying to show off or maintain job security by way of obtuseness agrees that these are good values to protect.  So, we’re on the same page in terms of what’s important in this discussion.

What is the general way Ayende is recommending to protect these values?

Don’t have abstractions for the sake of having abstractions.

Here is where there may be a fork in the road, and I have to side with Ayende on this.  We love our DDDs, SRPs, SOLIDs, and a whole lotta other acronyms.  I see too many developers start every solution with an ORM, an IoC container, an MVC framework, a services layer, a domain model layer, and all of this spread out over fifty different assemblies without even stopping to think what value all of this brings to their specific project.  It’s just what you do.

The problem is that nothing is the right way to do every project.  Even something as common as using a relational database for persistence isn’t right for every project.  There’s a reason this guy notes patterns like Active Record and Transaction Script.  The fact is, a thoroughgoing multilayered approach to every application is not appropriate, even if it might be the right choice for a good number of them.

The project I’m working on now is distributed across multiple servers and a WCF layer (on its own machine) bridges between the database server and the web server (on their own machines).  Does my solution have several layers in it?  Yup.  I have assemblies that will be on different machines and a requirement to make all calls through WCF.

Furthermore, the infrastructure of my project is volatile.  The RDBMS hasn’t even been decided on, and we started development a month ago.  This application also needs to communicate to a mainframe and possibly one other proprietary data store.  The jury is still out on whether or not we have to make all queries via stored procs.  The ability to swap out implementations of the respositories, ORM, etc. has real value to our project.

Do all projects need this level of modularity?  Not at all.  And the less modularity I think I need, the less abstractions I will have.

Then we get to the actual, specific recommendation: Ayende wants to see the UI make direct calls to NHibernate.

The irony of this is, if you wanted to use the simplest data access route possible, especially if you go a CQRS route, then NHibernate is the kind of thing you don’t necessarily want and certainly don’t need.  DataTables work just fine.  Stored procs work just fine.

The reason we love NHibernate (and I do) is because it solves a specific problem – disparity between a domain model and the structure of a data store.  If you don’t need a domain model, or there is no disparity between your model and data store, why on earth would I add an extra layer of mapping in there, even for write operations (note: Ayende’s example is a read operation)?

The thing is, a domain model does more than provide a means of shuffling data around – it models a business and business processes.  Domain objects have dependencies on each other because distinct business elements have dependencies on each other in reality.  If all your app does is act as a front end to a database, and your infrastructure is nice and stable, then by all means, lose the domain model.  Use stored procs and DataTables and DataReaders and go nuts with your poorly-abstracted self, because abstractions will buy you very little.

But if your application is actually automating business processes, then maybe the simplest route to data access shouldn’t be your primary consideration.  Maybe it’s worth modeling business truth in your app and not a convoluted SQL query or HBM file.  Maybe the ability to test that you’ve accurately captured your business’ logic and not just whether you’ve got the right fields on a page will buy you something.