T-SQL Tuesday 152: An ORM Will Not Save You From Yourself
T-SQL Tuesday is a monthly blog party hosted by a different community member each month, and this month Deborah Melkin (blog | twitter) asks us for a database-related rant.
Who are we kidding? I like to rant. Here we go.
Sell Me on an ORM
Some people really like object-relational mapping frameworks (ORMs) such as Entity Framework, Dapper, or NHibernate. They’re sold as a boon to developer productivity, you can abstract the database from your application code, your application might be RDBMS-agnostic, and lots of things are just “handled” automatically. As a developer, you’re freed from having to deal with the DBA, you don’t have to deal with messy, not-like-any-other-language-you’ve-used-before, hard-to-grok SQL, and the only data types you’re dealing with are the ones you already know and love from C#, Java, or whatever language you’re working in at the application tier.
Too many (read: more than zero) developers believe that because of ORMs, they don’t need to understand anything about what’s happening beyond the ORM-generated classes they’re coding against.
At this point, the ORM is getting in your way about as much as it’s helping.
But the ORM Handles Things for Me!
RDBMS Agnostic? Yes, but No.
Let’s say you have a developer, maybe a whole team of developers, who prefers working on Linux. Thanks to things like .NET Core, they can build apps that will happily run on Windows Server. But those same developers also don’t have an instance of SQL Server accessible to them, and SQL Server is what’s used in production.
So, the developer opts to use SQLite for their local development. SQLite is a nice database engine, and very useful in the right places. And it sure beats having to stand up a whole database server on your desktop. The ORM works great with it. Code code code, test test test, a couple weeks go by and a build is offered up to be deployed for users to test.
Then…the database migration fails when it’s pointed at SQL Server. Cue “works on my computer.”
The problem? Somehow (I don’t think I ever saw the details of “how”), part of the application was coded in such a way that the ORM allowed one of the fields in the composite primary key to be
NULL. SQL Server doesn’t like this. The SQL standard doesn’t like this. But SQLite allows it. From SQLite’s documentation:
According to the SQL standard, PRIMARY KEY should always imply NOT NULL. Unfortunately, due to a bug in some early versions, this is not the case in SQLite. Unless the column is an
INTEGER PRIMARY KEYor the table is a
WITHOUT ROWIDtable or a
STRICTtable or the column is declared
NOT NULL, SQLite allows
NULLvalues in a
PRIMARY KEYcolumn. SQLite could be fixed to conform to the standard, but doing so might break legacy applications. Hence, it has been decided to merely document the fact that SQLite allows NULLs in most
Either the ORM didn’t allow the developer to create a
STRICT table, or the developer wasn’t aware that it was a thing. Either way, the ORM didn’t handle this situation. And multiple people lost most of an afternoon trying to figure out why the migration broke.
Naming Things is Hard
Once past the primary key issue, we had the database migration working. Terrific! Then, I noticed something odd about the index names. First off - good on the developer for identifying that indexes are needed (whether they’re good indexes remains to be seen). But the index names were long. How long? As many as 128 characters! Which happens to be the largest an identifier can be in SQL Server.
How’d we get such long index names? Because by default, the ORM creates the index name by concatenating the name of each index key field, separated by underscores. Long column names, many columns, and we hit the limit. It’s just so easy to accept the defaults because the ORM is taking care of everything for me, right? Who cares how long the name is, anyway?
As the DBA, I care. I have to look at these things in execution plans. I have to deal with them when I’m doing index analysis. It’s going to blow out column widths in Excel, Plan Explorer, the output of tools like
sp_blitzindex, and more. These names are just…🤮.
The thing is, the ORM probably had a facility for the developer to give the index a shorter name. Entity Framework does, for example. It just takes a quick look at the documentation, pausing, and asking oneself “could I make this easier on people down the line?”.
The Easy Path (Probably) Doesn’t Scale
I’m sticking with the ORM thing here, and the same application as above. At this point the database is deployed, there are classes in the application that correspond to the tables in the database, and the developers can get to work building their data analysis engine. They grab some data from object (table) A, some more from object (table) B, then bring everything together in the application, merge the two data sets, do some aggregation, a few loops (maybe even nested) and give the user their results.
It works great on the developer’s workstation with 1000 records in each table. Job’s done, we can go to Happy Hour and celebrate.
Then the user gets their hands on it. It’s not working anywhere near as well for them. They click the buttons, wait, wait, then wait some more. Finally, after 45 minutes, they get an error message. They’re running the analysis over tens of millions of records in each table. Once you start looping those, and nesting loops, and then doing math along the way, it’s game over.
Let’s pause for a second and ask ourselves a very important question: What are some of the things relational databases are really good at?
- Filtering datasets
- Combining datasets
- Performing aggregations and other basic (and not so basic) arithmetic on large sets of data
But it’s easier to do it in the application code using the objects the ORM gave us because we didn’t have to “bypass” the ORM (tell the ORM to run a query instead of creating a collection of basic object from a table), or figure out how to write the SQL query. The easy path doesn’t scale and you have to take a different approach. You can’t just say “well, it’s a lot of data, that’s just how it is” because the problem doesn’t lie in the tool, or the data volume, or the database you’re using.
The Root Cause
The developers here made assumptions. Assumptions about what the ORM would do for them, assumptions about how their data persistence layer worked, and assumptions that things couldn’t be made better. Sometimes, that was because they didn’t understand what was happening or what was possible by looking one more layer down the stack. Or maybe they just didn’t know that it was possible to circumvent the “let the ORM do everything for me” approach when called for.
But that’s not an excuse. You can’t just accept everything at face value. Sometimes, you have to get your hands a little dirtier to get the job done right.
Thanks to Deborah for hosting this month’s T-SQL Tuesday and letting me get this off my chest! I had a few things I could rant about, but decided to go with this one for a few reasons:
- It’s new (for this blog. I think.)
- I kind of ranted about the first thing I thought of last year
- I saw the top of Andy Yun’s (blog | twitter) entry for the month and as he correctly surmised, I thought it would be about data types. I didn’t want to double up on that. Now that I’ve gone read the whole thing after writing this, I realize we overlapped anyway! (seriously, go read Andy’s post too)