I’ve decided to jump on the bandwagon and spill my thoughts on “NoSQL” since it’s been such a hot topic lately (, , , ). Since I work on the Drizzle project some folks would probably think I take the SQL side of the “debate,” but actually I’m pretty objective about the topic and find value in projects on both sides. Let me explain.
Last November at OpenSQL Camp I assembled a panel to debate “SQL vs NoSQL.” We had folks representing a variety of projects, including Cassandra, CouchDB, Drizzle, MariaDB, MongoDB, MySQL, and PostgreSQL.
Even though I realized this was a poor name for such a panel, I went with it anyways because this “debate” was really starting to heat up. The conclusion I was hoping for is that the two are not at odds because the two categories of projects can peacefully co-exist in the same toolbox for data management. Beyond the panel name, even the term “NoSQL” is a bit misleading. I talked with Eric Evans (one of my new co-workers over on the Cassandra team) who reintroduced the term, and even he admits it is vague and doesn’t do the projects categorized by it any favors. What happens when Cassandra has a SQL interface stacked on top of it? Yeah.
One reason for all this confusion is that for some people, the term “database” equates to “relational database.” This makes the non-relational projects look foreign because they don’t fit the database model that became “traditional” due it’s popularity. Anyone who has ever read up on other database models would quickly realize relational is just one of many models, and many of the “NoSQL” projects fit quite nicely into one of these categories.
The real value these new projects are providing are in their implementation details, especially with dynamic scale-out (adding new nodes to live systems) and synchronization mechanisms (eventual consistency or tunable quorum). There are a lot of great ideas in these projects, and people on the “SQL” side should really take the time to study them – there are some tricks to learn.
Square Peg, Round Hole
One of the main criticisms of the “NoSQL” projects is that they are taking a step back, simply reinventing a component that already exists in a relational model. While this may have some truth, if you gloss over the high-level logical data representations, this is just wrong. Sure, it may look like a simple key-value store from the outside, but there is a lot more under the hood. For many of these projects it was a design decision to focus on the implementation details where it matters, and not bother with things like parsing SQL and optimizing joins.
I think there is still some value in supporting some form of a SQL interface because this gets you instant adoption by pretty much any developer out there. Love it or hate it, people know SQL. As for joins, scaling them with distributed relational nodes has been a research topic for years, and it’s a hard problem. People have worked around this by accepting new data models and consistency levels. It all depends on what your problem requires.
I fully embrace the “NoSQL” projects out there, there is something we can all learn from them even if we don’t put them into production. We should be thrilled we have more open source tools in our database toolbox, especially non-relational ones. We are no longer required to smash every dataset “peg” into the relational “hole.” Use the best tool for the job, this may still be a relational database. Explore your options, try to learn a few things, model your data in a number of ways, and find out what is really required. When it comes time to making a decision just remember: Dear everyone who is not Facebook: You are not Facebook.