• Thoughts

    Are open source licenses as important as before?

    MongoDB’s efforts to obtain approval from the Open Source Initiative for a more business-friendly license, SSPL, have failed. The company has therefore chosen to do without it, and this could well be a turning point in the history of open source.

    What is happening right now is interesting. Never has open source been so ubiquitous in software, and yet it has never been as fluid as it seems now. Faced with cloud giants like Amazon Web Services, virtually capable of crushing them outright, companies managing open source projects, like MongoDB and Elasticsearch, have sought ways to defend themselves while encouraging companies to pay.

    The problem of open source licensing

    Judging by their financial results, the AWS threat was slightly overestimated. But it is understandable that MongoDB and others are looking for ways to protect their investments. Eliot Horowitz, technical director of MongoDB, recently said that his company had spent more than $300 million to develop its database, which is then made available free of charge to everyone, in open source. But the fact that AWS or another cloud service provider can grab this code without giving anything in return is a real problem.

    Hence the use of the SSPL license, which essentially says: “If you make MongoDB available as a service, you must contribute to the code of that service.” It may go a little far, but it is understandable why MongoDB chose this system. It is also not difficult to understand why the publisher has just decided to give up the blessing of the Open Source Initiative on the SSPL.

    MongoDB changes strategy

    The outcry against the SSPL by some members of the open source community was loud and sustained. Despite MongoDB’s reasonable efforts to amend the SSPL to address the objections, the company finally decided to throw in the towel, as Eliot Horowitz explained: “We continue to believe that the SSPL is consistent with the open source definition and the four essential software freedoms. However, given its reception by the entire community, the consensus necessary to support OSI approval does not seem to exist at this time. Therefore, we now remove the SSPL from the consideration of the OSI Board of Directors.”

    The CEO of MongoDB detailed what he intends to do to refine the license and work with other industry players to try to find a way to defend against the impending threat of the cloud. In the meantime, MongoDB will continue to offer its community edition under SSPL as if it were open source by allowing users to “examine, modify and distribute the software or redistribute modifications made to the software following the license.” It is not open source in itself, but it allows most users to have freedom similar to that provided by open source. And that’s when it gets interesting.

  • Thoughts

    Thoughts on “NoSQL”

    I’ve decided to jump on the bandwagon and spill my thoughts on “NoSQL” since it’s been such a hot topic lately ([1], [2], [3], [4]). Since I work on the Drizzle project some folks would probably think I take the SQL side of the “debate,” but actually I’m pretty objective about the topic and find value in projects on both sides. Let me explain.

    Last November at OpenSQL Camp I assembled a panel to debate “SQL vs NoSQL.” We had folks representing a variety of projects, including Cassandra, CouchDB, Drizzle, MariaDB, MongoDB, MySQL, and PostgreSQL.

    Even though I realized this was a poor name for such a panel, I went with it anyways because this “debate” was really starting to heat up. The conclusion I was hoping for is that the two are not at odds because the two categories of projects can peacefully co-exist in the same toolbox for data management. Beyond the panel name, even the term “NoSQL” is a bit misleading. I talked with Eric Evans (one of my new co-workers over on the Cassandra team) who reintroduced the term, and even he admits it is vague and doesn’t do the projects categorized by it any favors. What happens when Cassandra has a SQL interface stacked on top of it? Yeah.

    One reason for all this confusion is that for some people, the term “database” equates to “relational database.” This makes the non-relational projects look foreign because they don’t fit the database model that became “traditional” due it’s popularity. Anyone who has ever read up on other database models would quickly realize relational is just one of many models, and many of the “NoSQL” projects fit quite nicely into one of these categories.

    The real value these new projects are providing are in their implementation details, especially with dynamic scale-out (adding new nodes to live systems) and synchronization mechanisms (eventual consistency or tunable quorum). There are a lot of great ideas in these projects, and people on the “SQL” side should really take the time to study them – there are some tricks to learn.

    Square Peg, Round Hole

    One of the main criticisms of the “NoSQL” projects is that they are taking a step back, simply reinventing a component that already exists in a relational model. While this may have some truth, if you gloss over the high-level logical data representations, this is just wrong. Sure, it may look like a simple key-value store from the outside, but there is a lot more under the hood. For many of these projects it was a design decision to focus on the implementation details where it matters, and not bother with things like parsing SQL and optimizing joins.

    I think there is still some value in supporting some form of a SQL interface because this gets you instant adoption by pretty much any developer out there. Love it or hate it, people know SQL. As for joins, scaling them with distributed relational nodes has been a research topic for years, and it’s a hard problem. People have worked around this by accepting new data models and consistency levels. It all depends on what your problem requires.

    I fully embrace the “NoSQL” projects out there, there is something we can all learn from them even if we don’t put them into production. We should be thrilled we have more open source tools in our database toolbox, especially non-relational ones. We are no longer required to smash every dataset “peg” into the relational “hole.” Use the best tool for the job, this may still be a relational database. Explore your options, try to learn a few things, model your data in a number of ways, and find out what is really required. When it comes time to making a decision just remember: Dear everyone who is not Facebook: You are not Facebook.