Since I announced SlackDB a few weeks ago, I’ve had a number of questions and interesting conversations in response. I thought I would summarize the initial feedback and answer some questions to help clarify things. One of the biggest questions was “Isn’t this what Drizzle is doing?”, and the answer is no. They are both being designed for “the cloud” and speak the MySQL protocol, but they provide very different guarantees around consistency and high-availability. The simple answer is that SlackDB will provide true multi-master configurations through a deterministic and idempotent replication model (conflicts will be resolved via timestamps), where Drizzle still maintains transactions and ACID properties, which imply single master. Drizzle could add support for clustered configurations and distributed transactions (like the NDB storage engine), but writes would still happen on the majority (maintain quorum) since the concept of global state needs to be maintained.
This led Mark Callaghan to ask why not just modify Drizzle to support these behaviors? He has a good point since most of the properties I’m talking about exist at the storage engine level. There are still a number of changes that would need to happen in the kernel around catalog, database, and table creation to support the replication model. SlackDB also won’t need a number of constructs provided by the Drizzle kernel (various locks, transaction support) so query processing can be lighter-weight. So while it’s probably possible with enough patches and plugins to make this work in Drizzle, I believe it will be easier (both socially and technically) to do this from scratch. With either approach there is still a fair amount of code to be written, and I’ve decided to use Erlang since it allows programmers to express ideas concisely and more quickly with an acceptable trade-off in runtime efficiency. This would make it even more difficult to integrate with Drizzle.
A couple folks asked why I chose the BSD license instead of GPL or Apache. I didn’t want a copyleft license, so GPL was out, but after chatting some more I decided to switch SlackDB to the Apache 2.0 license for the patent protection clause. As much as I dislike patents and would prefer not to acknowledge them, I figured having the protection clauses in there would make it less likely that anyone using the software would have to deal with them once there are other contributors who may hold patents.
I presented the techniques I’m using behind SlackDB in a session at OpenSQL Camp Boston last weekend, and overall they were well received. There was a lot of great feedback and suggestions about other projects and libraries doing related things that may help speed things along. I was glad to see I wasn’t the only person thinking about these properties for relational databases, as Josh Berkus of PostgreSQL fame also led a session on ordering events and conflict resolution within relational data when you loosen up consistency.
I also attended Surge in Baltimore and listened to a talk by Justin Sheehy about “Embracing Concurrency At Scale.” You can see another recording of the same talk here. Justin explained the concepts and problems with systems trying to maintain any kind of globally consistent state quite well, and I agree with almost everything in his presentation. This recent blog post by Coda Hale also explains some of the other key principles around what you must give up in order to get the level of availability required by most systems these days. These help explain the reasons why I started SlackDB – I’m trying to combine these properties with a relational data model. Right now I’m still only able to put my limited spare time into it, but I’m hoping to find a way to put more time into the project. Hopefully you will agree we need a database like this and will help out too. 🙂