Back in January when I was between jobs I had a free weekend to do some fun hacking. I decided to start a new open source project that had been brewing in the back of my head and since then have been poking at it on the weekends and an occasional late night. I decided to call it Scale Stack because it aims to provide a scalable network service stack. This may sound a bit generic and boring, but let me show a graph of a database proxy module I slapped together in the past couple days:
Database Proxy Graph
I setup MySQL 5.5.2-m2 and ran the sysbench read-only tests against it with 1-8192 threads. I then started up the database proxy module built on Scale Stack so sysbench would route through that, and you can see the concurrency improved quite a bit at higher thread counts. The database module doesn’t do much, it simply does connection concentration, mapping M to N connections, where N is a fixed parameter given at startup.
In this case I always mapped all incoming sysbench connections down to 128 connections between Scale Stack and MySQL. It also uses a fixed number of threads and is entirely non-blocking. As you can see the max throughput around 64 threads is a bit lower, but I’ve not done much to optimize this yet (there should be some easy improvements where I simply stuck in a mutex instead of doing a lockless queue). It’s only a simple proof-of-concept module to see how well this would work, but it’s a start to a potentially useful module built on the other Scale Stack components. One other thing to mention is that these tests were run on a single 16-core Intel machine. I’d really like to test this with multiple machines at some point.
So, what is Scale Stack?
Check out the website for a simple overview of what it is. The goal is to pick up where the operating system kernel leaves off with the network stack. It is written in C++ and is extremely modular with only the module loader, option parsing, and basic log in the kernel library. It uses Monty Taylor’s pandora-build autoconf files to provide a sane modular build system, along with some modifications I made so dependency tracking is done between modules. You can actually use it to write modules that would do anything, I’m just most interested in network service based modules.
The kernel/module loader is also just a library, so you can actually embed this into existing applications as well. Some of the modules I’ve written for it are a threaded event handling module based on libevent/pthreads and a TCP socket module. There is also an echo server and simple proxy module I created while testing the event and socket modules. The database proxy module builds on top of the event and socket module. The code is under the BSD license and is up on Launchpad, so feel free to check it out and contribute. If you need a base to build high-performance network services on, you should definitely take a look and talk with me.
What’s up next?
I have a long list of things I would like to do with this, but first up are still some basics. This includes other socket type modules like TLS/SSL, UDP, and Unix sockets. Then are some more protocol modules such as Drizzle, a real MySQL protocol module, and others like HTTP, Gearman, and memcached. It’s fairly trivial to write these since the socket modules handle all buffering and provide a simple API. As for the DatabaseProxy module, I’d like to rework how things are now so it’s not MySQL protocol specific, integrate other protocol modules, improve performance, add in multi-tenancy support for quality-of-service queuing based on account rules, and a laundry list of other features I won’t bore you with right now.
I also have plans for other services besides a database proxy, especially one that could combine a number of protocols into a generic URI server with pluggable handlers so you can do some interesting translations between modules (like Apache httpd but not http-centric). For example, think of the crazy things you can do with Twisted for Python, but now with a fast, threaded C++ kernel. I also still need to experiment with live reloading of modules, but I’m not sure if this will be worthwhile yet.
If any of this sounds interesting, get in touch, I’d love to have some help! I’ll have some blog posts later on how to get started writing modules, but for now just take a look at the existing modules. The EchoServer is a good place to start since it is pretty simple. Also, if you’ll be at the MySQL Conference and Expo next week, I’d be happy to talk more about it then.
Since I announced SlackDB a few weeks ago, I’ve had a number of questions and interesting conversations in response. I thought I would summarize the initial feedback and answer some questions to help clarify things. One of the biggest questions was “Isn’t this what Drizzle is doing?”, and the answer is no. They are both being designed for “the cloud” and speak the MySQL protocol, but they provide very different guarantees around consistency and high-availability. The simple answer is that SlackDB will provide true multi-master configurations through a deterministic and idempotent replication model (conflicts will be resolved via timestamps), where Drizzle still maintains transactions and ACID properties, which imply single master. Drizzle could add support for clustered configurations and distributed transactions (like the NDB storage engine), but writes would still happen on the majority (maintain quorum) since the concept of global state needs to be maintained.
This led Mark Callaghan to ask why not just modify Drizzle to support these behaviors? He has a good point since most of the properties I’m talking about exist at the storage engine level. There are still a number of changes that would need to happen in the kernel around catalog, database, and table creation to support the replication model. SlackDB also won’t need a number of constructs provided by the Drizzle kernel (various locks, transaction support) so query processing can be lighter-weight. So while it’s probably possible with enough patches and plugins to make this work in Drizzle, I believe it will be easier (both socially and technically) to do this from scratch. With either approach there is still a fair amount of code to be written, and I’ve decided to use Erlang since it allows programmers to express ideas concisely and more quickly with an acceptable trade-off in runtime efficiency. This would make it even more difficult to integrate with Drizzle.
A couple folks asked why I chose the BSD license instead of GPL or Apache. I didn’t want a copyleft license, so GPL was out, but after chatting some more I decided to switch SlackDB to the Apache 2.0 license for the patent protection clause. As much as I dislike patents and would prefer not to acknowledge them, I figured having the protection clauses in there would make it less likely that anyone using the software would have to deal with them once there are other contributors who may hold patents.
I presented the techniques I’m using behind SlackDB in a session at OpenSQL Camp Boston last weekend, and overall they were well received. There was a lot of great feedback and suggestions about other projects and libraries doing related things that may help speed things along. I was glad to see I wasn’t the only person thinking about these properties for relational databases, as Josh Berkus of PostgreSQL fame also led a session on ordering events and conflict resolution within relational data when you loosen up consistency.
I also attended Surge in Baltimore and listened to a talk by Justin Sheehy about “Embracing Concurrency At Scale.” You can see another recording of the same talk here. Justin explained the concepts and problems with systems trying to maintain any kind of globally consistent state quite well, and I agree with almost everything in his presentation. This recent blog post by Coda Hale also explains some of the other key principles around what you must give up in order to get the level of availability required by most systems these days. These help explain the reasons why I started SlackDB – I’m trying to combine these properties with a relational data model. Right now I’m still only able to put my limited spare time into it, but I’m hoping to find a way to put more time into the project. Hopefully you will agree we need a database like this and will help out too. 🙂
Walmart and Microsoft announced a five-year technology partnership to fight against Amazon’s influence.
At Microsoft’s Inspire conference in Las Vegas July 14-18, Walmart revealed that the two companies have sealed a strategic partnership. Based on the Cloud computing services offered by Microsoft, the agreement would allow its partner to benefit from the Azure application platform and Microsoft 365. Companies will also work on the development of new projects focused on artificial intelligence and service improvement using machine learning technology. For example, artificial intelligence solutions could help Walmart personalize its marketing campaigns according to Internet users.
Walmart is Amazon’s biggest retail competitor, while Microsoft is Amazon’s biggest rival when it comes to cloud services.
It was therefore logical that the two companies finally joined forces to try to limit the power of their common enemy.
Nevertheless, it is important to remember that the partnership comes at a time when Microsoft has expressed interest in designing a technology rivalling the one operated by physical Amazon Go stores. Announced in January 2018, Amazon’s first store successfully replaced staff with ambitious technologies based on artificial intelligence. Similarly, customers no longer have to wait in line because the cash registers no longer exist: now you pay with your Amazon account and mobile phone.
Microsoft, for its part, would be looking at similar technology that could operate cameras attached to shopping carts in stores. To date, the firm has already hired a computer vision specialist from Amazon. Moreover, Bill Gates’ company would be in talks with Walmart, which would certainly find its interest in this project since it would enable it to catch up with Bezos’ company.
However, this assumption was not made when the partnership was officially announced.