As some of you may already know, we are on the way of transforming our architecture to SOA (not the TV show). We already have a few not-that-micro services, but they still live in the same repository as sbt subprojects in order to make the transition smoother: direct function calls between services could be left as is, so we didn't have to stop doing any other work on the code while rewriting everything to HTTP API calls.
We also separating the database to a per-service level which is an interesting and challenging problem. Luckily for now, our current architecture setup makes it easier to move tables as we kept the already separated apps next to each other with mysql slave on every app server, mostly for saving network traffic bypassing https with unix sockets. I won't go into details on pros and cons on per-appserver mysql slave setup, I leave the honor to our honoured ops.
The way I first approached the problem is a more versatile solution, I'll describe it later. Péter Boros from Percona came up with a simpler idea which works on the current setup: you can just use
RENAME TABLE to move tables from one database to another atomically if the two databases are in the same MySQL server. This was moving two tables with ~3 million records in each tables:
root@db-dev (localhost) [(none)]> RENAME TABLE kinja.blog TO sso.blog, kinja.index_bloghost TO sso.index_bloghost; Query OK, 0 rows affected (0.01 sec)
It took 10 milliseconds and it moved the two tables in one transaction, so there was no problem with queries joining this two tables.
It's nice, but how do you do it on a running application without a failed request or any planned or unplanned downtime?
After the table moved, your queries to the original database will raise an exception with the message
"Table '__tablename__' doesn't exist", so you can be sure that the table was moved instantly.
Back in the time we did select queries like this (sometimes we need to query the master database to bypass replication lag:
Or inserts like this:
What I didn't like in this approach is that it violates the DRY principle, and what is equally important is that I'm lazy enough to dislike it for the first sight: you can't even use code completion in Idea for this, or at least not in one round.
So I came up with two convenience functions, one for reading and one for writing the database, and put it in the base trait that was mixed in to every repository class:
This way the code could look like this:
We can write code faster, and it really comes handy when you need to change the way you execute queries: instead of changing every function, you can just update this two convenience functions without touching anything else and do what you need to. And when you do something temporary, like in our case, you can override them in the actual classes and support database migration for example.
I created a helper trait which accepted two connection pool classes (which holds 3 DataSources: reading slave, writing slave and reading master). It tries to execute a query on the original connection pool and when it gets an exception about the table being nonexistent, it falls back to the other connection pool and execute the query there.
I just mixeded the trait into the actual classes and overrided
executeWriteOnProperConnectionPool, and did the same with read queries, and that's it. Deploying the code live didn't change anything, and after banyek moved the tables to their new place everything worked just as fine as before. Of course, there was some performance degradation due to an exception being raised with every query to the database, so mean response time went up by ~2ms. As a quick fix the two connection pools in the helper trait got switched, so everything went back to normal with a deploy containing a small changeset.
I mentioned earlier a more versatile method. Basically you do the same with having convenience functions and a helper trait, but in the trait you have unlimited possibilities. For example if you have to move the table to a database in another server you can first start to execute write (insert/update/delete) queries on both connections.
By the time you already have a migration script that you can start at this point. It goes through the table and tries to insert every record it found in the original database to the new one with
INSERT IGNORE, so records inserted by the dual-write helper won't be replaced (because it can be more actual data than that you selected half second ago in the actual chunk you're processing). Also you have to select data from the old database with
SELECT [...] LOCK IN SHARED MODE, in a transaction, then insert the rows to the new database, and finally commit the transaction, so any updates to the rows you're processing will wait until they are present in the new database. This way you can be sure that the row(s) will be updated in both databases. It's an edge case, but you can actually lose data, so it has to be dealt with.