Here at Kinja Tech, we've made a new year's resolution to repartition our system into smaller, loosely-coupled services. These services may coexist in the same process, but they should also be independently deployable. We're using the Typesafe stack, so it seems like a no-brainer to use akka for communication — an in-process component can be made remote with a simple configuration change. There's a catch: messages must have a "wire" format. No problem. Messages can be case classes and Scala's case classes are java.io.Serializable by default. But there is a problem, and it isn't that "Java serialization is slow." Java serialization is fragile. Here's a simple example:

Our "update" service has been modified to accept a configurable number of replicas for write. To allow this, a parameter was added to the message. This isn't a problem if the actors are compiled together and running in the same process. If they aren't, and we are relying on Java's serialization for wire transport, there's a big problem — their serializations are completely incompatible. All components that send this message need to be recompiled and rolled-out with the update service — very carefully.

Of course, we might agree not to modify a class once it has been used as a message. A class, "DurablyFlushUpdatesWithReplicaCount", could be a solution. It may even be the correct solution in some cases, but it is easy to imagine an explosion of classes as more complex messages evolve.

There aren't any earth-shattering revelations here. This is a well-known problem with some good solutions. We already use one of them, and it is used internally by Akka as well: Protocol Buffers.

You can even update your data structure without breaking deployed programs that are compiled against the "old" format.

Akka supports protocol buffer messages out of the box. So we'll just use those directly? Possible? Yes. Ideal? No. For every simple message, like DurablyFlushUpdates, we'd need to create a protocol buffer definition and have it compiled to a Java class. Because our language of choice is Scala, we really want our endpoints to handle messages built from Options, Scala collections, etc. and not generated Java "bean" methods.

Akka Remoting: Creating a Custom SerializerS

Let's use a protocol buffer as an "envelope" that contains a sequence of name-value pairs. Values can be fundamental types like String or Int, but they may also be protocol buffers, nested messages, or sequences.

All messages implement a trait, KinjaMessage. A custom akka serializer is configured to handle them.

It is important that the trait extend java.io.Serializable. Otherwise, the serialization for case classes is ambiguous and akka will choose Java serialization. With the help of a few base classes, we'll define the evolution of our "flush" message like this:

The "materializer" object is a convention. When receiving a message, the akka framework supplies the Class to which the message belongs. An instance of it has to be created and "materialized" from binary somehow. This is fairly simple to do when dealing with a Java class: a known static method can be acquired using reflection. In fact, this is exactly how akka's protocol buffer serializer works. It isn't so simple with Scala because there are no "class" methods. An object definition translates to a lazily-created singleton instance of a class. Our custom serializer will use reflection to obtain the instance of a special "materializer" object that implements the trait, KinjaMessageMaterializer. Once obtained, a reference to this instance may be cached to avoid using reflection for every message.

Will it work? Let's imagine that we will deploy to a bunch of servers — both the update service and a few clients that use it. Will there be any serialization problems during the transition? If a newly deployed instance of the update service receives an "old" message, it will use the new default for replicaCount. If an instance of the previous version receives a "new" message, the replicaCount will be silently ignored and won't raise an error. Of course, it is up to the developer to make sure serialization evolution is backward compatible.

Here's a slightly more complicated message that contains nested protocol buffers.

Admittedly, this system introduces some extra work when creating and maintaining messages. Unlike Java serialization, it isn't magic. (Just add a "marker" interface!). Protocol buffers don't require any code to be hand-written, but they do require a definition to be maintained and compiled. I think this serialization scheme is a decent compromise; it sacrifices some performance and a little developer time for a lot of flexibility and compatibility.