Some of us have recently attended Scala.io, a 2-day, 3-track conference in Paris, and I'd like to leave a few notes about it. The organizers did a really good job, everything went smoothly, so it was not hard to focus on the discussed topics. 2+ track events usually have the common drawback that it's hard to choose between talks, but this was not the case now. One of the rooms held exclusively French talks, which was surprising at an international developer conference, but the other two tracks gave us plenty of stuff to think about.

If I'd have to choose the single most dominant topic of the talks we've heard, I'd say it's the communication of distributed services. As Akka matures and implements more and more use cases which are often desired in distributed systems (location transparency, persistence, clustering, cluster sharding, etc.) it's starting to get recognized as the default solution for setting up distributed services in the Scala ecosystem.

Moving away from a monolithic architecture to a microservice-based one often requires tremendous effort. As method calls are replaced with API calls, Akka actor messages, or basically anything which can fail outside the running application, and has higher latency, the business logic often needs to be redesigned to keep the number of outside calls as small of possible, and a more convoluted error handling is required.

Advertisement

Depending on our needs, our interservice communication protocols need planning as well. A simple message dispatch is considered an at-most-once delivery, as the message may be lost in transit or if the target node is unable to recieve our data. Introducing the concept of message acknowledgements solves this problem, but as ACKs may be lost as well, the new problem that needs solving is message duplication.

We may of course log recieved messages and discard them if they were acted on already, but as the application may fail between bookkeeping and reacting, it appears we have two ways to go. If we take care of logging first, then what we get is called almost-exactly-but-at-most-once delivery, as there's a chance that the application fails before executing the desired function, but it won't try again once the command arrives again, because that's logged as taken care of. If we act first, the problem is reversed, and we'll get almost-exactly-but-at-least-once delivery, which is often still not what we want.

Effectively-exactly once delivery may be achieved if our business logic allows it: if it is possible to make our commands idempotent (that is: acting on them once or multiple times always yields the same result) then dispatching them with an at-least-once delivery protocol ensures that we'll get the expected results, but it might be worth to log recieved messages after we've acted on them as described in the previous paragraph, especially if they typically require heavy computation.

Sponsored

Finally, we have to consider how our message queues need to behave. Using unbounded queues is a straightforward way to introduce possible stability issues into our application, as once the queue's size surpasses the memory and/or disc constraints, our application will become unresponsive. Bounded queues, on the other hand, may lose incoming messages if they can't fit into it. Losing messages is a necessary evil at this point, but if our application can tolerate losing older unprocessed messages better then losing new ones, implementing a ring buffer is the way to go. Once the queue fills up, old messages which are still in the buffer will be overwritten by the arriving new ones.

As I've stated in the beginning, many talks were about implementing specific patterns in distributed systems. We've seen how Akka can be tailored to implement the ones I've described above, and how RabbitMQ tackles the majority of them with minimal configuration.

RabbitMQ is a multiprotocol messaging server written in Erlang, built on top of OTP, which is built specifically for facilitating the communication of distributed nodes. The system architecture consists of producers, exchanges (routers on steroids), the message queue instances, and the clients, of course. Load balancing between queues, sharding, different ACK-modes, and PubSub and topic-based dispatch are already implemented in the solution, and setting them up is just a matter of configuration, so it can easily fit numerous common use cases.

As monolithic web applications are becoming more and more a thing of the past, it was really interesting to get to know more about distributed computing in general, and as Kinja itself will be moved to a microservice architecture, we'll strive to maintain its consistency and give our users the best publishing experience.