Some of the things we learned while working on a fairly big scala codebase
For more than a year now I've been involved in large refactoring projects at Kinja, and it's been an instructive journey. I thought I could write a post about some of the code style related things I have learned from these projects so far. Especially that on my current project I work together with a slightly more junior Scala developer, and I continuously frustrate him with comments and remarks about how we do stuff in general. It is frustrating for him because instead of being able to show him some kind of a summary of Kinja Scala Conventions, a wiki or a document, I can only correct him whenever I come across an issue, and to him it feels like I was inventing these arbitrary rules on the spot. Obviously I don't ambition to write such a code style rulebook now, but rather to provoke a discussion about it with a subjective, rough, and incomplete list of thought points on code formatting, use of specific language features, and generic principles.
To me it seemed that the most problematic aspect of our code has always been modularisation. A majority of the issues were, one way or another, related to the question of modules. And this always starts with the question of what belongs together, how to group functions. We actually started to group methods around models, and that of course made sense in the repository layer: blog model - blog repository, post model - post repository etc. But applying this same logic to the service and the controller layer obviously didn't work, the boundaries became unclear very quickly, and it was kind of arbitrary where a particular function was put. For example where to put the functionality of a user following a blog: in the UserService or in the BlogService? I think all this confusion contributed a lot to all the dependency mess we have eventually gotten into, that everything depended on everything else, and that it's been such a hard work to decouple the different components from each other.
People tend to concentrate on data structures and algorithms, while in my opinion writing modular, decoupled applications is way more important from a systems point of view. Before concentrating on the algorithm, or on figuring out the right data structure, I think developers should really ask how the component to be created will nicely integrate into the system, without the introduction of any tight coupling, hidden dependencies etc. Because this is most crucial to writing scalable, extendible, and maintainable systems. It's easier to refactor algorithms and data structures, but to change the pattern of higher level building blocks is actually pretty hard. And it also affects how scalable the engineering organisation is. High level planning may help to avoid a lot of mistakes, but I think these small, individual choices of where to put a particular method, accumulate over time, and eventually determine how modular the application is, so being very thoughtful about these pays itself back.
2. Separation of data and behaviour
I have an OOP background, and it took me a long time to even understand why functional programming not embraces the idea of encapsulation. But when I found business logic in domain models, and their companion objects (and kind of everywhere for that matter), I understood that it was very difficult to draw a line between what functionality belongs in a model object, and what should rather go into a service class. It's way easier and clearer to simply say that no behaviour should be included in classes used for data. Especially when code is being shared between services, and we want to share as little as possible, only models most of the time, and we don't want any behaviour to leak.
But separating data and behaviour cannot be a generic rule, as this question is related to the expression problem too. The expression problem is about defining a datatype by cases, and then adding new cases, or new functions over the datatype, without recompiling existing code. Here's an example, which I fully borrowed from Daniel Spiewak's excellent keynote at flatMap 2013, which by the way inspired many of the things I'm writing about in this post.
In this example we have an algebraic datatype, and a function defined over it. We use pattern matching to identify all cases in the function. As the example shows, it is very easy to add new functions over this datatype without recompiling the code, but adding a new case, let's say a
Mult constructor for multiplication, isn't actually possible without modifying (and recompiling) all existing functions over the
The object oriented solution suffers from the opposite problem:
It's easy to add a new case, but to add a new function (e.g.
Expr, all cases must be modified.
So the question of separating data and behaviour somewhat depends on the actual case. If the domain model can be thought of as an algebra, then it is probably good style to separate data, but otherwise it is not necessarily the right way.
The pattern that I'm starting to use more and more, is defining data as algebraic datatypes, and using typeclasses for functionality over that data. The typeclass pattern is kind of another answer to the expression problem, although in this example below it's not immediately obvious how it helps, due to the amount of boilerplate involved:
To be fair, it is not exactly the same example, as the signatures slightly differ. But it models the same problem, and to add a new case, we just need to add a new constructor, and a new instance of the
Value typeclass in the constructor's companion object. To add a new function, we can define a new typeclass, and instances of that can easily be created, they just need to be put in implicit scope somehow (we can't modify the existing companion objects, but there are patterns for doing this in a nice way). It is also possible to write macros to automate the creation of typeclass instances. A good example of that, among many, is the Play json library.
The expression problem, and the question of extendibility is not merely an abstract and academic one, but it is closely related to decoupling. Also all this applies to library code, but e.g. at Kinja we have all our business logic in fat libraries, that are dependencies of thin applications, so these considerations are quite relevant for us.
I also have to mention here that we've been having this argument with one of the lead devs at Kinja about how patterns like e.g. typeclasses are just too complex to expect junior developers or less experienced people to understand and use them properly. In my opinion these are just syntactic patterns that may look scary at first sight, but it is just a matter of our eyes getting used to these. The underlying logic is usually quite simple and straightforward. On the other hand not even trying to address issues related to decoupling (or not using any abstractions) in order to make our code more accessible to juniors, means that we are building all the inexperience into the actual codebase, and thus preserving it. While being a junior developer is a transient thing, the junior developer's code stays with us longer.
A great thing about Scala is that modules can be values and thus be passed around. This helps a lot when it comes to dealing with dependencies between modules. Most of the time this can be done by passing constructor parameters, and one thing I would probably do differently compared to what we did last year, when we turned our backend code inside out to impose some structure on it, is that I wouldn't try to apply the same DI method to every abstraction layer of the application. Passing constructor parameters for example works perfectly well in the repository layer (when passing db connection providers to repositories etc.).
We've chosen to use the cake pattern everywhere, mainly because our dependency graph was so tangled, that we had a lot of circular dependencies, and this method supported that in a quite straightforward way. It sounds terrible, but when we started the project, circular dependencies were the least of our problems: we had no clear boundaries between layers, low level and high level code was all over the place, all our components were singleton objects, with hardcoded dependencies, tightly coupled together. Using any DI method to decouple things felt liberating, and actually the cake pattern is just a more boilerplatey implementation of the general principle that complex module dependencies can be sometimes expressed efficiently through inheritance.
But the most important aspect of dealing with dependencies, no matter what the actual pattern is, is that modules should depend on each other through abstract interfaces, in order to decouple them. I know how obvious this is, but I've seen cases when people forgot about this principle epically.
4. Composition over inheritance?
This is really about code reuse, and inheritance is not the simplest way to achieve that. So this principle may still be relevant, and actually it is just another way of saying that the simplest way of dealing with dependencies is passing constructor arguments. As I just wrote above, inheritance is really for more complex cases.
It's another aspect of modularisation (although not the most important one), and I think the question of namespaces is mainly about what do we use Java packages for, how do we use them the most beneficial way. To me it has just always seemed the most logical way to create packages per abstraction layer. We have projects organised around features or functional units, like publishing or user management, and I think these components should all use the same namespace structure, which should reflect the layering of our application: repository, service, controller etc., so that these two orthogonal aspects are both expressed in our code. Currently we include the service name too in the package hierarchy, because, due to operational constraints, code from different projects can quite often be on the same classpath, so name clashes can occur. But this namespace pattern gives rise to questions like what is better:
layer.feature? To group code first by services, and under that by abstraction layers, or vice versa? I don't know if there is any good answer to these, and even to argue about it feels a bit futile, or at least boring.
By the way, the Scala feature of writing
and having the automatic import from
com.kinja.foo._ is really awesome, and this way base classes of a project or subproject could be kept in the root of it, and be automatically imported into every file in subpackages, without the extra import statement.
6. Package objects
I'm not sure this is a good idea. I've seen code failing to type check in a package object, but succeeding otherwise, for a reason I haven't been able to figure out since. As far as I know IDEs also struggle with package objects, and it makes difficult to search for things in text editors too.
You can find the second part of this post here.