What would happen if I opened someone's Kinja in my browser, and started to click the follow / unfollow button in rapid succession? What if I opened the same page in another tab, and somehow did the same thing there too? Can I actually expect the displayed follower count or follower list to show the same values in both tabs?
What we are trying to do here is syncing states over the network, which is unreliable: messages can get lost, they can arrive out of order, they can even multiply, etc. For example a preceding follow request could get routed to a slow node in the cluster, while the subsequent unfollow to a fast one. If we stored followers as a plain set, this would cause the follower count to be inconsistent.
There is a "trick" though, to make distributed state converge, and achieve (eventual) consistency without consensus. As Peter Bourgon from SoundCloud puts it in a talk linked below, we want a system, which, if we put all operations on its state in a box, we shake the box, then we open it and throw those operations on the system, still gets to the same new state eventually. The trick is how we store state, what data type we use for it. To be a bit more formal, the mentioned operations need to be
S ∘ (a ∘ b) = (S ∘ a) ∘ b
a ∘ b = b ∘ a
a ∘ a = a
The problem with a plain set is that its addition and removal operations don't commute. Or if we were really only interested in the follower count and therefore we only used a counter, the problem with addition and subtraction of numbers is that - while they commute - these are not idempotent. On the other hand set addition (set union, to be precise) in itself obeys all three rules:
A ∪ (B ∪ C) = (A ∪ B) ∪ C
A ∪ B = B ∪ A
A ∪ A = A
Similarly it is possible to create a counter, that can only grow (so it has no subtraction operation), and uses the idempotent max operation to merge states.
Unfortunately though this is not good enough to represent followers, unless users cannot unfollow someone, which is not the case in Kinja. But it's a good starting point. We could store followers in a data type consisting of two such grow-only sets: one for additions and one for removals (tombstones). The addition operation would add an item to the first, and the removal would add an item to the second. The actual follower list is the difference of these two sets. Similarly a counter can be created by using two previously described grow-only counters, and the actual value of the counter can be computed as the difference of these two. Both these data types obey the above rules.
While the counter would be a perfect solution for our follower count problem, the set combining two grow-only sets falls short, as an item can only be removed from it once (once an item is added to the removal set, it stays there). This can be remedied by storing globally unique transaction identifiers (e.g. UUIDs) together with the items added or removed.
I need to note here, because we use master/slave replication at Kinja, that storing followers in such a way means that there is no need for a master database, it's enough that each process has its own local replica of the distributed state. Only a communication channel is needed, that somehow broadcasts operations or local states to all other processes (sites) of the distributed system.
The above described data types are called CRDTs, Conflict-free Replicated Data Types. There are two flavours: state based (called Convergent Replicated Data Type, or CvRDT), and operation based (called Commutative Replicated Data Type, or CmRDT). The previous examples were all state based: the grow-only counter is called a G-Counter, the grow-only set is a G-Set, and the combined ones are called PN-Counter and 2P-Set. The set with the unique transaction ids is an OR-Set (observed-remove set). State based means that all of these have a merge method, that merges in another instance of the same data type, and that is how they converge. But other data types, like registers and graphs can also be built in this manner. Great examples for graphs are the collaborative text editing examples mentioned below.
CRDTs are proved to converge, and to achieve eventual consistency - this is one great thing about them. If you can describe your problem in terms of them (not all concurrency problems can be, and there are trade-offs, like e.g. space, to consider too), you have eventual consistency in your distributed system. There's a well-known paper by Shapiro, Preguica, Baquero and Zawirski, which formalizes the concept, provides proofs, and describes some CRDTs. It also demonstrates how CvRDTs can be emulated by CmRDTs and vice-versa, so the choice in the case of a particular problem really only depends on various practical considerations, but it's not like there is a class of problems which can only be solved by one, and another class that can only be solved by the other.
I heard about these data types a couple of weeks ago at the Scala eXchange conference in London, where David Brooks gave an excellent talk on how they use CRDTs at Whisk - an app turning online recipes into shopping lists on a phone - to synchronize shopping lists between mobile devices. It wasn't the first time though Scala eXchange had presentations on this topic: last year Noel Welsh from Underscore gave a similar introduction, which was followed by Richard Dallaway's talk, that introduced WOOT, a collaborative text editing algorithm, which is also based on CRDTs. Collaborative text editing is one of the popular examples used to introduce these data types (see for example the paper on Treedoc), as is the shopping cart example a recurring theme.
Although I've come across several library attempts to implement CRDTs in Scala, like Joseph Moniz's Convergence, or the work in progress implementation in Akka, these data types are so simple, that anybody can implement them in minutes. There are also databases supporting CRDTs, like Riak.
If you have time for only one presentation or paper or article on this topic, I recommend Peter Bourgon's talk at Strange Loop this year, who not only explains the concept, but also describes how SoundCloud uses CRDTs in production, leveraging Redis' sorted sets, and how they have built their own Redis cluster: