I know there’s already a lot of content out there about category theory, Monads and stuff like that, but I always felt that all this knowledge is a bit fragmented or too wordy for me: there are aspects that are not described in one or the other, so it never felt like I could put the whole puzzle together based on one “tutorial”. So I decided that the only way to put this right in my head and start to understand category theory and related notions in terms of programming (in Scala) is to write it down and fill in the holes with practical explanations. If you find something that would fit in this summary (and short enough to digest easily), please leave a reply so that I can update the post with your help. Thanks!
After reading through several articles on the topic, these answers started to hover in my monadic Nirvana...
What’s all the fuss about Monads? Where did it start?
As functional programming is getting more widespread, so do the fundamental concepts that are used heavily in FP. Such things are category theory and related notions like Functor, Monad or Applicative. These concepts help us to abstract over side effects or function composition.
Ok, then what is category theory?
Category theory is a branch of mathematics that deals with algebraic structures. A category itself is a labeled directed graph:
A category consists of objects (concepts) and arrows (morphisms). You can think of these graphs as types and functions between them (category of types). The essence of a category is composition, at the end of the day it helps you to make programs composable. If you have an arrow
f: A => B and
g: B => C you can compose them to make
h: A => C. A category should obey these laws as well:
- For each
f: A => Band
g: B => C, there exists a composite edge:
g ∘ f: A => C(g is in the same category, so a category is closed for composition)
- For each node A, there exists an identity edge (idA)
- For each three edges
f: A => B,
g: B => Cand
h: C => D, the following associativity equation holds:
h ∘ ( g ∘ f ) = ( h ∘ g ) ∘ f
- For each edge A -> B the following identity equations hold:
f ∘ idA = f, idB ∘ f = f
It seems quite simple. Where does Functor come into the picture?
A Functor is a mapping between two categories (graphs). A Functor is a homomorphism, that means structure preserving mapping. It’s structure preserving in the sense that edges are mapped to edges.
If you think of categories as types and functions between them (A as input, B as output), a Functor helps you to map types and functions to another universe. In programming, we use Functors mostly to add some context / side channel to a computation result. Such context can be an error or a missing value. There are a lot of examples for Functors, such as making IO operations, indicate errors or dealing with asynchronous computations. So Functors can help to embellish types: map types like Int or Person to a new universe (category) where we have Option[Int] and Option[Person]. Option[Person] means that it was intended to compute a Person result, but the context suggests that it may be missing. A Functor maps not just types (graph nodes), but functions (edges) as well. It also preserves the structure of a category: what’s connected in one category will be connected in the other. If you have a function of
Person => Int, there will be a function in the new universe with a signature of
Option[Person] => Option[Int].
In Scala, the Functor typeclass looks like this:
You can see that point (also called apply or unit) can be used to map objects and map can be used to map morphisms.
There are two laws that a Functor must hold:
- identity: map preserves identity, meaning
fa map (a => a) = fa
- composition: map respects composition, meaning
(fa map f) map g == fa map (g(f(_)))
You should notice that these laws are much stronger than the “structure preserving” behavior coming from homomorphisms. These laws are about values, not just types.
Why are these laws important?
There are more ways to transform one directed graph to the other, even if you respect the structure preserving behavior. But as a “user” of an Option, you’d consider trivial that if you map a function that does nothing, it doesn’t touch the inner value of the Option either. Also, you’d like to get the same result regardless you map first f and then g as if you map the composition of them:
Some(10) -> map (x * 2) -> map (y + 1) == Some(10) -> map(2x + 1)
So if something provides the mapping and guarantees these laws to hold, we call that a Functor.
You said that we can think of the category of types. Wouldn’t it mean that a Functor just points back to the original category if all the types are already there?
You can define your category as you’d like (if the laws hold). You can create a category A of two types Int and Person and category B of Option[Int] and Option[Person] nodes. In case you define A as the category of all types, then yes, F may map A to itself. That’s exactly what an endofunctor is: endofucntor is a functor that maps a category to itself.
It sounds cool! So now I have a tool that I can use to wrap a “normal” value and give context like asynchronicity. Even I can apply a function (that takes a “normal” value) to a value of an embellished type. What if I have multiple values of embellished types (i.e. in Functor context) and I’d like to apply a function that takes multiple “normal” arguments?
It’s something that a Functor can not solve for you. But a different structure, called Applicative can. Applicative is a special kind of Functor.
You can lift a function with any number of arguments by extending the Functor with one more behavior:
Then you can lift a function with e.g. 3 arguments (using currying in Scala):
Okay, then I guess there are some other rules for Applicative to preserve world equilibrium.
Here you are:
ap(fa)(point(x=>x)) == fa
ap(point(a))(point(f)) == point(f(a))
ap(point(a))(ff) == ap(ff)(point(x=>x(a)))
map(fa)(f) == ap(fa)(point(f))
I won’t go into details, but it’s worth to take some time and think about what all these mean.
Are we still far from the loved ones, the shiny and blessed Monads?
We just arrived to Monads. Previously we learned that if a function has a side effect, it may encode it in the return type. In other words, it may return a Functor object. But what if you would like to compose multiple functions that return a Functor object? How to compose
Int=>Option[Int] functions? Just by using Functors, you’d end up with an
Option[Option[Int]] after the first map call. Monad helps to propagate and combine side effects by chaining flatMap calls:
Here are the laws:
- Left identity:
flatMap(point(a))(f) == f(a)
- Right identity:
flatMap(fa)(point) == fa
fa flatMap f flatMap g == fa flatMap (x => f(x) flatMap g)
Now there’s one more concept that we introduce here: Monoid. A Monoid is just a type together with a binary operation (append) over that type, satisfying associativity and having an identity element (zero):
append(x, append(y, z)) == append(append(x, y), z)
append(a, zero) == append(zero, a) == a
Monoids can be used e.g. to implement MapReduce jobs or logging.
But how does Monoid relate to Monad at all?
If you think of Monads that compose
A => F[B] monadic functions, the Monad laws become much clearer:
def compose[A,B,C](f: A => F[B], g: B => F[C]): A => F[C]
A => F[B] functions are called Kleisli arrows. The third law is often expressed with compose:
compose(compose(f, g), h) == compose(f, compose(g, h))
Although we have only two functions above, we can think of fa as a result of the first function (f).
Furthermore, as zero was an identity element for a Monoid, point is an identity element for a Monad (identity laws):
- Left identity:
compose(unit, f) == f
- Right identity:
compose(f, unit) == f
Then the three Monad laws look the same as the identity and associative laws for Monoid, meaning that a Monad is a Monoid, right?
\o/ ! Any other aspects of Monad?
Well, as you could see, Monad is just an implementation of a set of monadic combinators (e.g. point and flatMap) satisfying the laws of identity and associativity. A chain of flatMap calls is like an imperative program that also combines the side channel. The Monad specifies what occurs at statement boundaries. For example, Option may return None and thus terminate the program:
If all load methods return an Option, it’s up to the flatMap implementation how to combine the “optionality” of loadBook and loadAuthor (the side channel). The user code only have to deal with the value that’s within the Monad/Functor.
Then when to use Applicative or when Monad?
Monad is a subtype of Applicative. This means all Monads are Applicatives but not all Applicatives are Monads. Apart from the Applicative style of data processing, Monads add another way to transform results.
Applicative functors allow you to take a “normal” function (that takes non-functorial arguments) use it to operate on several values that are in functor contexts. It is useful if you’d like to transform independent pieces of data, where the results are already there - within a Functor context (transform
(A, B) => C).
Monads are more restrictive than Applicatives. But there are quite a few use cases where you need to have a Monad and not an Applicative. One such use case is when you would like to change the sequence of an effectful computation depending on a previous outcome. Monads are about transforming dependent pieces of data. In fact, you compose functions like
A => F[B] and
B => F[C] (Kleisli arrows) that use each other’s output as an input. The client of the Monad decides how to create the Functor object (
F[B]) whereas the client of the Applicative can not influence this aspect, just passes in the
A => B.
A good example to highlight the difference is Validation. With a monadic bind, the computation would be terminated on the first error as we don’t have the result to be passed to the next flatMap call. Applicatives allow us to sequence through the computation results and accumulate all the effects.
Applicatives are useful for parallel processing while monadic workflows are sequential.
I understand that Monad is an Applicative with some extra capabilities. Then why not always use Monads instead of Applicatives?
Good question. There are cases when you’d like to work with stacked Monads, like
Option[IO[_]]. Although you can compose Functors and Applicatives infinitely deep, the composition can not be generalized for monads:
To do that, you have to write specific Monad transformers for all the types where the given Monad is the inner type:
The good news is that Monad transformers are Monads too, so transformers can wrap each other, thus multiple effects can be stacked.
Finally, here’s a cheatsheet of all that we talked about:
- Functors are something that can be mapped over. They allow us to embellish normal values, e.g. add error context. They can also transform functions taking normal parameters
- Applicatives help to combine multiple values that are in Functor context
- Monads help to combine Kleisli arrows like
A => F[B]
I still have a few holes in the puzzle that I’ll dedicate some time to (or you can leave a reply with the solution):
- Given a category A and an F mapping, are the Functor laws necessary and sufficient to guarantee that the result will be a category as well?
- Applicative.ap takes a
F[A => B]parameter... Can we say that all morphisms are concepts in the category of types?
- AFAIK in Haskell all functions are curried so they eventually take one parameter. It aligns well with the first part about categories (arrows between two nodes). In category theory, how to deal with functions that take multiple arguments? Can I say that A is the product of two or more types?
If you’d like to browse more on the topic, I suggest the great posts of Bartosz Milewski on category theory, visually appealing tutorial from adit, Haskell wiki, Graham Hutton’s Introduction to Category Theory course, scalaz overview from eed3si9n and of course the excellent book of Functional Programming in Scala.