Peter asked me to write this blog post, so here it is! To start us off I'll explain a few common concepts in Haskell and show you their Scala equivalent. We'll start out with the basics: algebraic data types. Algebraic data types, along with primitives, are how we represent data in Haskell.

data Sum = One | Two data Product = Product Int Int

This example shows the two kinds of algebraic data types, sum types and product types. A Sum type has multiple constructors—here they are One and Two—and its cardinality (number of possible values) is the sum of the cardinality of its constituents; here it is 2 (One and Two each have cardinality of 1).


A Product type takes type parameters; Product has two type parameters and each is Int. The cardinality of a product type is the product of the cardinality of its components.

These can be combined as in the following example:

data ProductSum = Product Int Int | One | Two

Scala does not have algebraic data types but we can model them. A case class without methods can be a Product type:

case class Product(a: Int, b: Int)

and Sum types can be modeled using sealed traits and inheritance.

sealed trait Sum object One extends Sum object Two extends Sum

Algebraic data types don't have any concept of subtyping, so how do we write generic code? The answer lies in a property of many programming languages, including Haskell: parametric polymorphism. Any place a type could appear in Haskell, one can substitute a type variable.

-- ADT data Wrapper a = Wrapper a -- Function id a = a

And in Scala:

// ADT case class Wrapper[A](a: A) // Function def id[A](a: A): A = a

This, in itself is not very useful. It requires us to write our function without any knowledge of the type we are manipulating. This is where class constraints come into effect. A class constraint allows a programmer to limit the types that can be used to fulfill a type variable.

add :: Num a => a -> a -> a add a b = a + b

There are a few things here that we haven't seen before. What exactly is going on in that first line? To start off our explanation, I'll tell you that this first line is what is called a type signature. Unlike in Scala, where our types are intermixed with our function definition, in Haskell the type signature sits on its own line. So the type signature of this Scala function: def func(a: Int, b: String): Float looks like this: func :: Int -> String -> Float.


That still doesn't explain this part: Num a =>. This means that all a in the type signature must be an instance of the type class Num. A type class just tells us something about the types that inhabit it, information we can use in our function. The Num type class looks like this:

class Num a where (+), (-), (*) :: a -> a -> a negate :: a -> a fromInteger :: Integer -> a x - y = x + negate y negate x = 0 - x

If our class inhabits the Num type class then we know that there are some basic functions we can use on it. Here those functions are +, -, *, and negate. Num also requires that the type provide a function to convert integers to that type. Num also provides default implementations for the - and negate functions, written in terms of each other.

How does a type become a Num? We (or somebody else) have to provide an instance for it. Below is the instance of Num for Integer:

instance Num Integer where (+) = plusInteger (-) = minusInteger (*) = timesInteger negate = negateInteger fromInteger x = x

And finally, here is everything we have just seen but in Scala:

def add[A: Num](a: A, b: A): A = implicitly[Num[A]].+(a, b) trait Num[A] { def +(a: A, b: A): A def -(a: A, b: A): A def *(a: A, b: A): A def negate(a: A): A def fromInteger(i: Int): A } implicit object NumInt extends Num[Int] { def +(a: Int, b: Int): Int = a + b def -(a: Int, b: Int): Int = a - b def *(a: Int, b: Int): Int = a * b def negate(a: Int): Int = -a def fromInteger(i: Int): Int = i }

You might have noticed that a type class is a kind of interface. Where one might use an interface in Java or a trait in Scala, in Haskell they would use a type class. As shown above, type classes in Scala are even implemented with traits.


If we have to use traits in order to implement a type class, why don't we just extend the trait? Why treat it as a type class? Type classes have a number of interesting properties. Unlike a normal trait, a typeclass instance can be declared by anybody, not just the author of the type: a big win for interoperability of libraries. Type classes also present a solution to the expression problem.

Haskell takes a somewhat different approach to interfaces than many other languages. Haskell has more concrete interfaces such as ListLike (similar to Array in Java or List in Scala), but it also contains much more abstract interfaces, inspired by Category Theory.

For instance, Haskell has the type class "things that can be appended" containing such diverse members as List and Int. This type class is called Monoid. Why "Monoid" and not "Appendable"? The term "Monoid" is borrowed from category theory, where it describes any structure with one associative binary operation and an identity, which is the definition it has in Haskell.


One of these type classes, which we'll be working with, is Arrow. An Arrow is anything which can be modeled as an arrow in category theory. Functions are arrows in the category of sets. It might help to think of Arrows as abstract representations of computation, one level higher than functions.

class Category a => Arrow a where -- | Lift a function to an arrow. arr :: (b -> c) -> a b c -- | Send the first component of the input through the argument -- arrow, and copy the rest unchanged to the output. first :: a b c -> a (b,d) (c,d) -- | A mirror image of 'first'. -- -- The default definition may be overridden with a more efficient -- version if desired. second :: a b c -> a (d,b) (d,c) second f = arr swap >>> first f >>> arr swap where swap :: (x,y) -> (y,x) swap ~(x,y) = (y,x) -- | Split the input between the two argument arrows and combine -- their output. Note that this is in general not a functor. -- -- The default definition may be overridden with a more efficient -- version if desired. (***) :: a b c -> a b' c' -> a (b,b') (c,c') f *** g = first f >>> second g -- | Fanout: send the input to both argument arrows and combine -- their output. -- -- The default definition may be overridden with a more efficient -- version if desired. (&&&) :: a b c -> a b c' -> a b (c,c') f &&& g = arr (\b -> (b,b)) >>> f *** g -- | Right-to-left composition (<<<) :: Category cat => cat b c -> cat a b -> cat a c (<<<) = (.) -- | Left-to-right composition (>>>) :: Category cat => cat a b -> cat b c -> cat a c f >>> g = g . f

Arrows can be composed "lengthwise" using >>> and "sideways" using *** and &&&. >>> works much the same way that . (Haskell) or andThen (Scala) do for functions, except that it operates on Arrows instead. *** takes a tuple, splits it, and applies the composed Arrows to their respective halfs. &&& Takes one input, duplicates it into each of the composed Arrows, then outputs a tuple of the results.


Side Note: The functionality of Arrows is actually spread out among many type classes and includes more functions than the ones shown here. For simplicity's sake I've collapsed the most used function on Arrows into the one type class show here. We'll see one more basic arrow function called app later, which is part of the ArrowApply type class.

Arrows are a powerful and versatile abstraction used across many Haskell libraries. If you'd like to explore arrows in Scala, they are available from the Scalaz library.

We'll now apply the knowledge we've learned to a practical problem, using the Haskell XML Toolbox. At Kinja we often half to modify Html that our users submit to us. We modify html once on storage and once on display; we call these latter modifications "decorators". We'll be writing one such decorator for inserting images into post bodies.

To begin with we'll define our Image type:

data Image = Image { uid :: String , width :: Int , height :: Int , format :: String }

This looks a little weird but it is just a shorthand way of writing

data Image = Image String Int Int String uid (Image u _ _ _) = u width (Image _ w _ _) = w height (Image _ _ h _) = h format (Image _ _ _ f) = f

Normally we would get our list of post images and our post body from the post object which ultimately originates from the database but for this post we'll just define these statically.

postImages = [ Image "1a2b" 400 300 "jpg" , Image "3c4d" 800 800 "gif" , Image "5e6f" 640 360 "jpg" , Image "7g8h" 200 200 "bmp" ] postBody = "<p><image uid="3c4d" /></p>\ \<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse \ \consequat leo ac ipsum sollicitudin, et fermentum metus vulputate. Curabitur \ \volutpat erat a orci mattis, vitae ultricies nisl commodo.</p>\ \<p><image uid="1a2b" /></p>\ \<p>Donec ullamcorper condimentum ultricies. Sed at nisl facilisis, adipiscing \ \lectus sed, pellentesque odio. Fusce ut augue nec odio faucibus consequat eget \ \nec ipsum.</p>\ \<p><image uid="5e6f" /></p>\ \<p>Proin ultricies, mauris id vulputate luctus, justo erat fringilla felis, ac \ \dignissim tortor purus et purus. Vivamus et augue nibh. In auctor tincidunt \ \purus, sed bibendum augue ultricies ac. </p>\ \<p><image uid="7g8h" /></p>\ \<p>Donec ullamcorper condimentum ultricies. Sed at nisl facilisis, adipiscing \ \lectus sed, pellentesque odio. Fusce ut augue nec odio faucibus consequat eget \ \nec ipsum.</p>"

Our goal is to turn all of these "image" tags into proper "img" tags based on the corresponding image in postImages. In Kinja our image urls are a bit more complicated (and based on information outside of the image object), but for this post we'll keep that logic simple.

imageUrl :: Image -> String imageUrl image = "" ++ (uid image) ++ "." ++ (format image)

With that out of the way, we can dig into the HXT. HXT contains arrows both for selecting and modifying XML data. We'll be operating on the ArrowXml type class which extends Arrow, adding XML specific functionality. We'll start off with our modification arrows.


For our <image> tags to become valid <img> tags the first thing we need to do is rename them! We can do that using setElemName, which is a function setElemName :: ArrowXml a => QName -> a XmlTree XmlTree (It takes a QName and gives us back an arrow that sets an elements name to that QName). We can use it like so:

mkImageTag :: ArrowXml a => a XmlTree XmlTree mkImageTag = setElemName (mkName "img")

Next we'll need the appropriate attributes; an <img> tag isn't very useful without a src attribute! We'll use addAttr and and removeAttr to modify the elements attributes.

mkImageTag' :: ArrowXml a => Image -> a XmlTree XmlTree mkImageTag' img = addAttr "src" (imageUrl (uid img) (format img)) mkImageTag'' :: ArrowXml a => Image -> a XmlTree XmlTree mkImageTag'' img = addAttr "width" (show $ width img) mkImageTag''' :: ArrowXml a => Image -> a XmlTree XmlTree mkImageTag''' img = addAttr "height" (show $ height img) mkImageTag'''' :: ArrowXml a => a XmlTree XmlTree mkImageTag'''' = removeAttr "uid"

Since we'll always be using these together, let's make them into one mkImageTag function by composing our arrows with >>>. This works just like . (in Haskell) or andThen (in Scala) does for functions (and in fact that is the definition of >>> for the function Arrow instance).

mkImageTag :: ArrowXml a => Image -> a XmlTree XmlTree mkImageTag img = setElemName (mkName "img") >>> addAttr "src" (imageUrl (uid img) (format img)) >>> addAttr "width" (show $ width img) >>> addAttr "height" (show $height img) >>> removeAttr "uid"

Now we need to use XML selection to find the image we'll need to pass to mkImageTag. Attribute values can easily be gotten witht the getAttrValue which has a type of ArrowXml a => String -> a XmlTree XmlTree which should look familiar to you by now. getAttrValue "uid" will easily get our uid for us.


We can compose getAttrValue and mkImageTag with a little help from arr which will lift mkImageTag into an arrow `ArrowXml a => a String (a XmlTree XmlTree).

decorateImage = getAttrValue "uid" >>> arr (\uid -> maybe this imageTag (selectImage uid postImages) where selectImage :: String -> [Image] -> Maybe Image selectImage uid images = find (\(Image u _ _ _) -> u == uid) images

Here we've passed to arr an anonymous function that trys to find the correct image and modifies the element if it does. maybe is a simple function b -> (a -> b) -> Maybe a -> b / def maybe[A, B](b: B, f: A => B, a: Option[A]): B which runs a function on the contents of a Maybe value or returns the default. Here we've specified the default to be this which is the identity arrow of ArrowXml.


Things are beginning to look up! This function looks a lot like what we want, but we're not quire there yet. If you can't spot the problem, the type signature might help: ArrowXml a => a XmlTree (a XmlTree XmlTree). Our arrow doesn't return what we want! In fact, our arrow returns a different arrow which is the arrow we want. However will we use our gorgeous ArrowXml a => a XmlTree XmlTree arrow when it is trapped in the return type of this other arrow?

The answer to our dilemna lies in the app arrow which (for our purposes) has the type ArrowXml a => a (a XmlTree XmlTree, XmlTree) XmlTree. It takes a tupled input and applies the arrow in the left half to the right half! In order to use app we'll need to split our input XmlTree which we can do with &&&

decorateImage = getAttrValue "uid" &&& this >>> ...

We can no longer compose our first arrow with our second arrow, since their types don't line up: ArrowXml a => a XmlTree (String, XmlTree) vs ArrowXml a => a String (a XmlTree XmlTree). We'll have to modify our second arrow to line up with the first one.

decorateImage = getAttrValue "uid" &&& this >>> (first (arr (\uid -> maybe this imageTag (selectImage uid postImages))) ... where selectImage :: String -> [Image] -> Maybe Image selectImage uid images = find (\(Image u _ _ _) -> u == uid) images

And finally we can compose our second arrow with app, to give us the overall arrow that we want.

decorateImage :: ArrowXml a => a XmlTree XmlTree decorateImage = getAttrValue "uid" &&& this >>> (first (arr (\uid -> maybe this imageTag (selectImage uid postImages))) >>> app where selectImage :: String -> [Image] -> Maybe Image selectImage uid images = find (\(Image u _ _ _) -> u == uid) images

We now have an arrow that turns our <image> tags into <img> tags, we just have to apply it to our post body! Before we do that, we'll have to make a few preparations. First, we only want our decorator to work on image tags and nothing else. Fortunately, HXT provides an arrow ifA which operates on arrows and does what you would expect.

ifA (hasName "image") decorateImage this

Next we need to modify our arrow to apply recursively to the whole post body, instead of individual elements. This is again made easy by HXT with processTopDown

processTopDown (ifA (hasName "image") decorateImage this)

Now we can put this all to use.

main = do -- Turns our string into an arrowified XmlTree. let xml = readString [withParseHTML yes, withWarnings no] postBody -- Run our decorators then turn the result into a string. parsedBody <- xshow (runX (xml >>> (processTopDown (ifA (hasName "image") decorateImage this)))) -- Print the result out to the console. mapM_ putStrLn parsedBody

We have successfully written our decorator! Although we've only written one decorator, this technique is easily expandable to any arbitary number of decorators: try implementing some of your own.

main = do let xml = readString [withParseHTML yes, withWarnings no] postBody parsedBody <- xshow (runX (xml >>> (processTopDown ((ifA (hasName "image") decorateImage this) >>> ((ifA (hasName "video") decorateVideo this)) >>> ((ifA (hasAttrValue "awesome" (=="true")) awesomeDecorator this))))) mapM_ putStrLn parsedBody

You might notice that we never specified what a was at any point in our decorator. This is a powerful idea; it means that we could substitute any implementation of ArrowXml into our codebase and everything would work just the same. The patterns we used (although not some of the specific functions) are also portable across different libraries; we could define new functions that work not just for HXT but also for request handling, file manipulation, or anything else that we can model with arrows!