# Algebraic patterns — Monoid

Posted on July 21, 2016

## Definition

The Monoid pattern is simply the combination of the two patterns Identity Element and Semigroup. A monoid therefore is a datatype with composition ⊗ and element e, satisfying

x ⊗ e = x
e ⊗ x = x
x ⊗ (y ⊗ z) = (x ⊗ y) ⊗ z


Some understanding of these patterns is assumed in this article. If you feel like you need intuition on what these equations mean, read the entries on these individual patterns before continuing.

## Intuition

The monoid pattern models the many structures that are semigroups and also have identity elements. In such situations, it is often convenient to consider these patterns in concert in order to derive elegant models and laws.

For instance the semigroup of lists have the empty list [] as identity. Treating lists as a semigroup only often result in less elegant laws where the empty list has to be treated as a separate case.

In this article we’ll give some examples of monoids and develop some models suitable for problem solving in Map-Reduce style programming models.

### Notation

To refer to a particular monoid we take the triple of its type, composition and identity. For instance (Number, +, 0) is the monoid of numbers with addition.

### Folds

For any monoid we can define a function called fold. It takes a list of elements of that monoid to their “product”. For the monoid (Number, +, 0), we define fold (by example) as

fold([]) = 0
fold() = 1
fold([5, 6, 3, 1])
= 5 + 6 + 3 + 1
= 15


The fold function simply inserts the monoid composition (in our case +) between each element. For the empty list it returns the identity element (0). The fold for the monoid (Number, +, 0) then is just the sum function.

Let’s repeat the construction above with a different monoid, (Number, max, -∞). In this case we get

fold([]) = -Infinity
fold() = 10
fold([9, 6, 5, 12])
= 9 max 6 max 5 max 12
= 12


so the fold for this monoid is the maximum function which finds the largest element in a list, and returns its identity -∞ for the empty list.

For the boolean monoid (Bool, &&) fold is the every function

fold([true, false, true])
= true && false && true
= false
fold([]) = true


which checks if all elements in a list are true.

and for (Bool, ||) we get the some function

fold([true, false, true])
= true || false || true
= true
fold([]) = false


which checks if some element is true.

Another two interesting examples of folds are the head and last functions that find the first and last element of a list respectively. These arise out of the semigroup operations ⨮ and ⨭ defined as.

x ⨭ y = x
x ⨮ y = y


which simply discard one of their arguments.

head then is the function

fold([x, y, z])
= x ⨭ y ⨭ z
= x


and last is the function

fold([x, y, z])
= x ⨮ y ⨮ z
= z


Unfortunately we can not give meaning to the expression fold([]). This is because ⨮ and ⨭ define semigroups that are not monoids, so these functions err on the empty list. This illustrates the problem of working with semigroups only, when our domain of study are lists.

### Algebra for parallelism

The relation of folds to the map-reduce programming model and parallel computation in general can be captured in the fact that they satisfy the following distributive law.

fold(xs ++ ys) = fold(xs) ⊗ fold(ys)


For a list that is the concatenation of lists xs and ys, fold(xs) and fold(ys) could be computed on different machines, or CPU cores, so such a law is a suitable condition for when a problem can be solved in a distributed or parallel way. At the end the two partial solutions are re-combined using the monoid composition ⊗, and this law then states that this behaves “as if the problem was solved sequentially”, by folding the entire list in sequence.

Since sum and maximum are both folds, they can be computed in parallel. The distributive law is these cases become

sum([1, 2, 3, 4, 5, 6])
= sum([1, 2, 3] ++ [4, 5, 6])
= sum([1, 2, 3]) + sum([4, 5, 6])

maximum([9, 6, 5, 12])
= maximum([9, 6] ++ [5, 12])
= maximum([9, 6]) max maximum([5, 12])


Of course, such a law can be repeatedly applied

sum([1, 2, 3, 4, 5, 6]) =
sum([1, 2]) + sum([3, 4, 5, 6]) =
sum([1, 2]) + sum([3, 4]) + sum([5, 6])


to distribute such a problem to any number of machines or cores.

Note that the requirement for an identity element arises naturally out of such a law:

fold(xs) = fold(xs ++ []) = fold(xs) ++ fold([])
fold(xs) = fold([] ++ xs) = fold([]) = fold(xs)


the value fold([]) must be such that it is an identity element for the range of fold, providing further evidence that the concept of a monoid is a natural extension of that of a semigroup when dealing with possibly empty lists.

The distributive law above is the fundamental property exploited in the map-reduce model, but folds do not cover all functions that can be solved in this way. To provide a better classification we generalize.

## Monoid morphisms

To define the concept of a monoid morphism, we pair the distributive law mentioned above with folds behaviour on the empty list, which by defintion returns the empty element of the target monoid.

fold([]) = e
fold(xs ++ ys) = fold(xs) ⊗ fold(ys)


We say that folds respects monoid structure, because they map the identity element of lists ([]) to identity elements in their domains(e), and they map monoid compositions (++) to compositions in the target monoid (⊗).

A function that respects monoid structure is called a monoid morphism. Folds then, are monoid morphisms from the list monoid to another.

In general, monoid morphisms need not be from the list monoid. In general h is a monoid morphism if it satisfies

h(e) = f
h(a ⊕ b) = h(a) ⊗ h(b)


for some source monoid (M, e, ⊕) to a target monoid (N, f, ⊗).

As we have seen sum is a fold and thus a monoid morphism, in this case targetting the monoid of numbers with addition. Another morphism with the same target monoid is length. It is a monoid morphism as it also respects monoid structure.

length([]) = 0
length(xs ++ ys) = length(xs) + length(ys)


Length is of course also another example of a function that is computable in parallel (albeit not a very interesting one). It is not a fold however, and doesn’t even “type-check” as such.

For some list, e.g. [4, 6, 1], we can apply the distributive laws for sum and length over and over until we get to single-element lists.

sum([4, 6, 1]) = sum() + sum() + sum()
length([4, 6, 1]) = length() + length() + length()


We can always make an argument of this type. It must then be the case that the difference between sum and length is only really in how they behave on single-element lists.

sum([x]) = x
length([x]) = 1


As there is nothing special about sum or length we can generalize:

Theorem A monoid morphism from lists is determined uniquely by its target monoid and its behaviour on single-element lists.

A monoid morphism that both starts and ends in the list monoid, is map(f), the higher-order function that maps a function f over each element of a list.

map(f)([]) = []
map(f)(xs ++ ys) = map(f)(xs) ++ map(f)(ys)


map(f) is thus another parallelizable function, that also happen to be a monoid morphism. By our previous discussion, it also possible to define map(f) as the unique monoid morphism from lists to lists satisfying

map(f)([x]) = [f(x)]


Since any possible behaviour on single element lists can be expressed by some function f, we see that.

Theorem Any monoid morphism from lists can be written on the form

fold ∘ map(f)


for some function f, clearly providing some validity to Map-Reduce as a computational model — it covers completely the set of functions "naturally" parallelizable through the distributive law defining monoid morphisms.

There is a way to extend any semigroup (S, ⊗) into a monoid. We simply add to its underlying type another value, that we’ll call None. It’s composition will be the same as ⊗, except for if either side is None, in which case we’ll make None an identity by defintion.

None ⊗₊ x = x
x ⊗₊ None = x
x ⊗₊ y = x ⊗ y  // otherwise


This construction is simply the Option or Maybe type, along with a suitably defined monoid structure.

Now we can define safeHead and safeLast as the folds of ⨭₊ and ⨮₊. For instance safeHead is the fold

fold([]) = None
fold([1, 2, 3])
= 1 ⨭₊ 2 ⨭₊ 3
= 1


Creating “safe” functions on lists can be seen as correcting a mismatch in structure between lists (that are monoids), and semigroups (that are not).

The fact that we chose maximum([]) = -Infinity is a similar correction, in fact it is of exactly the same form, except we named None as -Infinity.

Functions and maps are monoids if their domain is a monoid, where composition is performed pointwise.

function composeFunctions(f,g) {
return x => f(x) ⊗ g(x);
}


we call this the pointwise lifting of the monoid over the range.

Frequency maps are an example of this construction, they are the pointwise lifted additive monoid on numbers (Number, + 0).

Exercise Consider the semigroup of the set { LESS, GREATER } with composition ⨮. Define the monoid of comparators starting with this semigroup, and using the Option and pointwise lifting constructions.

Exercise Counting the votes in an election is a good real-word example of a parallelizable problem. Define a monoid morphism from a list of votes to some monoid giving the election results. Define the target monoid as the pointwise lift of another monoid.

Exercise Show that the fundamental theorem of arithmetic induces a monoid morphism from (Number, *, 1) to the monoid (Number, +, 0) lifted pointwise over the prime numbers.