The Monoid pattern is simply the combination of the two patterns
Identity Element
and Semigroup. A
monoid therefore is a datatype with composition ⊗
and element e
,
satisfying
x ⊗ e = x
e ⊗ x = x
x ⊗ (y ⊗ z) = (x ⊗ y) ⊗ z
Some understanding of these patterns is assumed in this article. If you feel like you need intuition on what these equations mean, read the entries on these individual patterns before continuing.
The monoid pattern models the many structures that are semigroups and also have identity elements. In such situations, it is often convenient to consider these patterns in concert in order to derive elegant models and laws.
For instance the semigroup of lists have the empty list []
as
identity. Treating lists as a semigroup only often result in less
elegant laws where the empty list has to be treated as a separate
case.
In this article we’ll give some examples of monoids and develop some models suitable for problem solving in Map-Reduce style programming models.
To refer to a particular monoid we take the triple of its type,
composition and identity. For instance (Number, +, 0)
is the monoid
of numbers with addition.
For any monoid we can define a function called fold
. It takes a list
of elements of that monoid to their “product”. For the monoid
(Number, +, 0)
, we define fold
(by example) as
fold([]) = 0
fold([1]) = 1
fold([5, 6, 3, 1])
= 5 + 6 + 3 + 1
= 15
The fold
function simply inserts the monoid composition (in our case
+
) between each element. For the empty list it returns the identity
element (0
). The fold
for the monoid (Number, +, 0)
then is
just the sum
function.
Let’s repeat the construction above with a different monoid,
(Number, max, -∞)
. In this case we get
fold([]) = -Infinity
fold([10]) = 10
fold([9, 6, 5, 12])
= 9 max 6 max 5 max 12
= 12
so the fold
for this monoid is the maximum
function which finds
the largest element in a list, and returns its identity -∞
for the
empty list.
For the boolean monoid (Bool, &&)
fold
is the every
function
fold([true, false, true])
= true && false && true
= false
fold([]) = true
which checks if all elements in a list are true.
and for (Bool, ||)
we get the some
function
fold([true, false, true])
= true || false || true
= true
fold([]) = false
which checks if some element is true.
Another two interesting examples of folds are the head
and last
functions that find the first and last element of a list
respectively. These arise out of the semigroup operations ⨮
and ⨭
defined as.
x ⨭ y = x
x ⨮ y = y
which simply discard one of their arguments.
head
then is the function
fold([x, y, z])
= x ⨭ y ⨭ z
= x
and last
is the function
fold([x, y, z])
= x ⨮ y ⨮ z
= z
Unfortunately we can not give meaning to the expression
fold([])
. This is because ⨮
and ⨭
define semigroups that are not
monoids, so these functions err on the empty list. This illustrates
the problem of working with semigroups only, when our domain of study
are lists.
The relation of folds
to the map-reduce programming model and
parallel computation in general can be captured in the fact that they
satisfy the following distributive law.
fold(xs ++ ys) = fold(xs) ⊗ fold(ys)
For a list that is the concatenation of lists xs
and ys
,
fold(xs)
and fold(ys)
could be computed on different machines, or
CPU cores, so such a law is a suitable condition for when a problem
can be solved in a distributed or parallel way. At the end the two
partial solutions are re-combined using the monoid composition ⊗
,
and this law then states that this behaves “as if the problem was
solved sequentially”, by folding the entire list in sequence.
Since sum
and maximum
are both folds, they can be computed in
parallel. The distributive law is these cases become
sum([1, 2, 3, 4, 5, 6])
= sum([1, 2, 3] ++ [4, 5, 6])
= sum([1, 2, 3]) + sum([4, 5, 6])
maximum([9, 6, 5, 12])
= maximum([9, 6] ++ [5, 12])
= maximum([9, 6]) `max` maximum([5, 12])
Of course, such a law can be repeatedly applied
sum([1, 2, 3, 4, 5, 6]) =
sum([1, 2]) + sum([3, 4, 5, 6]) =
sum([1, 2]) + sum([3, 4]) + sum([5, 6])
to distribute such a problem to any number of machines or cores.
Note that the requirement for an identity element arises naturally out of such a law:
fold(xs) = fold(xs ++ []) = fold(xs) ++ fold([])
fold(xs) = fold([] ++ xs) = fold([]) = fold(xs)
the value fold([])
must be such that it is an identity element for
the range of fold
, providing further evidence that the concept of a
monoid is a natural extension of that of a semigroup when dealing with
possibly empty lists.
The distributive law above is the fundamental property exploited in
the map-reduce model, but fold
s do not cover all functions that can
be solved in this way. To provide a better classification we generalize.
To define the concept of a monoid morphism, we pair the distributive
law mentioned above with fold
s behaviour on the empty list, which by
defintion returns the empty element of the target monoid.
fold([]) = e
fold(xs ++ ys) = fold(xs) ⊗ fold(ys)
We say that fold
s respects monoid structure, because they map the
identity element of lists ([]
) to identity elements in their
domains(e
), and they map monoid compositions (++
) to
compositions in the target monoid (⊗
).
A function that respects monoid structure is called a monoid morphism. Folds then, are monoid morphisms from the list monoid to another.
In general, monoid morphisms need not be from the list monoid. In general
h
is a monoid morphism if it satisfies
h(e) = f
h(a ⊕ b) = h(a) ⊗ h(b)
for some source monoid (M, e, ⊕)
to a target monoid (N, f, ⊗)
.
As we have seen sum
is a fold and thus a monoid morphism, in this
case targetting the monoid of numbers with addition. Another morphism
with the same target monoid is length
. It is a monoid morphism as it
also respects monoid structure.
length([]) = 0
length(xs ++ ys) = length(xs) + length(ys)
Length is of course also another example of a function that is
computable in parallel (albeit not a very interesting one). It is not
a fold
however, and doesn’t even “type-check” as such.
For some list, e.g. [4, 6, 1]
, we can apply the
distributive laws for sum
and length
over and over until we
get to single-element lists.
sum([4, 6, 1]) = sum([4]) + sum([6]) + sum([1])
length([4, 6, 1]) = length([4]) + length([6]) + length([1])
We can always make an argument of this type. It must then be the case
that the difference between sum
and length
is only really in how
they behave on single-element lists.
sum([x]) = x
length([x]) = 1
As there is nothing special about sum
or length
we can
generalize:
Theorem A monoid morphism from lists is determined uniquely by its target monoid and its behaviour on single-element lists.
A monoid morphism that both starts and ends in the list monoid, is
map(f)
, the higher-order function that maps a function f
over
each element of a list.
map(f)([]) = []
map(f)(xs ++ ys) = map(f)(xs) ++ map(f)(ys)
map(f)
is thus another parallelizable function, that also happen to
be a monoid morphism. By our previous discussion, it also possible to
define map(f)
as the unique monoid morphism from lists to lists
satisfying
map(f)([x]) = [f(x)]
Since any possible behaviour on single element lists can be
expressed by some function f
, we see that.
Theorem Any monoid morphism from lists can be written on the form
fold ∘ map(f)
for some function f
, clearly providing some validity to Map-Reduce
as a computational model — it covers completely the set of functions
"naturally" parallelizable through the distributive law defining
monoid morphisms.
There is a way to extend any semigroup (S, ⊗)
into a monoid. We
simply add to its underlying type another value, that we’ll call
None
. It’s composition will be the same as ⊗
, except for if either
side is None
, in which case we’ll make None
an identity by defintion.
None ⊗₊ x = x
x ⊗₊ None = x
x ⊗₊ y = x ⊗ y // otherwise
This construction is simply the Option
or Maybe
type, along with a
suitably defined monoid structure.
Now we can define safeHead
and safeLast
as the folds of ⨭₊
and
⨮₊
. For instance safeHead
is the fold
fold([]) = None
fold([1, 2, 3])
= 1 ⨭₊ 2 ⨭₊ 3
= 1
Creating “safe” functions on lists can be seen as correcting a mismatch in structure between lists (that are monoids), and semigroups (that are not).
The fact that we chose maximum([]) = -Infinity
is a
similar correction, in fact it is of exactly the same form, except we
named None
as -Infinity
.
Functions and maps are monoids if their domain is a monoid, where composition is performed pointwise.
function composeFunctions(f,g) {
return x => f(x) ⊗ g(x);
}
we call this the pointwise lifting of the monoid over the range.
Frequency maps are an example of this construction, they are the pointwise
lifted additive monoid on numbers (Number, + 0)
.
Exercise Consider the semigroup of the set { LESS, GREATER }
with
composition ⨮
. Define the monoid of comparators starting with this
semigroup, and using the Option
and pointwise lifting constructions.
Exercise Counting the votes in an election is a good real-word example of a parallelizable problem. Define a monoid morphism from a list of votes to some monoid giving the election results. Define the target monoid as the pointwise lift of another monoid.
Exercise Show that the fundamental theorem of arithmetic induces a
monoid morphism from (Number, *, 1)
to the monoid (Number, +, 0)
lifted pointwise over the prime numbers.