Oleksandr Manzyuk's Blog

Musings of a mathematician and aspiring programmer

Ariadne plugin for Emacs

I am pleased to announce ariadne-el, an Emacs interface to Ariadne.

Ariadne is a new tool written by Roman Cheplyaka that provides the go-to-definition functionality for Haskell code. It is designed as a server that responds to queries from IDEs and text editor plugins transmitted via the BERT-RPC protocol over TCP. The server is implemented in Haskell using Roman’s haskell-names name resolution library. It is fully Haskell-aware and can properly locate prefixed names (like T.head) and locally bound names, which makes it quite a bit smarter than TAGS. Ariadne is in an early development stage, and as of v0.1.2 it can only find definitions in the current file, but I am sure this and other limitations will be lifted in future versions.

ariadne-el allows you to communicate with Ariadne from Emacs. It uses my own bert-el, BERT serialization library for Emacs, for encoding/decoding BERT-RPC protocol messages. I have to admit that I never worked with networking facilities of Emacs before, so I shamelessly borrowed the design of low-level networking from SLIME.

If you are a Haskell programmer and an Emacs user, I encourage you to give Ariadne and ariadne-el a try. The installation is pretty straightforward:

  1. Install Ariadne and run the ariadne-server executable.
  2. Install bert-el and ariadne-el by cloning the Git repositories and adding their paths to load-path. If you use Marmalade, you can also obtain both packages by running M-x package-install ariadne.
  3. Add the following lines to your .emacs file:
(require 'ariadne)
(add-hook 'haskell-mode-hook
          (lambda ()
            (define-key haskell-mode-map "\C-cd" 'ariadne-goto-definition)))

Now try to open any Haskell source file, put the cursor on any name, and press C-c d.

Co-Yoneda Lemma

The purpose of this humble blog post is to understand this insightful comment by Edward Kmett to the question “Is operational really isomorphic to a free monad?” posted at StackOverflow.

Consider the following data type definition in Haskell, which can also be found in Edward Kmett’s package kan-extensions:

data CoYoneda f a = forall b. CoYoneda (b -> a) (f b)

In Haskell, a forall before the data constructor means existential rather than universal quantification, so that CoYoneda f a is isomorphic to the type \exists b. (b\to a, f\, b). The forall hints at the polymorphic type of the constructor

CoYoneda :: forall b. (b -> a) -> f b -> CoYoneda f a

Every polymorphic function satisfies a theorem that can be extracted by reading its type as a relation. This is referred to as parametricity. The reader is referred to Wadler’s classic paper “Theorems for free!” for more details.

Suppose that f is a functor. Then parametricity implies the following free theorem for the function CoYoneda: if g :: b -> a and y :: f c, then CoYoneda (g . h) y == CoYoneda g (fmap h y) for any h :: c -> b.

Consider the following two functions:

phi :: f a -> CoYoneda f a
phi x = CoYoneda id x

psi :: Functor f => CoYoneda f a -> f a
psi (CoYoneda g y) = fmap g y

Then obviously psi . phi == id:

(psi . phi) x == fmap id x == x

The above free theorem implies also that phi . psi == id:

(phi . psi) (CoYoneda g y) == CoYoneda id (fmap g y) == CoYoneda g y

We conclude that CoYoneda f is isomorphic f. This is a Haskell version of the co-Yoneda Lemma.


The only thing that is missing in Wadler’s paper is how to associate a relation with the type f b. Wadler only explains this in the case of the list functor. We assume that the Haskell functor f is modeled by a functor F on the category of sets. Suppose that \mathcal{B}: B \Leftrightarrow B' is a relation associated with the type b. That is, \mathcal{B} is a subset of B\times B'. Let \iota: \mathcal{B}\hookrightarrow B\times B' denote the inclusion map. We define the relation associated with the type f b as the image of the composite \langle F\pi, F\pi'\rangle\circ F\iota: F(\mathcal{B})\to FB\times FB', where \pi: B\times B'\to B and \pi': B\times B'\to B' are the canonical projections.

For example, if f = [], so that F=(-)^* is the list functor, then the above definition says that two lists xs \in B^* and ys \in B'^* are related if there exists a list zs\in \mathcal{B}^* \subset (B\times B')^* such that

\displaystyle  \begin{array}{rcl} map\; fst\; zs & \equiv & xs,\\ map\; snd\; zs & \equiv & ys. \end{array}

In other words, xs and ys are related if they have the same length and corresponding elements are related, which coincides with Wadler’s definition.

It follows from this definition that if \mathcal{B} is the relation associated with a function g: B\to B', then F(\mathcal{B}) is the relation associated with the function Fg. Indeed, if \mathcal{B} is the graph of g, i.e., the image of the map \langle\mathrm{id}_B, g\rangle: B\to B\times B', then F(\mathcal{B}) is the image of the composite \langle F\pi, F\pi'\rangle\circ F(\langle\mathrm{id}_B, g\rangle), which coincides with \langle F\mathrm{id}_B, Fg\rangle = \langle \mathrm{id}_{FB}, Fg\rangle. Therefore, F(\mathcal{B}) coincides with the graph of Fg.

Category Theory Helpline

One of my colleagues, Ben Moseley, has recently asked an interesting category theoretic question. I’ve decided to write up my answer in a blog post.


This blog post is a literate Haskell program. First, we are going to enable a few language extensions and import a few modules:

> {-# LANGUAGE FlexibleInstances     #-}
> {-# LANGUAGE MultiParamTypeClasses #-}
> {-# LANGUAGE TupleSections         #-}
> {-# LANGUAGE TypeOperators         #-}
> import Control.Arrow
> import Data.Traversable
> import Prelude hiding (sequence)

Next, we are going to recall the definitions of the fixed point of a functor f, the free monad generated by f, and the cofree comonad generated by f:

> newtype Fix f = Fix { unFix :: f (Fix f) }
> data Free f a = Pure a | Free (f (Free f a))
> data Cofree f a = a :< f (Cofree f a)

All three definitions are fairly standard. The last two can be found, for example, in Edward Kmett’s package free.

Fix f is both an initial algebra and final coalgebra for f: the f-algebra structure is given by Fix :: f (Fix f) -> Fix f, and the f-coalgebra structure is given by unFix :: Fix f -> f (Fix f).

The initiality means that for each f-algebra alg :: f a -> a there exists a unique f-algebra morphism cata alg :: Fix f -> a, called catamorphism. cata alg being an f-algebra morphism means that the following equation holds:

cata alg . Fix == alg . fmap (cata alg)

from which we can read off the definition of cata alg:

> cata :: Functor f => (f a -> a) -> Fix f -> a
> cata alg = alg . fmap (cata alg) . unFix

Similarly, the finality means that for each f-coalgebra coalg :: a -> f a there exists a unique f-coalgebra morphism ana coalg :: a -> Fix f, called anamorphism. ana coalg being an f-coalgebra morphism means that the following equation holds:

unFix . ana coalg == fmap (ana coalg) . coalg

from which we can read off the definition of ana coalg:

> ana :: Functor f => (a -> f a) -> a -> Fix f
> ana coalg = Fix . fmap (ana coalg) . coalg

The original version of this blog post fell into the fallacy of claiming that every function from resp. to Fix f could be expressed a catamorphism resp. an anamorphism. This is obviously not the case. “When is a function a fold or an unfold?” by Jeremy Gibbons, Graham Hutton, and Thorsten Altenkirch gives necessary and sufficient conditions for when a set-theoretic function can be written as a catamorphism resp. an anamorphism (thanks to Ben for pointing this out). The fallacy doesn’t affect the calculations below. However, they become somewhat less natural than I originally thought.

The functions cata and ana are only two examples of the plethora of recursion schemes associated with the type Fix f. The types Free f a and Cofree f a also have associated recursion schemes:

> fold :: Functor f => (Either a (f b) -> b) -> Free f a -> b
> fold f m = f $ case m of
>   Pure x -> Left x
>   Free n -> Right $ fmap (fold f) n

> unfold :: Functor f => (b -> (a, f b)) -> b -> Cofree f a
> unfold f c = case f c of
>   (x, d) -> x :< fmap (unfold f) d

The question

Ben wrote the following functions:

> cofree2fix :: Functor f => Cofree f a -> Fix f
> cofree2fix = ana (\(_ :< fx) -> fx)

> fix2free :: Functor f => Fix f -> Free a
> fix2free = cata Free

> fix2cofree :: Functor f => (a, Fix f) -> Cofree f a
> fix2cofree (a, fx) = unfold (const a &&& unFix) fx

> free2fix :: Traversable f => Free f a -> Either a (Fix f)
> free2fix = fold (Left ||| (right Fix . sequence))


(&&&) :: (a -> b) -> (a -> c) -> a -> (b, c)


(|||) :: (b -> d) -> (c -> d) -> Either b c -> d

are the standard mediating morphisms for products and sums. They are imported from Control.Arrow. The function sequence is the generalized version from Data.Traversable.

Ben then wrote:

I’m puzzled by the last one… It all typechecks, but there’s a clear asymmetry between fix2cofree and free2fix which worries me…

The rest of this blog post is an attempt to explain what is going on here.

It’s fixed points all the way down

We first observe that Free f a is isomorphic to the fixed point of the functor Either a :. f, and Cofree f a is isomorphic to the fixed point of the functor (,) a :. f, where :. denotes functor composition (thanks to Roman Cheplyaka for suggesting this notation):

> newtype (f :. g) a = C { unC :: f (g a) }

> instance (Functor f, Functor g) => Functor (f :. g) where
>   fmap t (C x) = C $ fmap (fmap t) x

We can encode isomorphisms between types by means of a typeclass, Iso:

> class Iso a b where
>   iso :: a -> b
>   inv :: b -> a

The isomorphisms between Free f a and Fix (Either a :. f), and between Cofree f a and Fix ((,) a :. f) are then defined as follows:

> instance Functor f => Iso (Free f a) (Fix (Either a :. f)) where
>   iso = fold (Fix . C)
>   inv = cata (either Pure Free . unC)

> instance Functor f => Iso (Cofree f a) (Fix ((,) a :. f)) where
>   iso = ana (\(x :< fx) -> C (x, fx))
>   inv = unfold (unC . unFix)

Under this identification, the recursion schemes fold and unfold identify with cata and ana respectively.

We shall show now that the functions cofree2fix, fix2free, fix2cofree, and free2fix can naturally be expressed as cata- resp. anamorphisms of some naturally arising algebras resp. coalgebras over the functor f.


The finality of Fix f allows us to manufacture a function

Fix ((,) a :. f) -> Fix f

out of a function

Fix ((,) a :. f) -> f (Fix ((,) a :. f))

There is an obvious natural representative of the latter type, namely the composite snd . unC . unFix, leading to the following definition:

> cofree2fix' :: Functor f => Fix ((,) a :. f) -> Fix f
> cofree2fix' = ana (snd . unC . unFix)


The initiality of Fix f implies that each function

f (Fix (Either a :. f)) -> Fix (Either a :. f)

gives rise to a function

Fix f -> Fix (Either a :. f)

There is an obvious natural representative of the former type, namely the composite Fix . C . Right, which corresponds to the constructor Free under the isomorphism between Free f a and Fix (Either a :. f). Therefore, we can define

> fix2free' :: Functor f => Fix f -> Fix (Either a :. f)
> fix2free' = cata (Fix . C . Right)


By the finality of Fix ((,) a :. f), we obtain a function

(a, Fix f) -> Fix ((,) a :. f)

from each function

(a, Fix f) -> ((,) a :. f) (a, Fix f)

The type ((,) a :. f) (a, Fix f) is isomorphic to (a, f (a, Fix f)) by means of the constructor C. Therefore, we need to define a function of type

(a, Fix f) -> (a, f (a, Fix f))

By the universal property of product, any such function is necessarily of the form u &&& v for uniquely determined

u :: (a, Fix f) -> a


v :: (a, Fix f) -> f (a, Fix f)

There is an obvious candidate for u: the function fst (and, in fact, because we are after a polymorphic u, this is the only choice we have). Is there also a natural choice of v? The answer is yes, and the fundamental reason for that is that Haskell functors are strong. That is, an arbitrary functor f admits the following natural transformation, which in category theory is called the right tensorial strength:

> strength :: Functor f => (a, f b) -> f (a, b)
> strength (x, fy) = fmap (x,) fy

This allows us to define

> fix2cofree' :: Functor f => (a, Fix f) -> Fix ((,) a :. f)
> fix2cofree' = ana (C . (fst &&& strength . second unFix))


The initiality of Fix (Either a :. f) implies that each function

(Either a :. f) (Either a (Fix f)) -> Either a (Fix f)

gives rise to a function

Fix (Either a :. f) -> Either a (Fix f)

Therefore, in order to construct a function of the latter type it suffices to come up with a function of the former type.

The type (Either a :. f) (Either a (Fix f)) is isomorphic to Either a (f (Either a (Fix f))) by means of C, therefore we are after a function of type

Either a (f (Either a (Fix f))) -> Either a (Fix f)

By the universal property of sum, any such function is necessarily of the form p ||| q with uniquely determined

p :: a -> Either a (Fix f)


q :: f (Either a (Fix f)) -> Either a (Fix f)

We have an obvious candidate for p: the function Left (and because p has to be polymorphic, this is the only choice we have). By analogy with the previous case, we might expect that there always exists a natural transformation of type

Functor f => f (Either a b) -> Either a (f b)

Alas, this is not the case. There is, however, a large class of functors that do admit a natural transformation of this type: Traversable functors, and the required function is sequence.

Therefore, for Traversable functors we can define

> free2fix' :: Traversable f => Fix (Either a :. f) -> Either a (Fix f)
> free2fix' = cata ((Left ||| (right Fix . sequence)) . unC)


The short answer to Ben’s question is this: fix2cofree works for any functor because all functors in Haskell are strong: for each functor f, there is a natural transformation strength :: Functor f => (a, f b) -> f (a, b), called the right tensorial strength, subject to coherence axioms.

The fundamental fact underlying the existence of strength is that, categorically, every Haskell functor is enriched: its action on morphisms is given not merely as a map from the set of functions a -> b to the set of functions f a -> f b, but as a morphism (in the category Hask) from the internal object of morphisms from a to b, the type a -> b, to the internal object of morphisms from f a to f b, the type f a -> f b. This morphism is fmap, of course. There is a one-to-one correspondence between enrichments of a functor F from a cartesian closed category \mathcal{C} to itself and right tensorial strengths on F. For more information on strong functors, see this nLab web page, or refer to the original paper by Anders Kock.

On the other hand, not every Haskell functor f admits a dual natural transformation Functor f => f (Either a b) -> Either a (f b). Traversable functors do, but the former condition is weaker, I think.

Update: User sclv on reddit suggested that (up to the order of summands) the function f (Either a b) -> Either a (f b) is called costrength in Edward Kmett’s post.

Gröbner bases in Haskell: Part II

In the previous post we discussed the representation of variables, monomials, and monomial orderings in a toy EDSL for polynomial computations groebner. In this post, we are going to consider how polynomials are represented in the EDSL and implement the Gröbner basis algorithm.


A term is a monomial times a coefficient (an element of some ground ring r):

data Term r v o = T r (Monomial v o) deriving (Eq, Show)

Terms as well as monomials form a monoid:

instance (Num r, Ord v) => Monoid (Term r v o) where
    mempty = T 1 mempty
    T a m `mappend` T b n = T (a * b) (m `mappend` n)

Polynomials are represented as lists of non-zero terms, ordered in descending order by the their monomials with respect to some monomial order. This makes equality test and extraction of the leading monomial very simple and cheap:

newtype Polynomial r v o = P [Term r v o] deriving Eq

lm :: Polynomial r v o -> Monomial v o
lm (P ((T _ m):_)) = m
lm (P [])          = error "lm: zero polynomial"

The relative order of terms does not change if the polynomial is multiplied by a term:

(*^) :: (Num r, Ord v) => Term r v o -> Polynomial r v o -> Polynomial r v o
u *^ P vs = P [ u `mappend` v | v <- vs ]

Polynomials have degree: the degree of the zero polynomial is usually defined to be -1, and the degree of a non-zero polynomial is the maximum of the degrees of its monomials:

instance Ord v => HasDegree (Polynomial r v o) where
    degree (P []) = -1
    degree (P ts) = maximum [ degree m | T _ m <- ts ]

We are trying to make the display of polynomials as close to the mathematical notation as possible. Because we don’t know what the ground ring r can be, we apply some heuristics:

instance (Eq r, Show r, Num r, Ord v, Show v) => Show (Polynomial r v o) where
    show (P [])     = "0"
    show (P (t:ts)) = showHead t ++ showTail ts
          showHead (T c m) = prefix ++ show m
                prefix = case c of
                           1  -> ""
                           -1 -> "-"
                           _  -> show c
          showTerm (T c m) = prefix ++ show m
                prefix = case signum c of
                           1  -> '+':a
                           -1 -> '-':a
                           _  -> "(" ++ show c ++ ")"
                a = if abs c == 1 then "" else show (abs c)
          showTail = concatMap showTerm

Arithmetic operations on polynomials are defined to preserve the invariant of the representation of polynomials:

instance (Eq r, Num r, Ord v, Show v, Ord (Monomial v o))
    => Num (Polynomial r v o) where
    f@(P (u@(T a m):us)) + g@(P (v@(T b n):vs))
        | m == n && a + b /= 0
        = let P ws = P us + P vs in P $ T (a + b) m:ws
        | m == n && a + b == 0
        = P us + P vs
        | m < n
        = let P ws = f + P vs in P $ v:ws
        | otherwise
        = let P ws = P us + g in P $ u:ws
    f + P [] = f
    P [] + g = g

    P (u:us) * P (v:vs)
        = let P ws = P us * P vs + u *^ P vs + v *^ P us
          in P $ (u `mappend` v):ws
    _ * P [] = P []
    P [] * _ = P []

    negate (P ts) = P $ [ T (negate a) m | T a m <- ts ]
    -- Inclusion of 'abs' and 'signum' into 'Num' was a stupid idea.
    abs _ = error "abs is undefined for polynomials"
    signum _ = error "signum is undefined for polynomials"
    fromInteger = constant . fromInteger

Constants can be viewed as polynomials (of degree 0 unless the constant is 0):

constant :: (Eq r, Num r, Ord v) => r -> Polynomial r v o
constant 0 = P []
constant c = P [T c mempty]

Variables can be viewed as polynomials of degree 1:

variable :: (Num r, Eq v) => v -> Polynomial r v o
variable x = P [T 1 (inject x)]

Suppose that f and g are two polynomials over a field r, and let x^\alpha and x^\beta be the leading monomials of f and g respectively. Let x^\gamma be the least common multiple of x^\alpha and x^\beta. Then the s-polynomial of f and g is defined to be

\displaystyle \mathrm{spoly}(f, g) = x^{\gamma-\alpha}f - \frac{\mathrm{lc}(f)}{\mathrm{lc}(g)}x^{\gamma-\beta}g

where \mathrm{lc} denotes the leading coefficient of a polynomial. In Haskell:

spoly :: (Eq r, Fractional r, Ord v, Show v, Ord (Monomial v o))
      => Polynomial r v o -> Polynomial r v o -> Polynomial r v o
spoly f@(P (u@(T a m):us)) g@(P (v@(T b n):vs)) = n' *^ f - m' *^ g
      n' = T 1       (complement m n)
      m' = T (a / b) (complement n m)

Normal Forms and Gröbner Bases

A normal form of a polynomial f with respect to a list s of polynomials is essentially the remainder from multivariate division of f by polynomials from the list s. It is computed using Buchberger’s algorithm. Instead of explaining the algorithm in words, I’ll let the code speak for itself. The following definition is in fact very close to the pseudo-code that can be found, for example, in A Singular Introduction to Commutative Algebra:

nf :: (Eq r, Fractional r, Ord v, Show v, Ord (Monomial v o))
   => Polynomial r v o -> [Polynomial r v o] -> Polynomial r v o
nf f s = go f
      go h | h == 0      = 0
           | []    <- s' = h
           | (g:_) <- s' = go (spoly h g)
             s' = [g | g <- s, lm h `isDivisibleBy` lm g]

The function groebner implements the Gröbner basis algorithm. It takes a list of generators of an ideal and returns a Gröbner basis of that ideal:

groebner :: (Eq r, Fractional r, Ord v, Show v, Ord (Monomial v o))
         => [Polynomial r v o] -> [Polynomial r v o]
groebner i = go i ps
      ps = [(f, g) | f <- i, g <- i, f /= g]
      go s [] = s
      go s ps@((f, g):ps')
          | h == 0    = go s ps'
          | otherwise = go (h:s) (ps' ++ [(h, f) | f <- s])
            h = nf (spoly f g) s

Product Criterion

The product criterion allows us to decrease the number of pairs that have to be considered by the Groebner basis algorithm. The criterion says that if the least common multiple of the leading monomials of f and g is their product, then the s-polynomial of f and g reduces to 0 with respect to the set \{f, g\}, and hence the pair (f, g) can be dropped. We implement a function pc that tests if two polynomials f and g satisfy the product criterion as follows:

pc f g = null (variables (lm f) `intersect` variables (lm g))

Note that instead of computing the least common multiple of the leading monomials of f and g we check if the sets of variables occurring in each of the monomials are disjoint.

The optimized version of the Gröbner basis algorithm reads as follows:

groebner :: (Eq r, Fractional r, Ord v, Show v, Ord (Monomial v o))
         => [Polynomial r v o] -> [Polynomial r v o]
groebner i = go i ps
      ps = [(f, g) | f <- i, g <- i, f /= g, not (pc f g)]
      go s [] = s
      go s ps@((f, g):ps')
          | h == 0    = go s ps'
          | otherwise = go (h:s) (ps' ++ [(h, f) | f <- s, not (pc h f)])
            h = nf (spoly f g) s

Declaring Variables

Having to define an enumeration type and to write an Enumerable instance for it in order to declare variables is tedious. Here we address this problem.

First, let us define the sum of types a and b:

data a :<: b = Inl a | Inr b deriving (Eq, Ord)
infixr 6 :<:

instance (Show a, Show b) => Show (a :<: b) where
    show (Inl x) = show x
    show (Inr x) = show x

It will become clear shortly why we have chosen to denote the sum by :<:. If both a and b are enumerable, then so is the sum of a and b:

instance (Enumerable a, Enumerable b) => Enumerable (a :<: b) where
    enumerate = map Inl enumerate ++ map Inr enumerate

Note that the elements of a are enumerated before the elements of b.

We can now define each variable as a singleton type:

data X = X
data Y = Y
data Z = Z

and join these types into X :<: Y :<: Z. Each of the types X, Y, Z is Enumerable:

instance Enumerable X where enumerate = [X]
instance Enumerable Y where enumerate = [Y]
instance Enumerable Z where enumerate = [Z]

Hence, the type X :<: Y :<: Z is Enumerable too. It is isomorphic to

data XYZ = X | Y | Z

However, the elements of X :<: Y :<: Z are somewhat unwieldy to write: Inl X, Inr (Inl Y), and Inr (Inr Z). We solve this problem by the trick used in Data types à la carte. We introduce the following typeclass that expresses the fact that a type a is a subtype of a type b:

class Sub a b where
    inj :: a -> b

Instead of writing the injections using Inl and Inr, the injections will be inferred using this typeclass. The Sub typeclass has only three instances:

a is a subtype of a:

instance Sub a a where
    inj = id

a is also a subtype of the sum a :<: b:

instance Sub a (a :<: b) where
    inj = Inl

Finally, if a is a subtype of c, then a is also a subtype of the sum b :<: c:

instance Sub a c => Sub a (b :<: c) where
    inj = Inr . inj

These instances require quite a few GHC extensions including OverlappingInstances (in fact, we have implicitly used many GHC extensions above), but as with data types à la carte, this shouldn’t result in an unexpected behaviour, provided that user never explicitly nests sums.

The following function allows us to view a variable from a set v as a polynomial over any wider set of variables w:

var :: (Sub v w, Ord (Monomial w o), Num r, Eq w) => v -> Polynomial r w o
var = variable . inj

We can now introduce

x = var X
y = var Y
    z = var Z

Thus, x is a polynomial over any ring r, over any set of variables w containing X, with respect to any monomial ordering:

*Main> :t x
x :: (Eq w, Num r, Ord (Monomial w o), Sub X w) => Polynomial r w o

Ditto for y and z.

Consequently, for example, x * y + z is a polynomial over any ring r, over any set of variables w containing X, Y, and Z, in particular it is a polynomial over the set of variables X :<: Y :<: Z.

Writing separate definitions for X, Y, Z and Enumerable instances is still tedious. Fortunately, we can write a Template Haskell macro allowing us to write

$(defineVariables ["X", "Y", "Z"])

See Variable.hs for details.


Consider the ideal generated by polynomials x^{10} + x^9y^2 and y^8 - x^2y^7:

ideal :: Ord (Monomial (X :<: Y) o) => [Polynomial Rational (X :<: Y) o]
ideal = [x ^ 10 + x ^ 9 * y ^ 2, y ^ 8 - x ^ 2 * y ^ 7]

Note that ideal is polymorphic in the monomial ordering. Let basis be the Gröbner basis of ideal:

basis :: Ord (Monomial (X :<: Y) o) => [Polynomial Rational (X :<: Y) o]
basis = groebner ideal

We can now compute basis with respect to different monomial orderings:

*Main> basis :: [Polynomial Rational (X :<: Y) Lex]
*Main> basis :: [Polynomial Rational (X :<: Y) RevLex]
*Main> basis :: [Polynomial Rational (X :<: Y) DegLex]
*Main> basis :: [Polynomial Rational (X :<: Y) DegRevLex]

Gröbner bases in Haskell: Part I


Although I am primarily a category theorist (my PhD research was at the border between category theory and homological algebra), I also studied algebraic geometry and computer algebra for my master degree at the University of Kaiserslautern. Computer algebra is a diverse subject encompassing many fields of symbolic computations. Our lectures in Kaiserslautern focused on computations in polynomial rings with an eye towards applications to algebraic geometry.

Functional programming languages are a particularly good fit for symbolic computations. My goal in this blog post is to illustrate this by showing you an implementation in Haskell of Buchberger’s algorithm for computing Gröbner bases of polynomial ideals.

Haskell is also a great host language for embedded domain-specific languages. To illustrate this, we are going to implement Buchberger’s algorithm as part of a tiny EDSL for polynomial computations.

Before we dive in, I would like to make a couple of remarks about the desiderata that guided the implementation.

First, my goal was to design a toy EDSL, not a fully-fledged computer algebra system, with the syntax that is as close to the mathematical notation as possible, possibly at the cost of efficiency. Don’t expect this implementation to be competitive with the advanced computer algebra systems like Singular, Macaulay, Maple etc.

Second, the whole point of using Haskell, and not say Common Lisp, as the host language is to leverage Haskell’s highly expressive type system. For example, in the computer algebra system Singular, developed in Kaiserslautern, polynomial rings are described by three pieces of information: the ground field, list of variables, and monomial ordering. In my EDSL this information is encoded in types, so that polynomials over different fields, or in different sets of variables, or of different monomial orderings are values of different types and can’t be accidentally mixed in programs. This particular desideratum has influenced many design choices we are about to discuss.

Today we are going to look at how variables, monomials, and monomial orderings are represented, leaving the representation of polynomials and actual implementation of the Gröbner basis algorithm for the next post. The complete code of the EDSL is available at GitHub.


Somewhat unconventionally, I have chosen to represent variables by enumeration types. For example, the set of three variables X, Y, Z is represented by the following data type:

data XYZ = X | Y | Z deriving (Eq, Ord, Show)

Every constructor is nullary and simply names a variable. If you want to compute with polynomials in n variables, you first need to define an enumeration type with n constructors, e.g., if n = 5:

data Vars = X1 | X2 | X3 | X4 | X5 deriving (Eq, Ord, Show)

Defining enumeration types by hand is arguably tedious and ugly (having to invent names for enumeration types is particularly obnoxious). We shall see later how we can make declaring variables more pleasant using some Template Haskell.


A monomial over a finite set of variables represented by an enumeration type v is represented internally by a Map from variables to integers (exponents):

newtype Monomial v o = M (Map v Int) deriving Eq

The type Monomial is a phantom type: the type variable o does not appear on the right hand side of the definition of Monomial. We are going to use o later as a tag allowing us to define different instances of the Ord typeclass (i.e., different monomial orderings) on Monomial.

We have a number of helper functions to construct monomials. A variable can be viewed as a monomial:

inject :: Eq v => v -> Monomial v o
inject x = M $ Map.singleton x 1

We can convert a monomial to a list of variable-exponent pairs, and we can build a monomial from such a list:

toList :: Ord v => Monomial v o -> [(v, Int)]
toList (M m) = [ p | p@(x, n) <- Map.toList m, n /= 0 ]

fromList :: Ord v => [(v, Int)] -> Monomial v o
fromList xs = M $ Map.fromList [ p | p@(x, n) <- xs, n /= 0 ]

Note that the variables with zero exponents are dropped. We can look up the exponent of a given variable in a monomial:

exponent :: Ord v => v -> Monomial v o -> Int
exponent x (M m) = fromMaybe 0 (Map.lookup x m)

We can also collect the variables occurring in a monomial with non-zero exponents:

variables :: Ord v => Monomial v o -> [v]
variables = map fst . toList

Monomials are shown as power products of variables:

instance (Ord v, Show v) => Show (Monomial v o) where
    show m
        | null support
        = "1"
        | otherwise
        = concat [ show x ++ suffix
                 | (x, n) <- support
                 , let suffix = if n == 1
                                then ""
                                else "^" ++ show n
          support = toList m

Monomials over a set of variables v are naturally a monoid:

instance Ord v => Monoid (Monomial v o) where
    mempty = M Map.empty
    M a `mappend` M b = M $ Map.unionWith (+) a b

Monomials have degree. Because polynomials also have degree, it is convenient to have an overloaded function degree:

class HasDegree a where
  degree :: a -> Int

The degree of a monomial is the sum of the exponents of its variables:

instance Ord v => HasDegree (Monomial v o) where
    degree (M m) = Map.fold (+) 0 m

We can test whether one monomial is divisible by another:

isDivisibleBy :: Ord v => Monomial v o -> Monomial v o -> Bool
isDivisibleBy (M a) (M b) = Map.isSubmapOfBy (<=) b a

We can divide one monomial by another:

div :: Ord v => Monomial v o -> Monomial v o -> Monomial v o
div (M a) (M b) = M $ Map.differenceWith sub a b
      sub x y | x > y     = Just (x - y)
              | otherwise = Nothing

We are going to use the function div only when one monomial is known to be divisible by the other, but the above definition gives a plausible answer also when this is not the case.

The least common multiple of monomials is defined as follows:

lcm :: Ord v => Monomial v o -> Monomial v o -> Monomial v o
lcm (M a) (M b) = M $ Map.unionWith max a b

Finally, complement m n computes the product of factors in n that are missing in m:

complement :: Ord v => Monomial v o -> Monomial v o -> Monomial v o
complement m n = lcm m n `div` m

Monomial Orderings

A monomial ordering on the set of monomials is a total ordering that is compatible with multiplication of monomials, i.e., m_1 \le m_2 implies m_1n \le m_2n.

We equip monomials Monomial v o over the set of variables v with different orderings by supplying different tags o:

instance (Show v, Enumerable v) => Ord (Monomial v Lex) where
    (<=) = lex
instance (Show v, Enumerable v) => Ord (Monomial v RevLex) where
    (<=) = revlex
instance (Show v, Enumerable v) => Ord (Monomial v DegLex) where
    (<=) = deglex
instance (Show v, Enumerable v) => Ord (Monomial v DegRevLex) where
    (<=) = degrevlex

Here Lex, RevLex, DegLex, and DegRevLex are empty data types:

data Lex         -- Lexicographic ordering
data RevLex      -- Reverse lexicographic ordering
data DegLex      -- Degree lexicographic ordering
data DegRevLex   -- Reverse degree lexicographic ordering

that are used as tags so that we can define different Ord instances on the Monomial type. Instead of making Monomial a phantom type, we could also define newtype wrappers around Map v Int, but then the definition of polynomials would have to become more involved.

The definitions of orderings are written in a slightly unintuitive style because they define (<=), not (>) or (<) as is customary. This is necessary because a minimal instance declaration of Ord requires either compare or (<=). In particular, if we define only (>), then the default implementation of (<=) isn’t in terms of (>) but in terms of compare, which in turn by default is defined in terms of (<=), leading to an infinite loop.

lex' :: (Ord v, Show v) => Monomial v o -> Monomial v o -> [v] -> Bool
lex' a b []     = True
lex' a b (x:xs) = exponent x a <= exponent x b
                  && (exponent x a /= exponent x b || lex' a b xs)

lex, revlex, deglex, degrevlex :: (Enumerable v, Show v)
                               => Monomial v o -> Monomial v o -> Bool
lex       a b = lex' a b enumerate
revlex    a b = lex' a b (reverse enumerate)
deglex    a b = degree a <= degree b
                && (degree a /= degree b || a `lex` b)
degrevlex a b = degree a <= degree b
                && (degree a /= degree b || b `revlex` a)

The definitions of orderings rely on the order of variables and the knowledge which variables can occur in monomials. We encode this information in the typeclass Enumerable:

class Ord a => Enumerable a where
    enumerate :: [a]

We have added the Ord constraint to the context in order to save some typing (and because it makes sense). For each type v of variables, we have to define an instance of Enumerable. For example:

data XYZ = X | Y | Z deriving (Eq, Ord, Show)

instance Enumerable XYZ where enumerate = [X, Y, Z]


That’s it for this time. Thanks for reading this far! In this blog post we have discussed how variables, monomials, and monomial orderings are represented in our EDSL. In the next post we shall consider how polynomials are represented in the EDSL and implement the Gröbner basis algorithm.

My Setup

On and off, I read interviews about people’s setups at usesthis.com. I particularly enjoyed the interviews of Andrew Huang, Benjamin Mako Hill, Phil Hagelberg, and Russ Cox. For amusement, I’ve decided to reflect on my own computer usage and compile a description of my setup.

Who are you, and what do you do?

I’ve been trained as a mathematician, but over the last a few years I’ve become more interested in programming, particularly functional and mathematically structured programming, and in theoretical computer science, most prominently programming languages theory and compilers. I’ve spent the last two years as a PhD student at NUIM thinking primarily about design and implementation of a novel functional programming language for scientific computing with support for automatic differentiation. I’m beginning a new job in industry in September. I’m going to work on a large commercial product written in Haskell, and I’m looking forward to it.

What hardware do you use?

My only machine is a Lenovo ThinkPad SL 500 laptop running a 2.00 GHz Intel Core 2 Duo T5870, 3Gb of RAM. It is a two years old machine, and its age is starting to show. Still, at the moment of purchase, it was probably the best laptop with no Windows pre-installed one could buy in Ukraine.

I have recently acquired a Kindle 4, non-touch edition, which I use for reading fiction. I have many papers and books in PDF format that I’d also like to read away from my computer, but I haven’t found an affordable device suitable for that.

And what software?

I run Ubuntu 10.04 (64 bit) on my laptop. I’ve been reluctant to upgrade because upgrades are know to break things and because this particular version has been working pretty well for me. When I want the latest version of some software, I build it from source. The following is a list of applications I use on a daily basis, in the loose order of descending importance.


When I’m working (and often also when I’m not), I’m spending most of my time in Emacs. Compared to other Emacs users, the severity of my addiction to Emacs is modest: Emacs is my text editor, organizer, file manager, and shell. I also have an Emacs interface to Google Translate, which I use as a poor man’s dictionary.

Emacs is one of few pieces of software that I build from source.


The second most used application is the browser. I’ve tried a few different ones: Firefox, Google Chrome, Opera, Chromium, Conkeror. I’m not happy with any of them. The browser I’ve enjoyed most was Conkeror, which is to browsers what Emacs is to editors. However, I had to abandon it because occasionally it would blow up and consume all available RAM and half of available swap, and because it reliably crashes when the laptop is awaken from suspend mode.

Right now I’m using Chromium, for no particular reason. I try to keep the number of open tabs low, otherwise Chromium as well as Google Chrome become a memory drain on my machine.


My window manager is xmonad. A lot has been said in praise of tiling window managers, in particular, about how they make you more productive than conventional desktop environments. I’m not sure I buy that.

I started to use xmonad mainly out of curiosity and because of the hype surrounding it, and also because it is implemented in Haskell, so getting into it was an opportunity to learn some Haskell. Later, when I was interning at Streamtech, a webdev startup from The Hague, I enjoyed the ability to easily tile the browser and a bunch of terminal windows on my screen. Still later, when I was doing PhD at NUIM, I discovered what a joy it was to use xmonad with a dual head setup.

However, right now I’m using it only on my laptop, and I’m not sure if it’s a such a big win there. Unlike other xmonad users, I’m not a terminal junkie. In fact, I don’t run terminals outside Emacs at all, except for the case when I want to ssh to a remote host. I also rarely tile my windows because I don’t have enough screen real estate for that. I run pretty much every application fullscreen all the time, with only a few exceptions. I do like the idea of having dedicated workspaces for different activities, but I rarely use more than 3 simultaneously.

Running pure xmonad on a laptop is also somewhat awkward because one has to think about many things DEs take care for you, for example, having different keyboard layouts and a layout indicator, having a system tray with the network manager applet and battery status, mounting devices etc. That’s why I run xmonad as a drop-in replacement for Metacity in Gnome. This way I get the best of both worlds.

However, I don’t like the direction in which Gnome is heading, and I’ve never liked KDE, so most likely I’m stuck with xmonad, and should I need to reinstall my system, I’ll probably go for some combination of xmonad and Gnome tools.


I use git as my version control system. I almost never use the command line interface. Instead, for frequent tasks like stage, commit, push, pull, create/change/delete a branch, and view diff/log I use magit, an awesome Emacs interface to git. For more complex git surgery there is gitk. I also use git-gui when I want to split a large commit into smaller chunks.


I use LaTeX for typesetting my papers. AUCTeX mode for Emacs is superb. I’m surprised to see people using both LaTeX and Emacs and not using AUCTeX, but rather invoking LaTeX, BibTeX, dvitops and other tools from a shell, either manually or using a makefile. Those people don’t know what they are missing.


I use Evince to view my PDF, PostScript, and DJVU files. Evince is not particularly configurable, and I hate that I have to change settings for every new open file, but it does its job sufficiently well that I don’t have an urge to switch to something better.

My only gripe about Evince is that its DVI viewer doesn’t support forward/inverse search.


That’s why I use xdvi when working with LaTeX. xdvi is old and ugly as hell, but it is fast and its forward/inverse search are killer features for me. Making inverse search work with Emacs used to be a PITA, but AUCTeX takes care of that, too.


I use Skype for VOIP. I like how much cleaner Skype looks on Linux compared to the bloated interface on Windows. For example, I was unable to share screen using Skype on my wife’s Windows laptop.


As I mentioned above, I bought myself a Kindle 4 recently. Because I didn’t want to deal with Amazon, I needed an application to conveniently upload e-books to my device and to occasionally convert between different formats. Calibre performs these tasks seamlessly.

Compilers and Interpreters

I like to explore programming languages, and I have a few compilers and interpreters installed to support my explorations.

I use GHC for Haskell, which is the top pick for my language design experiments. I use SBCL and SLIME for Common Lisp hacking, which I don’t do very often these days and only for small exploratory programs. I use MIT Scheme as my Scheme implementation only because that’s what my friend Alexey uses for his language design experiments.

I used to write some Ruby, but I haven’t done any Ruby hacking for quite a while now. I do have some Ruby scripts lying around that I use occasionally (for downloading BibTeX items from MathSciNet and ACM).

Soon I’m going to add Erlang to the list of language I use/play with. One (or both) of Standard ML and Caml are also interesting, primarily because being strict and non-pure makes programming in these language sufficiently different from programming in Haskell.

Well, that’s it folks. There are, of course, other pieces of software, but I don’t use them nearly as often as the above. I have a large collection of music, which I used to listen to using rhythmbox. These days, I prefer to tune in to Radio Paradise. I don’t do any photo editing except for extremely rare case when I have to resize or crop a picture. I use totem for watching videos, but most of the videos I watch are online. I’ve learned to love Emacs dired mode, and don’t start Nautilus, the default file manager in Gnome, except by accident. I don’t play games. I rarely run OpenOffice, mainly when I’m forced to edit/view Microsoft Office files. For quick document preparation, I prefer Google Docs.

Oh, there is a small Windows app that I run under Wine because I haven’t found a good Linux alternative: AP Guitar Tuner.

Non-Confluence of the Perturbative λ-Calculus

Over the last month, I have been trying to prove that the reduction relation defined in my MFPS paper “A Simply Typed Lambda-Calculus of Forward Automatic Differentiation” is confluent. Alas, this is not true. A counterexample is provided by the term \mathsf{T}\,(\lambda x.\;(\lambda y.\;f x y) x), where f is a free variable. There are two reduction sequences that produce two different normal forms:

\displaystyle \begin{array}{cl}   & \lambda x.\;\iota_1\,((\pi_1\,((\mathsf{T}\,f) x)) (\pi_2\,x)) \\ {} + & \lambda x.\;\iota_1\,(\pi_1\,((\mathsf{T}\,(f (\pi_2\,x))) x)) \\ {} + & \lambda x.\;\iota_2 ((f (\pi_2\,x)) (\pi_2\,x)), \\ \neq  & \lambda x.\;\iota_1\,((\pi_1\,((\mathsf{T}\,f) x)) (\pi_2\,x)) \\ {} + & \lambda x.\;\iota_1\,(\pi_1\,((\mathsf{T}\,(f (\pi_2\,x))) (\iota_2\,(\pi_2\,x)))) \\ {} + & \lambda x.\;\iota_1\,(\pi_1\,((\mathsf{T}\,(f (\pi_2\,x))) x)) \\ {} + & \lambda x.\;\iota_1\,((\pi_1\,((\mathsf{T}\,f) (\iota_2\,(\pi_2\,x)))) (\pi_2\,x)) \\ {} + & \lambda x.\;\iota_2\,((f (\pi_2\,x)) (\pi_2\,x)). \end{array}

The latter contains terms of the form \pi_1\,((\mathsf{T}\,M) (\iota_2\,N)), which are semantically zero but don’t reduce. The reduction sequences have been discovered with the help of Redex, which proved to be an extremely valuable tool. An encoding of the perturbative λ-calculus in Redex is available here.

Monads from monoids

Given a category \mathbf{C} and a functor T: \mathbf{C} \to \mathbf{C}, can you merely by inspecting the definition of T conclude that T admits the structure of a monad?

In some cases, you can. One way is to realize that T can be decomposed into a composition of two adjoint functors and apply a well-known theorem. Presumably, checking that the functors are adjoint should be less work than defining the unit and multiplication directly and checking that they satisfy the monad laws.

For example, the state monad with state object S on a cartesian closed category \mathbf{C} is given by TX = S\Rightarrow (X\times S). One sees immediately that T arises from the adjunction -\times S \vdash S \Rightarrow -, which expresses the fact \mathbf{C} is closed.

Another way is to realize that T is the image of a monoid in a monoidal category \mathbf{D} under a lax monoidal functor \Phi : \mathbf{D} \to [\mathbf{C}, \mathbf{C}]. The goal of this post is to explain and illustrate by examples what this means.

A monoidal category is a category equipped with a bifunctor \otimes : \mathbf{D}\times\mathbf{D} \to \mathbf{D}, called the tensor product, that is associative up to a natural isomorphism \alpha: (X\otimes Y)\otimes Z \to X\otimes (Y\otimes Z), and an object I, called the unit object, that is both a left and right identity for \otimes, again up to natural isomorphisms \lambda: I\otimes X\to X and \rho: X\otimes I\to X. The natural isomorphisms \alpha, \lambda, and \rho are subject to two coherence conditions: a commutative pentagon, whose vertices correspond to the five possible ways to parenthesize the tensor product W\otimes X\otimes Y\otimes Z, and a commutative triangle, whose vertices are (X\otimes I)\otimes Y, X\otimes (I\otimes Y), and X\otimes Y.

For example, a cartesian category is naturally a monoidal category: the tensor product is the cartesian product \times and the unit object is the terminal object 1.

Another interesting example of monoidal category is the category [\mathbf{C}, \mathbf{C}] of endofunctors on a category \mathbf{C}: the tensor product is the composition of functors and the unit object is the identity functor. The category [\mathbf{C}, \mathbf{C}] is an example of strict monoidal category, i.e., one in which the natural associativity and identity isomorphisms are identity transformations.

The definition of monoid can be given relative to any monoidal category (\mathbf{C}, \otimes, I). A monoid in \mathbf{C} is an object M equipped with a morphism \mu : M\otimes M\to M, called the multiplication, that is associative, and a morphism \eta: I \to M, called the unit, that is both a left and right identity for \mu.

For example, in the category of sets viewed as a monoidal category with respect to the cartesian product, monoids are precisely what we are used to call “monoids”: sets equipped with a binary associative operation together with a distinguished element that is both a left and right identity.

More interestingly, monoids in the category [\mathbf{C}, \mathbf{C}] are precisely monads on the category \mathbf{C} !

If M and N are monoids in \mathbf{C}, then a monoid morphism from M to N is a morphism f: M\to N in \mathbf{C} that preserves multiplication and unit in the obvious sense. Monoids in \mathbf{C} thus form a category, which we denote by \mathbf{Mon}(\mathbf{C}).

A suitable notion of morphism between monoidal categories is that of lax monoidal functor. If (\mathbf{C}, \otimes, I_{\mathbf{C}}) and (\mathbf{D}, \bullet, I_{\mathbf{D}}) are monoidal categories, then a lax monoidal functor from \mathbf{C} to \mathbf{D} consists of a functor F: \mathbf{C}\to \mathbf{D} together with a morphism \phi^0: I_{\mathbf{D}}\to FI_{\mathbf{C}} and a natural transformation \phi^2_{X, Y}: FX\bullet FY\to F(X\otimes Y) subject to certain coherence conditions. A lax monoidal functor (F, \phi^0, \phi^2) is called strong or simply a monoidal functor if \phi^0 and \phi^2 are isomorphisms.

Lax monoidal functors have the following remarkable property: a lax monoidal functor (F, \phi^0, \phi^2) : \mathbf{C} \to \mathbf{D} gives rise to a functor F_* : \mathbf{Mon}(\mathbf{C}) \to \mathbf{Mon}(\mathbf{D}). Namely, if (M, \mu, \eta) is a monoid in \mathbf{C}, then FM becomes a monoid in \mathbf{D} if we define the multiplication by the composite F\mu\circ \phi^2_{M, M} and the unit by the composite F\eta \circ \phi^0. If f : M\to N is a monoid morphism, then so is Ff, with respect to the multiplication and unit defined above.

Therefore, if we have a lax monoidal functor \Phi : \mathbf{D} \to [\mathbf{C}, \mathbf{C}] and a monoid M in \mathbf{D}, we can conclude that \Phi M is a monoid in the category [\mathbf{C}, \mathbf{C}], i.e., a monad.

Enough abstract nonsense. Here are two examples illustrating this observation.

The first is the writer monad. Let \mathbf{C} be a cartesian category. There is a functor \Phi from \mathbf{C} to the category [\mathbf{C}, \mathbf{C}] that maps an object X to the functor X\times -. This functor is clearly monoidal:

\displaystyle (\Phi M \circ \Phi N) X = M\times (N\times X) \simeq (M\times N)\times X = (\Phi (M\times N)) X

I omit the verification of the coherence conditions, which is straightforward. Therefore, if M is a monoid in \mathbf{C}, then \Phi M = M\times - is a monad. The underlying functor of this monad coincides with the underlying functor of the writer monad. One can check that the monad structure obtained by translating the monoid structure on M along the functor \Phi is the usual structure of the writer monad.

The second example is of the reader monad. Let \mathbf{C} be a cartesian closed category. There is a functor \Phi from the opposite \mathbf{C}^\text{op} of the category \mathbf{C} to [\mathbf{C}, \mathbf{C}] that maps an object X to the functor X\Rightarrow -. The domain of \Phi is \mathbf{C}^\text{op} because \Phi is a contravariant functor. It is obviously monoidal, where the monoidal structure on \mathbf{C}^\text{op} is given by the cartesian product on \mathbf{C} (more generally, the opposite of a monoidal category is naturally a monoidal category):

\displaystyle (\Phi X \circ \Phi Y) Z = X\Rightarrow (Y\Rightarrow Z) \simeq (X\times Y)\Rightarrow Z=(\Phi(X\times Y))Z

Again, I omit the verification of the coherence conditions. If follows that if E is a monoid in \mathbf{C}^\text{op}, then \Phi E = E\Rightarrow - is a monad. However, every object E of \mathbf{C}^\text{op} is naturally a monoid! Namely, the multiplication is a morphism E\times E\to E in \mathbf{C}^\text{op}, i.e., a morphism E\to E\times E in \mathbf{C}; we take it to be the diagonal morphism. Similarly, the unit is a morphism 1 \to E in \mathbf{C}^\text{op}, i.e., a morphism E\to 1 in \mathbf{C}, and there is only one such morphism. It is straightforward to check that these morphisms turn E into a monoid in \mathbf{C}^\text{op}. Hence \Phi E = E\Rightarrow - is a monad. Again, a simple computation confirms that its structure, obtained by translating the monoid structure of E along the functor \Phi, is the usual structure of the reader monad.

In both of these examples, the lax monoidal functor \Phi is actually strong. This need not be the case in general, and in a sequel to this post I hope to show an example of such a lax monoidal functor and monads its gives rise to.

MFPS slides

Unfortunately, I couldn’t attend the MFPS conference because of some silly visa issues. However, the organizers unanimously expressed the wish that my paper be presented, and Andrej Bauer kindly agreed to make a presentation provided that I prepare some slides. Andrej gave a talk on my paper yesterday, which I hear was great, and I’d like to thank him for that. I thought I’d make the slides available online, together with the notes, so here they are.

A Simply Typed Lambda-Calculus for Forward Automatic Differentiation

I am proud to announce that my first computer science paper has been accepted to MFPS XXVIII! In this post I would like to give a down-to-earth introduction to the subject of the paper and to explain the problem it is attempting to solve.

I am working on design and implementation of a functional programming language with support for automatic differentiation. Automatic differentiation (AD) is a powerful technique for computing derivatives of functions given by programs in programming languages. Let us contrast AD with two other techniques for programmatically computing derivatives of functions.

First, there is numerical differentiation, which approximates the derivative of a function f by the Newton’s difference quotient

\displaystyle f'(x)\approx\frac{f(x+h)-f(x)}{h}

for a small value of h. The choice of a suitable h is a non-trivial problem because of the intricacies of floating point arithmetic. If h is too small, you are going to subtract two nearly equal numbers, which may cause extreme loss of accuracy. In fact, due to rounding errors, the difference in the numerator is going to be zero if h is small enough. On the other hand, if h is not sufficiently small, then the difference quotient is a bad estimate on the derivative.

Second, you are probably familiar with symbolic differentiation, which works by applying the rules for computing derivatives you have learned in your calculus course (sum rule, product rule, chain rule) and by using a table of derivatives of elementary functions:

\displaystyle  \begin{array}{rcl} (f + g)'(x) & = & f'(x) + g'(x) \\ (f\,\cdot\, g)'(x) & = & f'(x)\cdot g(x) + f(x)\cdot g'(x)\\ (f\circ g)'(x) & = & f'(g(x))\cdot g'(x)\\ \exp'\,(x) & = & \exp\,(x)\\ \log'\,(x) & = & 1/x\\ \sin'\,(x) & = & \cos\,(x)\\ \cos'\,(x) & = & -\sin\,(x)\\ & \dots & \end{array}

Unlike numerical differentiation, symbolical differentiation is exact. However, there are also two main drawbacks to the symbolic approach to differentiation. First, symbolic differentiation requires access to the source code of the computation, and places restrictions on that source code. Second, more importantly, symbolic differentiation suffers from the loss of sharing. What does it mean? I find the following example illuminating. Consider the problem of computing the derivative of a product of n functions: f(x) = f_1(x) \cdot f_2(x) \cdot \ldots \cdot f_n(x). Applying the product rule, we arrive at the expression for the derivative, which has size quadratic in n:

\displaystyle  \displaystyle \begin{array}{rcl}f'(x) & = & f'_1(x)\cdot f_2(x)\cdot\ldots\cdot f_n(x)\\ & + & f_1(x)\cdot f'_2(x)\cdot\ldots\cdot f_n(x)\\ & + & \dots \\ & + & f_1(x)\cdot f_2(x)\cdot\ldots\cdot f'_n(x). \end{array}

Evaluating it naively would result in evaluating each f_i(x) n-1 times. A more sophisticated implementation may try, for example, to eliminate common subexpression before evaluation, but AD elegantly eliminates (hey, a pun!) the necessity of this step.

AD simultaneously manipulates values and derivatives, leading to more sharing of the different instances of the derivative of a subterm in the computation of the derivative of a bigger term. Unlike numerical and symbolic differentiation, AD is exact: there are no rounding errors, and in fact the answer produced by AD coincides with that produced by symbolic differentiation. Furthermore, AD is efficient: it offers strong complexity guarantees (in particular, evaluation of the derivative takes no more than a constant factor times as many operations as evaluation of the function).

AD comes in several variations: forward mode, reverse mode, as well as mixtures thereof. I will only focus on forward mode AD here.

Forward mode AD makes use of dual numbers. Let us recall the definition. It resembles that of complex numbers, with the difference that dual numbers are obtained from real numbers by adjoining a symbol \varepsilon with the property that \varepsilon^2=0. Thus, the set \mathbb{D} of dual numbers is the set of pairs of real numbers, each pair being written as a formal sum a+a'\varepsilon. The arithmetic operations on dual numbers are defined by the formulas

\displaystyle  \begin{array}{rcl} (a + a'\varepsilon) + (b + b'\varepsilon) & = & (a + b) + (a' + b')\varepsilon,\\ (a + a'\varepsilon)\times(b + b'\varepsilon) & = & (a \times b) + (a\times b' + a'\times b)\varepsilon \end{array}

(and similar formulas for subtraction and division).

We refer to the coefficients a and a' as to the primal value and perturbation respectively. A dual number a+a'\varepsilon can be thought as a small perturbation of the real number a.

Every differentiable function f: \mathbb{R}\to\mathbb{R} induces a function f_* : \mathbb{D}\to\mathbb{D}, called the pushforward of f and defined by the formula

\displaystyle f_*(x + x'\varepsilon) = f(x) + f'(x)x'\varepsilon,

which is essentially the formal Taylor series of f truncated at degree 1. It follows from the properties of differentiation that the pushforward transform commutes with sum, product, and composition, as well as with subtraction and division.

OK, so here is the idea of forward mode AD: overload arithmetic operations and elementary mathematical functions to operate not only on real numbers but also on dual numbers. In particular, the extension of each basis function from real numbers to dual numbers is given by the pushforward of that function:

data D = D Double Double

instance Num D where
    D x x' + D y y' = D (x + y) (x' + y')
    D x x' * D y y' = D (x * y) (x * y' + x' * y)
    negate (D x x') = D (negate x) (negate x')
instance Floating D where
    exp (D x x') = D (exp x) (exp x * x')
    log (D x x') = D (log x) (x' / x)
    sin (D x x') = D (sin x) (cos x * x')

Here in the example I am using Haskell because it supports overloading and because I am more familiar with it. But the same trick can be performed in any language supporting overloading (e.g., Scheme or C++).

What overloading achieves is that any function built out of the overloaded primitives contains information about its own derivative. Any such function is generic: it can accept arguments of both types, real numbers as well as dual numbers. Effectively, any function in the program can compute two mathematically different functions: the primal function when given an argument that is a real number, and its pushforward when given an argument that is a dual number. This follows from the properties of pushforward.

This allows us to compute the derivative of f at some point x by simply evaluating f at the point x+\varepsilon and extracting the coefficient in front of \varepsilon. For example, define

f :: Floating a => a -> a
f z = sqrt (3 * sin z)

and try it out in GHCi:

*Main> f (D 2 1)
D 1.6516332160855343 (-0.3779412091869595)

More formally, define the derivative operator D by D\, f\, x = E\, f(x+\varepsilon), where the accessor E is defined by E\, (x + x'\varepsilon) = x'.

Many applications of AD require that derivative operators nest properly. In other words, we would like to be able to differentiate functions that internally differentiate some other functions, and get correct results! For example, you may want to use AD for bi-level optimization, when the values of the function you are optimizing are found as optima of some other function.

Let us look at the example:

\displaystyle D\,(\lambda x.\, x\times (D\,(\lambda y.\, x\times y)\,2))\,1.

Note that the inner function that is being differentiated depends on x, which is free in the body of the function, but is bound by the enclosing lambda.

It is easy to compute the correct answer: the inner function being differentiated is linear in y, and its derivative is x at any point, therefore the outer function becomes x^2 and its derivative at 1 is 2.

On the other hand, what does our formalism produce? Let us compute. To visually distinguish between different invocations of E I am going to use braces of different shape:

\displaystyle  \begin{array}{cl} & D\,(\lambda x.\, x\times (D\,(\lambda y.\, x\times y)\,2))\,1\\ = & E\,[(\lambda x.\, x\times (D\, (\lambda y.\, x\times y)\,2)) (1+\varepsilon)]\\ = & E\,[(1+\varepsilon)\times (D\, (\lambda y.\, (1+\varepsilon)\times y)\, 2)]\\ = & E\,[(1+\varepsilon)\times E\,\{(\lambda y.\, (1+\varepsilon)\times y) (2+\varepsilon)\}]\\ = & E\,[(1+\varepsilon)\times E\,\{(1+\varepsilon)\times  (2+\varepsilon)\}]\\ = & E\,[(1+\varepsilon)\times E\,\{2 + 3\varepsilon\}]\\ = & E\,[(1+\varepsilon) \times 3]\\ = & 3 \ne 2 \end{array}

Uh-oh! What happened?

The root of this error, known as “perturbation confusion”, is our failure to distinguish between the perturbations introduced by the inner and outer invocations of D. There are ways to solve this problem, for instance by tagging perturbations with a fresh \varepsilon every time D is invoked and incurring the bookkeeping overhead of keeping track of which \varepsilon is associated with which invocation of D. Nonetheless, I hope the example serves to illustrate the value and nontriviality of a clear semantics for a \lambda-calculus with AD. In the paper I make some first steps towards tackling this problem.


Get every new post delivered to your Inbox.