Oleksandr Manzyuk's Blog

Musings of a mathematician and aspiring programmer

Month: September, 2011

Strawberry fields

TL;DR version: I present my solution in Haskell to one of the ITA’s hiring puzzles, discuss the choices I’ve made and some surprises I discovered while writing the program.

About two years ago, when I was excited about a new programming language I was learning then, Common Lisp, I thought it would be a great experience to intern at a company developing software in Common Lisp. That’s how I learned about ITA. Their website has a page listing their current hiring puzzles, and one of the puzzles caught my attention as particularly interesting. It is called “Strawberry Fields”.

Strawberries are growing in a rectangular field of length and width at most 50. You want to build greenhouses to enclose the strawberries. Greenhouses are rectangular, axis-aligned with the field (i.e., not diagonal), and may not overlap. The cost of each greenhouse is $10 plus $1 per unit of area covered.

Write a program that chooses the best number of greenhouses to build, and their locations, so as to enclose all the strawberries as cheaply as possible. Heuristic solutions that may not always produce the lowest possible cost will be accepted: seek a reasonable tradeoff of efficiency and optimality.

Your program must read a small integer 1 ≤ N ≤ 10 representing the maximum number of greenhouses to consider, and a matrix representation of the field, in which the ‘@’ symbol represents a strawberry. Output must be a copy of the original matrix with letters used to represent greenhouses, preceded by the covering’s cost. Here is an example input-output pair:

Input
4

..@@@@@...............
..@@@@@@........@@@...
.....@@@@@......@@@...
.......@@@@@@@@@@@@...
.........@@@@@........
.........@@@@@........

Output
90

..AAAAAAAA............
..AAAAAAAA....CCCCC...
..AAAAAAAA....CCCCC...
.......BBBBBBBCCCCC...
.......BBBBBBB........
.......BBBBBBB........

In this example, the solution cost of $90 is computed as (10+8*3) + (10+7*3) + (10+5*3).

Run your program on the 9 sample inputs found in this file and report the total cost of the 9 solutions found by your program, as well as each individual solution.

I must admit that I failed to solve this puzzle back then, but it stuck in my head. When Alexey mentioned that he had interned at ITA, I asked him if he knew about the “Strawberry Fields” puzzle. He didn’t, but he suggested an idea that was remarkably similar to the algorithm that I thought about two years ago and that I discarded because it was clear that for some examples it would produce covers that are provably not optimal. This time I decided to implement this algorithm, keeping in mind that a solution to the problem isn’t required to always produce an optimal cover. To my surprise, the algorithm produced pretty good covers for the 9 sample inputs. Also, in the case where it produced a solution that was obviously not optimal it was clear how to improve the solution by applying simple heuristics, which I chose not to implement, though.

However, the most fun part of solving the puzzle was optimizing the algorithm, gradually improving the program and turning it from a very slow program into a reasonably performing program. I wrote my solution in Haskell because I like Haskell and because I think it lends nicely to this problem. On the other hand, I am not an experienced Haskell programmer (yet), and I am not claiming that the program below is a good example of functional algorithm design. In fact, I am publishing my solution with the hope to attract some attention from more experienced Haskellers. If you have any remarks, suggestions, or critique, I’d be glad to hear from you.

Finally, if you are planning to apply to ITA, please do not copy my program. I challenge you to come up with a better solution!

Algorithm

Let us think how one could solve the problem ignoring the performance aspects for the time being. So, suppose we are given a field, i.e., a set of strawberries (each given by a pair of integer cooridnates), and an integer N, the maximum number of greenhouses to consider. One obvious way to compute an optimal cover would be the following. First, enumerate all partitions of the set of strawberries into a disjoint union of at most N non-empty subsets. Second, given such a partition, for each element of the partition, compute its minimum bounding greenhouse. If these greenhouses are pairwise disjoint, they constitute an acceptable cover. Compute its cost. Finally, pick a cover with the lowest cost.

What is the complexity of this solution? The number of partitions of an m-element set into n pairwise disjoint non-empty subsets is denoted S(m, n) and is called Stirling number of the second kind. It can be computed for example as follows (see MathWorld Wolfram for more details):

\displaystyle  \displaystyle S(m, n) = \frac{1}{n!}\sum_{i=0}^n (-1)^i \binom{m}{i} (n-i)^m.

It follows from this formula that for a fixed n, S(m, n) grows exponentially with m. Therefore, enumerating all partitions is not doable even for relatively small sizes of the field. On the other hand, for many of these partitions, the corresponding minimum bounding greenhouses will overlap. Can we avoid such partitions and somehow ensure that the cover associated with a partition is disjoint?

One way to achieve this is to restrict our attention to the partitions that can be obtained by a sequence of horizontal and vertical cuts. Given a set of strawberries with the coordinates (x_1, y_1), (x_2, y_2), …, (x_m, y_m), and a split point x, the horizontal cut at x is defined as the pair of sets \{(x_i, y_i) \mid x_i \le x\} and \{(x_i, y_i) \mid x_i > x\} (here and thereafter the x-axis is vertical and points downwards, and the y-axis is horizontal and is oriented from left to right). The vertical cut is defined similarly. It is clear that for any partition obtained by successively subdividing the field horizontally or vertically, the associated cover consisting of the minimum bounding greenhouses of the partition’s elements is disjoint.

The program arranges these ideas slightly differently. First, it is clearly sufficient to find for each 1 \le i \le n an optimal cover consisting of exactly i greenhouses. Given a field F and an exact number n of greenhouses to consider, the algorithm splits F vertically and horizontally at all possible split points, producing pairs of fields F_1 and F_2. It also generates all partitions n=n_1+n_2 of n into a sum of two positive integers. For each combination of the subfields F_1, F_2 and numbers n_1, n_2 the algorithm recursively tries to optimally cover F_i by exactly n_i greenhouses. If this is possible, the covers of F_1 and F_2 are combined into a cover of F. All such covers of F are accumulated into a list, and finally a cover with the lowest cost is chosen. Clearly, there are many ways to get from a field F to its subfield F' by a sequence of cuts. The algorithm uses memoization to avoid redundant recomputation of covers of the subfields that have already been processed.

Implementation

Here is the program. It’s been through many iterations, and the version presented below is the one that performs best. On my machine it takes less than 43 seconds to process all 9 examples and uses less than 350 Mb of memory. (The said machine has 8 cores, but this doesn’t matter here because the program uses only one core.) The full source is available here. You can look at the output of the program here.

import Data.Array
import Data.Char
import Data.List
import Data.Maybe
import Data.Ord

import Control.Applicative

import System.Environment
import System.IO.Unsafe

import Data.Hashable
import qualified Data.HashTable.IO as H

I am using “ugly memoization” using unsafePerformIO as described in Lennart Augustsson’s post. To further improve performance I am using a mutable hash table from the hashtables package instead of a Map stored in an IORef. Early in the development process I tried Conal Eliott’s MemoTrie package, but I was dissatisfied by its performance. The program below modified to use MemoTrie combinators instead of “ugly memoization” requires almost 5 minutes to process all examples and uses about 2.5 Gb of memory.

memoIO :: (Eq a, Hashable a) => (a -> b) -> IO (a -> IO b)
memoIO f = do
  t <- H.new :: IO (H.CuckooHashTable a b)
  let f' x = do v <- H.lookup t x
                case v of
                  Nothing -> do let r = f x
                                H.insert t x r
                                return r
                  Just r  -> return r
  return f'

memo :: (Eq a, Hashable a) => (a -> b) -> (a -> b)
memo f = let f' = unsafePerformIO (memoIO f)
         in \x -> unsafePerformIO (f' x)

A strawberry is represented by a pair of integers, its coordinates. Initially I was using ordinary pairs (so that Strawberry was simply a type synonym for (Int, Int)), however because lists of strawberries are used as keys in a hash table, I decided to experiment with a custom hash function. Because both coordinates of each strawberry lie between 0 and 50, there can only be finitely many strawberry objects in the program, and it is possible to preallocate a pool of strawberries, giving each strawberry a unique ID, which can then be used as a perfect hash function. Furthermore, instead of allocating a new strawberry every time I need one I can get it from the pool. This helps to reduce consing and to improve memory usage. This aspect is not as important in the present version of the program, but it was important when I was trying out an optimization I am going to explain in more detail below (see section “Surprises”).

data Strawberry = Strawberry { strawberryID :: !Int
                             , strawberryX  :: !Int
                             , strawberryY  :: !Int }

instance Eq Strawberry where
    Strawberry id1 _ _ == Strawberry id2 _ _ = id1 == id2

instance Hashable Strawberry where
    hash = strawberryID

strawberries :: Array (Int, Int) Strawberry
strawberries = array ((0, 0), (50, 50))
               [((i, j), Strawberry (51 * i + j) i j)
                    | i <- [0..50]
                    , j <- [0..50]]

mkStrawberry :: Int -> Int -> Strawberry
mkStrawberry i j = strawberries ! (i, j)

A greenhouse is represented as a quadruple of integers, the coordinates of the top left and bottom right corners:

data Greenhouse = Greenhouse !Int !Int !Int !Int deriving Eq

A field is a list of strawberries, and a cover is a list of greenhouses. We also store the cover’s cost in the cover data structure to avoid its recomputation.

type Field = [Strawberry]
data Cover = Cover [Greenhouse] Cost

type Cost = Int

cost :: Cover -> Cost
cost (Cover _ p) = p
{-# INLINE cost #-}

The function merge combines disjoint covers:

merge :: Cover -> Cover -> Cover
merge (Cover gs1 p1) (Cover gs2 p2)
    = Cover (gs1 ++ gs2) (p1 + p2)
{-# INLINE merge #-}

The area of a greenhouse is the product of its dimensions:

type Area = Int

area :: Greenhouse -> Area
area (Greenhouse xmin ymin xmax ymax)
    = (xmax - xmin + 1) * (ymax - ymin + 1)
{-# INLINE area #-}

The function boundingGreenhouse computes the minimum bounding greenhouse of a field:

boundingGreenhouse :: Field -> Greenhouse
boundingGreenhouse ((Strawberry _ x1 y1):ss)
    = foldl extend (Greenhouse x1 y1 x1 y1) ss
    where
      extend (Greenhouse xmin ymin xmax ymax) (Strawberry _ x y)
          = Greenhouse (min xmin x) (min ymin y)
                       (max xmax x) (max ymax y)

The function cover' tries to optimally cover a field with exactly n greenhouses. The arguments are tupled to simplify memoization. The return type is Maybe Cover. If there are less strawberries than greenhouses, then some greenhouse is going to be empty. We would like to avoid such covers (because they mean that a field can be more optimally covered by a smaller number of greenhouses). That’s why we return Nothing in this case. If n is 1, then there is always a cover consisting of the minimum bounding greenhouse. Otherwise, we find optimal covers obtained by splitting the field once vertically or horizontally, and return a cover with the lower cost.

cover' :: (Int, Field) -> Maybe Cover
cover' (n, field) | n > length field = Nothing
cover' (1, field) = Just (Cover [g] p)
    where
      g = boundingGreenhouse field
      p = area g + 10
cover' (n, field) = minimumByCost maybe_covers
    where
      maybe_cover1 = coverSplit strawberryX n field
      maybe_cover2 = coverSplit strawberryY n field
      maybe_covers = [maybe_cover1, maybe_cover2]

The function minimumByCost takes a list of Maybe Cover s and returns Just a cover with the lowest cost or Nothing if the list contains only Nothing s:

minimumByCost :: [Maybe Cover] -> Maybe Cover
minimumByCost maybe_covers
    | null covers
    = Nothing
    | otherwise
    = Just $ minimumBy (comparing cost) covers
    where
      covers = catMaybes maybe_covers

The function splitField splits a field at a given split point along the axis specified by the coordinate function coord:

splitField :: (Strawberry -> Int)
           -> Int -> Field -> (Field, Field)
splitField coord point = partition ((<= point) . coord)

The function coverSplit splits a field along the axis given by the coordinate function coord at all possible places, tries to cover the obtained subfields, and combines the covers (if any):

coverSplit :: (Strawberry -> Int) -> Int -> Field -> Maybe Cover
coverSplit coord n field = minimumByCost maybe_covers
    where
      split_points = init . sort . nub . map coord $ field
      maybe_covers
          = [liftA2 merge (memoCover' (i,   field1))
                          (memoCover' (n-i, field2))
                 | i <- [1..n-1], point <- split_points
                 , let (field1, field2)
                           = splitField coord point field]

memoCover' = memo cover'

This memoCover' is a memoized version of cover'. Finally, the function cover takes a field and a maximum number n of greenhouses to consider, tries to cover the field with exactly i greenhouses for each i from 1 to n, and picks a cover with the lowest cost:

cover :: Int -> Field -> Cover
cover n field = fromJust $ minimumByCost
                [memoCover' (i, field) | i <- [1..n]]

The rest of the program does parsing of examples from the input file and printing the solutions to the standard output. If you are interested in how these are done, take a look at the full source of the program here.

I compile the program with -O2 flag. Adding the -funbox-strict-fields allows to squeeze out a little more performance (with it the program runs about 5% faster).

Surprises

While writing this program I encountered a few “surprises”: changes that I was sure would improve performance, but that actually dramatically slowed the program down.

One such “optimization” was based on the following observation: if one field is obtained from another by translation, then an optimal cover of the former can be obtained by translating an optimal cover of the latter. This should be faster then doing the recursive crunch. That’s why initially before calling memoCover' I was shifting the field towards the origin, to make it touch the axis, and then shifting the cover returned by memoCover' back. Because I was manufacturing lots of new fields, and in particular lots of freshly allocated strawberries, the program was using a lot of memory (and as a consequence was spending a lot of time in GC, which I confirmed by profiling the program). That was the reason why I switched to the preallocated pool of strawberries I mentioned above. This helped to reduce memory usage by about 40%, and sped up the program by about 5%. I was happy about this improvement and didn’t even think that shifting fields could not be such a good idea. However, today I tried to remove this “optimization”, and suddenly the program ran almost twice faster (with the “optimization” it required 72 seconds, whereas without the optimization it ran in only 43 seconds). Apparently, the benefits of the optimization are outweighed by excessive consing caused by shifting fields, which in turn is stressing the GC.

The second surprise was awaiting me when I tried to replace the composition init . sort . nub . map coord $ field with a more efficient algorithm for computing the list of split points:

splitPoints :: (Strawberry -> Int) -> Field -> [Int]
splitPoints coord field = tail (indices checklist)
    where
      checklist = accumArray (||) False (0, 50)
                  [(coord strawberry, True) | strawberry <- field]

This algorithm is based on the idea borrowed from Richard Bird’s “Pearls of Functional Algorithm Design”. It computes the split points in linear time by setting up a bit vector, whose i-th entry is True if and only if i occurs as a coord-coordinate of some strawberry (this only works because we know that each coordinate lies between 0 and 50). Replacing init . sort . nub . map coord $ field, which is quadratic in the length of the field, with a linear algorithm should not worsen the performance, right? Well, I don’t know exactly why, but at least at the provided set of examples the program with splitPoints ran 3 times slower (126 seconds) than the other, obviously inefficient version. I can only guess that GHC does pretty good job at fusing the loops in init . sort . nub . map coord $ field (and worse job at optimizing arrays), and that even though splitPoints requires linear time, the constant factor screws us here.

Update: As Alexey has pointed out, one possible reason why the version with splitPoints loses can be the function indices, which traverses the entire array (51 elements) every time regardless of the size of the field (which is often smaller).

He has also suggested another clever way to compute the list of split points: sort the list of coordinates first; then removing duplicates from it can be done by traversing the list once, in linear time. I have implemented this strategy:

splitField :: (Strawberry -> Int)
           -> Int -> Field -> (Field, Field)
splitField coord point = partition ((< point) . coord)

splitPoints :: (Strawberry -> Int) -> Field -> [Int]
splitPoints coord = tail . nubSorted . sort . map coord

nubSorted :: Eq a => [a] -> [a]
nubSorted = foldr f []
    where
      f x [] = [x]
      f x ys@(y:_) | x == y = ys
                   | x /= y = x:ys

coverSplit :: (Strawberry -> Int) -> Int -> Field -> Maybe Cover
coverSplit coord n field = minimumByCost maybe_covers
    where
      split_points = splitPoints coord field
      maybe_covers
          = [liftA2 merge (memoCover' (i,   field1))
                          (memoCover' (n-i, field2))
                 | i <- [1..n-1], point <- split_points
                 , let (field1, field2)
                           = splitField coord point field]

The functions splitField and coverSplit required small tweaks. This version does speed up the program, even though the improvement is smaller than I expected (less than 2%). Nonetheless, I think this version is cleaner.

Advertisements

Using Google Translate from Emacs

Update (Jan 6, 2012): I have added an option to use ido-style completion instead of the vanilla Emacs completion mechanism. I like the former a little more, as it offers a snippet of the list of available options and allows fuzzy (flexible) matching. To enable it, set the variable google-translate-enable-ido-completion to t. In particular, “Detect language” appears as the first option in the Translate from: prompt, and you can select it simply by pressing RET.

Update (Jan 3, 2012): I have recently reimplemented the whole thing in Emacs Lisp, and have thus eliminated the dependence on Ruby. The result is google-translate, an Emacs interface to Google Translate. It is more flexible than the above Ruby script. In particular, it defines a function google-translate-query-translate, which I bind to C-c t, which queries for the source and target languages and text to translate, and shows a buffer with available translations of the text. If you specify a default for the source (resp. target) language by customizing the variable google-translate-default-source-language (resp. google-translate-default-target-language), that part won’t be queried. However, even if any defaults are set, they can always be overridden by supplying a C-u prefix argument to the function google-translate-query-translate.

For example, I set google-translate-default-source-language to "en" and google-translate-default-target-language to "ru". When I press C-c t, my prompt looks simply as Translate from English to Russian:, and I can quickly enter the text to translate. If I want to translate something from Russian to English, I press C-u C-c t. This time the prompt looks as Translate from:. I enter (with completion) Russian and press RET. The prompt changes to Translate from Russian to:. I enter (again, with completion, and in fact, typing only the first two characters) English and press RET. The prompt changes to Translate from Russian to English:, at which point I enter the text to translate. Pressing RET when queried for the source language (leaving it blank) allows me to let Google Translate detect the source language for me. If you want the source language to always be detected by Google Translate, set the variable google-translate-default-source-language to "auto".

The code is available at GitHub, together with installation and customization instructions.


I am not a native English speaker, and from time to time I need to look up a word in a dictionary. Google Translate makes this easy, but it is somewhat awkward to use because of high opportunity cost: you need to load the web interface first (and probably open a new tab in your browser before that), and regardless of how you do that (by clicking on a bookmark, pressing Ctrl-L to focus the location bar and typing translate.google.com, or googling “google translate”), it takes a few precious seconds before you can actually type in your word. A not so long time ago the interface of Google Translate had an obnoxious “bug”: after loading the page, the text area was not focused, and you had to click on it. Argh! Fortunately, they have fixed that.

I am also an Emacs user, and I spend most of my time in Emacs. Having to switch between Emacs and the browser to only look up a word is particularly annoying. That’s why I wanted to have access to Google Translate directly from Emacs. Here is how I do that.

I’ve written the following Ruby script, which I call translate.rb, and which is symlinked as ~/bin/translate:

#!/usr/bin/env ruby

# Usage: ruby translate.rb text
#
# Translate text from English to Russian using Google Translate.

require 'open-uri'
require 'json'

# Retrieve contents of URL as UTF-8 encoded string.  Google Translate
# won't allow us to make a request unless we send a "User-Agent"
# header it recognizes, e.g. "Mozilla/4.0".
def page(url)
  open(url, "User-Agent" => "Mozilla/4.0") do |response|
    response.read.encode("UTF-8")
  end
end

def translate(text, sl="en", tl="ru")
  baseURL     = "http://translate.google.com/translate_a/t"
  parameters  = [["client", "t" ],
                 ["text"  , text],
                 ["sl"    , sl  ],
                 ["tl"    , tl  ]]
  requestURL  = baseURL \
              + "?"     \
              + parameters.map do |p|
                  "#{p[0]}=#{URI.escape(p[1])}"
                end.join("&")
  # Deal with invalid (obfuscated?) JSON that Google sends us back.
  contents    = page(requestURL).gsub(/,(?=[\],])/, ",null")
  json        = JSON.parse(contents)
  dictionary  = json[1]
  if dictionary
    dictionary.map do |item|
      index = 0
      item[1].map do |translation|
        sprintf("%2d. %s", index+=1, translation)
      end.unshift(item[0]).join("\n")
    end.join("\n\n")
  else
    json[0][0][0]
  end
end

if ARGV.size == 0
  ARGF.each {|line| translate(line)}
elsif ARGV.size == 1
  puts translate(ARGV[0])
else
  puts "Usage: translate TEXT"
end

A few words about the script. It makes a GET request to http://translate.google.com/translate_a/t, passing the text to be translated, and the source and target languages as query parameters. For some reason it is also necessary to set the client parameter. Also, Google Translate won’t accept requests unless they come from user agents it recognizes. We circumvent this by passing “Mozilla/4.0” as the “User-Agent” header. Google Translate sends back something that looks almost like a JSON, except that it is not a valid (but is rather deliberately obfuscated) JSON: it contains substrings ",,". This issue is also easily fixed by replacing each comma followed by another comma by ",null". After that we have a valid JSON that can be parsed by the standard JSON parser that comes with Ruby. It remains to extract and display the interesting parts from the parsed object.

To make this script as unobtrusive to use as possible, I’ve written the following Emacs Lisp function:

(defun google-translate (text)
  (interactive
   (list
    (read-from-minibuffer "Translate: ")))
  (with-output-to-temp-buffer "*Google Translate*"
    (set-buffer "*Google Translate*")
    (insert (format "%s" text))
    (facemenu-set-face 'bold (point-min) (point-max))
    (insert (format "\n\n%s"
                    (shell-command-to-string
                     (format "translate \"%s\"" text))))))

I bind it to C-c t. It prompts for a word and displays its translation in the *Google Translate* buffer, which is put into the help-mode; in particular, pressing q dismisses it.

Of course, this solution is not perfect. For example, the script performs no error handling. However, it’s been serving me well enough over a few last months that I don’t feel an urge to fix it. Note also that I am using a fixed pair of languages (English and Russian); change it to whatever pair of languages you want to translate between. With a little more work one can make the script accept the source and target languages as command-line arguments. I leave this as an exercise for the interested reader.

On monad composition

This blog post is an HTML translation of a note I wrote for myself some time ago. I am publishing it here with the hope that somebody may find it useful.

King and Wadler claimed in [3] that the list monad can be composed with any monad. As observed in [2] and more recently in [4], the definitions given in [3] are not correct: e.g., they do not work if one tries to compose the list monad with itself. We show how these definitions can be falsified using QuickCheck.

The notion of distributive law was introduced by Beck [1] in order to classify monad compositions. Roughly speaking, if K and H are monad on a category \mathcal{C}, then a distributivity law of K over H is a natural transformation \lambda : HK\to KH subject to some axioms. Given such a distributivity law, the composition (of functors) KH can be equipped with the structure of a monad. Composing (or, more generally, combining) monads plays an important role in programming languages theory, because it allows us to combine various types of computations. For example, the list monad L can be used to model a form of nondeterminism, where instead of returning one value, a list of values is returned, and the empty list indicates failure. Therefore, composing the list monad with another monad M amounts in practice to adding nondeterminism to the class of computations modeled by M. In [3], King and Wadler claimed that the list monad can be composed with any monad. Unfortunately, the definitions given in [3] are not correct, as observed, e.g., by Jones and Duponcheel [2] and more recently by Manes and Mulry [4]. We show how this error could have been spotted by translating these definitions into Haskell and QuickCheck’ing them. This blog post is a literate Haskell program.

First, we need a few imports:

> import Control.Monad
> import Test.QuickCheck

Given any monad m and type a, King and Wadler observe that m [a] becomes a monoid with respect to the following operation:

> (<>) :: Monad m => m [a] -> m [a] -> m [a]
> a <> b = do  x <- a
>              y <- b
>              return (x ++ y)

Equivalently, (<>) = liftM2 (++). The identity element for (<>) is

> e :: Monad m => m [a]
> e = return []

We can also multiply any list of elements of type m [a]:

> prod :: Monad m => [m [a]] -> m [a]
> prod = foldl (<>) e

King and Wadler then define a function

> joinML :: Monad m => m [m [a]] -> m [a]
> joinML = join . liftM prod

and claim that together with

> unitML :: Monad m => a -> m [a]
> unitML = return . return

and

> mapML :: Monad m => (a -> b) -> m [a] -> m [b]
> mapML = liftM . map

it makes the composition of m and [] into a monad. However, we can easily check that the associativity does not hold for joinML if m is specialized to []. Namely, if we define a QuickCheck property

> associativity :: [[[[[[Int]]]]]] -> Bool
> associativity t
>     = (joinML . joinML $ t) == (joinML . mapML joinML $ t)

then testing it with QuickCheck yields

*Main> quickCheck associativity
*** Failed! Falsifiable (after 5 tests and 14 shrinks):
[[[[[[],[]]]],[[[[0]]],[]]]]

In fact, King and Wadler make even a stronger claim, namely that any monad m admits a distributivity law over the list monad and it is given by

> cp :: Monad m => [m a] -> m [a]
> cp = prod . map (liftM return)

However, it is easy to check that one of the distributive law axioms fails, namely the one called (cp-4) in [3] and (DL B) in [4]. Again, it is easily QuickCheck’ed:

> cp4 :: [[[Int]]] -> Bool
> cp4 xsss
>     = (cp . map concat $ xsss) == (concat . map cp . cp $ xsss)
*Main> quickCheck cp4
*** Failed! Falsifiable (after 5 tests and 10 shrinks):
[[[0,0]],[[1],[2]]]

It is shown in [4] that a distributivity law of m over the list monad exists provided that m is a commutative monad (e.g., m is a reader monad, or the writer monad associated with a commutative monoid). Let us describe this distributivity law. The list monad is a linear quotient of the monad of binary trees:

> data Tree a  =  Empty
>              |  Leaf a
>              |  Branch (Tree a) (Tree a)

> tau                 ::  Tree a -> [a]
> tau Empty           =   []
> tau (Leaf x)        =   [x]
> tau (Branch t1 t2)  =   (tau t1) ++ (tau t2)

The distributivity law lambda :: Monad m => [m a] -> m [a] of m over the list monad is then uniquely determined by the equation psi == lambda . tau (see [4], p. 204), where psi is defined as follows:

> psi                :: Monad m => Tree (m a) -> m [a]
> psi Empty          =  e
> psi (Leaf x)       =  liftM return x
> psi (Branch t1 t2) =  (psi t1) <> (psi t2)

It is easy to check that the function cp satisfies the equation psi == cp . tau. Indeed, we need to show that psi t == cp . tau $ t for any tree t :: Tree (m a). The proof is by induction on the structure of t. If t == Empty, we obtain:

   cp . flatten $ Empty
== {- definition of tau -}
   cp []
== {- definition of map -}
   prod []
== {- definition of foldl -}
   e
== {- definition of psi -}
   psi Empty

If t == Leaf x, then

   cp . flatten $ Leaf x
== {- definition of tau -}
   cp [x]
== {- property of map -}
   prod [liftM return x]
== {- property of foldl -}
   liftM return x
== {- definition of psi -}
   psi (Leaf x)

Finally, if t == Branch t1 t2, then

   cp . flatten $ Branch t1 t2
== {- definition of tau -}
   cp $ (tau t1) ++ (tau t2)
== {- definition of cp -}
   prod . map (liftM return) $ (tau t1) ++ (tau t2)
== {- map is a homomorphism of monoids-}
   prod $ (map (liftM return) (tau t1))
          ++ (map (liftM return) (tau t2))
== {- (<>) is associative and
      foldl is a homomorphism of monoids -}
   (prod . map (liftM return) $ tau t1)
   <> (prod . map (liftM return) $ tau t2)
== {- induction hypothesis -}
   (psi t1) <> (psi t2)
== {- definition of psi -}
   psi (Branch t1 t2)

Therefore, the distributivity law whose existence is proven in [4] coincides with cp. Where does the commutativity of the monad m show up? The proof of the distributivity law axioms for cp relies on the following property of prod (stated as (prod-4) in [3]):

prod . map (join . liftM prod) == join . lift prod . prod

which does not hold in general (and can be falsified using QuickCheck likewise cp). The left hand side of the equation applied to a list [a1, a2, ..., an] :: [m [m [a]]] can be seen to be equal to

prod . map (join . liftM prod) $ [a1, a2, ..., an]
    == do  x1 <- a1
           z1 <- prod x1
           x2 <- a2
           z2 <- prod x2
           ...
           xn <- an
           zn <- prod xn
           return (z1 ++ z2 ++ ... ++ zn)

whereas the right hand side applied to [a1, a2, ..., an] can be identified with

join . liftM prod . prod $ [a1, a2, ..., an]
    == do  x1 <- a1
           x2 <- a2
           ...
           xn <- an
           z1 <- prod x1
           z2 <- prod x2
           ...
           zn <- prod xn
           return (z1 ++ z2 ++ ... ++ zn)

If m is commutative, then the order of the statements in the do block does not matter, and the obtained expressions are equal.

The existence of a distributivity law for a general m is a subtle question. In fact, it is an open problem whether the list monad admits a distributivity law over itself. It is worth pointing out that the monad of nonempty lists does admit a distributive law over itself (see [4], Example 5.1.10).

References

[1] Jon Beck, Distributive laws, In Sem. on Triples and Categorical Homology Theory (ETH, Zürich, 1966/67), pp. 119-140. Springer, Berlin, 1969.

[2] Mark P. Jones, Luc Duponcheel, Composing monads, 1993; available here.

[3] David King and Philip Wadler, Combining Monads, Mathematical Structures in Computer Science, 1992, pp. 61-78.

[4] Ernie Manes and Philip Mulry, Monad Composition I: General Constructions and Recursive Distributive Laws, Theory and Applications of Categories, Vol. 18, No. 7, 2007, pp. 172-208.

A quiz

Alexey and I have recently discussed the following quiz:

You are given an array of length n>0 of pairwise independent random real numbers uniformly distributed on [0,1]. Suppose you are searching for the maximal element in the array by traversing it from left to right and updating a temporary variable holding the maximal value found so far (the variable is initialized with the 0th element of the array before the loop). How many updates will you be making on average?

Frankly, my own intuition for these kinds of things is pretty weak, which is why I felt challenged and sat down to solve the quiz. Below is what I ended up with. This is how not to solve it.

First, let us formalize the problem. Let T(n) be the answer for an array of length n. Let us figure out what exactly the “on average” part means here. Let u(x) denote the number of updates required by an array x of length n. If the set X of all possible arrays were finite, say X=\{x_1, x_2, \dots, x_N\}, we would count the number of updates for each array, and then T(n) would be the average of these numbers:

\displaystyle \displaystyle T(n) = \frac{1}{N}\sum_{p=1}^N u(x_p).

Because u(x) can have as its values only integers between 0 and n-1, we can rearrange the terms in the above formula and write it as follows:

\displaystyle  \displaystyle T(n) = \sum_{i=1}^{n-1}i \cdot \frac{\#\{x\in X \mid u(x) = i\}}{\# X}.

Here \# S denotes the cardinality of a set S. The quotient

\displaystyle \displaystyle \frac{\#\{x\in X \mid u(x) = i\}}{\# X}

can be interpreted as the probability that an array picked at random from the set X requires i updates, provided all arrays in X are equally likely. In other words, T(n) becomes the expectation of u. Unfortunately, the set X is infinite; in fact, it can be identified with the n-dimensional cube [0, 1]^n. And yet the logic remains the same. Because each coordinate is uniformly distributed over the interval [0, 1], the distribution of arrays in X is also uniform. That is, the probability that an array picked at random from the set X is contained in a subset A\subset X is the n-dimensional volume \mathrm{vol}_n(A) of A. In the fancy language of probability theory, the n-dimensional volume \mathrm{vol}_n is the product of n copies of the Lebesgue measure on [0,1] corresponding to the uniform distribution of each of the n coordinates. The measure \mathrm{vol}_n is a probability measure on the set X, i.e., \mathrm{vol}_n(X)=1, and hence the pair (X,\mathrm{vol}_n) is a probability space. Furthermore, u is a function X\to\mathbb{N}, i.e., a discrete random variable, and T(n) is the expectation of u:

\displaystyle \displaystyle T(n) = \sum_{i=1}^{n-1}i\cdot\mathrm{vol}_n(u^{-1}(i)).

Let us describe the set u^{-1}(i)=\{x\in X \mid u(x)=i\}. An array x requires i updates if and only if there exists a sequence of indexes 1 \le k_1 < k_2 < \dots < k_i \le n-1 where the updates happen such that x(0) < x(k_1) < x(k_2) < \dots < x(k_i) and for each 0 \le j \le i and index k_j < k < k_{j+1} holds x(k_j) \ge x(k); here we put k_0=0 and k_{i+1}=n for uniformity. If the indexes k_1 < k_2 < \dots < k_i are fixed, the volume of the set of arrays x satisfying these inequalities is given by the integral

\displaystyle \displaystyle\int_0^1\int_0^{t_i}\dots\int_0^{t_1}t_0^{k_1-1}t_1^{k_2-k_1-1} \dots t_{i-1}^{k_i-k_{i-1}-1}t_i^{n-k_i-1}dt_0dt_1\dots dt_i,

which is fairly easy to compute and which is equal to

\displaystyle \displaystyle\frac{1}{k_1\cdot k_2\cdot\ldots\cdot k_i\cdot n}.

The volume of the preimage u^{-1}(i) is equal to the sum of these integrals over all possible sequences of indexes:

\displaystyle \displaystyle\mathrm{vol}_n(u^{-1}(i))=\frac{1}{n}\sum_{1\le k_1 < k_2 < \dots < k_i \le n-1}\frac{1}{k_1\cdot k_2\cdot\ldots\cdot k_i}.

The average number of updates is then given by the formula

\displaystyle T(n) = \displaystyle \frac{1}{n}\sum_{i=1}^{n-1}\sum_{1\le k_1 < k_2 < \dots < k_i \le n-1}\frac{i}{k_1\cdot k_2\cdot\ldots\cdot k_i}.

Clearly, this cannot possibly be an answer to a job interview quiz. It is too complicated. I wasn’t pleased with my solution, so I emailed the quiz to Alexey to see if he could come up with anything better. And indeed, his solution turned out to be very slick: The probability that the largest item in the array is the last is 1/n. Whether it is or no, the [0..n-1] prefix is isomorphic to the original problem. Thus we have to do T(n-1) work, and with probability 1/n we have one more update after that:

\displaystyle T(n) = 1/n + T(n-1).

An array of length 1 doesn’t require updates, so that T(1) = 0. Together these equations imply that T(n) = 1/2 + 1/3 + \dots + 1/n, the n-th harmonic number less 1. In particular, this immediately suggests that the number of updates grows logarithmically in n, which is very hard to see from my formula.

By the way, is my answer actually correct? At first, I didn’t think so. How can something as hairy as

\displaystyle \displaystyle\frac{1}{n}\sum_{i=1}^{n-1}\sum_{1 \le k_1 < k_2 < ... < k_i \le n-1}\frac{i}{k_1\cdot k_2\cdot\ldots\cdot k_i}

possibly be equal to 1/2 + 1/3 + ... + 1/n? I tried to prove that by induction and didn’t succeed. However, this is indeed the case, and I have discovered an elegant algebraic proof, which I am pleased to present.

First, observe that the sum

\displaystyle \displaystyle \sum_{1 \le k_1 < k_2 < ... < k_i \le n-1}\frac{1}{k_1\cdot k_2\cdot\ldots\cdot k_i}

occurring in T(n) is the i-th elementary symmetric polynomial in 1, 1/2, …, 1/(n-1). Consider the polynomial

\displaystyle  \displaystyle \begin{array}{rcl}P(u) & = & \displaystyle \Bigl(u + 1\Bigr)\Bigl(\frac{u}{2}+ 1\Bigr)\dots\Bigl(\frac{u}{n-1} + 1\Bigr) \\ \\ & = & \displaystyle u^{n-1}\cdot\frac{1}{1\cdot 2\cdot\ldots\cdot(n-1)}\\ \\ & + & \displaystyle\dots \\ \\ & + & \displaystyle u^2\cdot\Bigl(\frac{1}{1\cdot 2} + \frac{1}{1\cdot 3} + \dots + \frac{1}{(n-1)(n-2)}\Bigr) \\ \\ & + & \displaystyle u\cdot\Bigl(1 + \frac{1}{2} + \dots + \frac{1}{n-1}\Bigr) \\ \\ & + & 1. \end{array}

The coefficients of P are precisely the elementary symmetric polynomials in 1, 1/2, \dots, 1/(n-1). The value of P at the point 1 is the sum of these polynomials, which is almost T(n), except that in T(n) these polynomials show up with coefficients 1, 2, \dots, n-1. That’s pretty easy to fix. The trick is to consider the derivative of P. On the one hand, it equals

\displaystyle  \displaystyle \begin{array}{rcl} P'(u) & = & \displaystyle u^{n-2}\cdot (n-1)\cdot\frac{1}{1\cdot 2\cdot \ldots\cdot (n-1)} \\ \\ & + & \displaystyle\dots \\ \\ & + & \displaystyle u \cdot 2\cdot\Bigl(\frac{1}{1\cdot 2} + \frac{1}{1\cdot 3} + \dots + \frac{1}{(n-1)(n-2)}\Bigr) \\ \\ & + & \displaystyle\Bigl(1 + \frac{1}{2} + \dots + \frac{1}{n-1}\Bigr) \end{array}

and therefore T(n) = P'(1)/n. On the other hand, using the product rule, we obtain

\displaystyle \displaystyle P'(u) = P(u)\Bigl(\frac{1}{u+1}+\frac{1}{u+2} + \dots + \frac{1}{u+n-1}\Bigr).

Furthermore, P(1) = 2 \cdot 3/2 \cdot 4/3 \cdot\ldots\cdot n/(n-1) = n, and hence T(n) = P'(1)/n = 1/2 + 1/3 + \dots + 1/n. Q. E. D.