Tag Archives: Program Derivation

Algebra of programming in Agda: dependent types for relational program derivation

S-C. Mu, H-S. Ko, and P. Jansson. In Journal of Functional Programming, Vol. 19(5), pp. 545-579. Sep. 2009 [PDF]

Relational program derivation is the technique of stepwise refining a relational specification to a program by algebraic rules. The program thus obtained is correct by construction. Meanwhile, dependent type theory is rich enough to express various correctness properties to be verified by the type checker.

We have developed a library, AoPA, to encode relational derivations in the dependently typed programming language Agda. A program is coupled with an algebraic derivation whose correctness is guaranteed by the type system.

Two non-trivial examples are presented: an optimisation problem, and a derivation of quicksort where well-founded recursion is used to model terminating hylomorphisms in a language with inductive types.

This article extends the paper we published in Mathematics of Program Construction 2008. Code accompanying the paper has been developed into an Agda library AoPA.

Tail-Recursive, Linear-Time Fibonacci

For two consecutive years I have been responsible for teaching the program derivation course in the FLOLAC summer school. It was an exam question last year to derive a linear-time solution computing the Fibonacci function, whose definition we all know very well:

fib 0 = 0
fib 1 = 1
fib (n+2) = fib (n+1) + fib n

I would like the students to see the derivation as an instance of tupling: to generalise the targeted function by having it return more results, hoping that the generalisation can actually be performed faster. For this problem, one may consider the generalisation:

fib2 n = (fib n, fib (n+1))

from which we may easily construct a recursive definition for fib2 by simply unfolding and folding the definitions. Once we finish, take fib = fst . fib2.

Tupling and Fold Fusion

As a practice, however, I instructed the students to see this as an instance of fold fusion with the fold on natural numbers:

foldN f e 0 = e
foldN f e (n+1) = f (foldN f e n)

The derivation goes like

=    { since id = foldN (1+) 0 }
    fib2 . foldN (1+) 0
=    { fold fusion, see below }
    foldN step (0,1)

To derive step through the fusion condition, we reason:

    fib2 (n+1)
=    { def. of fib2 }
    (fib (n+1), fib (n+2))
=    { def. of fib }
    (fib (n+1), fib (n+1) + fib n)
=    { let step (x,y) = (y,x+y) }
    step (fib n, fib (n+1))
=    { def. of fib2 }
    step (fib2 n)

Therefore, we have fib2 = foldN step (0,1) where step (x,y) = (y,x+y).

Oleg Kiselyov also used a similar definition of fib as a fold (on a list of length n) as an example in his library of fold transformers.

Accumulating Parameters

I was perhaps so happy with seeing fib as a fold-fusion that I forgot one may want a tail recursive definition. This is one of the exercises of the functional programming course (taught by Kung Chen) this year, where the students may just write down the (obvious) tail recursive, linear-time function computing Fibonacci. The hint was to “introduce two accumulating parameters.” Can I derive this definition?

When I taught the students about accumulating parameters, I also told them to “look for some generalisation by adding more parameters.” The canonical example is perhaps generalising quadratic time reverse to fastrev xs ys = reverse xs ++ ys. For fib, it appeared reasonable to generalise the base cases. I was later told by Oleg that this function is called gib:

gib 0 x y = x
gib 1 x y = y
gib (n+2) x y = gib (n+1) x y + gib n x y

From this definition, however, a linear-time implementation of gib does not follow immediately. We will need this lemma:

gib n (fib k) (fib (k+1)) = fib (n+k) (1)

which can be proved by induction on n. For an intuitive understanding, it says that gib allows us to compute Fibonacci not only from the base cases 0 and 1, but also from the middle: if we take fib k and fib (k+1) as base cases, we get fib (n+k).

The property (1) came from some slides Kung Chen showed me, originated from John Hughes. John Hughes actually suggested to start with (1) as the specification. Either way, I wonder how one could come up with (1) in the first place. Given the definition of gib, (1) can be proved by induction. The rest of the derivation is not too hard. This is the derived program. Details omitted.

gib 0 x y = x
gib (n+1) x y = gib n y (x+y)

Tail-Recursive Fibonacci

Meanwhile, Josh Ko gave me this generalisation:

ffib n x y = x × fib n + y × fib (n+1)

The motivation came from, perhaps, the observation that every fib n can always be expressed as an linear combination of some smaller fib i and fib j:

fib 2 = fib 1 + fib 0
fib 3 = fib 2 + fib 1 = 2 × fib 1 + fib 0
fib 4 = fib 3 + fib 2 = 2 × fib 2 + fib 1 = 3 × fib1 + 2 × fib 0

Given the definition, it is immediate that ffib 0 x y = y and:

ffib (n+1) x y
= x × fib (n+1) + y × fib (n+2)
= x × fib (n+1) + y × fib (n+1) + y × fib n
= (x+y) × fib (n+1) + y × fib n
= ffib n y (x+y)

Therefore we have derived:

ffib 0 x y = y
ffib (n+1) x y = ffib n y (x+y)

which is more or less what one would expect. It differs from the derived linear-time implementation of gib only on the base case.

Finally, since gib and ffib define similar functions, ffib satisfies a property similar to (1):

ffib n (fib k) (fib (k+1)) = fib (n+k+1) (2)

The proof proceeds by induction on n. The base case is trivial. For the inductive case:

   ffib (n+1) (fib k) (fib (k+1))
= fib k × fib (n+1) + fib (k+1) ×fib (n+2)
= fib k × fib (n+1) + fib (k+1) × fib (n+1) + fib (k+1) × fib n
= (fib k + fib (k+1)) × fib (n+1) + fib (k+1) × fib n
= ffib n (fib (k+1)) (fib (k+2))
=   { induction }
   fib (n+k+2)

thus completes the proof.

Algebra of programming using dependent types

S-C. Mu, H-S. Ko, and P. Jansson. In Mathematics of Program Construction 2008, LNCS 5133, pp 268-283. July 2008. [PDF]

Dependent type theory is rich enough to express that a program satisfies an input/output relational specification, but it could be hard to construct the proof term. On the other hand, squiggolists know very well how to show that one relation is included in another by algebraic reasoning. We demonstrate how to encode functional and relational derivations in a dependently typed programming language. A program is coupled with an algebraic derivation from a specification, whose correctness is guaranteed by the type system.

Code accompanying the paper has been developed into an Agda library AoPA.

A Simple Exercise using the Modular Law

Josh needs a property for which there is an obvious pointwise proof. He wonders whether it is possible to prove it in point-free style. I haven’t practised point-free proofs for a while, so I’d give it a try.

The actual property he needs is that FT . C ⊆ C . FT . C where C is a coreflexive, given T ⊆ ⦇S⦈ and F⦇S⦈ . C ⊆ C . F⦇S⦈ . C. It is sufficient to prove this generalised property:

S ⊆ C . S     ⇐     R ⊆ C . R   ∧   S ⊆ R

Intuitively speaking, if C covers the range of R and S ⊆ R, then C covers the range of S too.

Well, the proof goes:

    C . S
= { since C . C = C }
    C . C . S
{ since X ⊇ X ∩ Y }
     C . ((C . S) ∩ R)
{ modular law X ∩ (Y.Z) ⊆ Y . ((Y˘.X) ∩ Z), and C˘ = C }
     S ∩ (C . R)
{ given: R ⊆ C . R }
     S ∩ R
= { given: S ⊆ R }

It is hard to explain the intuition behind the use of modular law. If we take C as a predicate, the property can be written pointwisely as:

(b S a ⇒ C b ∧ b S a)     ⇐     (b R a ⇒ C b ∧ b R a)   ∧   b S a ⇒ b R a

where a and b are both universally quantified. We may simplify b S a ⇒ C b ∧ b S a to b S a ⇒ C b, but that makes it more difficult to see the connection. Essentially, I was mimicking the following obvious proof:

    C b ∧ b S a
{ strengthening }
    C b ∧ b S a ∧ b R a
{ since b R a ⇒ C b ∧ b R a }
    b S a ∧ b R a
{ since b S a ⇒ b R a }
     b S a

The strengthening step corresponds to the use of X ⊇ X ∩ Y, while the next two steps respectively correspond to the last two steps of the point-free proof. The modular law, implicit in the pointwise proof, was used to group C and R to the right places.

Well-Foundedness and Reductivity

Before learning dependent types, all I knew about program termination was from the work of Henk Doornbos and Roland Backhouse [DB95, DB96]. I am curious about the connection between their formulation of well-foundedness by relational calculation and the type theoretic view.

It is generally accepted that a relation _<_ is well-founded if, for all x, there is no infinite chain x > x₁ > x₂ > x₃ > …. . A related concept is inducvitity. We say that a relation admits induction of we can use it to perform induction. For example, the less-than ordering on natural number is inductive.

Most of the type theoretical paper I have seen equate well-foundedness to the concept of accessibilty, which I talked about in a previous post. For example, Bengt Nordström [Nor88] introduced the datatype Acc as a “constructively more useful description” of well-foundedness.

Doornbos and Backhouse generalised well-foundedness and other related notions to an arbitrary F-algebra. Well-foundedness, however, was generalised to “having a unique solution.” A relation R : FA ← A is F-well-founded if there is a unique solution for X = S . FX . R.


Inductivity, on the other hand, is generalised to F-reductivity. While F-reductivity is a generalisation of strong induction, Doornbos and Backhouse called it reductivity “in order to avoid confusion with existing notions of inductivity.” [DB95] (Nevertheless, an equivalent concept in Algebra of Programming modelled using membership relation rather than monotype factors, calls itself “inductivity”. Different usage of similar terminology can be rather confusing sometimes.)

A relation R : FA ← A is F-reductive if

μ(λP . R⧷FP) = id

where is the monotype factor defined by ran(R . A) ⊆ B ≡ A ⊆ R⧷B. The function λP . R⧷FP takes coreflexive (a relation that is a subset if id) P to coreflexive R⧷FP, and μ takes its least fixed-point.

Well, it is a definition that is very concise but also very hard to understand, unless you are one of the few who really like these sort of things. If we take P to be a predicate, R\FP expands to a predicate on x that is true if

∀y . y R x ⇒ FP y

Taking the fixed point means to look for the minimum P such that

∀x . (∀y . y R x ⇒ FP y) ⇒ P x

The expression now resembles strong induction we all know very well. Computationally, how do we take the fixed point? Of course, by defining a recursive datatype:

data Acc {a : Set}(R : a -> a -> Set) : a -> Set where
  acc : (x : a) -> (forall y -> y < x -> f (Acc R) y) -> Acc R x

where f : (a -> Set) -> (F a -> Set) is the arrow-operation of functor F. When we take F to be the identity functor, it is exactly the definition of accessibility in my previous post! Finally, the requirement that μ(λP . R⧷FP) = id means every element in a is accessible.

So, it seems that reductivity of Doornbos and Backhouse is in fact accessibility, which is often taken by type theorists to be an alternative definition of well-foundedness. Doornbos and Backhouse, however, showed that reductivity guarantees uniqueness solution of hylo equations (which is their definition of F-well-foundedness). The converse, however, is not true in general.

New Reductive Relations from Old

Doornbos and Backhouse also developed some principles to build new reductive relations from existing ones. Let us try to see their Agda counterparts. For simplicity, we take F to be the identity functor in the discussion below.

All initial algebras are reductive. For example, given Pred relating n and suc n, one can show that Pred is reductive.

data Pred : ℕ -> ℕ -> Set where
  pre : (n : ℕ) -> Pred n (suc n)

ℕPred-wf : well-found Pred
ℕPred-wf n = acc n (access n)
    access : (n : ℕ) -> (m : ℕ) -> Pred m n -> Acc Pred m
    access .(suc n) .n (pre n) = acc n (access n)

We have seen in the previous post that the less-than relation is reductive.

If R is reductive and S ⊆ R, then S is reductive too:

acc-⊑ : {a : Set}{S R : a -> a -> Set} ->
      S ⊑ R-> (x : a) -> Acc R x -> Acc S x
acc-⊑ {a}{S} S⊑R x (acc .x h) = acc x access
    access : (y : a) -> (y S x) -> Acc S y
    access y y≪x = acc-⊑ S⊑R y (h y (S⊑R y x ySx))

If f . R . f˘ is reductive for some function f, so is R. This is how we often prove termination of loops using f as the variant.

acc-fRf˘ : {a b : Set}{R : a -> a -> Set}{f : a -> b} ->
    (x : a) -> Acc (fun f ○ R ○ ((fun f) ˘)) (f x) -> Acc R x
acc-fRf˘ {a}{b}{R}{f} x (acc ._ h) = acc x access
    access : (y : a) -> R y x -> Acc R y
    access y xRy = acc-fRf˘ y
      (h (f y) (exists x (exists y (≡-refl , xRy) , ≡-refl)))

where the operators for relational composition (_○_) and converse () is that defined in AoPA.

Finally, for the special case where F is the identity functor, R is reductive if and only if its transitive closure R⁺ is. Let us first define transitive closure:

data _⁺ {a : Set}(R : a -> a -> Set) : a -> a -> Set where
  ⁺-base : forall {x y} -> R x y -> (R ⁺) x y
  ⁺-step : forall {x z} -> ∃ a (\y -> R y z × (R ⁺) x y) -> (R ⁺) x z

The “if” direction is easy. Let us prove the “only-if” direction.

acc-tc : {a : Set}{R : a -> a -> Set} ->
    (x : a) -> Acc R x -> Acc (R ⁺) x
acc-tc {a}{R} x ac = acc x (access x ac)
    access : (x : a) -> Acc R x ->
        (y : a) -> (R ⁺) y x -> Acc (R ⁺) y
    access x (acc .x h) y (⁺-base yRx) = acc-tc y (h y yRx)
    access x (acc .x h) y (⁺-step (exists z (zRx , yR⁺z))) =
      access z (h z zRx) y yR⁺z

Combined with the proof that Pred is reductive, we have another way to show that the less-than relation is reductive. One may compare it with the proof of well-found _<′_ in the previous post.


[DB95] Henk Doornbos and Roland Backhouse. Induction and recursion on datatypes. B. Moller, editor, Mathematics of Program Construction, 3rd Internatinal Conference, volume 947 of LNCS, pages 242-256. Springer-Verslag, July 1995.

[DB96] Henk Doornbos and Roland Backhouse. Reductivity. Science of Computer Programming, 26, pp. 217--236, 1996.

[Nor88] Bengt Nordström. Terminating general recursion. BIT archive, Volume 28, Issue 3, pp 605-619. 1988.

AoPA — Algebra of Programming in Agda

2015/03/29: AoPA is on GitHub now, updated to work with Agda and Standard Library 0.9.

An Agda library accompanying the paper Algebra of Programming in Agda: Dependent Types for Relational Program Derivation, developed in co-operation with Hsiang-Shang Ko and Patrik Jansson.

Dependent type theory is rich enough to express that a program satisfies an input/output relational specification, but it could be hard to construct the proof term. On the other hand, squiggolists know very well how to show that one relation is included in another by algebraic reasoning. The AoPA library allows one to encode Algebra of Programming style program derivation, both functional and relational, in Agda.


The following is a derivation of insertion sort in progress:

isort-der : ∃ (\f → ordered? ○ permute ⊒ fun f )
isort-der = (_ , (
      ordered? ○ permute
  ⊒⟨ (\vs -> ·-monotonic ordered? (permute-is-fold vs)) ⟩
      ordered? ○ foldR combine nil
  ⊒⟨ foldR-fusion ordered? ins-step ins-base ⟩
      foldR (fun (uncurry insert)) nil
  ⊒⟨ { foldR-to-foldr insert []}0
      { fun (foldr insert [])
  ⊒∎ }1))

isort : [ Val ] -> [ Val ]
isort = proj₁ isort-der

The type of isort-der is a proposition that there exists a function f that is contained in ordered ? ◦ permute , a relation mapping a list to one of its ordered permutations. The proof proceeds by derivation from the specification towards the algorithm. The first step exploits monotonicity of and that permute can be expressed as a fold. The second step makes use of relational fold fusion. The shaded areas denote interaction points — fragments of (proof ) code to be completed. The programmer can query Agda for the expected type and the context of the shaded expression. When the proof is completed, an algorithm isort is obtained by extracting the witness of the proposition. It is an executable program that is backed by the type system to meet the specification.

The complete program is in the Example directory of the code.

The Code

The code consists of the following files and folders:

  • AlgebraicReasoning: a number of modules supporting algebraic reasoning. At present we implement our own because the PreorderReasoning module in earlier versions of the Standard Library was not expressive enough for our need. We may adapt to the new Standard Library later.
  • Data: defining relational fold, unfold, hylomorphism (using well-founded recursion), the greedy theorem, and the converse-of-a-function theorem, etc, for list and binary tree.
  • Examples: currently we have prepared four examples: a functional derivation of the maximum segment sum problem, a relational derivation of insertion sort and quicksort (following the paper Functional Algorithm Design by Richard Bird), and solving an optimisation problem using the greedy theorem.
  • Relations: modules defining various properties of relations.
  • Sets: a simple encoding of sets, upon with Relations are built.


Download from Github: https://github.com/scmu/aopa.

Constructing List Homomorphism from Left and Right Folds

Since Josh has already mentioned it, I had better give it a full account.

Back in 2003 when I just started my postdoc in PSD Lab in University of Tokyo, my colleagues there were discussing about the third homomorphism theorem. The theorem says that if a function f can be expressed both as a foldr and a foldl:

f = foldr (≪) e
f = foldl (≫) e

there exists some associative binary operator such that

f [] = e
f [a] = a ≪ e = e ≫ a
f (xs ⧺ ys) = f xs ⊚ f ys

Being a list homomorphism implies the potential of parallel computation.

The paper we studied was, of course, The Thrid Homomorphism Theorem by Jeremy. The aim then was to automatically, or semi-automatically, derive , given , , and e. The motivating examples include:

  • f = sum, where can simply be +.
  • f = sort, where can be the function merge merging two sorted lists.
  • f = scanr ⊛ e. While scanr appears to be an inherently sequential function, it is actually possible to compute f “from the middle”, provided that is associative, if we take xs ⊚ (y : ys) = map (⊛y) xs ⧺ (y : ys).

Can we come up with a method to derive for all these examples?

I myself was not particularly interested in automation. I was interested in the challenge for two reasons. Firstly, it appears that some kind of inverse function is needed. Secondly, when I looked at Jeremy’s proof, I felt there is a relational proof inside waiting to be discovered. So I tried.

Setting the Stage

For the ease of point-free calculation, we define alternatives of folds where the input is paired with the base-cases:

foldrp (≪) ([],a) = a
foldrp (≪) (x:xs,a) = x ≪ foldrp (≪) (xs,a)
foldlp (≫) (a,[]) = a
foldlp (≫) (a, xs⧺[x]) = foldlp (≫) (a,xs) ≫ x

The advantage is that we can shift a suffix or a prefix of the input list to the base case. That is:

foldr (≪) e (xs⧺ys) = foldrp≪ (xs, foldr≪ e ys)
foldl (≫) e (xs⧺ys) = foldlp≪ (foldl ≪ e xs, ys)

or, in point-free style:

foldr (≪) e . cat = foldrp (≪) . (id × foldr (≪) e) (1)
foldl (≫) e . cat = foldlp (≫) . (foldlp (≫) e × id) (2)

where cat (xs,ys) = xs ⧺ ys and (f × g) (a,b) = (f a, g b).

The key property, however, is one that was shown in (possibly among other literature) Algebra of Programming: for a simple relation (i.e. a partial function) S, we have:

S . S° . S = S

where ° stands for relational converse (the relational concept of an “inverse function”).

Recall our aim: given f, look for such that f xs ⊚ f ys = f (xs⧺ys). It translates to point-free style as ⊚ . (f × f) = f . cat.

Proving the Theorem

The third homomorphism theorem is a direct corollary of the following lemma:

Lemma 1: f = foldr (≪) e = f = foldl (≫) e implies that f is prefix and postfix insensitive. That is:

f xs = f xs’ ∧ f ys = f ys’ ⇒ f (xs⧺ys) = f (xs’⧺ys’).

In point-free style: f . cat . (f°. f × f°. f) = f . cat.

Once we have the lemma proved, the theorem follows by taking ⊚ = f . cat . (f °× f °). It has to be a (partial) function because ⊚ . (f × f) = f . cat is a function. Futhermore, is associative because cat is.

The proof of Lemma 1 is simply some applications of (1), (2), and S . S° . S = S:

     f . cat
=      { (1) }
     foldrp (≪) . (id × f)
=      { since f . f° . f = f }
     foldrp (≪) . (id × f) . (id × f° . f)
=      { (1) }
     f . cat . (id × f° . f)
=      { (2) }
     foldlp (≫) . (f × id) . (id × f° . f)
=      { since f . f° . f = f }
     foldlp (≫) . (f × id) . (f° . f × f° . f)
=      { (2) }
     f . cat . (f° . f × f° . f)

The proof is still the same as that of Jeremy’s, but in a relational disguise.

Refining to Functions

To derive actual algorithms, we have yet to refine ⊚ = f . cat . (f°× f°) so that it uses functional components only. We shall pick some g ⊆ f° whose domain equals the range of f, such that ⊚ = f . cat . (g × g). (An equality, rather than inclusion, holds because in both definitions are partial functions having the same domain.)

For example, consider the special case sum = foldr (+) 0 = foldl (+) 0. Here ⊚ = sum . cat . (sum°× sum°). One simple choice is to pick wrap ⊆ sum°, where wrap a = [a]. In this case a ⊚ b = sum [a, b] = a + b.

For sort, define sorted = ran sort. It is a partial function such that sorted xs = xs iff xs is sorted. Notice that sorted ⊆ sort°. Therefore, a possible choice for would be sort . cat . (sorted × sorted) — concatenating two sorted lists, and sort them again. Jeremy then went on with this definition and proved that ⊚ = merge, taking advantage of the fact that both input lists are sorted.

Some more Properties

Some more properties are needed to deal with scanr. The following properties allow foldrp and foldrp to proceed by shifting elements of the list to the base case:

foldrp (≪) (xs⧺[x],a) = foldrp (≪) (xs, x ≪ a)
foldlp (≫) (a, x:xs) = foldlp (≫) (a ≫ x, xs) (3)

When simple approaches of refining ⊚ = f . cat . (f°× f°) does not work, the following approach sometimes does. Since f is a fold, one may attempt to take one of the as an unfold, thus forming an “unfold, then fold” pattern:

    ⊚ = f . cat . (f°× f°)
=      { (1) }
    foldrp (≪) . (id × f) . (f°× f°)
⊇      { since f . f° ⊇ ran f }
    foldrp (≪) . (f°× ran f)
=      { since f = foldl ≫ [] }
    foldrp (≪) . ((foldl ≫ [])°× ran f)


⊚ = foldlp (≫) . (ran f × (foldr ≪ [])°) (4)

Scanr “from the Middle”

Finally, consider f = scanr ⊛ e for some associative with identity e (that is, e ⊛ x = x ⊛ e = x). A scan can be defined both as a foldr and a foldl as below:

scanr ⊛ e = foldr ≪ [e]
    x ≪ (y : xs) = x ⊛ y : y : xs
scanr ⊛ e = foldl ≫ [e]
    xs ≫ x = map (⊛x) xs ⧺ [e]

From ⊚ = f . cat . (f°× f°), we can prove that xs ⊚ (y : ys) = map (⊛ y) xs ⧺ ys. Here is the inductive case:

    xs ⊚ (x : y : ys)
=      { (4) }
    foldlp (≫) (xs, (foldr ≪ [e])° (x : y :ys))
=      { definition of , let z ⊛ y = x }
    foldlp (≫) (xs, z : (foldr ≪ [e])° (y :ys))
=      { (3) }
    foldlp (≫) (xs ≫ z, (foldr ≪ [])° (y :ys))
=      { (4) }
    (xs ≫ z) ⊚ (y : ys)
=      { induction }
    map (⊛ y) (xs ≫ z) ⧺ ys
=      { definition of }
    map (⊛ y) (map (⊛ z) xs ⧺ [e]) ⧺ ys
=      { map, e identity }
     map (λv → (v ⊛ y) ⊛ z) xs ⧺ [y] ⧺ ys
=      { associtivity of }
     map (λv → v ⊛ (y ⊛ z)) xs ⧺ [y] ⧺ ys
=      { z ⊛ y =x }
    map (⊛ x) xs ⧺ [x] ⧺ (y:ys)


Given the complexity of the proof above, I did not think there was a hope to automatically construct for a reasonably useful set of list homomorphisms. My colleagues were talking about “weak inverses” — their attempts to look for a refinement of , which I considered too ad-hoc.

Being just graduated from AoP, I was perhaps too arrogant and proud of all the clever derivation we did to care about automatic program construction. (Like Jeremy, in the end of my thesis I quoted “善數不用籌策 (Those who good at calculation need no mechanical aids)” from 老子 Lao Tzu. And didn’t Dijkstra say “I hardly used the computer they gave me. I was perfectly happy without it.”?) The relational method, which seemed to cover everything, gave me a false feeling that I knew the problem inside out.

Last year, however, my colleagues and I met again and I was told that they eventually published a paper on this subject:

Kazutaka Morita, Akimasa Morihata, Kiminori Matsuzaki, Zhenjiang Hu, Masato Takeichi. Automatic Inversion Generates Divide-and-Conquer Parallel Programs, PLDI 2007.

They focused on weak inverses that returns only lists with fixed lengths. The advantage is that calculations like the one above are no longer necessary — since the lists are of fixed-length, always takes constant time. On the other hand, scanr and scanl are special cases dealt with on a different level. Such distinction is enforced by the language they allow to construct f. No, they do not seem to have handled sort. But their approach still covered a reasonably useful collection of functions.

Well, I think the moral is that we cannot just stop when it appears that is no elegant solution that suits our taste. It sometimes pays to get our hands dirty, through which we may eventually discover the beauty within.

Deriving a Virtual Machine

After reading Atsushi Igarashi’s paper Deriving Compilers and Virtual Machines for a Multi-Level Language, I gave myself the following exercise: given an interpreter of a small language, derive a virtual machine and a corresponding compiler. Such techniques probably date back to Ager et al., where they built connections between not one but five pairs of denotational semantics and existing abstract machines (Krivine’s, CEK, CLS, SECD, and Categorical Abstract Machine). They continued to explore a number of variations in subsequent papers. Igarashi’s paper also deals with a more difficult topic — to derive an abstract machine for a language with quotes/unquotes. As a starting point to appreciating their work, I tried this small example one could do in one afternoon.

Let us start with this simple language with numerical values, a binary operator for addition, lambda abstraction and variables represented by de Bruijn notation, and application:

data Exp = Val Int
         | Var Int
         | Plus Exp Exp
         | Lam Exp 
         | App Exp Exp deriving Show

The expression (λf a b. f a + f b) ((λc d. c + d) 1) 2 3, for example, would be represented by:

  (App (App (App 
      (Lam (Lam (Lam (Plus (App (Var 2) (Var 1)) 
                           (App (Var 2) (Var 0))))))
      (App (Lam (Lam (Plus (Var 0) (Var 1))))
           (Val 1)))
    (Val 2)) (Val 3))

An expression evaluates to either a number or a closure:

data Val = Num Int
         | Clos Env Exp deriving Show

And the evaluator can be defined, in an almost standard manner, in the continuation passing style:

type Cont = Val -> Val

type Env = [Val]

eval :: Exp -> Env -> Cont -> Val
eval (Val n) _ k = k (Num n)
eval (Var n) env k = k (env !! n)
eval (Plus e1 e2) env k = 
   eval e1 env (\(Num i) -> 
        eval e2 env (\(Num j) -> 
              k (Num (i + j))))
eval (Lam e) env k =
   k (Clos env e)
eval (App e1 e2) env k =
   eval e1 env (\(Clos env' body) ->
      eval e2 env (\v -> eval body (v : env') k))

Like in many derivations, we have cheated a little from the beginning. The use of de Bruijn indices makes env act like a stack, while CPS abstracts the control flow of the virtual machine to be derived.

Defunctionalising the Continuation

The derivation relies on Reynolds’ defunctionalisation: to enumerate the instances where a higher order function is used, and represent them by first-order structures. As the first step, we aim to replace the continuation Cont, a synonym of Val -> Val, by some datatype:

data Cont = Cont 0 | Cont1 ... | Cont2 ... | ...

and to define a function appK :: Cont -> Val -> Val pattern-matching Cont and performing the actions that were to be done in eval. Where there used to be a continuation application in eval, we replace it with an application of appK:

eval :: Exp -> Env -> Cont -> Val
eval (Val n) _ k = appK k (Num n)
eval (Var n) env k = appK k (env !! n)

The case for Plus is more complicated. In the original definition:

eval (Plus e1 e2) env k = 
   eval e1 env (\(Num i) -> 
        eval e2 env (\(Num j) -> 
              k (Num (i + j))))

the red subexpression should be replaced by a first-order data structure. We call it Cont1, leaving the free variables e2, env, and k as its parameters:

eval (Plus e1 e2) env k = 
   eval e1 env (Cont1 e2 env k)

the function appK, upon matching Cont1, carries out what was supposed to be computed in eval:

appK (Cont1 e2 env k) (Num i) =
    eval e2 env (\(Num j) -> k (Num (i + j)))

However, the red subexpression should be replaced by a first order construct too. Introducing a new constructor Cont2 and abstracting over the free variables i and k, we get:

appK (Cont1 e2 env k) (Num i) =
    eval e2 env (Cont2 i k)
appK (Cont2 i k) (Num j) = appK k (Num (i + j))

The next few cases are dealt with in the same way. Eventually we get the code below. The modified parts are marked in red:

data Cont = Cont0
          | Cont1 Exp Env Cont
          | Cont2 Int Cont
          | Cont3 Exp Env Cont     
          | Cont4 Exp Env Cont  deriving Show

eval :: Exp -> Env -> Cont -> Val
eval (Val n) _ k = appK k (Num n)
eval (Var n) env k = appK k (env !! n)
eval (Plus e1 e2) env k = eval e1 env (Cont1 e2 env k)
eval (Lam e) env k = appK k (Clos env e)
eval (App e1 e2) env k =
   eval e1 env (Cont3 e2 env k)

appK :: Cont -> Val -> Val
appK Cont0 v = v
appK (Cont1 e2 env k) (Num i) =
    eval e2 env (Cont2 i k)
appK (Cont2 i k) (Num j) = appK k (Num (i + j))
appK (Cont3 e2 env k) (Clos env' body) =
    eval e2 env (Cont4 body env' k)
appK (Cont4 body env' k) v = 
    eval body (v : env') k

Separating the Evaluator from the Virtual Machine

The functions eval and appK are mutually recursive. Later, however, appK is to evolve into a part of the virtual machine and eval the compiler. We do not want to mix the two stages. The next task is therefore to separate them.

In the definition of appK, the function eval is called in three places. In all these cases, eval is applied to an expression captured in Cont. Rather than performing the application in appK, we can also make it happen earlier in eval and store the partially applied result in Cont:

eval :: Exp -> Env -> Cont -> Val
eval (Val n) _ k = appK k (Num n)
eval (Var n) env k = appK k (env !! n)
eval (Plus e1 e2) env k = eval e1 env (Cont1 (eval e2) env k)
eval (Lam e) env k = appK k (Clos env (eval e))
eval (App e1 e2) env k =
   eval e1 env (Cont3 (eval e2) env k)

appK :: Cont -> Val -> Val
appK Cont0 v = v
appK (Cont1 c2 env k) (Num i) = 
    c2 env (Cont2 i k)
appK (Cont2 i k) (Num j) = appK k (Num (i + j))
appK (Cont3 c2 env k) (Clos env' cb) =
    c2 env (Cont4 cb env' k)
appK (Cont4 cb env' k) v = 
    cb (v : env') k 

The datatypes Cont and Val no longer store expressions, but partially applied instances of eval. Define:

type Compt = Env -> Cont -> Val

The datatypes are updated to:

data Val = Num Int
         | Clos Env Compt

data Cont = Cont0
          | Cont1 Compt Env Cont
          | Cont2 Int Cont
          | Cont3 Compt Env Cont     
          | Cont4 Compt Env Cont

Notice that while Env is a data stack, Cont, having a list-like structure, acts as a control stack. Constructors Cont1, Cont3, and Cont4 store a pointer (Env) into the data stack. What about Compt? As we will soon see, by applying another defunctionalisation, Compt becomes a list of instructions. The pointer in Cont with type Compt can therefore be seen as an instruction pointer.

Defunctionalising the Computation

By defunctionalising, we replace Compt by a first-order data structure, and create a function appC :: Compt -> Env -> Cont -> Val. This time, however, we try to choose more telling names for the introduced instructions. The first two clauses of eval, for example, is split into:

data Compt = Lit Int | Access Int | ...

eval :: Exp -> Compt
eval (Val m) = Lit m
eval (Var n) = Access n

appC :: Compt -> Env -> Cont -> Val
appC (Lit m) env k = appK k (Num m)
appC (Access n) env k = appK k (env !! n)

Examining the more interesting case eval (Plus e1 e2) env k = eval e1 env (Cont1 (eval e2) env k), we realise that we only have to abstract the two calls to eval as parameters:

data Compt = Lit Int | Access Int | Push Compt Compt ...

eval (Plus e1 e2) = Push1 (eval e1) (eval e2)

appC (Push1 is1 is2) env k = appC is1 env (Cont1 is2 env k)

The transformed program is given below. We also rename the constructors of Cont to more telling names, roughly following the naming by Igarashi :

data Cont = Halt
          | EvOpnd Compt Env Cont
          | EvSum Int Cont
          | EvArg Compt Env Cont     
          | EvBody Compt Env Cont  deriving Show

data Compt = Lit Int 
          | Access Int
          | Close Compt
          | Push1 Compt Compt
          | Push2 Compt Compt

eval :: Exp -> Compt
eval (Val m) = Lit m
eval (Var n) = Access n
eval (Plus e1 e2) = Push1 (eval e1) (eval e2)
eval (Lam e) = Close (eval e)
eval (App e1 e2) = Push2 (eval e1) (eval e2)

appC :: Compt -> Env -> Cont -> Val
appC (Lit m) env k = appK k (Num m)
appC (Access n) env k = appK k (env !! n)
appC (Close is) env k = appK k (Clos env is)
appC (Push1 is1 is2) env k = appC is1 env (EvOpnd is2 env k)
appC (Push2 is1 is2) env k = appC is1 env (EvArg is2 env k)

appK :: Cont -> Val -> Val
appK Halt v = v
appK (EvOpnd c2 env k) (Num i) =
    appC c2 env (EvSum i k)
appK (EvSum i k) (Num j) = appK k (Num (i + j))
appK (EvArg c2 env k) (Clos env' cb) =
    appC c2 env (EvBody cb env' k)
appK (EvBody cb env' k) v = 
    appC cb (v : env') k

Now, functions appC and appK together constitute a virtual machine, while eval compiles a program to its machine code Compt. The sample expression (λf a b. f a + f b) ((λc d. c + d) 1) 2 3, for example, is compiled to:

                (Access 2)
                (Access 1))
                (Access 2)
                (Access 0))))))
              (Access 0)
              (Access 1))))
        (Lit 1)))
    (Lit 2))
  (Lit 3)

Once you understand how it works, the derived result is not so surprising. Igarashi sort of mentioned that it is not too difficult to add recursion. See his paper for more interesting results, such as extension to multi-level languages — programs that generate programs.

Maximum segment sum is back: deriving algorithms for two segment problems with bounded lengths

S-C. Mu. In Partial Evaluation and Program Manipulation (PEPM ’08), pp 31-39. January 2008. (20/74) [PDF] [GZipped Postscript]

It may be surprising that variations of the maximum segment sum (MSS) problem, a textbook example for the squiggolists, are still active topics for algorithm designers. In this paper we examine the new developments from the view of relational program calculation. It turns out that, while the classical MSS problem is solved by the Greedy Theorem, by applying the Thinning Theorem, we get a linear-time algorithm for MSS with upper bound on length.

To derive a linear-time algorithm for the maximum segment density problem, on the other hand, we purpose a variation of thinning based on an extended notion of monotonicity. The concepts of left-negative and right-screw segments emerge from the search for monotonicity conditions. The efficiency of the resulting algorithms crucially relies on exploiting properties of the set of partial solutions and design efficient data structures for them.

Maximum Segment Sum, Agda Approved

To practise using the Logic.ChainReasoning module in Agda, Josh proved the foldr-fusion theorem, which he learnt in the program derivation lecture in FLOLAC where we used the maximum segment sum (MSS) as the main example. Seeing his proof, I was curious to know how much program derivation I can do in Agda and tried coding the entire derivation of MSS. I thought it would be a small exercise I could do over the weekend, but ended up spending the entire week.

As a reminder, given a list of (possibly negative) numbers, the MSS is about computing the maximum sum among all its consecutive segments. Typically, the specification is:

  mss = max ○ map⁺ sum ○ segs

where segs = concat⁺ ○ map⁺ inits ○ tails computes all segments of a list.

A dependent pair is defined by:

  data _∣_ (A : Set) (P : A -> Set) : Set where
    sub : (x : A) -> P x -> A ∣ P

such that sub x p is a pair where the type of the second component p depends on the value of the first component x. The idea is to use a dependent pair to represent a derivation:

mss-der : (x : List Val) -> (Val ∣ \m -> (max ○ map⁺ sum ○ segs) x ≡ m)
mss-der x = 
   sub ?
       (chain> (max ○ map⁺ sum ○ segs) x 
           === ?)

It says that mss-der is a function taking a list x and returns a value of type Val, with the constraint that the value returned must be equal to (max ○ map⁺ sum ○ segs) x.

My wish was to use the interactive mechanism of the Agda Emacs mode to “derive” the parts marked by ?, until we come to an implementation:

mss-der : (x : List Val) -> (Val ∣ \m -> (max ○ map⁺ sum ○ segs) x ≡ m)
mss-der x = 
   sub RESULT
       (chain> (max ○ map⁺ sum ○ segs) x 
           === ...
           === ...
           === RESULT)

If it works well, we can probably use Agda as a tool for program derivation!

Currently, however, I find it harder to use than expected, perhaps due to my being unfamiliar with the way Agda reports type errors. Nevertheless, Agda does force me to make every details right. For example, the usual definition of max I would use in a paper would be:

   max = foldr _↑_ -∞

But then I would have to define numbers with lower bound -∞. A sloppy alternative definition:

   max = foldr _↑_ 0

led me to prove a base case 0 ↑ max x ≡ max x, which is not true. That the definition does work in practice relies on the fact that segs always returns empty list as one of the possible segment. Alternatively, I could define max on non-empty lists only:

  max : List⁺ Val -> Val
  max = foldr⁺ _↑_ id

where List⁺ A is defined by:

  data List⁺ (A : Set) : Set where
    [_]⁺ : A -> List⁺ A
    _::⁺_ : A -> List⁺ A -> List⁺ A

and refine the types of inits, tails, etc, to return non-empty lists. This is the approach I took.

Eventually, I was able to give a derivation of mss this way:

mss-der : (x : List Val) -> (Val ∣ \m -> (max ○ map⁺ sum ○ segs) x ≡ m)
mss-der x = 
   sub ((max ○ scanr _⊗_ ø) x)
       (chain> (max ○ map⁺ sum ○ segs) x 
           === (max ○ map⁺ sum ○ concat⁺ ○ map⁺ inits ○ tails) x 
                   by refl
           === (max ○ concat⁺ ○ map⁺ (map⁺ sum) ○ map⁺ inits ○ tails) x 
                   by cong max (sym (concat⁺-map⁺ ((map⁺ inits ○ tails) x)))
           === (max ○ map⁺ max ○ map⁺ (map⁺ sum) ○ map⁺ inits ○ tails) x 
                   by max-concat⁺ ((map⁺ (map⁺ sum) ○ map⁺ inits ○ tails) x)
           === (max ○ map⁺ max ○ map⁺ (map⁺ sum ○ inits) ○ tails) x 
                   by cong (\xs -> max (map⁺ max xs)) (sym (map⁺-compose (tails x)))
           === (max ○ map⁺ (max ○ map⁺ sum ○ inits) ○ tails) x
                   by cong max (sym (map⁺-compose (tails x)))
           === (max ○ map⁺ (foldr _⊗_ ø) ○ tails) x
                   by cong max (map⁺-eq max-sum-prefix (tails x)) 
           === (max ○ scanr _⊗_ ø) x  
                   by cong max (scanr-pf x) )
  where _⊕_ : Val -> List⁺ Val -> List⁺ Val 
        a ⊕ y = ø ::⁺ map⁺ (_+_ a) y

        _⊗_ : Val -> Val -> Val
        a ⊗ b = ø ↑ (a + b)

where max-sum-prefix consists of two fold fusion:

max-sum-prefix : (x : List Val) -> max (map⁺ sum (inits x)) ≡ foldr _⊗_ ø 
max-sum-prefix x =
 chain> max (map⁺ sum (inits x))
        === max (foldr _⊕_ [ ø ]⁺ x)
               by cong max (foldr-fusion (map⁺ sum) lemma1 x)
        === foldr _⊗_ ø x
               by foldr-fusion max lemma2 x
 where lemma1 : (a : Val) -> (xs : List⁺ (List Val)) -> 
                          map⁺ sum (ini a xs) ≡ (a ⊕ (map⁺ sum xs))
       lemma1 a xs = ?
       lemma2 : (a : Val) -> (x : List⁺ Val) ->
                               max (a ⊕ x) ≡ (a ⊗ max x)
       lemma2 a x = ?

The two lemmas are given in the files attached below. Having the derivation, mss is given by:

mss : List Val -> Val
mss x = depfst (mss-der x)

it is an executable program that is proved to be correct.

The complete Agda program is split into five files:

  • MSS.agda: the main program importing all the sub-modules.
  • ListProperties.agda: some properties I need about lists, such as fold fusion, concat ○ map (map f) = map f ○ concat, etc. Later in the development I realised that I should switch to non-empty lists, so not all of the properties here are used.
  • List+.agda: non-empty lists and some of its properties.
  • Derivation.agda: the main derivation of MSS. The derivation is parameterised over any numerical data and operators + and such that + is associative, and a + (b ↑ c) = (a ↑ b) + (a ↑ c). The reason of this parameterisation, however, was that I did not know how to prove the properties above, until Josh showed me the proof.
  • IntRNG.agda: proofs from Josh that Data.Int actually satisfy the properties above. (Not quite complete yet.)