Functional pearl: finding a densest segment

Sharon Curtis and Shin-Cheng Mu. To be submitted.
[Paper | Supplementary Proofs | Code]

The problem of finding a densest segment of a list is similar to the well-known maximum segment sum problem, but its solution is surprisingly challenging. We give a general specification of such problems, and formally develop a linear-time online solution, using a sliding window style algorithm. The development highlights some elegant properties of densities, involving partitions that are decreasing and all right-skew.

The Code

The accompanying program consists of the following files:

  • FDSList.hs: a main program that, for expository purpose, represents sequences by lists.
  • FDSSeq.hs: a linear-time implementation of the algorithm, using refined data structures as described in the paper.
  • ProbMDS.hs: problem specification for the MDS problem. To be imported by FDSList.hs or FDSSeq.hs.
  • ProbMMS.hs: problem specification for the MMS problem. To be imported by FDSList.hs or FDSSeq.hs.
  • MDSTest.hs: QuickCheck tests for the MDS problem.
  • MMSTest.hs: QuickCheck tests for the MMS problem.

Programming from Galois connections

Shin-Cheng Mu and José Nuno Oliveira. In the Journal of Logic and Algebraic Programming, Vol 81(6), pages 680–704, August 2012.
[PDF]

Problem statements often resort to superlatives such as in eg. “… the smallest such number”, “… the best approximation”, “… the longest such list” which lead to specifications made of two parts: one defining a broad class of solutions (the easy part) and the other requesting the optimal such solution (the hard part).

This article introduces a binary relational combinator which mirrors this linguistic structure and exploits its potential for calculating programs by optimization. This applies in particular to specifications written in the form of Galois connections, in which one of the adjoints delivers the optimal solution.

The framework encompasses re-factoring of results previously developed by Bird and de Moor for greedy and dynamic programming, in a way which makes them less technically involved and therefore easier to understand and play with.

This is an extended version of our conference paper published in RAMiCS 2011.

Proving the Church-Rosser Theorem Using a Locally Nameless Representation

Around 2 years ago, for an earlier project of mine (which has not seen its light yet!) in which I had to build a language with variables and prove its properties, I surveyed a number of ways to handle binders. For some background, people have noticed that, when proving properties about a language with bound variables, one often ends up spending most of the effort proving “uninteresting” things about names, substitution, weakening, etc. For formal theorem proving to be practical, the effort has to be reduced. A number of approaches were proposed. Among them, the “locally nameless” style appeared to be interesting and promising.

With the only reference I used, Engineering Formal Metatheory by Aydemir et al., which outlines the principle ideas of the approach, I imagined how it works and tried to implement my own version in Agda. My first few implementations, however, all ended up in a mess. It appeared that there was an endless number of properties to prove. Besides the complexity of the language I was implementing, there must be something I got wrong about the locally nameless representation. Realising that I could not finish the project this way, I eventually decided to learn from the basics.

I started with the tutorial by Chargueraud, with complete code in Coq. I would then follow his footprints using Agda. The task is a classical one: define untyped λ-calculus and its reduction rules, and prove the Church-Rosser theorem.

Term

From an abstract level, there is nothing too surprising. We define a syntax for untyped λ-calculus that distinguishes between free and bound variables:

data Term : Set where
  bv : (i : BName) → Term
  fv : (x : FName) → Term
  ƛ : (e : Term) → Term
  _·_ : (e₁ : Term) → (e₂ : Term) → Term

where BName = ℕ represents bound variables by de Bruin indexes, while FName is the type of free variables. The latter can be any type that supports equality check and a method that generates a new variable not in a given set (in fact, a List) of variables. If one takes FName = String, the expression λ x → x y, where y occurs free, is represented by ƛ (bv 0 · fv "y"). For ease of implementation, one may takeFName = ℕ as well.

Not all terms you can build are valid. For example, ƛ (bv 1 · fv "y") is not a valid term since there is only one ƛ binder. How to distinguish the valid terms from invalid ones? I would (and did) switch to a dependent datatype Term n, indexed by the number of enclosing binders, and let BName = Fix n. The index is passed top-down and is incremented each time we encounter a ƛ. Closed terms are then represented by the type Term 0.

The representation above, if works, has the advantage that a term that can be build at all must be valid. Choosing such a representation was perhaps the first thing I did wrong, however. Chargueraud mentioned a similar predicate that also passes the “level” information top-down, and claimed that the predicate, on which we will have to perform induction on to prove property about terms, does not fit the usual pattern of induction. This was probably why I had so much trouble proving properties about terms.

The way to go, instead, is to use a predicate that assembles the information bottom up. The predicate LC (“locally-closed” — a term is valid if it is locally closed) is defined by:

data LC : Term → Set where
  fv : ∀ x → LC (fv x)
  ƛ : (L : FNames) → ∀ {e} → 
       (fe : ∀ {x} → (x∉L : x ∉ L) → LC ([ 0 ↦ fv x ]  e)) → LC (ƛ e)
  _·_ : ∀ {e₁ e₂} → LC e₁ → LC e₂ → LC (e₁ · e₂)

A free variable alone is a valid term. A application f · e is valid if both f and e are. And an abstraction ƛ e is valid if e becomes a valid term after we substitute any free variable x for the first (0-th) bound variable. There can be an additional constraint on x, that it is not in L, a finite set of “protected” variables — such co-finite quantification is one of the features of the locally nameless style.

The “open” operator [ n ↦ t ] e substitutes the term t for the n-th bound variable in e. It is defined by

[_↦_] : ℕ → Term → Term → Term
[ n ↦ t ] (bv i) with n ≟ i
...        | yes _ = t
...        | no _ = bv i
[ n ↦ t ] (fv y) = fv y
[ n ↦ t ] (ƛ e) = ƛ ([ suc n ↦ t ] e)
[ n ↦ t ] (e₁ · e₂) = [ n ↦ t ] e₁ · [ n ↦ t ] e₂

note how n is incremented each time we go into a ƛ. A dual operator,

[_↤_] : ℕ → FName → Term → Term

instantiates the n-th bound variable to a term.

β Reduction and Church-Rosser Theorem

Small-step β reduction can be defined by:

data _β→_ : Term → Term → Set where
  β-red : ∀ {t₁ t₂} → Body t₁ → LC t₂ 
          → ((ƛ t₁) · t₂) β→ [ 0 ↦ t₂ ] t₁
  β-app₁ : ∀ {t₁ t₁' t₂} → LC t₂ 
           → t₁ β→ t₁'
           → (t₁ · t₂) β→ (t₁' · t₂)
  β-app₂ : ∀ {t₁ t₂ t₂'} → LC t₁ 
           → t₂ β→ t₂'
           → (t₁ · t₂) β→ (t₁ · t₂')
  β-ƛ : ∀ L {t₁ t₁'}
           → (∀ x → x ∉ L → ([ 0 ↦ fv x ] t₁) β→ ([ 0 ↦  fv x ] t₁'))
           → ƛ t₁ β→ ƛ t₁'

where β-red reduces a redux, β-app₁ and β-app₂ allows reduction respectively on the right and left hand sides of an application, and β-ƛ goes into a ƛ abstraction — again we use co-finite quantification.

Given _β→_ we can define its reflexive, transitive closure _β→*_, and the reflexive, transitive, symmetric closure _β≣_. The aim is to prove that _β→*_ is confluent:

β*-confluent : 
  ∀ {m s t} → (m β→* s) → (m β→* t)
                  → ∃ (λ u → (s β→* u) × (t β→* u))

which leads to the Church-Rosser property:

β-Church-Russer : ∀ {t₁ t₂} → (t₁ β≣ t₂)
                    → ∃ (λ t → (t₁ β→* t) × (t₂ β→* t))

At an abstract level, the proof follows the classical route: it turns out that it is easier to prove the confluence of a “parallel reduction” relation _⇉_ which allows β reduction to happen in several places of a term in one step. We then prove that _β→*_ is equivalent to _⇉*_, thereby proving the confluence of _β→*_ as well. All these can be carried out relatively nice and clean.

The gory details, however, hides in proving the infrastructutal properties supporting the abstract view of the proofs.

Infrastructures

Confluence, Church-Rosser… these are the interesting stuffs we want to prove. However, we often end up spending most of the time proving those infrastructure properties we have to prove — which is why there have been so much recent research hoping to find better representations that simplify them. The locally nameless style is supposed to be such a representation. (Another orthogonal topic is to seek generic representations such that the proofs can be done once for all languages.)

In my code, most of these properties are piled in the file Infrastructure.agda. They range from stuffs you might expect to have:

open-term : ∀ k t {e} → LC e → e ≡ [ k ↦ t ] e
close-var-open-aux : ∀ k x e → LC e → e ≡ [ k ↦ fv x ] ([ k ↤ x ] e)

to stuffs not that obvious:

open-var-inj : ∀ k x t u → x ∉ fvars t → x ∉ fvars u
               → [ k ↦ fv x ] t ≡ [ k ↦ fv x ] u → t ≡ u
open-term-aux : ∀ j t i u e → ¬ (i ≡ j)
                → [ j ↦ t ] e ≡ [ i ↦ u ] ([ j ↦ t ] e) 
                → e ≡ [ i ↦ u ] e

The lemma open-var-inj is one of the early lemmas that appeared in Chargueraud’s tutorial, which might give one the impression that it is an easy first lemma to prove. On the contrary, it is among the tedious ones — I needed a 40-line proof (most of the cases were simply eliminated by contraction, though).

It takes experience and intuition to know what lemmas are needed. Without promise that it will work, I would think something must have gone wrong when I found myself having to prove weird looking lemmas like:

close-var-rec-open : 
  ∀ x y z t i j 
  → ¬(i ≡ j) → ¬(y ≡ x) → y ∉ fvars t
  → [ i ↦ fv y ] ([ j ↦ fv z ] ([ j ↤ x ] t))
    ≡ [ j ↦ fv z ] ([ j ↤ x ] ([ i ↦ fv y ] t))

which is not easy to prove either.

The Bottom Line?

So, is the locally nameless representation what it claims to be — a way to represent binders that simplifies the infrastructural proofs and is easier to scale up? When I was struggling with some of the proofs in Infrastructure.agda I did wonder whether the claim is true only for Coq, with cleverly designed proof tactics, but not for Agda, where everything is done by hand (so far). Once the infrastructural proofs are done, however, the rest was carried out very pleasantly.

To make a fair comparison, I should re-implement everything again using de Bruin notation. That has to wait till some other time, though. (Any one want to give it a try?)

It could be the case that, while some proofs are easily dismissed in Coq using tactics, in Agda the programmer should develop some more abstractions. I did feel myself repeating some proof patterns, and found one or two lemmas that do not present in Chargueraud’s tutorial which, if used in Agda, simplifies the proofs a bit. There could be more, but at this moment I am perhaps too involved in the details to see the patterns from a higher viewpoint.

The exercise does pay off, though. Now I feel I am much more familiar with this style, and am perhaps more prepared to use it in my own project.

Code

A zip file containing all the code.

References

  1. Brian Aydemir, Arthur Chargueraud, Benjamin Pierce, Randy Pollack, and Stephanie Weirich. Engineering Formal Metatheory. POPL ’08.
  2. Arthur Chargueraud. The Locally Nameless Representation. Journal of Automated Reasoning, May 2011

Constructing list homomorphisms from proofs

Yun-Yan Chi and Shin-Cheng Mu. In the 9th Asian Symposium on Programming Languages and Systems (APLAS 2011), LNCS 7078, pages 74-88.
[PDF]

The well-known third list homomorphism theorem states that if a function h is both an instance of foldr and foldl, it is a list homomorphism. Plenty of previous works devoted to constructing list homomorphisms, however, overlook the fact that proving h is both a foldr and a foldl is often the hardest part which, once done, already provides a useful hint about what the resulting list homomorphism could be. In this paper we propose a new approach: to construct a possible candidate of the associative operator and, at the same time, to transform a proof that h is both a foldr and a foldl to a proof that h is a list homomorphism. The effort constructing the proof is thus not wasted, and the resulting program is guaranteed to be correct.

Generalising and dualising the third list-homomorphism theorem

Shin-Cheng Mu and Akimasa Morihata. In the 16th ACM SIGPLAN International Conference on Functional Programming (ICFP 2011), pages 385-391.
[PDF]

The third list-homomorphism theorem says that a function is a list homomorphism if it can be described as an instance of both a foldr and a foldl. We prove a dual theorem for unfolds and generalise both theorems to trees: if a function generating a list can be described both as an unfoldr and an unfoldl, the list can be generated from the middle, and a function that processes or builds a tree both upwards and downwards may independently process/build a subtree and its one-hole context. The point-free, relational formalism helps to reveal the beautiful symmetry hidden in the theorem.

Calculating Programs from Galois Connections

Galois connections are ubiquitous in mathematics and computing science. One is often amazed that, once two functions are identified as a Galois connection, a long list of nice and often useful properties follow from one concise, elegant defining equation. But how does one construct a program from a specification given as a Galois connection? This is the topic of a recent work of José Nuno Oliveira and I, and this article is an advertisement.

Galois Connections as Specifications

In program construction one often encounters program specification of the form “… the smallest such number”, “the longest prefix of the input list satisfying …”, etc. A typical example is whole number division: given a natural number x and a positive integer y, x / y is the largest natural number that, when multiplied by y, is at most x. For another example, the Haskell function takeWhile p returns the longest prefix of the input list such that all elements satisfy predicate p.

Such specifications can be seen as consisting of two parts. The easy part specifies a collection of solution candidates: numbers that are at most x after multiplication with y, or all prefixes of the input list. The hard part, on the other hand, picks one optimal solution, such as the largest, the longest, etc., among the collection.

Our goal is to calculate programs for such specifications. But how best should the specification be given in the first place? Take division for example, one might start from a specification that literally translates our description above into mathematics:

    x / y = ⋁{ z | z * y ≤ x } 

As we know, however, suprema is in general not easy to handle. One could also explicitly name the remainder:

    z = x / y  ≡  (∃ r : 0 ≤ r < y : x = z * y + r)

at the cost of existentially quantifying over the remainder.

A third option looks surprising simpler: given x and y, the value x / y is such that for all z,

    z * y ≤ x  ≡  z ≤ x / y(1)

Why is this sufficient as a definition of x / y? Firstly, by substituting x / y for z, the right hand side of reduces to true, and we obtain on the left hand side (x / y) * y ≤ x. This tell that x / y is a candidate --- it satisfies the easy part of the specification. Secondly, read the definition from left to right: z * y ≤ x ⇒ z ≤ x / y. It says that x / y is the largest among all the numbers satisfying the easy part.

Equations of the form are called Galois connections. Given preorders and , Functions f and g form a Galois connection if for all x and z we have

    f z ⊑ x  ≡  z ≤ g x(2)

The function f is called the lower adjoint and g the upper adjoint.

The definition of division above is a Galois connection where f = (* y) and g = (/ y). For another example, takeWhile p can be specified as an upper adjoint:

    map p? zs ⊑ xs  ≡  zs ⊑ takeWhile p xs(3)

where is the prefix ordering: ys ⊑ xs if ys is a prefix of xs, and map p? is a partial function: map p? xs = xs if p x holds for each x in xs.

We love Galois connections because once two functions are identified as such, a long list of useful properties follows: f (g x) ⊑ x, z ≤ g (f z), f and g are monotonic, and are inverses of each other in the other's range... etc.

These are all very nice. But can one calculate a program from a Galois connection? Given , , and f, how does one construct g?

The "Shrink" Operator

José discovered and proposed a relational operator to handle such calculations. To use the operator, we have to turn the Galois connection (1) into point-free style. We look at the left hand side of (1): f z ⊑ x, and try to write it as a relation between z and x. Let denote the relational converse of f -- roughly, think of it as the inverse function of f, that it, it maps f z to z, and let denote relational composition -- function composition extended to relations. Thus f z ⊑ x translates to

    f° ∘ (⊑)

It is a relation between z and x: putting x on the left hand side of f° ∘ (⊑), it relates, through , to f z, which is then mapped to z through .

Then we wish that f° ∘ (⊑) can be transformed into a (relational) fold or unfold, which is often the case because the defining components: , , and f, are often folds or unfolds. Consider the lower adjoint of takeWhile p in (3). Since , the relation that takes a list and returns a prefix of the list, can be defined as a fold on lists, (map p?)° ∘ (⊑), by fold fusion, is also a fold. Consider (1), since and (* y) are both folds on natural numbers, (* y)° ∘ (≤) can be both a fold and an unfold.

In our paper we showed that a Galois connection (2) can be transformed into

    g = (f° ∘ (⊑)) ↾ (≥)

where is the new operator José introduced. The relation S ↾ R, pronounced "S shrunk by R", is a sub-relation of S that yields, for each input, an optimal result under relation R. Note that the equation made the easy/hard division explicit: f° ∘ (⊑) is the easy part: we want a solution z that satisfies f z ⊑ x, while is the criteria we use, in the hard part, to choose an optimal solution.

The operator is similar to the min operator of Bird and de Moor, without having to use sets (which needs a power allegory). It satisfies a number of useful properties. In particular, we have theorems stating when (↾ R) promotes into folds and unfolds. For example,

    (fold S) ↾ R ⊇ fold (S ↾ R)

if R is transitive and S is monotonic on R.

With the theorems we can calculate g. Given g, specified as an upper adjoint in a Galois connection with lower adjoint f, we first try to turn f° ∘ (⊑) into a fold or an unfold, and then apply the theorems to promote (↾ (≥)). For more details, take a look at our paper!

Programming from Galois connections — principles and applications

Shin-Cheng Mu and José Nuno Oliveira. Technical Report TR-IIS-10-009, Academia Sinica, December 2010.
[PDF]

This report is an extended version of our conference submission Programming from Galois connections.

Problem statements often resort to superlatives such as in eg. “… the smallest such number”, “… the best approximation”, “… the longest such list” which lead to specifications made of two parts: one defining a broad class of solutions (the easy part) and the other requesting the optimal such solution (the hard part).

This report introduces a binary relational combinator which mirrors this linguistic structure and exploits its potential for calculating programs by optimization. This applies in particular to specifications written in the form of Galois connections, in which one of the adjoints delivers the optimal solution at target.

The framework encompasses re-factoring of results previously developed by Bird and de Moor for greedy and dynamic programming, in a way which makes them less technically involved and therefore easier to understand and play with.

Programming from Galois connections

Shin-Cheng Mu and José Nuno Oliveira. In the 12th International Conference on Relational and Algebraic Methods in Computer Science (RAMiCS #12), LNCS 6663, pages 294-313. May 30 – June 3, 2011.
[PDF]

Problem statements often resort to superlatives such as in eg. “… the smallest such number”, “… the best approximation”, “… the longest such list” which lead to specifications made of two parts: one defining a broad class of solutions (the easy part) and the other requesting the optimal such solution (the hard part).

This paper introduces a binary relational combinator which mirrors this linguistic structure and exploits its potential for calculating programs by optimization. This applies in particular to specifications written in the form of Galois connections, in which one of the adjoints delivers the optimal solution at target.

The framework encompasses re-factoring of results previously developed by Bird and de Moor for greedy and dynamic programming, in a way which makes them less technically involved and therefore easier to understand and play with.

An accompanying technical report is available online.

Constructing datatype-generic fully polynomial-time approximation schemes using generalised thinning

Shin-Cheng Mu, Yu-Han Lyu, and Akimasa Morihata. In the 6th ACM SIGPLAN workshop on Generic programming (WGP 2010), pages 97-108, Sep. 2010. [PDF]

The fully polynomial-time approximation scheme (FPTAS) is a class of approximation algorithms that is able to deliver an approximate solution within any chosen ratio in polynomial time. By generalising Bird and de Moor’s Thinning Theorem to a property between three orderings, we come up with a datatype-generic strategy for constructing fold-based FPTASs. Greedy, thinning, and approximation algorithms can thus be seen as a series of generalisations. Components needed in constructing an FPTAS are often natural extensions of those in the thinning algorithm. Design of complex FPTASs is thus made easier, and some of the resulting algorithms turn out to be simpler than those in previous works.

Evaluating Simple Polynomials

In the end of FLOLAC ’10 I had a chance to show the students, in 25 minutes, what functional program calculation is about. The student have just been exposed to functional programming a week ago in a three-hour course, after which they have written some simple programs handling concrete data but may have problem grasping those more abstract concepts like folds. I have talked to them about maximum segment sum way too many times (in the context of imperative program derivation, though), and it is perhaps too complex to cover in 25 minutes. The steep list problem, on the other hand, can be dealt with in 5 minutes. Thus I need another example.

This is what I eventually came up with: given a list of numbers a₀, a₁, a₂ ... an and a constant X, compute a₀ + a₁X, + a₂X² + ... + anXn. In Haskell it can be specified as a one-liner:

  poly as = sum (zipWith (×) as (iterate (×X) 1))

One problem of this example is that the specification is already good enough: it is a nice linear time algorithm. To save some multiplications, perhaps, we may try to further simplify it.

It is immediate that poly [] = 0. For the non-empty case, we reason:

   poly (a : as)
 =   { definition of poly }
   sum (zipWith (×) (a:as) (iterate (×X) 1))
 =   { definition of iterate }
   sum (zipWith (×) (a:as) (1 : iterate (×X) X))
 =   { definition of zipWith } 
   sum (a : zipWith (×) as (iterate (×X) X))
 =   { definition of sum }
   a + sum (zipWith (×) as (iterate (×X) X))

The expression to the right of a + is unfortunately not poly as — the last argument to iterate is X rather than 1. One possibility is to generalise poly to take another argument. For this problem, however, we can do slightly better:

   a + sum (zipWith (×) as (iterate (×X) X))
 =   { since iterate f (f b) = map f (iterate f b) }
   a + sum (zipWith (×) as (map (×X) (iterate (×X) 1)))
 =   { zipWith (⊗) as . map (⊗X) = map (⊗X) . zipWith (⊗) as 
          if ⊗ associative }
   a + sum (map (×X) (zipWith (×) as (iterate (×X) 1)))
 =   { sum . map (×X) = (×X) . sum }
   a + (sum (zipWith (×) as (iterate (×X) 1))) × X
 =   { definition of poly }
   a + (poly as) × X

We have thus come up with the program

  poly [] = 0
  poly (a : as) = a + (poly as) × X


Besides the definitions of sum, zipWith, iterate, etc, the rules used include:

  1. map f (iterate f x) = iterate f (f x)
  2. zipWith (⊗) as . map (⊗X) = map (⊗X) . zipWith (⊗) as if associative
  3. sum . map (×X) = (×X) . sum, a special case of foldr ⊕ e . map (⊗X) = (⊗X) . foldr ⊕ e if (a ⊕ b) ⊗ X = (a ⊗ X) ⊕ (b ⊗ X) and e ⊗ X = e.

Well, this is not a very convincing example. Ideally I’d like to have a derivation, like the steep list, where we gain some improvement in complexity by calculation.

What is your favourite example for functional program calculation?