# Programming from Galois connections

Shin-Cheng Mu and José Nuno Oliveira. In the Journal of Logic and Algebraic Programming, Vol 81(6), pages 680–704, August 2012.
[PDF]

Problem statements often resort to superlatives such as in eg. “… the smallest such number”, “… the best approximation”, “… the longest such list” which lead to specifications made of two parts: one defining a broad class of solutions (the easy part) and the other requesting the optimal such solution (the hard part).

This article introduces a binary relational combinator which mirrors this linguistic structure and exploits its potential for calculating programs by optimization. This applies in particular to specifications written in the form of Galois connections, in which one of the adjoints delivers the optimal solution.

The framework encompasses re-factoring of results previously developed by Bird and de Moor for greedy and dynamic programming, in a way which makes them less technically involved and therefore easier to understand and play with.

This is an extended version of our conference paper published in RAMiCS 2011.

# Proving the Church-Rosser Theorem Using a Locally Nameless Representation

Around 2 years ago, for an earlier project of mine (which has not seen its light yet!) in which I had to build a language with variables and prove its properties, I surveyed a number of ways to handle binders. For some background, people have noticed that, when proving properties about a language with bound variables, one often ends up spending most of the effort proving “uninteresting” things about names, substitution, weakening, etc. For formal theorem proving to be practical, the effort has to be reduced. A number of approaches were proposed. Among them, the “locally nameless” style appeared to be interesting and promising.

With the only reference I used, Engineering Formal Metatheory by Aydemir et al., which outlines the principle ideas of the approach, I imagined how it works and tried to implement my own version in Agda. My first few implementations, however, all ended up in a mess. It appeared that there was an endless number of properties to prove. Besides the complexity of the language I was implementing, there must be something I got wrong about the locally nameless representation. Realising that I could not finish the project this way, I eventually decided to learn from the basics.

I started with the tutorial by Chargueraud, with complete code in Coq. I would then follow his footprints using Agda. The task is a classical one: define untyped λ-calculus and its reduction rules, and prove the Church-Rosser theorem.

### Term

From an abstract level, there is nothing too surprising. We define a syntax for untyped λ-calculus that distinguishes between free and bound variables:
``` ```

``````data Term : Set where
bv : (i : BName) → Term
fv : (x : FName) → Term
ƛ : (e : Term) → Term
_·_ : (e₁ : Term) → (e₂ : Term) → Term``````
``` ```

where `BName = ℕ` represents bound variables by de Bruin indexes, while `FName` is the type of free variables. The latter can be any type that supports equality check and a method that generates a new variable not in a given set (in fact, a `List`) of variables. If one takes `FName = String`, the expression `λ x → x y`, where `y` occurs free, is represented by `ƛ (bv 0 · fv "y")`. For ease of implementation, one may take`FName = ℕ` as well.

Not all terms you can build are valid. For example, `ƛ (bv 1 · fv "y")` is not a valid term since there is only one `ƛ` binder. How to distinguish the valid terms from invalid ones? I would (and did) switch to a dependent datatype `Term n`, indexed by the number of enclosing binders, and let `BName = Fix n`. The index is passed top-down and is incremented each time we encounter a `ƛ`. Closed terms are then represented by the type `Term 0`.

The representation above, if works, has the advantage that a term that can be build at all must be valid. Choosing such a representation was perhaps the first thing I did wrong, however. Chargueraud mentioned a similar predicate that also passes the “level” information top-down, and claimed that the predicate, on which we will have to perform induction on to prove property about terms, does not fit the usual pattern of induction. This was probably why I had so much trouble proving properties about terms.

The way to go, instead, is to use a predicate that assembles the information bottom up. The predicate `LC` (“locally-closed” — a term is valid if it is locally closed) is defined by:
``` ```

``````data LC : Term → Set where
fv : ∀ x → LC (fv x)
ƛ : (L : FNames) → ∀ {e} →
(fe : ∀ {x} → (x∉L : x ∉ L) → LC ([ 0 ↦ fv x ]  e)) → LC (ƛ e)
_·_ : ∀ {e₁ e₂} → LC e₁ → LC e₂ → LC (e₁ · e₂)``````
``` ```

A free variable alone is a valid term. A application `f · e` is valid if both `f` and `e` are. And an abstraction `ƛ e` is valid if `e` becomes a valid term after we substitute any free variable `x` for the first (`0`-th) bound variable. There can be an additional constraint on `x`, that it is not in `L`, a finite set of “protected” variables — such co-finite quantification is one of the features of the locally nameless style.

The “open” operator `[ n ↦ t ] e` substitutes the term `t` for the `n`-th bound variable in `e`. It is defined by
``` ```

``````[_↦_] : ℕ → Term → Term → Term
[ n ↦ t ] (bv i) with n ≟ i
...        | yes _ = t
...        | no _ = bv i
[ n ↦ t ] (fv y) = fv y
[ n ↦ t ] (ƛ e) = ƛ ([ suc n ↦ t ] e)
[ n ↦ t ] (e₁ · e₂) = [ n ↦ t ] e₁ · [ n ↦ t ] e₂``````
``` ```

note how `n` is incremented each time we go into a `ƛ`. A dual operator,
``` ```

``[_↤_] : ℕ → FName → Term → Term``
``` ```

instantiates the `n`-th bound variable to a term.

### β Reduction and Church-Rosser Theorem

Small-step β reduction can be defined by:
``` ```

``````data _β→_ : Term → Term → Set where
β-red : ∀ {t₁ t₂} → Body t₁ → LC t₂
→ ((ƛ t₁) · t₂) β→ [ 0 ↦ t₂ ] t₁
β-app₁ : ∀ {t₁ t₁' t₂} → LC t₂
→ t₁ β→ t₁'
→ (t₁ · t₂) β→ (t₁' · t₂)
β-app₂ : ∀ {t₁ t₂ t₂'} → LC t₁
→ t₂ β→ t₂'
→ (t₁ · t₂) β→ (t₁ · t₂')
β-ƛ : ∀ L {t₁ t₁'}
→ (∀ x → x ∉ L → ([ 0 ↦ fv x ] t₁) β→ ([ 0 ↦  fv x ] t₁'))
→ ƛ t₁ β→ ƛ t₁'``````
``` ```

where `β-red` reduces a redux, `β-app₁` and `β-app₂` allows reduction respectively on the right and left hand sides of an application, and `β-ƛ` goes into a `ƛ` abstraction — again we use co-finite quantification.

Given `_β→_` we can define its reflexive, transitive closure `_β→*_`, and the reflexive, transitive, symmetric closure `_β≣_`. The aim is to prove that `_β→*_` is confluent:
``` ```

``````β*-confluent :
∀ {m s t} → (m β→* s) → (m β→* t)
→ ∃ (λ u → (s β→* u) × (t β→* u))``````
``` ```

which leads to the Church-Rosser property:
``` ```

``````β-Church-Russer : ∀ {t₁ t₂} → (t₁ β≣ t₂)
→ ∃ (λ t → (t₁ β→* t) × (t₂ β→* t))
``````
``` ```

At an abstract level, the proof follows the classical route: it turns out that it is easier to prove the confluence of a “parallel reduction” relation `_⇉_` which allows β reduction to happen in several places of a term in one step. We then prove that `_β→*_` is equivalent to `_⇉*_`, thereby proving the confluence of `_β→*_` as well. All these can be carried out relatively nice and clean.

The gory details, however, hides in proving the infrastructutal properties supporting the abstract view of the proofs.

### Infrastructures

Confluence, Church-Rosser… these are the interesting stuffs we want to prove. However, we often end up spending most of the time proving those infrastructure properties we have to prove — which is why there have been so much recent research hoping to find better representations that simplify them. The locally nameless style is supposed to be such a representation. (Another orthogonal topic is to seek generic representations such that the proofs can be done once for all languages.)

In my code, most of these properties are piled in the file `Infrastructure.agda`. They range from stuffs you might expect to have:
``` ```

``````open-term : ∀ k t {e} → LC e → e ≡ [ k ↦ t ] e
close-var-open-aux : ∀ k x e → LC e → e ≡ [ k ↦ fv x ] ([ k ↤ x ] e)``````
``` ```

to stuffs not that obvious:
``` ```

``````open-var-inj : ∀ k x t u → x ∉ fvars t → x ∉ fvars u
→ [ k ↦ fv x ] t ≡ [ k ↦ fv x ] u → t ≡ u
open-term-aux : ∀ j t i u e → ¬ (i ≡ j)
→ [ j ↦ t ] e ≡ [ i ↦ u ] ([ j ↦ t ] e)
→ e ≡ [ i ↦ u ] e``````
``` ```

The lemma `open-var-inj` is one of the early lemmas that appeared in Chargueraud’s tutorial, which might give one the impression that it is an easy first lemma to prove. On the contrary, it is among the tedious ones — I needed a 40-line proof (most of the cases were simply eliminated by contraction, though).

It takes experience and intuition to know what lemmas are needed. Without promise that it will work, I would think something must have gone wrong when I found myself having to prove weird looking lemmas like:
``` ```

``````close-var-rec-open :
∀ x y z t i j
→ ¬(i ≡ j) → ¬(y ≡ x) → y ∉ fvars t
→ [ i ↦ fv y ] ([ j ↦ fv z ] ([ j ↤ x ] t))
≡ [ j ↦ fv z ] ([ j ↤ x ] ([ i ↦ fv y ] t))``````
``` ```

which is not easy to prove either.

### The Bottom Line?

So, is the locally nameless representation what it claims to be — a way to represent binders that simplifies the infrastructural proofs and is easier to scale up? When I was struggling with some of the proofs in `Infrastructure.agda` I did wonder whether the claim is true only for Coq, with cleverly designed proof tactics, but not for Agda, where everything is done by hand (so far). Once the infrastructural proofs are done, however, the rest was carried out very pleasantly.

To make a fair comparison, I should re-implement everything again using de Bruin notation. That has to wait till some other time, though. (Any one want to give it a try?)

It could be the case that, while some proofs are easily dismissed in Coq using tactics, in Agda the programmer should develop some more abstractions. I did feel myself repeating some proof patterns, and found one or two lemmas that do not present in Chargueraud’s tutorial which, if used in Agda, simplifies the proofs a bit. There could be more, but at this moment I am perhaps too involved in the details to see the patterns from a higher viewpoint.

The exercise does pay off, though. Now I feel I am much more familiar with this style, and am perhaps more prepared to use it in my own project.

### Code

A zip file containing all the code.

### References

1. Brian Aydemir, Arthur Chargueraud, Benjamin Pierce, Randy Pollack, and Stephanie Weirich. Engineering Formal Metatheory. POPL ’08.
2. Arthur Chargueraud. The Locally Nameless Representation. Journal of Automated Reasoning, May 2011

# Constructing list homomorphisms from proofs

Yun-Yan Chi and Shin-Cheng Mu. In the 9th Asian Symposium on Programming Languages and Systems (APLAS 2011), LNCS 7078, pages 74-88.
[PDF]

The well-known third list homomorphism theorem states that if a function `h` is both an instance of `foldr` and `foldl`, it is a list homomorphism. Plenty of previous works devoted to constructing list homomorphisms, however, overlook the fact that proving `h` is both a `foldr` and a `foldl` is often the hardest part which, once done, already provides a useful hint about what the resulting list homomorphism could be. In this paper we propose a new approach: to construct a possible candidate of the associative operator and, at the same time, to transform a proof that `h` is both a `foldr` and a `foldl` to a proof that `h` is a list homomorphism. The effort constructing the proof is thus not wasted, and the resulting program is guaranteed to be correct.

# Generalising and dualising the third list-homomorphism theorem

Shin-Cheng Mu and Akimasa Morihata. In the 16th ACM SIGPLAN International Conference on Functional Programming (ICFP 2011), pages 385-391.
[PDF]

The third list-homomorphism theorem says that a function is a list homomorphism if it can be described as an instance of both a `foldr` and a `foldl`. We prove a dual theorem for unfolds and generalise both theorems to trees: if a function generating a list can be described both as an `unfoldr` and an `unfoldl`, the list can be generated from the middle, and a function that processes or builds a tree both upwards and downwards may independently process/build a subtree and its one-hole context. The point-free, relational formalism helps to reveal the beautiful symmetry hidden in the theorem.

# Calculating Programs from Galois Connections

Galois connections are ubiquitous in mathematics and computing science. One is often amazed that, once two functions are identified as a Galois connection, a long list of nice and often useful properties follow from one concise, elegant defining equation. But how does one construct a program from a specification given as a Galois connection? This is the topic of a recent work of José Nuno Oliveira and I, and this article is an advertisement.

### Galois Connections as Specifications

In program construction one often encounters program specification of the form “… the smallest such number”, “the longest prefix of the input list satisfying …”, etc. A typical example is whole number division: given a natural number `x` and a positive integer `y`, `x / y` is the largest natural number that, when multiplied by `y`, is at most `x`. For another example, the Haskell function `takeWhile p` returns the longest prefix of the input list such that all elements satisfy predicate `p`.

Such specifications can be seen as consisting of two parts. The easy part specifies a collection of solution candidates: numbers that are at most `x` after multiplication with `y`, or all prefixes of the input list. The hard part, on the other hand, picks one optimal solution, such as the largest, the longest, etc., among the collection.

Our goal is to calculate programs for such specifications. But how best should the specification be given in the first place? Take division for example, one might start from a specification that literally translates our description above into mathematics:
``` ```

``    x / y = ⋁{ z | z * y ≤ x } ``
``` ```

As we know, however, suprema is in general not easy to handle. One could also explicitly name the remainder:
``` ```

``    z = x / y  ≡  (∃ r : 0 ≤ r < y : x = z * y + r)``
``` ```

at the cost of existentially quantifying over the remainder.

A third option looks surprising simpler: given `x` and `y`, the value `x / y` is such that for all `z`,
``` ```

``    z * y ≤ x  ≡  z ≤ x / y(1)``
``` ```

Why is this sufficient as a definition of `x / y`? Firstly, by substituting `x / y` for `z`, the right hand side of `≡` reduces to true, and we obtain on the left hand side `(x / y) * y ≤ x`. This tell that `x / y` is a candidate --- it satisfies the easy part of the specification. Secondly, read the definition from left to right: `z * y ≤ x ⇒ z ≤ x / y`. It says that `x / y` is the largest among all the numbers satisfying the easy part.

Equations of the form are called Galois connections. Given preorders `⊑` and `≤`, Functions `f` and `g` form a Galois connection if for all `x` and `z` we have
``` ```

``    f z ⊑ x  ≡  z ≤ g x(2)``
``` ```

The function `f` is called the lower adjoint and `g` the upper adjoint.

The definition of division above is a Galois connection where `f = (* y)` and `g = (/ y)`. For another example, `takeWhile p` can be specified as an upper adjoint:
``` ```

``    map p? zs ⊑ xs  ≡  zs ⊑ takeWhile p xs(3)``
``` ```

where `⊑` is the prefix ordering: `ys ⊑ xs` if `ys` is a prefix of `xs`, and `map p?` is a partial function: `map p? xs = xs` if `p x` holds for each `x` in `xs`.

We love Galois connections because once two functions are identified as such, a long list of useful properties follows: `f (g x) ⊑ x`, `z ≤ g (f z)`, `f` and `g` are monotonic, and are inverses of each other in the other's range... etc.

These are all very nice. But can one calculate a program from a Galois connection? Given `⊑`, `≤`, and `f`, how does one construct `g`?

### The "Shrink" Operator

José discovered and proposed a relational operator to handle such calculations. To use the operator, we have to turn the Galois connection `(1)` into point-free style. We look at the left hand side of `(1)`: `f z ⊑ x`, and try to write it as a relation between `z` and `x`. Let `f°` denote the relational converse of `f` -- roughly, think of it as the inverse function of `f`, that it, it maps `f z` to `z`, and let `∘` denote relational composition -- function composition extended to relations. Thus `f z ⊑ x` translates to
``` ```

``    f° ∘ (⊑)``
``` ```

It is a relation between `z` and `x`: putting `x` on the left hand side of `f° ∘ (⊑)`, it relates, through `⊑`, to `f z`, which is then mapped to `z` through `f°`.

Then we wish that `f° ∘ (⊑)` can be transformed into a (relational) fold or unfold, which is often the case because the defining components: `⊑`, `≤`, and `f`, are often folds or unfolds. Consider the lower adjoint of `takeWhile p` in `(3)`. Since `⊑`, the relation that takes a list and returns a prefix of the list, can be defined as a fold on lists, `(map p?)° ∘ (⊑)`, by fold fusion, is also a fold. Consider `(1)`, since `≤` and `(* y)` are both folds on natural numbers, `(* y)° ∘ (≤)` can be both a fold and an unfold.

In our paper we showed that a Galois connection `(2)` can be transformed into
``` ```

``    g = (f° ∘ (⊑)) ↾ (≥)``
``` ```

where `↾` is the new operator José introduced. The relation `S ↾ R`, pronounced "`S` shrunk by `R`", is a sub-relation of `S` that yields, for each input, an optimal result under relation `R`. Note that the equation made the easy/hard division explicit: `f° ∘ (⊑)` is the easy part: we want a solution `z` that satisfies `f z ⊑ x`, while `≥` is the criteria we use, in the hard part, to choose an optimal solution.

The `↾` operator is similar to the `min` operator of Bird and de Moor, without having to use sets (which needs a power allegory). It satisfies a number of useful properties. In particular, we have theorems stating when `(↾ R)` promotes into folds and unfolds. For example,
``` ```

``    (fold S) ↾ R ⊇ fold (S ↾ R)``
``` ```

if `R` is transitive and `S` is monotonic on `R`.

With the theorems we can calculate `g`. Given `g`, specified as an upper adjoint in a Galois connection with lower adjoint `f`, we first try to turn `f° ∘ (⊑)` into a fold or an unfold, and then apply the theorems to promote `(↾ (≥))`. For more details, take a look at our paper!

# Programming from Galois connections — principles and applications

Shin-Cheng Mu and José Nuno Oliveira. Technical Report TR-IIS-10-009, Academia Sinica, December 2010.
[PDF]

This report is an extended version of our conference submission Programming from Galois connections.

Problem statements often resort to superlatives such as in eg. “… the smallest such number”, “… the best approximation”, “… the longest such list” which lead to specifications made of two parts: one defining a broad class of solutions (the easy part) and the other requesting the optimal such solution (the hard part).

This report introduces a binary relational combinator which mirrors this linguistic structure and exploits its potential for calculating programs by optimization. This applies in particular to specifications written in the form of Galois connections, in which one of the adjoints delivers the optimal solution at target.

The framework encompasses re-factoring of results previously developed by Bird and de Moor for greedy and dynamic programming, in a way which makes them less technically involved and therefore easier to understand and play with.

# Programming from Galois connections

Shin-Cheng Mu and José Nuno Oliveira. In the 12th International Conference on Relational and Algebraic Methods in Computer Science (RAMiCS #12), LNCS 6663, pages 294-313. May 30 – June 3, 2011.
[PDF]

Problem statements often resort to superlatives such as in eg. “… the smallest such number”, “… the best approximation”, “… the longest such list” which lead to specifications made of two parts: one defining a broad class of solutions (the easy part) and the other requesting the optimal such solution (the hard part).

This paper introduces a binary relational combinator which mirrors this linguistic structure and exploits its potential for calculating programs by optimization. This applies in particular to specifications written in the form of Galois connections, in which one of the adjoints delivers the optimal solution at target.

The framework encompasses re-factoring of results previously developed by Bird and de Moor for greedy and dynamic programming, in a way which makes them less technically involved and therefore easier to understand and play with.

An accompanying technical report is available online.

# Constructing datatype-generic fully polynomial-time approximation schemes using generalised thinning

Shin-Cheng Mu, Yu-Han Lyu, and Akimasa Morihata. In the 6th ACM SIGPLAN workshop on Generic programming (WGP 2010), pages 97-108, Sep. 2010. [PDF]

The fully polynomial-time approximation scheme (FPTAS) is a class of approximation algorithms that is able to deliver an approximate solution within any chosen ratio in polynomial time. By generalising Bird and de Moor’s Thinning Theorem to a property between three orderings, we come up with a datatype-generic strategy for constructing fold-based FPTASs. Greedy, thinning, and approximation algorithms can thus be seen as a series of generalisations. Components needed in constructing an FPTAS are often natural extensions of those in the thinning algorithm. Design of complex FPTASs is thus made easier, and some of the resulting algorithms turn out to be simpler than those in previous works.

# Evaluating Simple Polynomials

In the end of FLOLAC ’10 I had a chance to show the students, in 25 minutes, what functional program calculation is about. The student have just been exposed to functional programming a week ago in a three-hour course, after which they have written some simple programs handling concrete data but may have problem grasping those more abstract concepts like folds. I have talked to them about maximum segment sum way too many times (in the context of imperative program derivation, though), and it is perhaps too complex to cover in 25 minutes. The steep list problem, on the other hand, can be dealt with in 5 minutes. Thus I need another example.

This is what I eventually came up with: given a list of numbers `a₀, a₁, a₂ ... an` and a constant `X`, compute `a₀ + a₁X, + a₂X² + ... + anXn`. In Haskell it can be specified as a one-liner:
``` ```

``  poly as = sum (zipWith (×) as (iterate (×X) 1))``
``` ```

One problem of this example is that the specification is already good enough: it is a nice linear time algorithm. To save some multiplications, perhaps, we may try to further simplify it.

It is immediate that `poly [] = 0`. For the non-empty case, we reason:
``` ```

``````   poly (a : as)
=   { definition of poly }
sum (zipWith (×) (a:as) (iterate (×X) 1))
=   { definition of iterate }
sum (zipWith (×) (a:as) (1 : iterate (×X) X))
=   { definition of zipWith }
sum (a : zipWith (×) as (iterate (×X) X))
=   { definition of sum }
a + sum (zipWith (×) as (iterate (×X) X))
``````
``` ```

The expression to the right of `a +` is unfortunately not `poly as` — the last argument to `iterate` is `X` rather than `1`. One possibility is to generalise `poly` to take another argument. For this problem, however, we can do slightly better:
``` ```

``````   a + sum (zipWith (×) as (iterate (×X) X))
=   { since iterate f (f b) = map f (iterate f b) }
a + sum (zipWith (×) as (map (×X) (iterate (×X) 1)))
=   { zipWith (⊗) as . map (⊗X) = map (⊗X) . zipWith (⊗) as
if ⊗ associative }
a + sum (map (×X) (zipWith (×) as (iterate (×X) 1)))
=   { sum . map (×X) = (×X) . sum }
a + (sum (zipWith (×) as (iterate (×X) 1))) × X
=   { definition of poly }
a + (poly as) × X
``````
``` ```

We have thus come up with the program
``` ```

``````  poly [] = 0
poly (a : as) = a + (poly as) × X
``````
``` ```

Besides the definitions of `sum`, `zipWith`, `iterate`, etc, the rules used include:

1. `map f (iterate f x) = iterate f (f x)`
2. `zipWith (⊗) as . map (⊗X) = map (⊗X) . zipWith (⊗) as` if `⊗` associative
3. `sum . map (×X) = (×X) . sum`, a special case of `foldr ⊕ e . map (⊗X) = (⊗X) . foldr ⊕ e` if `(a ⊕ b) ⊗ X = (a ⊗ X) ⊕ (b ⊗ X)` and `e ⊗ X = e`.

Well, this is not a very convincing example. Ideally I’d like to have a derivation, like the steep list, where we gain some improvement in complexity by calculation.

What is your favourite example for functional program calculation?

# Sum of Squares of Differences

In the final exam of the Program Construction course in FLOLAC ’10, I gave the students this problem (from Kaldewaij’s book):
``` ```

``````|[ con N {N ≥ 2}; a : array [0..N) of int;
var r : int;
S
{ r = (Σ i,j : 0 ≤ i < j < N : (a.i - a.j)²) }
]|``````
``` ```

In words, given an array of integers having at least two elements, compute the sum of squares of the difference between all pairs of elements. (Following the convention of the guarded command language, function application is written `f.x`, and an array is seen as a function from indices to values.)

It is not hard to quickly write up a `O(N²)` program using nested loops, which, I have to confess, is what I would do before reading Kaldewaij’s book and realised that it is possible to do the task in linear time using one loop. Unfortunately, not many students managed to come up with this solution, therefore I think it is worth some discussion.

### Quantifiers

Before we solve the problem, let us review the “Dutch style” quantifier syntax and rules. Given a commutative, associative binary operator `⊕` with unit element `e`, if we informally denote the (integral) values in the interval `[A .. B)` by `i₀, i₁, i₂ ... in`, the quantified expression:
``` ```

``   (⊕ i : A ≤ i < B : F.i)``
``` ```

informally denotes `F.i₀ ⊕ F.i₁ ⊕ F.i₂ ⊕ ... ⊕ F.in`. More generally, if all values satisfying predicate `R` can be enlisted `i₀, i₁, i₂ ... in`, the expression
``` ```

``   (⊕ i : R.i : F.i)``
``` ```

denotes `F.i₀ ⊕ F.i₁ ⊕ F.i₂ ⊕ ... ⊕ F.in`. We omit the `i` in `R.i` and `F.i` when there can be no confusion.

A more formal characterisation of the quantified expression is given by the following rules:

1. `(⊕ i : false : F.i) = e`
2. `(⊕ i : i = x : F.i) = F.x`
3. `(⊕ i : R : F) ⊕ (⊕ i : S : F) = (⊕ i : R ∨ S : F) ⊕ (⊕ i : R ∧ S : F)`
4. `(⊕ i : R : F) ⊕ (⊕ i : R : G) = (⊕ : R : F ⊕ G)`
5. `(⊕ i : R.i : (⊕ j : S.j : F.i.j)) = (⊕ j : S.j : (⊕ i : R.i : F.i.j))`

Rules 1 and 3 give rise to a useful rule "split off `n`": consider `i` such that `0 ≤ i < n + 1`. If `n > 0`, the set of possible values of `i` can be split into two subsets: `0 ≤ i < n` and `i = n`. By rule 3 (reversed) and 1 we get:
``` ```

``  (⊕ i : 0 ≤ i < n + 1 : F.i) = (⊕ i : 0 ≤ i < n : F.i) ⊕ F.n``
``` ```

Expressions quantifying more than one variables can be expressed in terms of quantifiers over single variables:
``` ```

``   (⊕ i,j : R.i ∧ S.i,j : F.i.j) = (⊕ i : R.i : (⊕ j : S.i.j : F.i.j))``
``` ```

If `⊗` distributes into `⊕`, we have an additional property:
``` ```

``   x ⊗ (⊕ i : R : F) = (⊗ i : R : x ⊗ F)``
``` ```

As a convention, `(+ i : R : F)` is often written `(Σ i : R : F)`.

### Computing the Sum of Squares of Differences

The first step is to turn the constant `N` to a variable `n`. The main worker of the program is going to be a loop, in whose invariant we try to maintain:
``` ```

``   P  ≣  r = (Σ i,j : 0 ≤ i < j < n : (a.i - a.j)²)``
``` ```

In the end of the loop we increment `n`, and the loop terminates when `n` coincides with `N`:
``` ```

``````   { Inv: P ∧ 2 ≤ n ≤ N , Bound: N - n}
do n ≠ N → ... ; n := n + 1 od``````
``` ```

We shall then find out how to update `r` before `n := n + 1` in a way that preserves `P`.

Assume that `P` and `2 ≤ n ≤ N` holds. To find out how to update `s`, we substitute `n` for `n + 1` in the desired value of `r`:
``` ```

``````   (Σ i,j : 0 ≤ i < j < n : (a.i - a.j)²)[n+1 / n]
= (Σ i,j : 0 ≤ i < j < n + 1 : (a.i - a.j)²)
=   { split off j = n }
(Σ i,j : 0 ≤ i < j < n : (a.i - a.j)²) +
(Σ i : 0 ≤ i < n : (a.i - a.n)²)
=   { P }
r + (Σ i : 0 ≤ i < n : (a.i - a.n)²)
``````
``` ```

This is where most people stop the calculation and start constructing a loop computing `(Σ i : 0 ≤ i < n : (a.i - a.n)²)`. One might later realise, however, that most computations are repeated. Indeed, the expression above can be expanded further:
``` ```

``````   r + (Σ i : 0 ≤ i < n : (a.i - a.n)²)
=   { (x - y)² = x² - 2xy + y² }
r + (Σ i : 0 ≤ i < n : a.i² - 2 × a.i × a.n + a.n²)
=   { Rule 4 }
r + (Σ i : 0 ≤ i < n : a.i²)
- (Σ i : 0 ≤ i < n : 2 × a.i × a.n)
+ (Σ i : 0 ≤ i < n : a.n²)
=   { a.n is a constant, multiplication distributes into addition }
r + (Σ i : 0 ≤ i < n : a.i²)
- 2 × (Σ i : 0 ≤ i < n : a.i) × a.n
+ (Σ i : 0 ≤ i < n : a.n²)
=   { simplifying the last term }
r + (Σ i : 0 ≤ i < n : a.i²)
- 2 × (Σ i : 0 ≤ i < n : a.i) × a.n + n × a.n²
``````
``` ```

which hints at that we can store the values of `(Σ i : 0 ≤ i < n : a.i²)` and `(Σ i : 0 ≤ i < n : a.i)` in two additional variables:
``` ```

``````  Q₀  ≣  s = (Σ i : 0 ≤ i < n : a.i²)
Q₁  ≣  t = (Σ i : 0 ≤ i < n : a.i)
``````
``` ```

It merely takes some routine calculation to find out how to update `s` and `t`. The resulting code is:
``` ```

``````|[ con N {N ≥ 2}; a : array [0..N) of int;
var r, s, t, n : int;

r, s, t, n := (a.0 - a.1)², a.0² + a.1², a.0 + a.1, 2
{ Inv: P ∧ Q₀ ∧ Q₁ ∧ 2 ≤ n ≤ N , Bound: N - n }
; do n ≠ N →
r := r + s - 2 × t × a.n + n × a.n²;
s := s + a.n²;
t := t + a.n;
n := n + 1
od
{ r = (Σ i,j : 0 ≤ i < j < N : (a.i - a.j)²) }
]|``````
``` ```

### Another “One Loop” Solution

Among those students who did come up with a program, most of them resorted to a typical two-loop, `O(N²)` solution. Given that this 9-hour course is, for almost all of them, their first exposure to program derivation, I shall perhaps be happy enough that around 3 to 4 out of 38 students came up with something like the program above.

One student, however, delivered a program I did not expect to see:
``` ```

``````|[ con N {N ≥ 2}; a : array [0..N) of int;
var r, i, j : int;

r, i, j := 0, 0, 0
{ Inv: ... ∧ 0 ≤ i ≤ j ∧ 0 ≤ j ≤ N, Bound: ? }
; do j ≠ N →
if i < j → r := r + (a.i - a.j)²;  i := i + 1
| i = j → i, j := 0, j + 1
fi
od
]|``````
``` ```

The program uses only one loop, but is still `O(N²)` — on a closer inspection one realises that it is actually simulating the inner loop manually. Still, I’d be happy if the student could show me a correctness proof, with a correct loop invariant and a bound, since both of them are more complex than what I expected them to learn. Unfortunately, in the answer handed in, the program, the invariant, and the bound all contain some bugs. Anyone wants to give it a try?