# The Maximum Segment Sum Problem: Its Origin, and a Derivation

In a previous paper of mine, regrettably, I wrongly attributed the origin of the maximum segment sum problem to Dijkstra and Feijen’s Een methode van programmeren. In fact, the story behind the problem was told very well in Jon Bentley’s Programming Pearls.

### The Problem, and the Linear-Time Algorithm

Given a list of numbers, the task is to compute the largest possible sum of a consecutive segment. In a functional language the problem can be specified by:

mss = max . map sum . segments

where segments = concat . map inits . tails enlists all segments of the input list, map sum computes the sum of each of the segments, before max :: Ord a ⇒ [a] → a picks the maximum. The specification, if executed, is a cubic time algorithm. Yet there is a linear time algorithm scanning through the list only once:

mss = snd . foldr step (0,0)
where step x (p,s) = (0 ↑ (x+p), (0 ↑ (x+p)) ↑ s)

where a ↑ b yields the maximum of a and b.

Both the specification and the linear time program are short. The program is merely a foldr that can be implemented as a simple for-loop in an imperative language. Without some reasoning, however, it is not that trivial to see why the program is correct (hint: the foldr computes a pair of numbers, the first one being the maximum sum of all prefixes of the given list, while the second is the maximum sum of all segments). Derivation of the program (given below) is mostly mechanical, once you learn the basic principles of program calculation. Thus the problem has become a popular choice as the first non-trivial example of program derivation.

### Origin

Jon Bentley recorded in Programming Pearls that the problem was proposed by Ulf Grenander of Brown University. In a pattern-matching procedure he designed, a subarray having maximum sum is the most likely to yield a certain pattern in a digitised image. The two dimensional problem took too much time to solve, so he simplified to one dimension in order to to understand its structure.

In 1977 [Grenander] described the problem to Michael Shamos of UNILOGIC, Ltd. (then of Carnegie-Mellon University) who overnight designed Algorithm 3. When Shamos showed me the problem shortly thereafter, we thought that it was probably the best possible; … A few days later Shamos described the problem and its history at a Carnegie-Mellon seminar attended by statistician Jay Kadane, who designed Algorithm 4 within a minute.

Jon Bentley, Programming Pearls (1st edition), page 76.

Jay Kadane’s Algorithm 4 is the now well-known linear time algorithm, the imperative version of the functional program above:

maxpre, maxseg = 0, 0
for i in range (0, N):
maxpre = 0 ↑ (maxpre + a[i])
maxseg = maxpre ↑ maxseg

Algorithm 3, on the other hand, is a divide and conquer algorithm. An array a is split into two halves a₁ ⧺ a₂, and the algorithm recursively computes the maximum segment sums of a₁ and a₂. However, there could be some segment across a₁ and a₂ that yields a good sum, therefore the algorithm performs two additional loops respectively computing the maximum suffix sum of a₁ and the maximum prefix sum of a₂, whose sum is the maximum sum of segment crossing the edge. The algorithm runs in O(N log N) time. (My pseudo Python translation of the algorithm is given below.)

In retrospect, Shamos did not have to compute the maximum prefix and suffix sums in two loops each time. The recursive function could have computed a triple quadruple of (maximum prefix sum, maximum segment sum, maximum suffix sum, and sum of the whole array) for each array. The prefix and suffix sums could thus be computed bottom-up. I believe that would result in a O(N) algorithm. This linear time complexity might suggest that the “divide” is superficial — we do not have to divide the array in the middle. It is actually easier to divide the array into a head and a tail — which was perhaps how Kadane quickly came up with Algorithm 4!

### A Functional Derivation

I learned the function derivation of the maximum segment sum problem from one of Jeremy’s papers [3] and was very amazed. It was perhaps one of the early incident that inspired my interest in program calculation. The derivation does not appear to be very well known outside the program derivation circle — not even for functional programmers, so I would like to redo it here.

The first few steps of the derivation goes:

max . map sum . segs
=    { definition of segs }
max . map sum . concat . map inits . tails
=    { since map f . concat = concat . map (map f) }
max . concat . map (map sum) . map inits . tails
=    { since max . concat = max . map max }
max . map max .  map (map sum) . map inits . tails
=    { since map f . map g = map (f.g) }
max . map (max . map sum . inits) . tails

The purpose of the book-keeping transformation above is to push max . map sum closer to inits. The fragment max . map sum . inits is a function which, given a list of numbers, computes the maximum sum among all its prefixes. We denote it by mps, for maximum prefix sum. The specification has been transformed to:

mss = max . map mps . tails

This is a common strategy for segment problems: to solve a problem looking for an optimal segment, proceed by looking for an optimal prefix of each suffix. (Symmetrically we could process the list the other way round, look for an optimal suffix for each prefix.)

We wish that mps for each of the suffixes can be efficiently computed in an incremental manner. For example, to compute mps [-1,3,3,-4], rather than actually enumerating all suffixes, we wish that it can be computed from -1 and mps [3,3,-4] = 6, which can in turn be computed from 3 and mps [3,-4] = 3, all in constant time. In other words, we wish that mps is a foldr using a constant time step function. If this is true, one can imagine that we could efficiently implement map mps . tails in linear time. Indeed, scanr f e = map (foldr f e) . tails!

The aim now is to turn mps = max . map sum . inits into a foldr. Luckily, inits is actually a foldr. In the following we will perform foldr-fusion twice, respectively fusing map sum and max into inits, thus turning the entire expression into a foldr.

The first fusion goes:

max . map sum .inits
=    { definition of inits }
max . map sum . foldr (\x xss -> [] : map (x:) xss) [[]]
=    { fold fusion, see below }
max . foldr zplus [0]

The fusion condition can be established below, through which we also construct the definition of zplus:

map sum ([] : map (x:) xss)
=  0 : map (sum . (x:)) xss
=    { by definition, sum (x : xs) = x + sum xs }
0 : map (x+) (map sum xss)
=    { define zplus x xss = 0 : map (x+) xss }
zplus x (map sum xss)

We continue with the derivation and perform another fusion:

max . foldr zplus [0]
=    { fold fusion, let zmax x y = 0 ↑ (x+y) }
foldr zmax 0 {-"."-}

For the second fold fusion to work, we have to prove the following fusion condition:

max (0 : map (x+) xs)
=  0 ↑ max (map (x+) xs)
=    { since  max (map (x +) xs) = x + max xs }
0 ↑ (x + max xs) {-"."-}

The property max (map (x +) xs) = x + max xs in the last step follows from that (↑) distributes into (+), that is, (x + y) ↑ (x + z) = x + (y ↑ z). This is the key property that allows the whole derivation to work.

By performing foldr-fusion twice we have established that

mps = foldr zmax 0

In words, mps (x : xs), the best prefix sum of x : xs, can be computed by zmax x (mps xs). The definition of zmax says that if x + mps xs is positive, it is the maximum prefix sum; otherwise we return 0, sum of the empty prefix.
Therefore, mss can be computed by a scanr:

mss
=    { reasoning so far }
max . map (foldr zmax 0) . tails
=    { introducing scanr }
max . scanr zmax 0 {-"."-}

We have derived mss = max . scanr zmax 0, where zmax x y = 0 ↑ (x+y).

Many functional derivations usually stop here. This gives us an algorithm that runs in linear time, but takes linear space. A tupling transformation eliminates the need for linear space:

mss = snd . (head &&& max) . scanr zmax 0

where (f &&& g) a = (f a, g a). The part (head &&& max) . scanr zmax 0 returns a pair, the first component being the result of mps, the second mss. By some mechanical simplification we get the final algorithm:

mss = snd . foldr step (0,0)
where step x (p,s) = (0 ↑ (x+p), (0 ↑ (x+p)) ↑ s)

### A Relational Derivation?

The maximum segment sum problem later turned out to be a example of Richard and Oege’s Greedy Theorem [2]. It is an exercise in the Algebra of Programming book, but I have not seen the solution given anywhere. For completeness, I recorded a relational derivation in a paper of mine about some other variations of the maximum segment sum problem[4].

### References

1. Bentley, Jon. Programming Pearls. Addison-Wesley, Inc, 1987.
2. Bird, Richard and de Moor, Oege. Algebra of Programming. Prentice-Hall, 1997
3. Gibbons, Jeremy. Calculating Functional Programs. Proceedings of ISRG/SERG Research Colloquium, Oxford Brookes University, November 1997.
4. Mu, Shin-Cheng. Maximum segment sum is back: deriving algorithms for two segment problems with bounded lengths. Partial Evaluation and Program Manipulation (PEPM ’08), pp 31-39. January 2008.

### Appendix: Algorithm 3

def mss(l,u):
if l > u:
return 0          # empty array
else if l == u:
return (0 ↑ a[l])  # singleton array
else:
m = (l + u) / 2

# compute maximum suffix sum of a[0..m]
sum, maxToLeft = 0, 0
for i in range (m, l-1, -1):
sum = sum + a[i]
maxToLeft = maxToLeft ↑ sum
# compute maximum prefix sum of a[m+1..u]
sum, maxToRight = 0, 0
for i in range (m+1, u+1):
sum = sum + a[i]
maxToLeft = maxToRight ↑ sum
maxCrossing = maxToLeft + maxToRight

# recursively compute mss of a[0..m] and a[m+1..u]
maxInL = mss(l,m)
maxInR = mss(m+1,u)
return (maxInL ↑ maxCrossing ↑ maxInR)

# Maximally Dense Segments — Code and Proof

The following is superseded by our later work Functional pearl: finding a densest segment.

Code and proof accompanying a forthcoming paper of Sharon Curtis and me: Functional Pearl: Maximally Dense Segments.

Quick links: [expository program | linear time program | proofs (late update: 2010.04.17)].

errata:
• Page 3: “This input sequence does not have a solution…” what we meant was “This input does not have a prefix that is within bounds.” We used another example where the input does not have a feasible segment at all before changing to example, but I forgot to change the text accordingly.
• Page 4, Proof of Theorem 3.2: the first mdsM x ⇑d win (a:x) should be mdsM x ⇑d wp (trim (a:x)); a : x <b L and a : x ≥b L should respectively be trim (a : x) <b L and trim (a : x) ≥b L.
• Thanks to Josh Ko for pointing out both errors.

### Expository Code

The expository program [download here] intends to be an executable implementation of the code in the paper. For clarity we use Haskell lists for all sequences, and do not implement optimisations such as storing the areas and breadths of segments, thus the program is not linear time yet. A linear time implementation will follow soon.

The code is mostly consistent with the paper, with some minor differences: many functions take an additional argument of type BreadthSpec = (Breadth, Maybe Breadth). The first component is the lower bound, while the second component is an optional upperbound. The main function is:

mds :: BreadthSpec ->  [Elem] -> Maybe [Elem]

which takes a BreadthSpec and switches between the modes with or without an upper bound.

To try the code, you may either load the module Main into ghci and invoke the function mds:

mds (lb, Just ub) [x1, x2, x3, ... ]

or load the module Test and run some QuickCheck properties:

Test.QuickCheck.quickCheck (prop_mds_correct bb (lb, Just ub))

where lb and ub are the lower and upper bounds, and bb is the bound on breadths of generated test elements. The property prop_mds_correct asserts that mds (lb,ub) x =d mds_spec (lb,ub) x for all x.

The gzipped file consists of the following Haskell modules:

• Main: containing the main program mds, our variant of zygomorphism zh, wp2, smsp2, maxChop, trim, etc.
• Spec: containing a specification of the maximally dense segment problem:
mds_spec :: BreadthSpec -> [Elem] -> Maybe [Elem]
mds_spec bs = maxlistM d . filter (bounds bs) . nonEmptySegs

Many types having area, breadth, and density defined are collected into a type class Block, and functions like maxChop are define on the type class.

• DRSP: specification of right-skew segments and DRSP, with functions including rightSkew, sdars, lrsp, and drsp.
• DPTrees: defining DTrees and PTrees, and functions like addD and prependD allowing one to construct DRSPs in a fold.
• Utilities: some utility functions processing lists.
• Test: a module defining some QuickCheck properties to test the code.

### Linear Time Implementation

A linear time implementation can be downloaded here. The program uses Data.Sequence to represent the compulsory part and the first level of the DForest and the PForest of the window, as well as annotating them with areas and breadths. The subtrees of a DTree and a PTree, however, can be represented simply by snoc-lists and cons-lists respectively.

Organisation of the code is the same as the first program.

### Proofs

Proofs accompanying the paper [PDF]. Theorems and lemmas are labelled with both their own numbers as well as the numbers in the paper, if any. For example, Lemma A.1 (3.1) is Lemma 3.1 in the paper.

# A Survey of Binary Search

If you think you know everything you need to know about binary search, but have not read Netty van Gasteren and Wim Feijen’s note The Binary Search Revisited, you should.

In the FLOLAC summer school this year we plan to teach the students some basics of Hoare logic and Dijkstra’s weakest precondition calculus — a topic surprisingly little known among Taiwanese students. Binary search seems to be a topic I should cover. A precise specification will be given later, but basically the requirement is like this: given a sorted array of N numbers and a value to search for, either locate the position where the value resides in the array, or report that the value does not present in the array, in O(log N) time.

Given that everybody should have learned about binary search in their algorithm class, you would not expect it to be a hard programming task. Yet in his popular book Programming Pearls, Jon Bentley noted that surprisingly few professions programmers managed to implement the algorithm without bugs at their first attempt. I tried it myself and, being very careful (and having learned something about program construction already), I produced a program which I believe to be correct, but rather inelegant. I tried it on my students and they did produce code with some typical bugs. So it seems to be a good example for a class about correct program construction.

### The Van Gasteren-Feijen Approach

The surprising fact that van Gasteren and Feijen pointed out was that binary search does not apply only to sorted lists! In fact, the usual practice comparing binary search to searching for a word in a dictionary is, according to van Gasteren and Feijen, a major educational blunder.

Van Gasteren and Feijen considered solving a more general problem: let M and N be integral numbers with M < N, and let Φ be a relation such that M Φ N (where Φ is written infix), with some additional constraints to be given later. The task is to find l such that

M ≤ l < N   ∧   l Φ (l+1)

This is the program:

{ M < N ∧ M Φ N }
l, r := M, N
{ Inv: M ≤ l < r ≤ N  ∧  l Φ r,   Bound: r - l }
; do l+1 ≠ r →
{ l + 2 ≤ r }
m := (l + r) / 2
; if m Φ r → l := m
[] l Φ m → r := m
fi
od
{ M ≤ l < N  ∧  l Φ (l+1) }

Notice first that the loop guard l+1 ≠ r, if satisfied, guarantees that l and r are not adjacent numbers, therefore assigning m := (l + r) / 2 establishes l < m < r, and thus the bound r - l is guaranteed to decrease. The if statement clearly maintains the invariant, if at least one of the guards are always satisfied:

l Φ r  ∧  l < m < r   ⇒   l Φ m  ∨  m Φ r(*)

For Φ satisfying the condition above, at the end of the loop we will find some l such that l Φ (l+1).

What relations satisfy (*)? Examples given by van Gasteren and Feijen include:

• i Φ j = a[i] ≠ a[j] for some array a. Van Gasteren and Feijen suggested using this as the example when introducing binary search.
• i Φ j = a[i] < a[j],
• i Φ j = a[i] × a[j] ≤ 0,
• i Φ j = a[i] ∨ a[j], etc.

### Searching for a Key in a Sorted Array

To search for a key K in an ascending-sorted array a, it seems that we could just pick this Φ:

i Φ j = a[i] ≤ K < a[j]

and check whether a[i] = K after the loop. There is only one problem, however -- we are not sure we can establish the precondition a[l] ≤ K < a[r]!

Van Gasteren and Feijen's solution is to add to imaginary elements to the two ends of the array. That is, for a possibly empty array a[0..N), we imagine two elements a[-1] such that a[-1] ≤ x and a[N] such that x < a[N] for all x. I believe this is equivalent to using this Φ:

i Φ j  =  (i = -1  ∨  a[i] ≤ K)  ∧  (K < a[j]  ∨  j = N)

which still satisfies (*) if a is sorted. And here is the program

{ 0 ≤ N ∧ -1 Φ N }
l, r := -1, N
{ Inv: -1 ≤ l < r ≤ N  ∧  l Φ r,   Bound: r - l }
; do l+1 ≠ r →
{ l + 2 ≤ r }
m := (l + r) / 2
; if a[m] ≤ K → l := m
[] K < a[m] → r := m
fi
od
{ -1 ≤ l < N  ∧  l Φ (l+1) }
; if l > -1 → found := a[l] = K
[] l = -1 → found := false
fi

Do not worry about the idea "adding" elements to a. The invariant implies that -1 < m < N, thus a[-1] and a[N] are never accessed, and the array a needs not be actually altered. They are just there to justify the correctness of the program. It also enables us to handle possibly empty arrays, while the loop body seems to be designed for the case when the range [l..r) is non-empty.

### Bentley's Program

Bentley's program for binary search in Programming Pearls can be rephrased as below:

l, r := 0, N-1
; do l ≤ r →
m := (l + r) / 2
; if a[m] < K → l := m + 1
[] a[m] = K → found := true; break
[] K < a[m] → r := m - 1
fi
od
; found := false

I would like to be able to derive this program in class, since this appears to be the more popular version. Apart from the presence of break, which I do not yet know of a easy variation of Hoare logic that helps to derive it, to relate the test a[m] < K to l := m + 1 I will have to bring in the fact that a is sorted in an earlier stage of the development. Thus it is harder to put it in a more general picture.

For several reasons I used to believe that Bentley's program could be preferred, for example, it seems to shrink the range more effectively, assigning l and r to m + 1 and m - 1, rather than m. On a second thought I realised that it might not be true. Variable l can be assigned m + 1 because the possibility of a[m] = K is covered in another case with an early exit, and r is assigned m - 1 because this algorithm represents an array segment with an inclusive right bound, as opposed to the previous algorithm.

The two algorithms do not solve exactly the same problem. With multiple occurrences of K in the array, Bentley's algorithm is non-deterministic about which index it returns, while the van Gasteren-Feijen algorithm, enforced by the specification, always returns the largest index. When K does not appear in the array, van Gasteren and Feijen's program could be more efficient because it needs only one comparison in the loop, rather than two as in Bentley's case (I am assuming that the last comparison is a catch-all case and need not be implemented). What if K does present in the array? An analysis by Timothy J. Rolfe concluded that a single-comparison approach is still preferable in average -- benefit of the early exit does not outweigh the cost of the extra comparison in the loop.

### On Computing the Middle Index

There are some other interesting stories regarding the assignment m := (l + r) / 2. Joshua Bloch from Google noted that for large arrays, adding l and r may cause an overflow, and Bloch was not picking on Bentley -- the bug was reported by Sun. Bloch suggests one of the following:

int m = l + ((r - l) / 2);                          /* for Java/C/C++ */
int m = (l + r) >>> 1;                              /* for Java */
m = ((unsigned int)l + (unsigned int)r)) >> 1;      /* for C/C++ */

Since the publication of the blog post, there have been numerous discussions on whether it should be considered a bug in the binary search algorithm or the integer datatype, and some more machine dependent issues like whether one may have an array so large that cannot be indexed by an int, etc.

To be more language-independent, on the other hand, Roland Backhouse in his book Program Construction: Calculating Implementations from Specifications suggested using (l + r - 1) / 2, such that the value of m will be ⌊(l + r)/2⌋, regardless of whether the integral division operator actually rounds the number up or down.

### Exercise?

Among the exercises suggested by van Gasteren and Feijen, this one caught my eye: let array a[0,N), with 0 < N, be the concatenation of a strictly increasing and a strictly decreasing array. Use binary search to find the maximum element. (For this problem I think it is reasonable to assume that the two sub-arrays could be empty, while a is non-empty.) This happens to a sub-routine I needed for an O(N log N) algorithm for the maximum segment density problem (there are linear-time algorithms for this problem, though), and I do remember I started off treating it as an unimportant sub-routine but had a hard time getting it right. I am glad that now I know more about it.

# A “Side-Swapping” Lemma Regarding Minimum, Using Enriched Indirect Equality

Yu-Han Lyu and I were studying some paper from the algorithm community, and we noticed a peculiar kind of argument. For a much simplified version, let X and D be two relations of type A → B, denoting two alternative approaches to non-deterministically compute possible solution candidates to a problem. Also let be a transitive relation on B, and its converse. The relation min ≤ : {B} → B, given a set, returns one of its elements that is no larger (under ) than any elements in the set, if such a minimum exists.
We would like find solution as small as possible under .

When arguing for the correctness of its algorithm, the paper we are studying claims that the method X is no worse than D in the following sense: if every solution returned by D is no better than some solution returned by X, which we translate to:

D ⊆ ≥ . X

then the best (smallest) solution by X must be no worse than (one of the) best solutions returned by D:

min ≤ . ΛX ⊆ ≤ . min ≤ . ΛD

where Λ converts a relation A → B to a function A → {B} by collecting its results to a set. Note that, awkwardly, X and D are swapped to different sides of relational inclusion.

“What? How could this be true?” was my first reaction. I bombarded Yu-Han with lots of emails, making sure that we didn’t misinterpret the paper. An informal way to see it is that since every result of D is outperformed by something returned by X, collectively, the best result among the latter must is “lower-bounded” by the optimal result of D. But this sounds unconvincing to me. Something is missing.

### Totality and Well-Boundedness

It turns out that the reasoning can be correct, but we need some more constraints on D and . Firstly, D must yield some result whenever X does. Otherwise it could be that D ⊆ ≥ . X is true but ΛD returns an empty set, while ΛX still returns something. This is bad because X is no more a safe alternative of D — it could sometimes do too much. One way to prevent it from happening so is to demand that ΛD = dom ∈ . ΛD, where is the membership relation, and dom ∈, the domain of , consists only of non-empty sets. It will be proved later that this is equivalent to demanding that D be total.

Secondly, we need to be sure that every non-empty set has a minimum, or min ≤ always yields something for non-empty sets. Therefore min ≤ . ΛD would not fall back to the empty relation. Formally, it can be expressed as dom ∈ = dom (min ≤). Bird and de Moor called this property well-boundedness of .

Recall that min ≤ = ∈ ∩ ≤/∋. The part guarantees that min ≤ returns something that is in the given set, while ≤/∋ guarantees that the returned value is a lower-bound of the given set. Since ΛD (as well as ΛX) is a function, we also have min ≤ . ΛD = D ∩ ≤/D°, following from the laws of division.

Later we will prove an auxiliary lemma stating that if is well-bounded, we have:

≤/∋ . dom ∈   ⊆   ≤ . min ≤ . dom ∈

The right-hand side, given a non-empty list, takes its minimum and returns something possibly smaller. The left-hand side merely returns some lower-bound of the given set. It sounds weaker because it does not demand that the set has a minimum. Nevertheless, the inclusion holds if is well-bounded.

An algebraic proof of the auxiliary lemma was given by Akimasa Morihata. The proof, to be discussed later, is quite interesting to me because it makes an unusual use of indirect equality. With the lemma, proof of the main result becomes rather routine:

min ≤ . ΛX   ⊆   ≤ . min ≤ . ΛD
≣   { since ΛD = dom ∈ . ΛD }
min ≤ . ΛX   ⊆   ≤ . min ≤ . dom ∈ . ΛD
⇐  { ≤/∋ . dom ∈ ⊆ ≤ . min ≤ . dom ∈, see below }
min ≤ . ΛX   ⊆   ≤/∋ . dom ∈ . ΛD
≣   { since ΛD = dom ∈ .  ΛD }
min ≤ . ΛX   ⊆   ≤/∋ . ΛD
≣   { since ΛD is a function, R/S . f = R/(f° . S) }
min ≤ . ΛX   ⊆   ≤/D°
≣   { Galois connection }
min ≤ . ΛX . D°   ⊆   ≤
⇐   { min ≤ . ΛX ⊆ ≤/X° }
≤/X°. D°   ⊆   ≤
⇐   { since  D ⊆ ≥ . X }
≤/X°. X° . ≤   ⊆   ≤
⇐   { division }
≤ . ≤   ⊆   ≤
≣   { ≤ transitive }
true

### Proof Using Enriched Indirect Equality

Now we have got to prove that ≤/∋ . dom ∈ ⊆ ≤ . min ≤ . dom ∈ provided that is well-bounded. To prove this lemma I had to resort to first-order logic. I passed the problem to Akimasa Morihata and he quickly came up with a proof. We start with some preparation:

≤/∋ . dom ∈ ⊆ ≤ . min ≤ . dom ∈
⇐   { since min ≤ ⊆ ∈ }
≤/(min ≤)° . dom ∈ ⊆ ≤ . min ≤ . dom ∈

And then we use proof by indirect (in)equality. The proof, however, is unusual in two ways. Firstly, we need the enriched indirect equality proposed by Dijkstra in
EWD 1315: Indirect equality enriched (and a proof by Netty). Typically, proof by indirect equality exploits the property:

x = y    ≡   (∀u. u ⊆ x ≡ u ⊆ y)

and also:

x ⊆ y   ≡   (∀u. u ⊆ x ⇒ u ⊆ y)

When we know that both x and y satisfy some predicate P, enriched indirect equality allows us to prove x = y (or x ⊆ y) by proving a weaker premise:

x = y   ≡   (∀u. P u ⇒ u ⊆ x ≡ u ⊆ y)

Note that both ≤/(min ≤)° . dom ∈ and ≤ . min ≤ . dom ∈ satisfy X = X . dom ∈. Later we will try to prove:

X ⊆ ≤/(min ≤)° . dom ∈    ⇒    X ⊆ ≤ . min ≤ . dom ∈

for X such that X = X . dom ∈.

The second unusual aspect is that rather than starting from one of X ⊆ ≤/(min ≤)° . dom ∈ or X ⊆ ≤ . min ≤ . dom ∈ and ending at another, Morihata’s proof took the goal as a whole and used rules like (P ⇒ Q) ⇒ (P ⇒ P ∧ Q). The proof goes:

(X ⊆ ≤/(min ≤)° . dom ∈  ⇒  X ⊆ ≤ . min ≤ . dom ∈)
⇐   { dom ∈  ⊆ id }
(X  ⊆ ≤/(min ≤)°  ⇒  X ⊆ ≤ . min ≤ . dom ∈)
≣    { Galois connection }
(X . (min ≤)° ⊆ ≤  ⇒  X ⊆ ≤ . min ≤ . dom ∈)
⇐   { (P ⇒ Q) ⇒ (P ⇒ P ∧ Q) }
(X . (min ≤)° ⊆ ≤  ⇒  X ⊆ X . (min ≤)° . min ≤ . dom ∈)
⇐   { R ∩ S ⊆ R  }
(X . (min ≤)° ⊆ ≤  ⇒  X ⊆ X . (((min ≤)° . min ≤) ∩ id) . dom ∈)
≣   { dom R = (R° . R) ∩ id }
(X . (min ≤)° ⊆ ≤  ⇒  X ⊆ X . dom (min ≤) . dom ∈)
≣   { ≤ well-bounded: dom ∈ = dom (min ≤) }
(X . (min ≤)° ⊆ ≤  ⇒  X ⊆ X . dom ∈ . dom ∈)
≣   { dom ∈ . dom ∈ = dom ∈ }
(X . (min ≤)° ⊆ ≤  ⇒  X ⊆ X . dom ∈)
≣   { X = X . dom ∈ }
(X . (min ≤)° ⊆ ≤  ⇒  true)
≣ true

### Auxiliary Proofs

Finally, this is a proof that the constraint ΛD = dom ∈ . ΛD is equivalent to D being total, that is id ⊆ D° . D. Recall that dom ∈ = ((∋ . ∈) ∩ id). We simplify dom ∈ . ΛD a bit:

dom ∈ . ΛD
= ((∋ . ∈) ∩ id) . ΛD
=   { ΛD a function }
(∋ . ∈ . ΛD) ∩ ΛD
=   { ∈ . ΛD = D }
(∋ . D) ∩ ΛD

We reason:

dom ∈ . ΛD = ΛD
≡   { R ∩ S = S iff S ⊆ R }
ΛD ⊆ ∋ . D
≡   { ΛD function, shunting }
id ⊆ (ΛD)° . ∋ . D
≡ id ⊆ D° . D

which is the definition of totality.

# Determining List Steepness in a Homomorphism

The steep list problem is an example we often use when we talk about tupling. A list of numbers is called steep if each element is larger than the sum of elements to its right. Formally,

steep [] = True
steep (x : xs) = x > sum xs ∧ steep xs

The definition above is a quadratic-time algorithm. Can we determine the steepness of a list in linear time? To do so, we realise that we have to compute some more information. Let steepsum = ⟨steep, sum⟩ where ⟨f, g⟩ x = (f x, g x), we discover that steepsum can be computed as a foldr:

steepsum = foldr step (True, 0)
where step x (b, s) = (x > s ∧ b, s + b)

which takes linear time. After computing steepsum, simply take steep = fst . steepsum.

In the Software Construction course we also talked about list homomorphism, that is, functions that can be defined in the form

h [] = e
h [a] = f a
h (xs ⧺ ys) = h xs ⊚ h ys

where is associative and e is the identity element of . The discussion would be incomplete if we didn’t mention the third homomorphism theorem: if a function on lists can be computed both from right to left and from left to right, that it, both by a foldr and a foldl, it can be computed from anywhere in the middle by a homomorphism, which has the potential of being parallelised. Hu sensei had this idea using steepsum as an exercise: can we express steepsum as a foldl, and a list homomorphism?

Unfortunately, we cannot — steepsum is not a foldl. To determining the steepness from left to right, we again have to compute some more information — not necessarily in the form of a tuple.

### Right Capacity

The first idea would be to tuple steepsum with yet another element, some information that would allow us to determine what could be appended to the right of the input. Given a list of numbers xs, let rcap xs (for right capacity) be an (non-inclusive) upperbound: a number y < rcap xs can be safely appended to the right of xs without invalidating the steepness within xs. It can be defined by:

rcap [] = ∞
rcap (x : xs) = (x - sum xs) ↓ rcap xs

where stands for binary minimum. For an explanation of the second clause, consider x : xs ⧺ [y]. To make it a steep list, y has to be smaller than x - sum xs and, furthermore, y has to be small enough so that xs ⧺ [y] is steep. For example, rcap [9,5] is 4, therefore [9,5,3] is still a steep list.

Following the theme of tupling, one could perhaps try to construct ⟨steep, sum, rcap⟩ as a foldl. By doing so, however, one would soon discover that rcap itself contains all information we need. Firstly, a list xs is steep if and only if rcap xs is greater than 0:
Lemma 1: steep xs ⇔ rcap xs > 0.
Proof: induction on xs.
case []: True ⇔ ∞ > 0.
case (x : xs):

steep (x : xs)
⇔ x > sum xs ∧ steep xs
⇔ x - sum xs > 0  ∧ steep xs
⇔   { induction }
x - sum xs > 0 ∧ 0 < rcap xs
⇔ ((x - sum xs) ↓ rcap xs) > 0
⇔ rcap (x : xs) > 0

Secondly, it turns out that rcap is itself a foldl:
Lemma 2: rcap (xs ⧺ [x]) = (rcap xs - x) ↓ x.
Proof: induction on xs.
case []: rcap [x] = x = (∞ - x) ↓ x.
case (y : xs):

rcap (y : xs ⧺ [x])
= ((y - sum xs) - x) ↓ rcap (xs ⧺ [x])
=   { induction }
((y - sum xs) - x) ↓ (rcap xs - x) ↓ x
=   { since (a - x) ↓ (b -x) = (a ↓ b) -x }
(((y - sum xs) ↓ rcap xs) - x) ↓ x
= (rcap (y : xs) - x) ↓ x

Therefore we have rcap = foldl (λ y x → (y - x) ↓ x) ∞. If the aim were to determine steepness from left to right, our job would be done.

However, the aim is to determine steepness from both directions. To efficiently compute rcap using a foldr, we still need to tuple rcap with sum. In summary, the function ⟨sum, rcap⟩ can be computed both in terms of a foldl:

⟨sum, rcap⟩ = foldl step (0, ∞)
where step (s, y) x = (s + x, (y - x) ↓ x)

and a foldr:

⟨sum, rcap⟩ = foldr step (0, ∞)
where step x (s, y) = (x + s, (x - s) ↓ y)

Now, can we compute ⟨sum, rcap⟩ by a list homomorphism?

### List Homomorphism

Abbreviate ⟨sum, rcap⟩ to sumcap. The aim now is to construct such that sumcap (xs ⧺ ys) = sumcap xs ⊚ sumcap ys. The paper Automatic Inversion Generates Divide-and-Conquer Parallel Programs by Morita et al. suggested the following approach (which I discussed, well, using more confusing notations, in a previous post): find a weak inverse of ⟨sum, rcap⟩, that is, some g such that sumcap (g z) = z for all z in the range of sumcap. Then we may take z ⊚ w = sumcap (g z ⧺ g w).

For this problem, however, I find it hard to look for the right g. Anyway, this is the homomorphism that seems to work:

sumcap [] = (0, ∞)
sumcap [x] = (x, x)
sumcap (xs ⧺ ys) = sumcap xs ⊚ sumcap ys
where (s1,c1) ⊚ (s2,c2) = (s1+s2, (c1-s2) ↓ c2)

However, I basically constructed by guessing, and proved it correct afterwards. It takes a simple inductive proof to show that rcap (xs ⧺ ys) = (rcap xs - sum ys) ↓ rcap ys.

I am still wondering what weak inverse of sumcap would lead us to the solution above, however.

# The Windowing Technique for Longest Segment Problems

In the previous post we reviewed Hans Zantema’s algorithm for solving longest segment problems with suffix and overlap-closed predicates. For predicates that are not overlap-closed, Zantema derived a so-called “windowing” technique, which will be the topic of this post.

A brief review: the longest segment problem takes the form:

max# ∘ p ◁ ∘ segs

where segs :: [a] → [[a]], defined by segs = concat ∘ map inits ∘ tails returns all consecutive segments of the input list; p ◁ is an abbreviation for filter p, and max# :: [[a]] → [a] returns the longest list from the input list of lists. In words, the task is to compute the longest consecutive segment of the input that satisfies predicate p.

A predicate p is suffix-closed if p (xs ⧺ ys) ⇒ p ys. For suffix-closed p, Zantema proposed a technique that, from a high-level point of view, looks just like the right solution to such problems. We scan through the input list using a foldr from the right to the left, during which we try to maintain the longest segment satisfying p so far. Also, we keep a prefix of the list that is as long as the currently longest segment, which we call the window.

If, when we move one element to the right, the window (now one element longer than the currently longest segment) happens to satisfy p, it becomes the new optimal solution. Otherwise we drop the right-most element of the window so that it slides leftwards, retaining the length. Notice that it implies that we’d better represent the window using a queue, so we can efficiently add elements from the left and drop from the right.

Derivation of the algorithm is a typical case of tupling.

### Tupling

Given a function h, we attempt to compute it efficiently by turning it into a foldr. It would be possible if the value of the inductive case h (x : xs) were determined solely by x and h xs, that is:

h (x : xs) = f x (h xs)

for some f. With some investigation, however, it would turn out that h (x : xs) also depends on some g:

h (x : xs) = f x (g (x : xs)) (h xs)

Therefore, we instead try to construct their split ⟨ h , g ⟩ as a fold, where the split is defined by:

⟨ h , g ⟩ xs = (h xs, g xs)

and h = fst . ⟨ h , g ⟩.

If ⟨ h , g ⟩ is indeed a fold, it should scan through the list and construct a pair of a h-value and a g-value. To make it feasible, it is then hoped that g (x : xs) can be determined by g xs and h xs. Otherwise, we may have to repeat the process again, making the fold return a triple.

### Segment/Prefix Decomposition

Let us look into the longest segment problem. For suffix-closed p it is reasonable to assume that p [] is true — otherwise p would be false everywhere. Therefore, for the base case we have max# ∘ p ◁ ∘ segs ▪ [] = []. We denote function application by to avoid too many parentheses.

Now the inductive case. It is not hard to derive an alternative definition of segs:

segs [] = [[]]
segs (x : xs) = inits (x:xs) ⧺ segs xs

therefore, we derive:

max# ∘ p ◁ ∘ segs ▪ (x : xs)
= max# ∘ p ◁ ▪ (inits (x : xs) ⧺ segs xs)
= (max# ∘ p ◁ ∘ inits ▪ (x : xs)) ↑#
(max# ∘ p ◁ ∘ segs ▪ xs)

where xs ↑# ys returns the longer one between xs and ys.

It suggests that we maintain, by a foldr, a pair containing the longest segment and the longest prefix satisfying p (that is, max# ∘ p ◁ ∘ inits). It is then hoped that max# ∘ p ◁ ∘ inits ▪ (x : xs) can be computed using max# ∘ p ◁ ∘ inits ▪ xs. And luckily, it is indeed the case, implied by the following proposition proved in an earlier post:

Proposition 1: If p is suffix-closed, we have:

p ◁ ∘ inits ▪ (x : xs) = finits (max# ∘ p ◁ ∘ inits ▪ xs)

where finits ys = p ◁ ∘ inits ▪ (x : ys).

Proposition 1 says that the list (or set) of all the prefixes of x : xs that satisfies p can be computed by the longest prefix of xs (call it ys) satisfying p, provided that p is suffix-closed. A naive way to do so is simply by computing all the prefixes of x : ys and do the filtering again, as is done in finits.

This was the route taken in the previous post. It would turn out, however, to come up with an efficient implementation of f we need some more properties from p, such as that it is also overlap-closed.

### The “Window”

Proposition 1 can be strengthened: to compute all the prefixes of x : xs that satisfies p using finits we do not strictly have to start with ys. Any prefix of xs longer than ys will do.

Proposition 2: If p is suffix-closed, we have:

p ◁ ∘ inits ▪ (x : xs) = finits (take i xs)

where finits ys = p ◁ ∘ inits ▪ (x : ys), and i ≥ length ∘ max# ∘ p ◁ ∘ inits ▪ xs.

In particular, we may choose i to be the length of the longest segment:

Lemma 1:

length ∘ max# ∘ p ◁ ∘ segs ▪ xs ≥
length ∘ max# ∘ p ◁ ∘ inits ▪ xs

Appealing to intuition, Lemma 1 is true because segs xs is a superset of inits xs.

Remark: Zantema proved Proposition 1 by contradiction. The purpose of an earlier post was to give a constructive proof of Proposition 1, which was considerably harder than I expected. I’d be interested to see a constructive proof of Proposition 2.

Now we resume the reasoning:

max# ∘ p ◁ ∘ segs ▪ (x : xs)
= max# ∘ p ◁ ▪ (inits (x : xs) ⧺ segs xs)
= (max# ∘ p ◁ ∘ inits ▪ (x : xs)) ↑#
(max# ∘ p ◁ ∘ segs ▪ xs)
=   { Proposition 2 and Lemma 1 }
let s = max# ∘ p ◁ ∘ segs ▪ xs
in (max# ∘ finits ▪ (x : take (length s) xs)) ↑# s

Define window xs = take (length ∘ max# ∘ p ◁ ∘ segs ▪ xs) xs, the reasoning above suggest that we may try the following tupling:

max# ∘ p ◁ ∘ segs
= fst ∘ ⟨ max# ∘ p ◁ ∘ segs , window ⟩

### Maintaining the Longest Segment and the Window

The task now is to express ⟨ max# ∘ p ◁ ∘ segs , window ⟩ as a foldr. We can do so only if both max# ∘ p ◁ ∘ segs ▪ (x : xs) and window (x : xs) can be determined by max# ∘ p ◁ ∘ segs ▪ xs and window xs. Let us see whether it is the case.

#### Maintaining the Longest Segment

Regarding max# ∘ p ◁ ∘ segs ▪ (x : xs), we have derived:

max# ∘ p ◁ ∘ segs ▪ (x : xs)
=   { as shown above, let s = max# ∘ p ◁ ∘ segs ▪ xs }
(max# ∘ p ◁ ∘ inits ▪ (x : take (length s) xs)) ↑# s

Let s = max# ∘ p ◁ ∘ segs ▪ xs. We consider two cases:

1. Case p (x : take (length s) xs). We reason:
(max# ∘ p ◁ ∘ inits ▪ (x : take (length s) xs)) ↑# s
=   { see below }
(x : take (length s) xs) ↑# s
=   { since the LHS is one element longer than the RHS }
x : take (length s) xs
=   { definition of window }
x : window xs

The first step is correct because, for all zs, p zs implies that max# ∘ p ◁ ∘ inits ▪ zs = zs.

2. Case ¬ p (x : take (length s) xs). In this case (max# ∘ p ◁ ∘ inits ▪ (x : take (length s) xs)) ↑# s must be s, since ¬ p zs implies that length∘ max# ∘ p ◁ ∘ inits ▪ zs < length zs.

#### Maintaining the Window

Now consider the window. Also, we do a case analysis:

1. Case p (x : take (length s) xs). We reason:
window (x : xs)
= take (length ∘ max# ∘ p ◁ ∘ segs ▪ (x : xs)) (x : xs)
=   { by the reasoning above }
take (length (x : take (length s) xs)) (x : xs)
=   { take and length }
x : take (length (take (length s)) xs) xs
=   { take and length }
x : take (length s) xs
= x : window xs

2. Case ¬ p (x : take (length s) xs). We reason:
window (x : xs)
= take (length ∘ max# ∘ p ◁ ∘ segs ▪ (x : xs)) (x : xs)
=   { by the reasoning above }
take (length s) (x : xs)
=   { take and length }
x : take (length (s-1)) xs
= x: init (window xs)

#### The Algorithm

In summary, the reasoning above shows that

⟨ max# ∘ p ◁ ∘ segs , window ⟩ = foldr step ([], [])

where step is given by

step x (s, w) | p (x : w) = (x : w, x : w)
| otherwise = (s, x : init w)

As is typical of many program derivations, after much work we deliver an algorithm that appears to be rather simple. The key invariants that made the algorithm correct, such as that s is the optimal segment and that w is as long as s, are all left implicit. It would be hard to prove the correctness of the algorithm without these clues.

We are not quite done yet. The window w had better be implemented using a queue, so that init w can be performed efficiently. The algorithm then runs in time linear to the length of the input list, provided that p (x : w) can be performed in constant time -- which is usually not the case for interesting predicates. We may then again tuple the fold with some information that helps to compute p efficiently. But I shall stop here.

# Longest Segment Satisfying Suffix and Overlap-Closed Predicates

I spent most of the week preparing for the lecture on Monday, in which we plan to talk about segment problems. One of the things we would like to do is to translate the derivations in Hans Zantema’s Longest Segment Problems to Bird-Meertens style. Here is a summary of the part I managed to do this week.

Zantema’s paper considered problems of this form:

max# ∘ p ◁ ∘ segs

where segs :: [a] → [[a]], defined by segs = concat ∘ map inits ∘ tails returns all consecutive segments of the input list; p ◁ is a shorter notation for filter p, and max# :: [[a]] → [a] returns the longest list from the input list of lists. In words, the task is to compute the longest consecutive segment of the input that satisfies predicate p.

Of course, we have to assume certain properties from the predicate p. A predicate p is:

• suffix-closed, if p (xs ⧺ ys) ⇒ p ys;
• overlap-closed, if p (xs ⧺ ys) ∧ p (ys ⧺ zs) ∧ ys ≠ [] ⇒ p~(xs ⧺ ys ⧺ zs).

For example, ascending is suffix and overlapping-closed, while p xs = (all (0 ≤) xs) ∧ (sum xs ≤ C) for some constant C is suffix-closed but not overlap-closed. Note that for suffix-closed p, it is reasonable to assume that p [] is true, otherwise p would be false everywhere. It also saves us some trouble being sure that max# is always applied to a non-empty set.

I denote function application by an infix operator that binds looser than function composition but tighter than other binary operators. Therefore f ∘ g ∘ h ▪ x means f (g (h x)).

### Prefix/Suffix Decomposition

Let us begin with the usual prefix/suffix decomposition:

max# ∘ p ◁ ∘ segs
= max# ∘ p ◁ ∘ concat ∘ map inits ∘ tails
= max# ∘ concat ∘ map (p ◁) ∘ map inits ∘ tails
= max# ∘ map (max# ∘ p ◁ ∘ inits) ∘ tails

Like what we do with the classical maximum segment sum, if we can somehow turn max# ∘ p ◁ ∘ inits into a fold, we can then implement map (foldr ...) ∘ tails by a scanr. Let us denote max# ∘ p ◁ ∘ inits by mpi.

If you believe in structural recursion, you may attempt to fuse map# ∘ p ◁ into inits by fold-fusion. Unfortunately, it does not work this way! In the fold-fusion theorem:

h ∘ foldr f e = foldr g (h e)   ⇐   h (f x y) = g x (h y)

notice that x and y are universally quantified, which is too strong for this case. Many properties we have, to be shown later, do need information from the context — e.g. some properties are true only if y is the result of inits.

### Trimming Unneeded Prefixes

One of the crucial properties we need is the following:

Proposition 1: If p is suffix-closed, we have:

p ◁ ∘ inits ▪ (x : xs) =
p ◁ ∘ inits ▪ (x : max# ∘ p ◁ ∘ inits ▪ xs)

For some intuition, let x = 1 and xs = [2,3,4]. The left-hand side first compute all prefixes of xs:

[] [2] [2,3] [2,3,4]

before filtering them. Let us assume that only [] and [2,3] pass the check p. We then pick the longest one, [2,3], cons it with 1, and compute all its prefixes:

[] [1] [1,2] [1,2,3]

before filtering them with p again.

The right-hand side, on the other hand, performs filtering on all prefixes of [1,2,3,4]. However, the proposition says that it is the same as the left-hand side — filtering on the prefixes of [1,2,3] only. We lose nothing if we drop [1,2,3,4]. Indeed, since p is suffix-closed, if p [1,2,3,4] were true, p [2,3,4] would have been true on the right-hand side.

Proof of Proposition 1 was the topic of a previous post. The proposition is useful for us because:

mpi (x : xs)
= max# ∘ p ◁ ∘ inits ▪ (x : xs)
=    { Proposition 1 }
max# ∘ p ◁ ∘ inits ▪ (x : max# ∘ p ◁ ∘ inits ▪ xs)
= mpi (x : mpi xs)

Therefore mpi is a fold!

mpi = foldr (λx ys → mpi (x : ys)) []

### Refining the Step Function

We still have to refine the step function λx ys → mpi (x : ys) to something more efficient. Luckily, for overlap-closed p, mpi (x : ys) is either [], [x], or x : ys — if ys is the result of mpi.

Proposition 2: If p is overlap-closed, mpi (x : mpi xs) = x ⊙ xs, where is defined by:

x ⊙ ys | p (x : xs) = x : ys
| p [x] = [x]
| otherwise = []

To see why Proposition 2 is true, consider mpi (x : mpi xs).

• If mpi (x : mpi xs) = [], we are done.
• Otherwise it must be x : zs for some zs ∈ inits (mpi xs). And we have p~(x : zs) because it is a result of mpi. Again consider two cases:
• If zs = [], both sides reduce to [x], otherwise…
• Let us not forget that p (mpi xs) must be true. Also, since zs ∈ inits (mpi xs), we have mpi xs = zs ⧺ ws for some ws. Together that implies p (x : zs ⧺ ws) = p (x : mpi xs) must be true.

Notice that the reasoning above (from Zantema) is a proof-by-contradiction. I do not yet know how hard it is to build a constructive proof.

With Proposition 1 and 2 we have turned mpi to a fold. That leads to the derivation:

max# ∘ p ◁ ∘ segs
=   { derivation above }
max# ∘ map (max# ∘ p ◁ ∘ inits) ∘ tails
= max# ∘ map (foldr (⊙) []) ∘ tails
= max# ∘ scanr (⊙) []

with the definition of given above. It turns out to be a rather simple algorithm: we scan through the list, and in each step we choose among three outcomes: [], [x], and x : ys. Like the maximum segment sum problem, it is a simple algorithm whose correctness is that that easy to justify.

The algorithm would be linear-time if can be computed in constant-time. With the presence of p in , however, it is unlikely the case.

### Efficient Testing

So let us compute, during the fold, something that allows p to be determined efficiently. Assume that there exists some φ :: [A] → B that is a fold (φ = foldr ⊗ ι for some and ι), such that p (x : xs) = p xs ∧ f x (φ xs) for some f. Some example choices of φ and f:

• p = ascending. We may pick:

φ xs = if null xs then Nothing else Just (head xs)
f x Nothing = true
f x (Just y) = x ≤ y
• p xs = all elements in xs equal modolu 3. We may pick:
φ xs = if null xs then Nothing else Just (head xs `mod` 3)
f x Nothing = true
f x (Just y) = x `mod`3 == y

Let us tuple mpi with φ, and turn them into one fold. Let ⟨ f , g ⟩ x = (f x, g x), we derive:

max# ∘ p ◁ ∘ inits
=   { f = fst ∘  ⟨ f , g ⟩, see below}
fst ∘ ⟨ max# ∘ p ◁ ∘ inits , φ ⟩
= fst ∘ foldr step ([], ι)

where step is given by

step x (xs, b)
| f x b = (x : xs , x ⊗ b)
| f x ι = ([x], x ⊗ ι)
| otherwise = ([], ι)

Notice that the property f = fst ∘ ⟨ f , g ⟩ is true when the domain of f is in the domain of g, in particular, when they are both total, which again shows why we prefer to work in a semantics with total functions only.

Let us restart the main derivation again, this time use the tupling:

max# ∘ p ◁ ∘ segs
= max# ∘ map (max# ∘ p ◁ ∘ inits) ∘ tails
= max# ∘ map (fst ∘ ⟨ max# ∘ p ◁ ∘ inits , φ ⟩) ∘ tails
=   { since map# ∘ map fst = fst ∘ map#', see below}
fst ∘ max#' ∘ map ⟨ max# ∘ p ◁ ∘ inits , φ ⟩ ∘ tails
=   { derivation above }
fst ∘ max#' ∘ map (foldr step ([], ι) ∘ tails
= fst ∘ max#' ∘ scanr step ([], ι)

where map#' compares the lengths of the first components. This is a linear-time algorithm.

### Next… Windowing?

What if p is not overlap-closed? Zantema used a technique called windowing, which I will defer to next time…

# On a Basic Property for the Longest Prefix Problem

In the Software Construction course next week we will, inevitably, talk about maximum segment sum. A natural next step is to continue with the theme of segment problems, which doesn’t feel complete without mentioning Hans Zantema’s Longest Segment Problems.

The paper deals with problem of this form:

ls = max# ∘ p ◁ ∘ segs

That is, computing the longest consecutive segment of the input list that satisfies predicate p. When writing on paper I found it much easier denoting filter p by the Bird-Meertens style p ◁ and I will use the latter for this post too. The function segs :: [a] → [[a]], defined by segs = concat ∘ map inits ∘ tails returns all consecutive segments of the input list, and max# :: [[a]] → [a] returns the longest list from the input list of lists. To avoid long nesting parenthesis, I denote function application by an infix operator that binds looser than function composition . Therefore f ∘ g ∘ h ▪ x means f (g (h x)).

Standard transformation turns the specification to the form

ls = max# ∘ map (max# ∘ p ◁ ∘ inits) ∘ tails

Therefore we may solve ls if we manage to solve its sub-problem on prefixes:

lp = max# ∘ p ◁ ∘ inits

that is, computing the longest prefix of the input list satisfying predicate p. One of the key propositions in the paper says:

Proposition 1: If p is suffix-closed (that is, p (x ⧺ y) ⇒ p y), we have:

p ◁ ∘ inits ▪ (a : x) =
p ◁ ∘ inits ▪ (a : max# ∘ p ◁ ∘ inits ▪ x)

It is useful because, by composing max# on both sides we get

lp (a : x) = max# ∘ p ◁ ∘ inits ▪ (a : lp x)

that is, lp can be computed by a foldr.

Of course, we are not quite done yet. We then have to somehow simplify p ◁ ∘ inits ▪ (a : lp x) to something more efficient. Before we move on, however, proving Proposition 1 turns out to be an interesting challenge in itself.

### Intuition

What does Proposition 1 actually say? Let x = [1,2,3] and a = 0. On the left-hand side, we are performing p ◁ on

[] [0] [0,1] [0,1,2] [0,1,2,3]

The right hand side says that we may first filter the tails of [1,2,3]:

[] [1] [1,2] [1,2,3]

Assuming that only [] and [1,2] get chosen. We may then keep the longest prefix [1,2] only, generate all its prefixes (which would be [] [1] [1,2]), and filter the latter again. In other words, we lost no information dropping [1,2,3] if it fails predicate p, since by suffix-closure, p ([0] ⧺ [1,2,3]) ⇒ p [1,2,3]. If [1,2,3] doesn’t pass p, p [0,1,2,3] cannot be true either.

Zantema has a nice and brief proof of Proposition 1 by contradiction. However, the theme of this course has mainly focused on proof by induction and, to keep the possibility of someday encoding our derivations in tools like Coq or Agda, we would like to have a constructive proof.

So, is it possible to prove Proposition 1 in a constructive manner?

### The Proof

I managed to come up with a proof. I’d be happy to know if there is a better way, however.

For brevity, I denote if p then x else y by p → x; y. Also, define

a ⊕p x = p a → a : x ; x

Therefore p ◁ is defined by

p ◁ = foldr ⊕p []

Here comes the the main proof:

Proposition 1

p ◁ ∘ inits ▪ (a : x) = p ◁ ∘ inits ▪ (a : max# ∘ p ◁ ∘ inits ▪ x)

if p is suffix-closed.
Proof.

=      { definition of inits }
p ◁ ([] : map (a :) ∘ inits ∘ max# ∘ p ◁ ∘ inits ▪ x)
=      { definition of p ◁ }
[] ⊕p (p ◁ ∘ map (a :) ∘ inits ∘ max# ∘ p ◁ ∘ inits ▪ x)
=      { Lemma 1 }
[] ⊕p (p ◁ ∘ map (a :) ∘ p ◁ ∘ inits ∘ max# ∘ p ◁ ∘ inits ▪ x)
=      { Lemma 2 }
[] ⊕p (p ◁ ∘ map (a :) ∘ p ◁ ∘ inits ▪ x)
=      { Lemma 1 }
[] ⊕p (p ◁ ∘ map (a :) ∘ inits ▪ x)
=      { definition of p ◁ }
p ◁ ([] : map (a :) ∘ inits ▪ x)
=      { definition of inits }
p ◁ ∘ inits ▪ (a : x)

The main proof refers to two “decomposition” lemmas, both are of the form f ∘ g = f ∘ g ∘ f:

• Lemma 1: p ◁ ∘ map (a:) = p ◁ ∘ map (a:) ∘ p ◁ if p suffix-closed.
• Lemma 2: p ◁ ∘ inits ∘ max# ∘ p ◁ ∘ inits = p ◁ ∘ inits for all predicate p.

Both are proved by structural induction. For Lemma 1 we need the conditional distribution rule:

f (p →  x; y) = (p →  f x; f y)

If we are working in CPO we need the side condition that f is strict, which is true for the cases below anyway:
Lemma 1

p ◁ ∘ map (a:) =  p ◁ ∘ map (a:) ∘ p ◁

if p is suffix-closed.
Proof. Structural induction on the input.
Case []: trivial.
Case (x : xs):

p ◁ ∘ map (a:) ∘ p ◁ ▪ (x : xs)
=   { definition of p ◁ }
p ◁ ∘ map (a:) ▪ (p x →  x : p ◁ xs ; p ◁ xs)
=   { map distributes into conditionals }
p ◁ ▪ (p x → (a : x) : map (a :) ∘ p ◁ ▪ xs ; map (a :) ∘ p ◁ ▪ xs)
=   { p ◁ distributes into conditionals }
p x → p ◁ ((a : x) : map (a :) ∘ p ◁ ▪ xs) ;
p ◁ ∘ map (a :) .p ◁ ▪ xs
=   { definition of p ◁ }
p x → (p (a : x) → (a : x) : p ◁ ∘ map (a :) ∘ p ◁ ▪ xs) ;
p ◁ ∘ map (a :) ∘ p ◁ ▪ xs) ;
p ◁ ∘ map (a :) ∘ p ◁ ▪ xs
=   { induction }
p x → (p (a : x) → (a : x) : p ◁ ∘ map (a :) ▪ xs) ;
p ◁ ∘ map (a :) ▪ xs) ;
p ◁ ∘ map (a :) ▪ xs
=   { since p (a : x) ⇒ p x by suffix closure }
p (a : x) → (a : x) : p ◁ ∘ map (a :) ▪ xs) ;
p ◁ ∘ map (a :) ▪ xs
=   { definition of p ◁ }
p ◁ ((a : x) : map (a :) xs)
=   { definition of map }
p ◁ ∘ map (a :) ▪ (x : xs)

For Lemma 2, it is important that p is universally quantified. We need the following map-filter exchange rule:

p ◁ ∘ map (a :) =  map (a :) ∘ (p ∘ (a:)) ◁

The proof goes:
Lemma 2 For all predicate p we have

p ◁ ∘ inits ∘ max# ∘ p ◁ ∘ inits = p ◁ ∘ inits

Proof. Structural induction on the input.
Case []: trivial.
Case (a : x):

p ◁ ∘ inits ∘ max# ∘ p ◁ ∘ inits ▪ (a : x)
= p ◁ ∘ inits ∘ max# ∘ p ◁ ▪ ([] : map (a :) (inits x))

Consider two cases:
1. Case p [] ∧ null (p ◁ ∘ map (a :) ∘ inits ▪ x):
If ¬ p [], both sides are undefined. Otherwise:

...
= p ◁ ∘ inits ∘ max# ▪ []
= []
= p ◁ ▪ ([] : p ◁ ∘ map (a : ) ∘ inits ▪ x)
= p ◁ ∘ inits ▪ (a : x)

2. Case ¬ (null (p ◁ ∘ map (a :) ∘ inits ▪ x)):

...
= p ◁ ∘ inits ∘ max# ∘ p ◁ ∘ map (a :) ∘ inits ▪ x
=   { map-filter exchange }
p ◁ ∘ inits ∘ max# ∘ map (a :) ∘ (p ∘ (a:)) ◁ ∘ inits ▪ x
=   { since  max# ∘ map (a :) =  (a :) ∘ max# }
p ◁ ∘ inits ∘ (a :) ∘ max# ∘ (p ∘ (a :)) ◁ ∘ inits ▪ x
=   { definition of inits }
p ◁ ([] : map (a :) ∘ inits ∘  max# ∘ (p ∘ (a :)) ◁ ∘ inits ▪ x)
=   { definition of p ◁ }
p ⊕p (p ◁ ∘ map (a :) ∘ inits ∘  max# ∘ (p ∘ (a :)) ◁ ∘ inits ▪ x)
=   { map-filter exchange }
p ⊕p (map (a :) ∘ (p ∘ (a :)) ◁ ∘ inits ∘  max# ∘ (p ∘ (a :)) ◁ ∘ inits ▪ x)
=   { induction }
p ⊕p (map (a :) ∘ (p ∘ (a :)) ◁ ∘ inits ▪ x)
=   { map-filter exchange }
p ⊕p (p ◁ ∘ map (a :) ∘ inits ▪ x)
=   { definition of p ◁ }
p ◁ ( [] :  map (a :) ∘ inits ▪ x)
=   { definition of inits }
p ◁ ∘ inits ▪ (a : x)

# Algebra of programming in Agda: dependent types for relational program derivation

S-C. Mu, H-S. Ko, and P. Jansson. In Journal of Functional Programming, Vol. 19(5), pp. 545-579. Sep. 2009 [PDF]

Relational program derivation is the technique of stepwise refining a relational specification to a program by algebraic rules. The program thus obtained is correct by construction. Meanwhile, dependent type theory is rich enough to express various correctness properties to be verified by the type checker.

We have developed a library, AoPA, to encode relational derivations in the dependently typed programming language Agda. A program is coupled with an algebraic derivation whose correctness is guaranteed by the type system.

Two non-trivial examples are presented: an optimisation problem, and a derivation of quicksort where well-founded recursion is used to model terminating hylomorphisms in a language with inductive types.

This article extends the paper we published in Mathematics of Program Construction 2008. Code accompanying the paper has been developed into an Agda library AoPA.

# Tail-Recursive, Linear-Time Fibonacci

For two consecutive years I have been responsible for teaching the program derivation course in the FLOLAC summer school. It was an exam question last year to derive a linear-time solution computing the Fibonacci function, whose definition we all know very well:

$fib 0 = 0$
fib 1 = 1
fib (n+2) = fib (n+1) + fib n

I would like the students to see the derivation as an instance of tupling: to generalise the targeted function by having it return more results, hoping that the generalisation can actually be performed faster. For this problem, one may consider the generalisation:

$fib2 n = \left(fib n, fib \left(n+1\right)\right)$

from which we may easily construct a recursive definition for fib2 by simply unfolding and folding the definitions. Once we finish, take fib = fst . fib2.

### Tupling and Fold Fusion

As a practice, however, I instructed the students to see this as an instance of fold fusion with the fold on natural numbers:

$foldN f e 0 = e$
foldN f e (n+1) = f (foldN f e n)

The derivation goes like

=    { since id = foldN (1+) 0 }
fib2 . foldN (1+) 0
=    { fold fusion, see below }
foldN step (0,1)

To derive step through the fusion condition, we reason:

=    { def. of fib2 }
(fib (n+1), fib (n+2))
=    { def. of fib }
(fib (n+1), fib (n+1) + fib n)
=    { let step (x,y) = (y,x+y) }
step (fib n, fib (n+1))
=    { def. of fib2 }
step (fib2 n)

Therefore, we have fib2 = foldN step (0,1) where step (x,y) = (y,x+y).

Oleg Kiselyov also used a similar definition of fib as a fold (on a list of length n) as an example in his library of fold transformers.

### Accumulating Parameters

I was perhaps so happy with seeing fib as a fold-fusion that I forgot one may want a tail recursive definition. This is one of the exercises of the functional programming course (taught by Kung Chen) this year, where the students may just write down the (obvious) tail recursive, linear-time function computing Fibonacci. The hint was to “introduce two accumulating parameters.” Can I derive this definition?

When I taught the students about accumulating parameters, I also told them to “look for some generalisation by adding more parameters.” The canonical example is perhaps generalising quadratic time reverse to fastrev xs ys = reverse xs ++ ys. For fib, it appeared reasonable to generalise the base cases. I was later told by Oleg that this function is called gib:

$gib 0 x y = x$
gib 1 x y = y
gib (n+2) x y = gib (n+1) x y + gib n x y

From this definition, however, a linear-time implementation of gib does not follow immediately. We will need this lemma:

$gib n \left(fib k\right) \left(fib \left(k+1\right)\right) = fib \left(n+k\right)$(1)

which can be proved by induction on n. For an intuitive understanding, it says that gib allows us to compute Fibonacci not only from the base cases 0 and 1, but also from the middle: if we take fib k and fib (k+1) as base cases, we get fib (n+k).

The property (1) came from some slides Kung Chen showed me, originated from John Hughes. John Hughes actually suggested to start with (1) as the specification. Either way, I wonder how one could come up with (1) in the first place. Given the definition of gib, (1) can be proved by induction. The rest of the derivation is not too hard. This is the derived program. Details omitted.

$gib 0 x y = x$
gib (n+1) x y = gib n y (x+y)

### Tail-Recursive Fibonacci

Meanwhile, Josh Ko gave me this generalisation:

$ffib n x y = x × fib n + y × fib \left(n+1\right)$

The motivation came from, perhaps, the observation that every fib n can always be expressed as an linear combination of some smaller fib i and fib j:

$fib 2 = fib 1 + fib 0$
fib 3 = fib 2 + fib 1 = 2 × fib 1 + fib 0
fib 4 = fib 3 + fib 2 = 2 × fib 2 + fib 1 = 3 × fib1 + 2 × fib 0
:

Given the definition, it is immediate that ffib 0 x y = y and:

$ffib \left(n+1\right) x y$
= x × fib (n+1) + y × fib (n+2)
= x × fib (n+1) + y × fib (n+1) + y × fib n
= (x+y) × fib (n+1) + y × fib n
= ffib n y (x+y)

Therefore we have derived:

$ffib 0 x y = y$
ffib (n+1) x y = ffib n y (x+y)

which is more or less what one would expect. It differs from the derived linear-time implementation of gib only on the base case.

Finally, since gib and ffib define similar functions, ffib satisfies a property similar to (1):

$ffib n \left(fib k\right) \left(fib \left(k+1\right)\right) = fib \left(n+k+1\right)$(2)

The proof proceeds by induction on n. The base case is trivial. For the inductive case:

= fib k × fib (n+1) + fib (k+1) ×fib (n+2)
= fib k × fib (n+1) + fib (k+1) × fib (n+1) + fib (k+1) × fib n
= (fib k + fib (k+1)) × fib (n+1) + fib (k+1) × fib n
= ffib n (fib (k+1)) (fib (k+2))
=   { induction }
fib (n+k+2)

thus completes the proof.