Tag Archives: Segment Problems

An Exercise Utilising Galois Connections

Given two partial orders (A, ⊑), (B, ≼), two functions f : A → B, g : B → A form a Galois connection between them if for all a : A, b : B we have

  f a ≼ b ≣ a ⊑ g b

We will refer to this defining property as “GC” later. The function f is called the lower adjoint and g the upper adjoint of the Galois connection. Galois connections are interesting because once two functions are identified as such, they immediately satisfy a rich collection of useful properties:

  • letting a := g b in GC, we get f (g b) ≼ b;
  • letting b := f a, we get a ⊑ g (f a);
  • f is monotonic, since:
     f a₁ ≼ f a₂
    ≣   { GC }
     a₁ ⊑ g (f a₂)
    ⇐  {  since a ⊑ g (f a) }
     a₁ ⊑ a₂
  • similarly, g is monotonic: b₁ ≼ b₂ ⇒ f b₁ ⊑ f b₂,

and many more.

In the recent work of Sharon and me on maximally dense segments we needed quite a number of functions to be monotonic, idempotent, etc. It only occurred to me after submitting the paper: could they be defined as Galois connections? The number of properties we needed in the paper is huge and it would be nice to establish them on fewer basic properties. And it looks prettier.

Longest Prefix Up to a Certain Sum

One such function is trim in the paper, but it is sufficient to consider a simplification: let sam : [Int] → [Int] (for “sum atmost”) return the longest prefix of the input list whose sum is no larger than a constant U. Denote “x is a prefix of y” by x ⊑ y. We want to show that sam satisfies

  • monotonicity: x ⊑ y ⇒ sam x ⊑ sam y, and
  • idempotence: sam (sam x) = sam x.

Can they be derived by defining sam as a Galois connection?

I learned from José N. Oliveira‘s talk A Look at Program “G”alculation in IFIP WG 2.1 #65 Meeting how (the uncurried version of) take can be defined as a Galois connection. It turns out that sam is just the same. We consider a slight generalisation sam' : (Int, [Int]) → [Int] that takes an upper bound as a parameter. It can be characterised by:

sum y ≤ b  ∧  y ⊑ x   ≣   y ⊑ sam' (b, x)

There is in fact a Galois connection hidden already! To see that, define ⟨f, g⟩ a = (f a, g a) (in the Haskell Hierarchy Library it is defined in Control.Arrow as &&&), and denote the product of binary relations by ×, that is, if a ≤ b and x ⊑ y then (a,x) is related to (b,y) by ≤×⊑. We write a composed relation as an infix operator by surrounding it in square brackets (a,x) [≤×⊑] (b,y).

Using these notations, the defining equation of sam' can be rewritten as:

⟨sum, id⟩ y [≤×⊑] (b,x)   ≣   y ⊑ sam' (b,x)

Thus sam' is the upper adjoint in a Galois connection between ((Int, [Int]), ≤×⊑) and ([Int], ⊑)!

Now that ⟨sum, id⟩ and sam' form a Galois connection, we have:

  • f (g b) ≼ b instantiates to ⟨sum, id⟩ (sam' (b,x)) [≤×⊑] (b,x), that is, sum (sam' (b,x)) ≤ b and sam' (b,x) ⊑ x;
  • a ⊑ g (f a) instantiates to x ⊑ sam' (sum x, x). Together with the previous property we have x = sam' (sum x, x);
  • monotonicity of the lower adjoint instantiates to y₁ ⊑ y₂ ⇒ sum y₁ ≤ sum y₂ ∧ y₁ ⊑ y₂;
  • monotonicity of the upper adjoint instantiates to
    (b₁,x₁) [≤×⊑] (b₂,x₂)   ⇒   sam' (b₁,x₁) ⊑ sam' (b₂,x₂) 

    that is

    b₁ ≤ b₂   ∧  x₁ ⊑ x₂   ⇒   sam' (b₁,x₁) ⊑ sam' (b₂,x₂) 

    a generalisation of the monotonicity we want.

Finally, to show idempotence, we reason

   sam' (b₁, x) ⊑ sam' (b₁, sam' (b₂, x))
≣   { GC }
   ⟨sum, id⟩ (sam' (b₁, x)) [≤×⊑]  (b₁, sam' (b₂, x))
≣   { definitions }
   sum (sam' (b₁, x)) ≤ b₁   ∧   sam' (b₁, x) ⊑ sam' (b₂, x)
⇐  { properties above }
   b₁ ≤ b₂

These are all nice and pretty. There is another function, however, that is much harder to deal with, which I will write about next time.

Finding Maximally Dense Segments

Sharon and I have finally concluded, for now, our work on the maximally dense segment problem (draft, with an errata already!), on which we have been working on and off for the past two years. Considering the algorithm itself and its derivation/proofs, I am quite happy with what we have achieved. The algorithm is rather complex, however, and it is a challenge presenting it in an accessible way. Sharon has done a great job polishing the paper, and I do hope more people would be interested in reading it and it would, eventually, inspire more work on interesting program derivations.

The basic form of the problem looks like a natural variation of the classical maximum segment sum problem: given a list of numbers, find a consecutive segment whose average, that is, sum divided by length, is maximum. The problem would be trivial without more constraints, since one could simply return the largest element, thus we usually impose a lower bound L on the length of feasible segments.

It was noticed by Huang [3], that a segment having maximum average need not be longer than 2L - 1: given a segment of 2L elements or more, we cut it in the middle. If the two halves have different averages, we keep the larger one. Otherwise the two halves have the same average. Either way, we get a shorter, feasible segment whose average is not lower. The fact hints at a trivial O(nL) algorithm: for each suffix of the list, find its best prefix upto 2L - 1 elements long.

A difficult challenge, however, is to come up with an algorithm that is O(n), independently of L. The problem can be generalised to the case where the elements do not have length 1, but each has a width, and the goal is to maximise the density — sum of the elements divided by sum of their width. It makes the problem sightly more complicated, but does not change its nature. If we go on to impose an upper bound U on the length as well, however, the problem becomes much more difficult. There was an published algorithm that claimed to be linear only to be found not so. We discovered that two later algorithms, which appeared to have concluded the problem, also fail for a boundary case. The bug is easy to fix for one of the algorithm, but might not be so for the other.

Our algorithm closely relates to that of Chung and Lu [1] and that of Goldwasser et al [2]. The algorithm is perhaps too complex to present in detail in a blog post (that’s why we need a paper!), but I will try to give an outline using pictures from the paper, my slides and poster.

One of the ways to visualise the problem is to see each element as a block, the number being the area of the block, and the density would be its height. The input is a list of (area, width) pairs, and the goal is to find a consecutive segment maximising the height. Shown below is the input list [(9,6),(6,2),(14,7),(20,4),(-10,5),(20,8),(-2,2),(27,6)], and the dashed line is their average height:

Notice that an area can be negative. In the paper, since the alphabet w is used for “window” (to be explained below), we instead refer to the width as “breadth”.

Prefixes of Suffixes, and the Window

Many optimal segment problems (finding some optimal segment of a given list) are solved by finding, for each suffix, its optimal prefix, as shown below. Each bar is a suffix of the input, and the blue part is its optimal prefix:

It is preferable that an optimal prefix of a : x can be computed from the optimal prefix of x, that is, the function computing the optimal prefix is a foldr. If it is true, the algorithm may keep a pair of (optimal segment, optimal prefix). Each time a new element is read, it computes the new optimal prefix using the previous optimal prefix, and update the optimal segment if the new prefix is better. If you like structured recursion (or the so-called “origami programming”), this form of computation is an instance of a zygomorphism.

For each optimal prefix to be computable from the previous optimal prefix, it may not extend further than the latter. We do not want the following to happen:

However, it appears to be possible for the maximally dense prefix! Imagining adding a very small, or even negative area. We might get a denser prefix by extending further to the right since the denominator is larger.

The first theorem we had to prove aimed to show that it does not matter — if a maximally dense prefix extends further than the previous one, it is going to be suboptimal anyway. Thus it is safe if we always start from the right end of the previous prefix. That is, we do not compute the maximally dense prefix of the entire input, but merely the maximally dense prefix of the previous prefix.

This is an instance of the sliding window scheme proposed by Zantema [4]. The blue part is like a “window” of the list, containing enough information to guarantee the correctness of the algorithm. As the algorithm progresses, the two ends of the window keeps sliding to the left, hence the name.

To formally show that the window contains enough information to compute the maximally dense segment, we have to clearly state what window is, and what invariant it satisfies. It turned out to be quite tricky to formally state the intuition that “the window does not always give you the optimal prefix, but it does when it matters,” and was the first challenge we met.

Since we aim at computing a segment at least L units in breadth, it might be handy to split the window into a “compulsory part” (the shortest prefix that is at least L units wide) and the rest, the “optional part”. The algorithm thus looks like this:

where the yellow bars are the compulsory parts and blue bars the optional parts. Each time we read an element into the compulsory part, zero or more elements (since the elements have non-uniform breadths) may be shifted from the compulsory part to the optional part. Then we compute a maximally dense prefix (the yellow and the blue parts together) that does not extend further than the previous one. The best among all these prefixes is the maximally dense segment.

We want a linear time algorithm, which means that all the computation from a pair of yellow-blue bars to the next pair has to be done in (amortised) constant time — how is that possible at all? To do so we will need to exploit some structure in the optional part, based on properties of density and segments.

Right-Skew Segments, and the DRSP

A non-empty list of elements x is called right-skew if, for every non-empty x₁ and x₂ such that x₁ ⧺ x₂ = x, we have density x₁ ≤ density x₂. Informally, a right-skew list is drawn as the blue wavy block below:

The rising wavy slope informally hints that the right half has a higher density than the left half wherever you make it cut. Howver, the drawing is at risk from the misunderstanding that a right-skew segment is a list of elements with ascending areas or densities. Note that neither the areas nor the densities of individual elements have to be ascending. For example, the list [(9,6),(6,2),(14,7)], with densities [1.5, 3, 2], is right-skew.

Right-skew lists are useful because of the following property. Imagining placing a list z next to x, as depicted above. To find a maximally dense prefix of z ⧺ x starting with z, it is sufficient to consider only z and z ⧺ x — nothing in the middle, such as z ⧺ x₁, can be denser than the two ends!

Given a window with compulsory part c and optional part x, if we can partition x into x₁ ⧺ x₂ ⧺ ... ⧺ xn, such that x₁, x₂, … xn are all right-skew, then to compute the maximally dense prefix of c ⧺ x, we only need to consider c, c ⧺ x₁, c ⧺ x₁ ⧺ x₂,… and c ⧺ x₁ ⧺ x₂ ⧺ ... ⧺ xn.

Such a partition is always possible for any list x — after all, each element itself constitute a singleton right-skew list. However, there is one unique right-skew partition such that the densities of x₁, x₂, … xn are strictly decreasing. This is called the decreasing right-skew partition (DRSP) of x. We will partition the optional part of the window into its DRSP. A window now looks like the picture below:

Sharon summarised many nice properties of DRSP in the paper, for which we unfortunately do not have space here. We will only look at some properties that matters for this blog post. Firstly, consider the diagram below:

In the bottom row, the leftmost block is the density of c, and the second block is the density of c ⧺ x₁, etc. If segments x₁, x₂, … xn have decreasing densities, the densities of c, c ⧺ x₁, c ⧺ x₁ ⧺ x₂,… and c ⧺ x₁ ⧺ x₂ ⧺ ... ⧺ xn must be bitonic — first ascending, then descending. It helps to efficiently locate the maximally dense prefix.

Secondly, the DRSP can be built and maintained in a foldr. The following diagram depicts how the DRSP for the list of areas [1,4,2,5,3] (all with breadth 1) can be built by adding elements from the left one by one (which eventually results in one big partition):

The rule is that blocks newly added from the left keeps merging with blocks to its right until it encounters a block shorter than itself. The top-left of the diagram indicates that the DRSP of (3 is itself. Since 5 > 3, adding 1 results in a partition containing two segments. When 2 is added, it is merged with 5 to form a new segment with density 3.5. No merging is triggered with the addition of 4 since 4 > 3.5 and thus [4,3.5,3] form a decreasing sequence. Newly added 1 first merges 4, forming a block having density 2.5. Since 2.5 < 3.5, it again merges with the block [2,5]. Eventually all elements are grouped into one segment with density 3. One important thing here is that adding a new element only involves merging some initial parts of the DRSP.

Algorithm Overview

Recall that our algorithm computes, for each suffix, a prefix (a window) that is possibly optimal and contains enough information to compute all future optimal solutions. Since a feasible prefix is wider than L, we split it into a (yellow) compulsory part and a (blue) optional part. To obtain a linear time algorithm, we have to compute one row from the previous row in amortised constant time (the corresponding diagram is duplicated here):

The diagram below depicts how to go from one row to the next. The blue part is partitioned into DRSP. Each time an element is added to the yellow part, some elements may be shifted to the blue part, and that may trigger some right-skew segments in the blue part to be merged (second row). Then we look for a maximally dense prefix by going from right to left, chopping away segments, until we find the peak (third row):

Note that the operation shown on the third row (chopping to find the maximum) always chop away a right-skew segment in its entirety. It is important that the merging happens at the left end of the optional part, while the chopping happens at the right end. By using a tree-like data structure, each merging can be a O(1) operation. With the data structure, we may argue that, since each element can be merged at most once, throughout the algorithm only O(n) merging could happen. Similarly, each element can be chopped away at most once, so the chopping could happen at most O(n) time as well. Therefore the operations in the second and third rows above are both amortised O(1).

Problem with Having an Upper Bound

The discussion so far already allows us to develop an algorithm for the maximally dense segment problem without an upper bound on the breadth of feasible segments. Having the upper bound makes the problem much harder because, different from the chopping depicted above, an upper bound may cut through a right-skew segment in the middle:

And a right-skew segment, with some elements removed, might not be right-skew anymore!

Our solution is to develop another data structure that allows efficient removal from the right end of a DRSP, while maintaining the DRSP structure. The final configuration of a window looks like the diagram below, where the new data structure is represented by the green blocks:

Unfortunately, it is inefficient to add new elements from the left into the green blocks. Therefore we have to maintain the window in a way similar to how a queue is implemented using two lists. New elements are added from the left into the blue blocks; when we need to remove element from the right of a block, it is converted to a green block in large chunks.

For more details, see the paper!

References

  1. Chung, Kai-Min and Lu, Hsueh-I. An Optimal Algorithm for the Maximum-Density Segment Problem. SIAM Journal on Computing 34(2):373-387, 2004.
  2. Goldwasser, Michael H. and Kao, Ming-Yang and Lu, Hsueh-I. Linear-Time Algorithms for Computing Maximum-Density Sequence Segments with Bioinformatics Applications. Journal of Computer and System Sciences, 70(2):128-144, 2005.
  3. Huang, Xiaoqui. An algorithm for identifying regions of a {DNA} sequence that satisfy a content requirement. Computer Applications in the Biosciences 3(10): 219-225, 1994.
  4. Zantema, Hans. Longest segment problems. Science of Computer Programming, 18(1):39-66, 1992.

Functional pearl: maximally dense segments

Sharon Curtis and Shin-Cheng Mu. Submitted.
[PDF]

errata:
  • Page 3: “This input sequence does not have a solution…” what we meant was “This input does not have a prefix that is within bounds.” We used another example where the input does not have a feasible segment at all before changing to example, but I forgot to change the text accordingly.
  • Page 4, Proof of Theorem 3.2: the first mdsM x ⇑d win (a:x) should be mdsM x ⇑d wp (trim (a:x)); a : x <b L and a : x ≥b L should respectively be trim (a : x) <b L and trim (a : x) ≥b L.
  • Thanks to Josh Ko for pointing out both errors.

The problem of finding a maximally dense segment (MDS) of a list is a generalisation of the well-known maximum segment sum (MSS) problem, but its solution is more challenging. We extend and illuminate some recent work on this problem with a formal development of a linear-time online algorithm, in the form of a sliding window zygomorphism. The development highlights some elegant properties of densities, involving partitions which are decreasing and all right-skew.

Code and supplementary proofs are available online.

keywords: program derivation, segment problem, maximum density, sliding window, zygomorphism, right-skew.

The Maximum Segment Sum Problem: Its Origin, and a Derivation

In a previous paper of mine, regrettably, I wrongly attributed the origin of the maximum segment sum problem to Dijkstra and Feijen’s Een methode van programmeren. In fact, the story behind the problem was told very well in Jon Bentley’s Programming Pearls.

The Problem, and the Linear-Time Algorithm

Given a list of numbers, the task is to compute the largest possible sum of a consecutive segment. In a functional language the problem can be specified by:

 mss = max . map sum . segments

where segments = concat . map inits . tails enlists all segments of the input list, map sum computes the sum of each of the segments, before max :: Ord a ⇒ [a] → a picks the maximum. The specification, if executed, is a cubic time algorithm. Yet there is a linear time algorithm scanning through the list only once:

mss = snd . foldr step (0,0)
  where step x (p,s) = (0 ↑ (x+p), (0 ↑ (x+p)) ↑ s)

where a ↑ b yields the maximum of a and b.

Both the specification and the linear time program are short. The program is merely a foldr that can be implemented as a simple for-loop in an imperative language. Without some reasoning, however, it is not that trivial to see why the program is correct (hint: the foldr computes a pair of numbers, the first one being the maximum sum of all prefixes of the given list, while the second is the maximum sum of all segments). Derivation of the program (given below) is mostly mechanical, once you learn the basic principles of program calculation. Thus the problem has become a popular choice as the first non-trivial example of program derivation.

Origin

Jon Bentley recorded in Programming Pearls that the problem was proposed by Ulf Grenander of Brown University. In a pattern-matching procedure he designed, a subarray having maximum sum is the most likely to yield a certain pattern in a digitised image. The two dimensional problem took too much time to solve, so he simplified to one dimension in order to to understand its structure.

In 1977 [Grenander] described the problem to Michael Shamos of UNILOGIC, Ltd. (then of Carnegie-Mellon University) who overnight designed Algorithm 3. When Shamos showed me the problem shortly thereafter, we thought that it was probably the best possible; … A few days later Shamos described the problem and its history at a Carnegie-Mellon seminar attended by statistician Jay Kadane, who designed Algorithm 4 within a minute.

Jon Bentley, Programming Pearls (1st edition), page 76.

Jay Kadane’s Algorithm 4 is the now well-known linear time algorithm, the imperative version of the functional program above:

maxpre, maxseg = 0, 0
for i in range (0, N):
     maxpre = 0 ↑ (maxpre + a[i])
     maxseg = maxpre ↑ maxseg

Algorithm 3, on the other hand, is a divide and conquer algorithm. An array a is split into two halves a₁ ⧺ a₂, and the algorithm recursively computes the maximum segment sums of a₁ and a₂. However, there could be some segment across a₁ and a₂ that yields a good sum, therefore the algorithm performs two additional loops respectively computing the maximum suffix sum of a₁ and the maximum prefix sum of a₂, whose sum is the maximum sum of segment crossing the edge. The algorithm runs in O(N log N) time. (My pseudo Python translation of the algorithm is given below.)

In retrospect, Shamos did not have to compute the maximum prefix and suffix sums in two loops each time. The recursive function could have computed a triple quadruple of (maximum prefix sum, maximum segment sum, maximum suffix sum, and sum of the whole array) for each array. The prefix and suffix sums could thus be computed bottom-up. I believe that would result in a O(N) algorithm. This linear time complexity might suggest that the “divide” is superficial — we do not have to divide the array in the middle. It is actually easier to divide the array into a head and a tail — which was perhaps how Kadane quickly came up with Algorithm 4!

A Functional Derivation

I learned the function derivation of the maximum segment sum problem from one of Jeremy’s papers [3] and was very amazed. It was perhaps one of the early incident that inspired my interest in program calculation. The derivation does not appear to be very well known outside the program derivation circle — not even for functional programmers, so I would like to redo it here.

The first few steps of the derivation goes:

   max . map sum . segs
=    { definition of segs }
   max . map sum . concat . map inits . tails
=    { since map f . concat = concat . map (map f) }
   max . concat . map (map sum) . map inits . tails 
=    { since max . concat = max . map max } 
   max . map max .  map (map sum) . map inits . tails
=    { since map f . map g = map (f.g) }
   max . map (max . map sum . inits) . tails

The purpose of the book-keeping transformation above is to push max . map sum closer to inits. The fragment max . map sum . inits is a function which, given a list of numbers, computes the maximum sum among all its prefixes. We denote it by mps, for maximum prefix sum. The specification has been transformed to:

   mss = max . map mps . tails 

This is a common strategy for segment problems: to solve a problem looking for an optimal segment, proceed by looking for an optimal prefix of each suffix. (Symmetrically we could process the list the other way round, look for an optimal suffix for each prefix.)

We wish that mps for each of the suffixes can be efficiently computed in an incremental manner. For example, to compute mps [-1,3,3,-4], rather than actually enumerating all suffixes, we wish that it can be computed from -1 and mps [3,3,-4] = 6, which can in turn be computed from 3 and mps [3,-4] = 3, all in constant time. In other words, we wish that mps is a foldr using a constant time step function. If this is true, one can imagine that we could efficiently implement map mps . tails in linear time. Indeed, scanr f e = map (foldr f e) . tails!

The aim now is to turn mps = max . map sum . inits into a foldr. Luckily, inits is actually a foldr. In the following we will perform foldr-fusion twice, respectively fusing map sum and max into inits, thus turning the entire expression into a foldr.

The first fusion goes:

   max . map sum .inits   
=    { definition of inits }
   max . map sum . foldr (\x xss -> [] : map (x:) xss) [[]]
=    { fold fusion, see below }
   max . foldr zplus [0]

The fusion condition can be established below, through which we also construct the definition of zplus:

   map sum ([] : map (x:) xss)
=  0 : map (sum . (x:)) xss
=    { by definition, sum (x : xs) = x + sum xs }
   0 : map (x+) (map sum xss) 
=    { define zplus x xss = 0 : map (x+) xss }  
   zplus x (map sum xss)

We continue with the derivation and perform another fusion:

   max . foldr zplus [0]
=    { fold fusion, let zmax x y = 0 ↑ (x+y) }
   foldr zmax 0 {-"."-}

For the second fold fusion to work, we have to prove the following fusion condition:

   max (0 : map (x+) xs)
=  0 ↑ max (map (x+) xs)
=    { since  max (map (x +) xs) = x + max xs }
   0 ↑ (x + max xs) {-"."-}

The property max (map (x +) xs) = x + max xs in the last step follows from that (↑) distributes into (+), that is, (x + y) ↑ (x + z) = x + (y ↑ z). This is the key property that allows the whole derivation to work.

By performing foldr-fusion twice we have established that

mps = foldr zmax 0

In words, mps (x : xs), the best prefix sum of x : xs, can be computed by zmax x (mps xs). The definition of zmax says that if x + mps xs is positive, it is the maximum prefix sum; otherwise we return 0, sum of the empty prefix.
Therefore, mss can be computed by a scanr:

   mss
=    { reasoning so far }
   max . map (foldr zmax 0) . tails
=    { introducing scanr }
   max . scanr zmax 0 {-"."-}

We have derived mss = max . scanr zmax 0, where zmax x y = 0 ↑ (x+y).

Many functional derivations usually stop here. This gives us an algorithm that runs in linear time, but takes linear space. A tupling transformation eliminates the need for linear space:

  mss = snd . (head &&& max) . scanr zmax 0 

where (f &&& g) a = (f a, g a). The part (head &&& max) . scanr zmax 0 returns a pair, the first component being the result of mps, the second mss. By some mechanical simplification we get the final algorithm:

mss = snd . foldr step (0,0)
  where step x (p,s) = (0 ↑ (x+p), (0 ↑ (x+p)) ↑ s)

A Relational Derivation?

The maximum segment sum problem later turned out to be a example of Richard and Oege’s Greedy Theorem [2]. It is an exercise in the Algebra of Programming book, but I have not seen the solution given anywhere. For completeness, I recorded a relational derivation in a paper of mine about some other variations of the maximum segment sum problem[4].

References

  1. Bentley, Jon. Programming Pearls. Addison-Wesley, Inc, 1987.
  2. Bird, Richard and de Moor, Oege. Algebra of Programming. Prentice-Hall, 1997
  3. Gibbons, Jeremy. Calculating Functional Programs. Proceedings of ISRG/SERG Research Colloquium, Oxford Brookes University, November 1997.
  4. Mu, Shin-Cheng. Maximum segment sum is back: deriving algorithms for two segment problems with bounded lengths. Partial Evaluation and Program Manipulation (PEPM ’08), pp 31-39. January 2008.

Appendix: Algorithm 3


def mss(l,u):
  if l > u: 
    return 0          # empty array
  else if l == u:
    return (0 ↑ a[l])  # singleton array
  else:
    m = (l + u) / 2

    # compute maximum suffix sum of a[0..m]
    sum, maxToLeft = 0, 0
    for i in range (m, l-1, -1):
      sum = sum + a[i]
      maxToLeft = maxToLeft ↑ sum
    # compute maximum prefix sum of a[m+1..u]
    sum, maxToRight = 0, 0
    for i in range (m+1, u+1):
      sum = sum + a[i]
      maxToLeft = maxToRight ↑ sum
    maxCrossing = maxToLeft + maxToRight

    # recursively compute mss of a[0..m] and a[m+1..u]
    maxInL = mss(l,m)
    maxInR = mss(m+1,u)
    return (maxInL ↑ maxCrossing ↑ maxInR)

Maximally Dense Segments — Code and Proof

Code and proof accompanying a forthcoming paper of Sharon Curtis and me: Functional Pearl: Maximally Dense Segments.

Quick links: [expository program | linear time program | proofs (late update: 2010.04.17)].

errata:
  • Page 3: “This input sequence does not have a solution…” what we meant was “This input does not have a prefix that is within bounds.” We used another example where the input does not have a feasible segment at all before changing to example, but I forgot to change the text accordingly.
  • Page 4, Proof of Theorem 3.2: the first mdsM x ⇑d win (a:x) should be mdsM x ⇑d wp (trim (a:x)); a : x <b L and a : x ≥b L should respectively be trim (a : x) <b L and trim (a : x) ≥b L.
  • Thanks to Josh Ko for pointing out both errors.

Expository Code

The expository program [download here] intends to be an executable implementation of the code in the paper. For clarity we use Haskell lists for all sequences, and do not implement optimisations such as storing the areas and breadths of segments, thus the program is not linear time yet. A linear time implementation will follow soon.

The code is mostly consistent with the paper, with some minor differences: many functions take an additional argument of type BreadthSpec = (Breadth, Maybe Breadth). The first component is the lower bound, while the second component is an optional upperbound. The main function is:

mds :: BreadthSpec ->  [Elem] -> Maybe [Elem]

which takes a BreadthSpec and switches between the modes with or without an upper bound.

To try the code, you may either load the module Main into ghci and invoke the function mds:

mds (lb, Just ub) [x1, x2, x3, ... ]

or load the module Test and run some QuickCheck properties:

Test.QuickCheck.quickCheck (prop_mds_correct bb (lb, Just ub))

where lb and ub are the lower and upper bounds, and bb is the bound on breadths of generated test elements. The property prop_mds_correct asserts that mds (lb,ub) x =d mds_spec (lb,ub) x for all x.

The gzipped file consists of the following Haskell modules:

  • Main: containing the main program mds, our variant of zygomorphism zh, wp2, smsp2, maxChop, trim, etc.
  • Spec: containing a specification of the maximally dense segment problem:
    mds_spec :: BreadthSpec -> [Elem] -> Maybe [Elem]
    mds_spec bs = maxlistM d . filter (bounds bs) . nonEmptySegs

    Many types having area, breadth, and density defined are collected into a type class Block, and functions like maxChop are define on the type class.

  • DRSP: specification of right-skew segments and DRSP, with functions including rightSkew, sdars, lrsp, and drsp.
  • DPTrees: defining DTrees and PTrees, and functions like addD and prependD allowing one to construct DRSPs in a fold.
  • Utilities: some utility functions processing lists.
  • Test: a module defining some QuickCheck properties to test the code.

Linear Time Implementation

A linear time implementation can be downloaded here. The program uses Data.Sequence to represent the compulsory part and the first level of the DForest and the PForest of the window, as well as annotating them with areas and breadths. The subtrees of a DTree and a PTree, however, can be represented simply by snoc-lists and cons-lists respectively.

Organisation of the code is the same as the first program.

Proofs

Proofs accompanying the paper [PDF]. Theorems and lemmas are labelled with both their own numbers as well as the numbers in the paper, if any. For example, Lemma A.1 (3.1) is Lemma 3.1 in the paper.

The Windowing Technique for Longest Segment Problems

In the previous post we reviewed Hans Zantema’s algorithm for solving longest segment problems with suffix and overlap-closed predicates. For predicates that are not overlap-closed, Zantema derived a so-called “windowing” technique, which will be the topic of this post.

A brief review: the longest segment problem takes the form:

max# ∘ p ◁ ∘ segs

where segs :: [a] → [[a]], defined by segs = concat ∘ map inits ∘ tails returns all consecutive segments of the input list; p ◁ is an abbreviation for filter p, and max# :: [[a]] → [a] returns the longest list from the input list of lists. In words, the task is to compute the longest consecutive segment of the input that satisfies predicate p.

A predicate p is suffix-closed if p (xs ⧺ ys) ⇒ p ys. For suffix-closed p, Zantema proposed a technique that, from a high-level point of view, looks just like the right solution to such problems. We scan through the input list using a foldr from the right to the left, during which we try to maintain the longest segment satisfying p so far. Also, we keep a prefix of the list that is as long as the currently longest segment, which we call the window.

If, when we move one element to the right, the window (now one element longer than the currently longest segment) happens to satisfy p, it becomes the new optimal solution. Otherwise we drop the right-most element of the window so that it slides leftwards, retaining the length. Notice that it implies that we’d better represent the window using a queue, so we can efficiently add elements from the left and drop from the right.

Derivation of the algorithm is a typical case of tupling.

Tupling

Given a function h, we attempt to compute it efficiently by turning it into a foldr. It would be possible if the value of the inductive case h (x : xs) were determined solely by x and h xs, that is:

h (x : xs) = f x (h xs)

for some f. With some investigation, however, it would turn out that h (x : xs) also depends on some g:

h (x : xs) = f x (g (x : xs)) (h xs)

Therefore, we instead try to construct their split ⟨ h , g ⟩ as a fold, where the split is defined by:

⟨ h , g ⟩ xs = (h xs, g xs)

and h = fst . ⟨ h , g ⟩.

If ⟨ h , g ⟩ is indeed a fold, it should scan through the list and construct a pair of a h-value and a g-value. To make it feasible, it is then hoped that g (x : xs) can be determined by g xs and h xs. Otherwise, we may have to repeat the process again, making the fold return a triple.

Segment/Prefix Decomposition

Let us look into the longest segment problem. For suffix-closed p it is reasonable to assume that p [] is true — otherwise p would be false everywhere. Therefore, for the base case we have max# ∘ p ◁ ∘ segs ▪ [] = []. We denote function application by to avoid too many parentheses.

Now the inductive case. It is not hard to derive an alternative definition of segs:

segs [] = [[]]
segs (x : xs) = inits (x:xs) ⧺ segs xs

therefore, we derive:

   max# ∘ p ◁ ∘ segs ▪ (x : xs)
 = max# ∘ p ◁ ▪ (inits (x : xs) ⧺ segs xs) 
 = (max# ∘ p ◁ ∘ inits ▪ (x : xs)) ↑#
     (max# ∘ p ◁ ∘ segs ▪ xs)

where xs ↑# ys returns the longer one between xs and ys.

It suggests that we maintain, by a foldr, a pair containing the longest segment and the longest prefix satisfying p (that is, max# ∘ p ◁ ∘ inits). It is then hoped that max# ∘ p ◁ ∘ inits ▪ (x : xs) can be computed using max# ∘ p ◁ ∘ inits ▪ xs. And luckily, it is indeed the case, implied by the following proposition proved in an earlier post:

Proposition 1: If p is suffix-closed, we have:

   p ◁ ∘ inits ▪ (x : xs) = finits (max# ∘ p ◁ ∘ inits ▪ xs)

where finits ys = p ◁ ∘ inits ▪ (x : ys).

Proposition 1 says that the list (or set) of all the prefixes of x : xs that satisfies p can be computed by the longest prefix of xs (call it ys) satisfying p, provided that p is suffix-closed. A naive way to do so is simply by computing all the prefixes of x : ys and do the filtering again, as is done in finits.

This was the route taken in the previous post. It would turn out, however, to come up with an efficient implementation of f we need some more properties from p, such as that it is also overlap-closed.

The “Window”

Proposition 1 can be strengthened: to compute all the prefixes of x : xs that satisfies p using finits we do not strictly have to start with ys. Any prefix of xs longer than ys will do.

Proposition 2: If p is suffix-closed, we have:

   p ◁ ∘ inits ▪ (x : xs) = finits (take i xs)

where finits ys = p ◁ ∘ inits ▪ (x : ys), and i ≥ length ∘ max# ∘ p ◁ ∘ inits ▪ xs.

In particular, we may choose i to be the length of the longest segment:

Lemma 1:

   length ∘ max# ∘ p ◁ ∘ segs ▪ xs ≥ 
      length ∘ max# ∘ p ◁ ∘ inits ▪ xs

Appealing to intuition, Lemma 1 is true because segs xs is a superset of inits xs.

Remark: Zantema proved Proposition 1 by contradiction. The purpose of an earlier post was to give a constructive proof of Proposition 1, which was considerably harder than I expected. I’d be interested to see a constructive proof of Proposition 2.

Now we resume the reasoning:

   max# ∘ p ◁ ∘ segs ▪ (x : xs)
 = max# ∘ p ◁ ▪ (inits (x : xs) ⧺ segs xs) 
 = (max# ∘ p ◁ ∘ inits ▪ (x : xs)) ↑#
     (max# ∘ p ◁ ∘ segs ▪ xs)
 =   { Proposition 2 and Lemma 1 }
   let s = max# ∘ p ◁ ∘ segs ▪ xs
   in (max# ∘ finits ▪ (x : take (length s) xs)) ↑# s

Define window xs = take (length ∘ max# ∘ p ◁ ∘ segs ▪ xs) xs, the reasoning above suggest that we may try the following tupling:

   max# ∘ p ◁ ∘ segs
 = fst ∘ ⟨ max# ∘ p ◁ ∘ segs , window ⟩

Maintaining the Longest Segment and the Window

The task now is to express ⟨ max# ∘ p ◁ ∘ segs , window ⟩ as a foldr. We can do so only if both max# ∘ p ◁ ∘ segs ▪ (x : xs) and window (x : xs) can be determined by max# ∘ p ◁ ∘ segs ▪ xs and window xs. Let us see whether it is the case.

Maintaining the Longest Segment

Regarding max# ∘ p ◁ ∘ segs ▪ (x : xs), we have derived:

   max# ∘ p ◁ ∘ segs ▪ (x : xs)
 =   { as shown above, let s = max# ∘ p ◁ ∘ segs ▪ xs }
   (max# ∘ p ◁ ∘ inits ▪ (x : take (length s) xs)) ↑# s

Let s = max# ∘ p ◁ ∘ segs ▪ xs. We consider two cases:

  1. Case p (x : take (length s) xs). We reason:
        (max# ∘ p ◁ ∘ inits ▪ (x : take (length s) xs)) ↑# s
      =   { see below }
        (x : take (length s) xs) ↑# s
      =   { since the LHS is one element longer than the RHS }
        x : take (length s) xs
      =   { definition of window }
        x : window xs
    

    The first step is correct because, for all zs, p zs implies that max# ∘ p ◁ ∘ inits ▪ zs = zs.

  2. Case ¬ p (x : take (length s) xs). In this case (max# ∘ p ◁ ∘ inits ▪ (x : take (length s) xs)) ↑# s must be s, since ¬ p zs implies that length∘ max# ∘ p ◁ ∘ inits ▪ zs < length zs.

Maintaining the Window

Now consider the window. Also, we do a case analysis:

  1. Case p (x : take (length s) xs). We reason:
        window (x : xs)
      = take (length ∘ max# ∘ p ◁ ∘ segs ▪ (x : xs)) (x : xs)
      =   { by the reasoning above }
        take (length (x : take (length s) xs)) (x : xs)
      =   { take and length }
        x : take (length (take (length s)) xs) xs
      =   { take and length }
        x : take (length s) xs
      = x : window xs
    
  2. Case ¬ p (x : take (length s) xs). We reason:
      window (x : xs)
      = take (length ∘ max# ∘ p ◁ ∘ segs ▪ (x : xs)) (x : xs)
      =   { by the reasoning above }
        take (length s) (x : xs)
      =   { take and length }
        x : take (length (s-1)) xs
      = x: init (window xs)
    

The Algorithm

In summary, the reasoning above shows that

⟨ max# ∘ p ◁ ∘ segs , window ⟩ = foldr step ([], [])

where step is given by

step x (s, w) | p (x : w) = (x : w, x : w)
              | otherwise = (s, x : init w)

As is typical of many program derivations, after much work we deliver an algorithm that appears to be rather simple. The key invariants that made the algorithm correct, such as that s is the optimal segment and that w is as long as s, are all left implicit. It would be hard to prove the correctness of the algorithm without these clues.

We are not quite done yet. The window w had better be implemented using a queue, so that init w can be performed efficiently. The algorithm then runs in time linear to the length of the input list, provided that p (x : w) can be performed in constant time -- which is usually not the case for interesting predicates. We may then again tuple the fold with some information that helps to compute p efficiently. But I shall stop here.

Reference

Longest Segment Satisfying Suffix and Overlap-Closed Predicates

I spent most of the week preparing for the lecture on Monday, in which we plan to talk about segment problems. One of the things we would like to do is to translate the derivations in Hans Zantema’s Longest Segment Problems to Bird-Meertens style. Here is a summary of the part I managed to do this week.

Zantema’s paper considered problems of this form:

max# ∘ p ◁ ∘ segs

where segs :: [a] → [[a]], defined by segs = concat ∘ map inits ∘ tails returns all consecutive segments of the input list; p ◁ is a shorter notation for filter p, and max# :: [[a]] → [a] returns the longest list from the input list of lists. In words, the task is to compute the longest consecutive segment of the input that satisfies predicate p.

Of course, we have to assume certain properties from the predicate p. A predicate p is:

  • suffix-closed, if p (xs ⧺ ys) ⇒ p ys;
  • overlap-closed, if p (xs ⧺ ys) ∧ p (ys ⧺ zs) ∧ ys ≠ [] ⇒ p~(xs ⧺ ys ⧺ zs).

For example, ascending is suffix and overlapping-closed, while p xs = (all (0 ≤) xs) ∧ (sum xs ≤ C) for some constant C is suffix-closed but not overlap-closed. Note that for suffix-closed p, it is reasonable to assume that p [] is true, otherwise p would be false everywhere. It also saves us some trouble being sure that max# is always applied to a non-empty set.

I denote function application by an infix operator that binds looser than function composition but tighter than other binary operators. Therefore f ∘ g ∘ h ▪ x means f (g (h x)).

Prefix/Suffix Decomposition

Let us begin with the usual prefix/suffix decomposition:

   max# ∘ p ◁ ∘ segs
 = max# ∘ p ◁ ∘ concat ∘ map inits ∘ tails
 = max# ∘ concat ∘ map (p ◁) ∘ map inits ∘ tails
 = max# ∘ map (max# ∘ p ◁ ∘ inits) ∘ tails

Like what we do with the classical maximum segment sum, if we can somehow turn max# ∘ p ◁ ∘ inits into a fold, we can then implement map (foldr ...) ∘ tails by a scanr. Let us denote max# ∘ p ◁ ∘ inits by mpi.

If you believe in structural recursion, you may attempt to fuse map# ∘ p ◁ into inits by fold-fusion. Unfortunately, it does not work this way! In the fold-fusion theorem:

h ∘ foldr f e = foldr g (h e)   ⇐   h (f x y) = g x (h y)

notice that x and y are universally quantified, which is too strong for this case. Many properties we have, to be shown later, do need information from the context — e.g. some properties are true only if y is the result of inits.

Trimming Unneeded Prefixes

One of the crucial properties we need is the following:

Proposition 1: If p is suffix-closed, we have:

   p ◁ ∘ inits ▪ (x : xs) =
    p ◁ ∘ inits ▪ (x : max# ∘ p ◁ ∘ inits ▪ xs)

For some intuition, let x = 1 and xs = [2,3,4]. The left-hand side first compute all prefixes of xs:

[] [2] [2,3] [2,3,4]

before filtering them. Let us assume that only [] and [2,3] pass the check p. We then pick the longest one, [2,3], cons it with 1, and compute all its prefixes:

[] [1] [1,2] [1,2,3]

before filtering them with p again.

The right-hand side, on the other hand, performs filtering on all prefixes of [1,2,3,4]. However, the proposition says that it is the same as the left-hand side — filtering on the prefixes of [1,2,3] only. We lose nothing if we drop [1,2,3,4]. Indeed, since p is suffix-closed, if p [1,2,3,4] were true, p [2,3,4] would have been true on the right-hand side.

Proof of Proposition 1 was the topic of a previous post. The proposition is useful for us because:

  mpi (x : xs)
= max# ∘ p ◁ ∘ inits ▪ (x : xs)
=    { Proposition 1 }
  max# ∘ p ◁ ∘ inits ▪ (x : max# ∘ p ◁ ∘ inits ▪ xs)
= mpi (x : mpi xs)

Therefore mpi is a fold!

mpi = foldr (λx ys → mpi (x : ys)) []

Refining the Step Function

We still have to refine the step function λx ys → mpi (x : ys) to something more efficient. Luckily, for overlap-closed p, mpi (x : ys) is either [], [x], or x : ys — if ys is the result of mpi.

Proposition 2: If p is overlap-closed, mpi (x : mpi xs) = x ⊙ xs, where is defined by:

x ⊙ ys | p (x : xs) = x : ys
       | p [x] = [x]
       | otherwise = []

To see why Proposition 2 is true, consider mpi (x : mpi xs).

  • If mpi (x : mpi xs) = [], we are done.
  • Otherwise it must be x : zs for some zs ∈ inits (mpi xs). And we have p~(x : zs) because it is a result of mpi. Again consider two cases:
    • If zs = [], both sides reduce to [x], otherwise…
    • Let us not forget that p (mpi xs) must be true. Also, since zs ∈ inits (mpi xs), we have mpi xs = zs ⧺ ws for some ws. Together that implies p (x : zs ⧺ ws) = p (x : mpi xs) must be true.

Notice that the reasoning above (from Zantema) is a proof-by-contradiction. I do not yet know how hard it is to build a constructive proof.

With Proposition 1 and 2 we have turned mpi to a fold. That leads to the derivation:

  max# ∘ p ◁ ∘ segs
=   { derivation above } 
  max# ∘ map (max# ∘ p ◁ ∘ inits) ∘ tails
= max# ∘ map (foldr (⊙) []) ∘ tails
= max# ∘ scanr (⊙) []

with the definition of given above. It turns out to be a rather simple algorithm: we scan through the list, and in each step we choose among three outcomes: [], [x], and x : ys. Like the maximum segment sum problem, it is a simple algorithm whose correctness is that that easy to justify.

The algorithm would be linear-time if can be computed in constant-time. With the presence of p in , however, it is unlikely the case.

Efficient Testing

So let us compute, during the fold, something that allows p to be determined efficiently. Assume that there exists some φ :: [A] → B that is a fold (φ = foldr ⊗ ι for some and ι), such that p (x : xs) = p xs ∧ f x (φ xs) for some f. Some example choices of φ and f:

  • p = ascending. We may pick:

    φ xs = if null xs then Nothing else Just (head xs)
    f x Nothing = true
    f x (Just y) = x ≤ y
  • p xs = all elements in xs equal modolu 3. We may pick:
    φ xs = if null xs then Nothing else Just (head xs `mod` 3)
    f x Nothing = true
    f x (Just y) = x `mod`3 == y

Let us tuple mpi with φ, and turn them into one fold. Let ⟨ f , g ⟩ x = (f x, g x), we derive:

   max# ∘ p ◁ ∘ inits
=   { f = fst ∘  ⟨ f , g ⟩, see below}
   fst ∘ ⟨ max# ∘ p ◁ ∘ inits , φ ⟩ 
= fst ∘ foldr step ([], ι)

where step is given by

step x (xs, b) 
     | f x b = (x : xs , x ⊗ b)
     | f x ι = ([x], x ⊗ ι)
     | otherwise = ([], ι)

Notice that the property f = fst ∘ ⟨ f , g ⟩ is true when the domain of f is in the domain of g, in particular, when they are both total, which again shows why we prefer to work in a semantics with total functions only.

Let us restart the main derivation again, this time use the tupling:

  max# ∘ p ◁ ∘ segs
= max# ∘ map (max# ∘ p ◁ ∘ inits) ∘ tails
= max# ∘ map (fst ∘ ⟨ max# ∘ p ◁ ∘ inits , φ ⟩) ∘ tails
=   { since map# ∘ map fst = fst ∘ map#', see below} 
  fst ∘ max#' ∘ map ⟨ max# ∘ p ◁ ∘ inits , φ ⟩ ∘ tails
=   { derivation above } 
  fst ∘ max#' ∘ map (foldr step ([], ι) ∘ tails
= fst ∘ max#' ∘ scanr step ([], ι)

where map#' compares the lengths of the first components. This is a linear-time algorithm.

Next… Windowing?

What if p is not overlap-closed? Zantema used a technique called windowing, which I will defer to next time…

Reference

On a Basic Property for the Longest Prefix Problem

In the Software Construction course next week we will, inevitably, talk about maximum segment sum. A natural next step is to continue with the theme of segment problems, which doesn’t feel complete without mentioning Hans Zantema’s Longest Segment Problems.

The paper deals with problem of this form:

ls = max# ∘ p ◁ ∘ segs

That is, computing the longest consecutive segment of the input list that satisfies predicate p. When writing on paper I found it much easier denoting filter p by the Bird-Meertens style p ◁ and I will use the latter for this post too. The function segs :: [a] → [[a]], defined by segs = concat ∘ map inits ∘ tails returns all consecutive segments of the input list, and max# :: [[a]] → [a] returns the longest list from the input list of lists. To avoid long nesting parenthesis, I denote function application by an infix operator that binds looser than function composition . Therefore f ∘ g ∘ h ▪ x means f (g (h x)).

Standard transformation turns the specification to the form

ls = max# ∘ map (max# ∘ p ◁ ∘ inits) ∘ tails

Therefore we may solve ls if we manage to solve its sub-problem on prefixes:

lp = max# ∘ p ◁ ∘ inits

that is, computing the longest prefix of the input list satisfying predicate p. One of the key propositions in the paper says:

Proposition 1: If p is suffix-closed (that is, p (x ⧺ y) ⇒ p y), we have:

   p ◁ ∘ inits ▪ (a : x) =
    p ◁ ∘ inits ▪ (a : max# ∘ p ◁ ∘ inits ▪ x)

It is useful because, by composing max# on both sides we get

   lp (a : x) = max# ∘ p ◁ ∘ inits ▪ (a : lp x)

that is, lp can be computed by a foldr.

Of course, we are not quite done yet. We then have to somehow simplify p ◁ ∘ inits ▪ (a : lp x) to something more efficient. Before we move on, however, proving Proposition 1 turns out to be an interesting challenge in itself.

Intuition

What does Proposition 1 actually say? Let x = [1,2,3] and a = 0. On the left-hand side, we are performing p ◁ on

  [] [0] [0,1] [0,1,2] [0,1,2,3]

The right hand side says that we may first filter the tails of [1,2,3]:

  [] [1] [1,2] [1,2,3]

Assuming that only [] and [1,2] get chosen. We may then keep the longest prefix [1,2] only, generate all its prefixes (which would be [] [1] [1,2]), and filter the latter again. In other words, we lost no information dropping [1,2,3] if it fails predicate p, since by suffix-closure, p ([0] ⧺ [1,2,3]) ⇒ p [1,2,3]. If [1,2,3] doesn’t pass p, p [0,1,2,3] cannot be true either.

Zantema has a nice and brief proof of Proposition 1 by contradiction. However, the theme of this course has mainly focused on proof by induction and, to keep the possibility of someday encoding our derivations in tools like Coq or Agda, we would like to have a constructive proof.

So, is it possible to prove Proposition 1 in a constructive manner?

The Proof

I managed to come up with a proof. I’d be happy to know if there is a better way, however.

For brevity, I denote if p then x else y by p → x; y. Also, define

a ⊕p x = p a → a : x ; x

Therefore p ◁ is defined by

p ◁ = foldr ⊕p []

Here comes the the main proof:

Proposition 1

p ◁ ∘ inits ▪ (a : x) = p ◁ ∘ inits ▪ (a : max# ∘ p ◁ ∘ inits ▪ x)

if p is suffix-closed.
Proof.

    p ◁ ∘ inits ▪ (a : max# ∘ p ◁ ∘ inits ▪ x)
=      { definition of inits }
     p ◁ ([] : map (a :) ∘ inits ∘ max# ∘ p ◁ ∘ inits ▪ x)
=      { definition of p ◁ }
     [] ⊕p (p ◁ ∘ map (a :) ∘ inits ∘ max# ∘ p ◁ ∘ inits ▪ x)
=      { Lemma 1 }
    [] ⊕p (p ◁ ∘ map (a :) ∘ p ◁ ∘ inits ∘ max# ∘ p ◁ ∘ inits ▪ x)
=      { Lemma 2 }
    [] ⊕p (p ◁ ∘ map (a :) ∘ p ◁ ∘ inits ▪ x)
=      { Lemma 1 }
    [] ⊕p (p ◁ ∘ map (a :) ∘ inits ▪ x)
=      { definition of p ◁ }
    p ◁ ([] : map (a :) ∘ inits ▪ x)
=      { definition of inits }
    p ◁ ∘ inits ▪ (a : x)

The main proof refers to two “decomposition” lemmas, both are of the form f ∘ g = f ∘ g ∘ f:

  • Lemma 1: p ◁ ∘ map (a:) = p ◁ ∘ map (a:) ∘ p ◁ if p suffix-closed.
  • Lemma 2: p ◁ ∘ inits ∘ max# ∘ p ◁ ∘ inits = p ◁ ∘ inits for all predicate p.

Both are proved by structural induction. For Lemma 1 we need the conditional distribution rule:

f (p →  x; y) = (p →  f x; f y)

If we are working in CPO we need the side condition that f is strict, which is true for the cases below anyway:
Lemma 1

p ◁ ∘ map (a:) =  p ◁ ∘ map (a:) ∘ p ◁

if p is suffix-closed.
Proof. Structural induction on the input.
Case []: trivial.
Case (x : xs):

   p ◁ ∘ map (a:) ∘ p ◁ ▪ (x : xs)
 =   { definition of p ◁ }
   p ◁ ∘ map (a:) ▪ (p x →  x : p ◁ xs ; p ◁ xs)
 =   { map distributes into conditionals }
   p ◁ ▪ (p x → (a : x) : map (a :) ∘ p ◁ ▪ xs ; map (a :) ∘ p ◁ ▪ xs)
 =   { p ◁ distributes into conditionals }
   p x → p ◁ ((a : x) : map (a :) ∘ p ◁ ▪ xs) ; 
         p ◁ ∘ map (a :) .p ◁ ▪ xs
 =   { definition of p ◁ }
   p x → (p (a : x) → (a : x) : p ◁ ∘ map (a :) ∘ p ◁ ▪ xs) ; 
                      p ◁ ∘ map (a :) ∘ p ◁ ▪ xs) ; 
         p ◁ ∘ map (a :) ∘ p ◁ ▪ xs
 =   { induction }
   p x → (p (a : x) → (a : x) : p ◁ ∘ map (a :) ▪ xs) ; 
                      p ◁ ∘ map (a :) ▪ xs) ; 
         p ◁ ∘ map (a :) ▪ xs
 =   { since p (a : x) ⇒ p x by suffix closure }
   p (a : x) → (a : x) : p ◁ ∘ map (a :) ▪ xs) ; 
               p ◁ ∘ map (a :) ▪ xs
 =   { definition of p ◁ }
   p ◁ ((a : x) : map (a :) xs)
 =   { definition of map }
   p ◁ ∘ map (a :) ▪ (x : xs)

For Lemma 2, it is important that p is universally quantified. We need the following map-filter exchange rule:

p ◁ ∘ map (a :) =  map (a :) ∘ (p ∘ (a:)) ◁

The proof goes:
Lemma 2 For all predicate p we have

p ◁ ∘ inits ∘ max# ∘ p ◁ ∘ inits = p ◁ ∘ inits

Proof. Structural induction on the input.
Case []: trivial.
Case (a : x):

   p ◁ ∘ inits ∘ max# ∘ p ◁ ∘ inits ▪ (a : x)
 = p ◁ ∘ inits ∘ max# ∘ p ◁ ▪ ([] : map (a :) (inits x))

Consider two cases:
1. Case p [] ∧ null (p ◁ ∘ map (a :) ∘ inits ▪ x):
If ¬ p [], both sides are undefined. Otherwise:

      ...
    = p ◁ ∘ inits ∘ max# ▪ []
    = []
    = p ◁ ▪ ([] : p ◁ ∘ map (a : ) ∘ inits ▪ x)
    = p ◁ ∘ inits ▪ (a : x)

2. Case ¬ (null (p ◁ ∘ map (a :) ∘ inits ▪ x)):

      ...
    = p ◁ ∘ inits ∘ max# ∘ p ◁ ∘ map (a :) ∘ inits ▪ x
    =   { map-filter exchange }
      p ◁ ∘ inits ∘ max# ∘ map (a :) ∘ (p ∘ (a:)) ◁ ∘ inits ▪ x
    =   { since  max# ∘ map (a :) =  (a :) ∘ max# }
      p ◁ ∘ inits ∘ (a :) ∘ max# ∘ (p ∘ (a :)) ◁ ∘ inits ▪ x
    =   { definition of inits }
      p ◁ ([] : map (a :) ∘ inits ∘  max# ∘ (p ∘ (a :)) ◁ ∘ inits ▪ x)
    =   { definition of p ◁ }
      p ⊕p (p ◁ ∘ map (a :) ∘ inits ∘  max# ∘ (p ∘ (a :)) ◁ ∘ inits ▪ x)
    =   { map-filter exchange } 
      p ⊕p (map (a :) ∘ (p ∘ (a :)) ◁ ∘ inits ∘  max# ∘ (p ∘ (a :)) ◁ ∘ inits ▪ x)
    =   { induction }
      p ⊕p (map (a :) ∘ (p ∘ (a :)) ◁ ∘ inits ▪ x)
    =   { map-filter exchange }
      p ⊕p (p ◁ ∘ map (a :) ∘ inits ▪ x)
    =   { definition of p ◁ }
      p ◁ ( [] :  map (a :) ∘ inits ▪ x)
    =   { definition of inits }
      p ◁ ∘ inits ▪ (a : x)

Reference

Maximum segment sum is back: deriving algorithms for two segment problems with bounded lengths

S-C. Mu. In Partial Evaluation and Program Manipulation (PEPM ’08), pp 31-39. January 2008. (20/74) [PDF] [GZipped Postscript]

It may be surprising that variations of the maximum segment sum (MSS) problem, a textbook example for the squiggolists, are still active topics for algorithm designers. In this paper we examine the new developments from the view of relational program calculation. It turns out that, while the classical MSS problem is solved by the Greedy Theorem, by applying the Thinning Theorem, we get a linear-time algorithm for MSS with upper bound on length.

To derive a linear-time algorithm for the maximum segment density problem, on the other hand, we purpose a variation of thinning based on an extended notion of monotonicity. The concepts of left-negative and right-screw segments emerge from the search for monotonicity conditions. The efficiency of the resulting algorithms crucially relies on exploiting properties of the set of partial solutions and design efficient data structures for them.

Maximum Segment Sum, Agda Approved

To practise using the Logic.ChainReasoning module in Agda, Josh proved the foldr-fusion theorem, which he learnt in the program derivation lecture in FLOLAC where we used the maximum segment sum (MSS) as the main example. Seeing his proof, I was curious to know how much program derivation I can do in Agda and tried coding the entire derivation of MSS. I thought it would be a small exercise I could do over the weekend, but ended up spending the entire week.

As a reminder, given a list of (possibly negative) numbers, the MSS is about computing the maximum sum among all its consecutive segments. Typically, the specification is:


  mss = max ○ map⁺ sum ○ segs

where segs = concat⁺ ○ map⁺ inits ○ tails computes all segments of a list.

A dependent pair is defined by:


  data _∣_ (A : Set) (P : A -> Set) : Set where
    sub : (x : A) -> P x -> A ∣ P

such that sub x p is a pair where the type of the second component p depends on the value of the first component x. The idea is to use a dependent pair to represent a derivation:


mss-der : (x : List Val) -> (Val ∣ \m -> (max ○ map⁺ sum ○ segs) x ≡ m)
mss-der x = 
   sub ?
       (chain> (max ○ map⁺ sum ○ segs) x 
           === ?)

It says that mss-der is a function taking a list x and returns a value of type Val, with the constraint that the value returned must be equal to (max ○ map⁺ sum ○ segs) x.

My wish was to use the interactive mechanism of the Agda Emacs mode to “derive” the parts marked by ?, until we come to an implementation:


mss-der : (x : List Val) -> (Val ∣ \m -> (max ○ map⁺ sum ○ segs) x ≡ m)
mss-der x = 
   sub RESULT
       (chain> (max ○ map⁺ sum ○ segs) x 
           === ...
           === ...
           === RESULT)

If it works well, we can probably use Agda as a tool for program derivation!

Currently, however, I find it harder to use than expected, perhaps due to my being unfamiliar with the way Agda reports type errors. Nevertheless, Agda does force me to make every details right. For example, the usual definition of max I would use in a paper would be:


   max = foldr _↑_ -∞

But then I would have to define numbers with lower bound -∞. A sloppy alternative definition:


   max = foldr _↑_ 0

led me to prove a base case 0 ↑ max x ≡ max x, which is not true. That the definition does work in practice relies on the fact that segs always returns empty list as one of the possible segment. Alternatively, I could define max on non-empty lists only:


  max : List⁺ Val -> Val
  max = foldr⁺ _↑_ id

where List⁺ A is defined by:


  data List⁺ (A : Set) : Set where
    [_]⁺ : A -> List⁺ A
    _::⁺_ : A -> List⁺ A -> List⁺ A

and refine the types of inits, tails, etc, to return non-empty lists. This is the approach I took.

Eventually, I was able to give a derivation of mss this way:


mss-der : (x : List Val) -> (Val ∣ \m -> (max ○ map⁺ sum ○ segs) x ≡ m)
mss-der x = 
   sub ((max ○ scanr _⊗_ ø) x)
       (chain> (max ○ map⁺ sum ○ segs) x 
           === (max ○ map⁺ sum ○ concat⁺ ○ map⁺ inits ○ tails) x 
                   by refl
           === (max ○ concat⁺ ○ map⁺ (map⁺ sum) ○ map⁺ inits ○ tails) x 
                   by cong max (sym (concat⁺-map⁺ ((map⁺ inits ○ tails) x)))
           === (max ○ map⁺ max ○ map⁺ (map⁺ sum) ○ map⁺ inits ○ tails) x 
                   by max-concat⁺ ((map⁺ (map⁺ sum) ○ map⁺ inits ○ tails) x)
           === (max ○ map⁺ max ○ map⁺ (map⁺ sum ○ inits) ○ tails) x 
                   by cong (\xs -> max (map⁺ max xs)) (sym (map⁺-compose (tails x)))
           === (max ○ map⁺ (max ○ map⁺ sum ○ inits) ○ tails) x
                   by cong max (sym (map⁺-compose (tails x)))
           === (max ○ map⁺ (foldr _⊗_ ø) ○ tails) x
                   by cong max (map⁺-eq max-sum-prefix (tails x)) 
           === (max ○ scanr _⊗_ ø) x  
                   by cong max (scanr-pf x) )
  where _⊕_ : Val -> List⁺ Val -> List⁺ Val 
        a ⊕ y = ø ::⁺ map⁺ (_+_ a) y

        _⊗_ : Val -> Val -> Val
        a ⊗ b = ø ↑ (a + b)

where max-sum-prefix consists of two fold fusion:


max-sum-prefix : (x : List Val) -> max (map⁺ sum (inits x)) ≡ foldr _⊗_ ø 
max-sum-prefix x =
 chain> max (map⁺ sum (inits x))
        === max (foldr _⊕_ [ ø ]⁺ x)
               by cong max (foldr-fusion (map⁺ sum) lemma1 x)
        === foldr _⊗_ ø x
               by foldr-fusion max lemma2 x
 where lemma1 : (a : Val) -> (xs : List⁺ (List Val)) -> 
                          map⁺ sum (ini a xs) ≡ (a ⊕ (map⁺ sum xs))
       lemma1 a xs = ?
       lemma2 : (a : Val) -> (x : List⁺ Val) ->
                               max (a ⊕ x) ≡ (a ⊗ max x)
       lemma2 a x = ?

The two lemmas are given in the files attached below. Having the derivation, mss is given by:


mss : List Val -> Val
mss x = depfst (mss-der x)

it is an executable program that is proved to be correct.

The complete Agda program is split into five files:

  • MSS.agda: the main program importing all the sub-modules.
  • ListProperties.agda: some properties I need about lists, such as fold fusion, concat ○ map (map f) = map f ○ concat, etc. Later in the development I realised that I should switch to non-empty lists, so not all of the properties here are used.
  • List+.agda: non-empty lists and some of its properties.
  • Derivation.agda: the main derivation of MSS. The derivation is parameterised over any numerical data and operators + and such that + is associative, and a + (b ↑ c) = (a ↑ b) + (a ↑ c). The reason of this parameterisation, however, was that I did not know how to prove the properties above, until Josh showed me the proof.
  • IntRNG.agda: proofs from Josh that Data.Int actually satisfy the properties above. (Not quite complete yet.)