Research
λ calculus Agda Bidirectional Updating Burrows-Wheeler Transform Concurrency Converse-of-a-Function Theorem Curry-Howard Data Structure Dependent Type Fibonacci GADT Greedy Theorem Haskell HaXML Indirect Equality List Homomorphism Logic Logic Programming Optimisation Problems Program Derivation Program Inversion Quantum Programming Quine Regular Expression Segment Problems Termination Thinning Theorem Types XML Streaming-
Recent Comments
The Windowing Technique for Longest Segment Problems
In the previous post we reviewed Hans Zantema’s algorithm for solving longest segment problems with suffix and overlap-closed predicates. For predicates that are not overlap-closed, Zantema derived a so-called “windowing” technique, which will be the topic of this post.
A brief review: the longest segment problem takes the form:
where
segs :: [a] → [[a]], defined bysegs = concat ∘ map inits ∘ tailsreturns all consecutive segments of the input list;p ◁is an abbreviation forfilter p, andmax# :: [[a]] → [a]returns the longest list from the input list of lists. In words, the task is to compute the longest consecutive segment of the input that satisfies predicatep.A predicate
pis suffix-closed ifp (xs ⧺ ys) ⇒ p ys. For suffix-closedp, Zantema proposed a technique that, from a high-level point of view, looks just like the right solution to such problems. We scan through the input list using afoldrfrom the right to the left, during which we try to maintain the longest segment satisfyingpso far. Also, we keep a prefix of the list that is as long as the currently longest segment, which we call the window.If, when we move one element to the right, the window (now one element longer than the currently longest segment) happens to satisfy
p, it becomes the new optimal solution. Otherwise we drop the right-most element of the window so that it slides leftwards, retaining the length. Notice that it implies that we’d better represent the window using a queue, so we can efficiently add elements from the left and drop from the right.Derivation of the algorithm is a typical case of tupling.
Tupling
Given a function
h, we attempt to compute it efficiently by turning it into afoldr. It would be possible if the value of the inductive caseh (x : xs)were determined solely byxandh xs, that is:for some
f. With some investigation, however, it would turn out thath (x : xs)also depends on someg:Therefore, we instead try to construct their split
⟨ h , g ⟩as a fold, where the split is defined by:and
h = fst . ⟨ h , g ⟩.If
⟨ h , g ⟩is indeed a fold, it should scan through the list and construct a pair of ah-value and ag-value. To make it feasible, it is then hoped thatg (x : xs)can be determined byg xsandh xs. Otherwise, we may have to repeat the process again, making the fold return a triple.Segment/Prefix Decomposition
Let us look into the longest segment problem. For suffix-closed
pit is reasonable to assume thatp []is true — otherwisepwould be false everywhere. Therefore, for the base case we havemax# ∘ p ◁ ∘ segs ▪ [] = []. We denote function application by▪to avoid too many parentheses.Now the inductive case. It is not hard to derive an alternative definition of
segs:therefore, we derive:
where
xs ↑# ysreturns the longer one betweenxsandys.It suggests that we maintain, by a
foldr, a pair containing the longest segment and the longest prefix satisfyingp(that is,max# ∘ p ◁ ∘ inits). It is then hoped thatmax# ∘ p ◁ ∘ inits ▪ (x : xs)can be computed usingmax# ∘ p ◁ ∘ inits ▪ xs. And luckily, it is indeed the case, implied by the following proposition proved in an earlier post:Proposition 1 says that the list (or set) of all the prefixes of
x : xsthat satisfiespcan be computed by the longest prefix ofxs(call itys) satisfyingp, provided thatpis suffix-closed. A naive way to do so is simply by computing all the prefixes ofx : ysand do the filtering again, as is done infinits.This was the route taken in the previous post. It would turn out, however, to come up with an efficient implementation of
fwe need some more properties fromp, such as that it is also overlap-closed.The “Window”
Proposition 1 can be strengthened: to compute all the prefixes of
x : xsthat satisfiespusingfinitswe do not strictly have to start withys. Any prefix ofxslonger thanyswill do.In particular, we may choose
ito be the length of the longest segment:Appealing to intuition, Lemma 1 is true because
segs xsis a superset ofinits xs.Now we resume the reasoning:
Define
window xs = take (length ∘ max# ∘ p ◁ ∘ segs ▪ xs) xs, the reasoning above suggest that we may try the following tupling:Maintaining the Longest Segment and the Window
The task now is to express
⟨ max# ∘ p ◁ ∘ segs , window ⟩as afoldr. We can do so only if bothmax# ∘ p ◁ ∘ segs ▪ (x : xs)andwindow (x : xs)can be determined bymax# ∘ p ◁ ∘ segs ▪ xsandwindow xs. Let us see whether it is the case.Maintaining the Longest Segment
Regarding
max# ∘ p ◁ ∘ segs ▪ (x : xs), we have derived:Let
s = max# ∘ p ◁ ∘ segs ▪ xs. We consider two cases:p (x : take (length s) xs). We reason:The first step is correct because, for all
zs,p zsimplies thatmax# ∘ p ◁ ∘ inits ▪ zs = zs.¬ p (x : take (length s) xs). In this case(max# ∘ p ◁ ∘ inits ▪ (x : take (length s) xs)) ↑# smust bes, since¬ p zsimplies thatlength∘ max# ∘ p ◁ ∘ inits ▪ zs < length zs.Maintaining the Window
Now consider the window. Also, we do a case analysis:
p (x : take (length s) xs). We reason:¬ p (x : take (length s) xs). We reason:The Algorithm
In summary, the reasoning above shows that
where
stepis given byAs is typical of many program derivations, after much work we deliver an algorithm that appears to be rather simple. The key invariants that made the algorithm correct, such as that
sis the optimal segment and thatwis as long ass, are all left implicit. It would be hard to prove the correctness of the algorithm without these clues.We are not quite done yet. The window
whad better be implemented using a queue, so thatinit wcan be performed efficiently. The algorithm then runs in time linear to the length of the input list, provided thatp (x : w)can be performed in constant time -- which is usually not the case for interesting predicates. We may then again tuple the fold with some information that helps to computepefficiently. But I shall stop here.Reference