Category Archives: Publication

An executable monadic specification for Spark aggregation

Yu-Fang Chen, Chih-Duo Hong, Ondřej Lengál, Shin-Cheng Mu, Nishant Sinha, and Bow-Yaw Wang. Submitted.
[Code|Supplementary Proofs]

Spark is a new promising platform for scalable data-parallel computation that provides several high-level APIs to perform efficient data aggregation. Distributed computation is inherently non-deterministic, while programmers generally wish that their aggregation yield the same result regardless of how the data is distributed. We present an executable, monadic, formal specification for Spark aggregate combinators in Haskell. By algebraic, equational reasoning of monadic programs, requirements for deterministic outcomes from distributed Spark aggregation are precisely characterized. We report several case studies to analyze deterministic outcomes and correctness of Spark programs, and also illustrate how our executable specification helps developing a distributed Spark vertex coloring program.

Queueing and glueing for optimal partitioning

Shin-Cheng Mu, Yu-Hsi Chiang, and Yu-Han Lyu. In the 21st ACM SIGPLAN International Conference on Functional Programming (ICFP 2016), Sep. 2016. Accepted.
[Paper | Code]

The queueing-glueing algorithm is the nickname we give to an algorithmic pattern that provides amortised linear time solutions to a number of optimal list partition problems that have a peculiar property: at various moments we know that two of three candidate solutions could be optimal. The algorithm works by keeping a queue of lists, glueing them from one end, while chopping from the other end, hence the name. We give a formal derivation of the algorithm, and demonstrate it with several non-trivial examples.

Code accompanying this paper is available on GitHub:
https://github.com/scmu/queueing-glueing.

Formal derivation of greedy algorithms from relational specifications: a tutorial

Yu-Hsi Chiang, Shin-Cheng Mu. Journal of Logical and Algebraic Methods in Programming, in press.
[Paper(doi:10.1016/j.jlamp.2015.12.003) | Code]

Many programming tasks can be specified as optimisation problems in which a relation is used to generate all possible solutions, from which we wish to choose an optimal one. A relational operator “shrink”, developed by José N. Oliveira, is particularly suitable for constructing greedy algorithms from such specifications. Meanwhile, it has become standard in many sub-fields in programming language that proofs must be machine-verified. This tutorial leads the readers through the development of algebraic derivations of three greedy algorithms, one fold-based and two unfold-based, using AoPA, a library designed for machine-verified relational program calculation.

Modular reifiable matching: a list-of-functors approach to two-level types

Bruno C. d. S. Oliveira, Shin-Cheng Mu, and Shu-Hung You. In the 8th ACM SIGPLAN Symposium on Haskell (Haskell 2015), pages 82-93. Sep. 2015.
[Paper (doi: 10.1145/2804302.2804315)| Code]

This paper presents Modular Reifiable Matching (MRM): a new approach to two level types using a fixpoint of list-of-functors representation. MRM allows the modular definition of datatypes and functions by pattern matching, using a style similar to the widely popular Datatypes a la Carte (DTC) approach. However, unlike DTC, MRM uses a fixpoint of list-of-functors approach to two-level types. This approach has advantages that help with various aspects of extensibility, modularity and reuse. Firstly, modular pattern matching definitions are collected using a list of matches that is fully reifiable. This allows for extensible pattern matching definitions to be easily reused/inherited, and particular matches to be overridden. Such flexibility is used, among other things, to implement extensible generic traversals. Secondly, the subtyping relation between lists of functors is quite simple, does not require backtracking, and is easy to model in languages like Haskell. MRM is implemented as a Haskell library, and its use and applicability are illustrated through various examples in the paper.

Code accompanying this paper is available on GitHub:
https://github.com/scmu/mrm.

Calculating a linear-time solution to the densest segment problem

Sharon Curtis and Shin-Cheng Mu. Journal of Functional Programming, Vol. 25, 2015.
[Paper (doi: 10.1017/S095679681500026X)| Supplementary Proofs | Code]

The problem of finding a densest segment of a list is similar to the well-known maximum segment sum problem, but its solution is surprisingly challenging. We give a general specification of such problems, and formally develop a linear-time online solution, using a sliding window style algorithm. The development highlights some elegant properties of densities, involving partitions that are decreasing and all right-skew.

The Code

The accompanying Haskell programs and Agda proofs are available on GitHub:
https://github.com/scmu/mds
The Haskell program consists of the following files:

  • FDSList.hs: a main program that, for expository purpose, represents sequences by lists.
  • FDSSeq.hs: a linear-time implementation of the algorithm, using refined data structures as described in the paper.
  • ProbMDS.hs: problem specification for the MDS problem. To be imported by FDSList.hs or FDSSeq.hs.
  • ProbMMS.hs: problem specification for the MMS problem. To be imported by FDSList.hs or FDSSeq.hs.
  • MDSTest.hs: QuickCheck tests for the MDS problem.
  • MMSTest.hs: QuickCheck tests for the MMS problem.

Programming from Galois connections

Shin-Cheng Mu and José Nuno Oliveira. In the Journal of Logic and Algebraic Programming, Vol 81(6), pages 680–704, August 2012.
[PDF]

Problem statements often resort to superlatives such as in eg. “… the smallest such number”, “… the best approximation”, “… the longest such list” which lead to specifications made of two parts: one defining a broad class of solutions (the easy part) and the other requesting the optimal such solution (the hard part).

This article introduces a binary relational combinator which mirrors this linguistic structure and exploits its potential for calculating programs by optimization. This applies in particular to specifications written in the form of Galois connections, in which one of the adjoints delivers the optimal solution.

The framework encompasses re-factoring of results previously developed by Bird and de Moor for greedy and dynamic programming, in a way which makes them less technically involved and therefore easier to understand and play with.

This is an extended version of our conference paper published in RAMiCS 2011.

Constructing list homomorphisms from proofs

Yun-Yan Chi and Shin-Cheng Mu. In the 9th Asian Symposium on Programming Languages and Systems (APLAS 2011), LNCS 7078, pages 74-88.
[PDF]

The well-known third list homomorphism theorem states that if a function h is both an instance of foldr and foldl, it is a list homomorphism. Plenty of previous works devoted to constructing list homomorphisms, however, overlook the fact that proving h is both a foldr and a foldl is often the hardest part which, once done, already provides a useful hint about what the resulting list homomorphism could be. In this paper we propose a new approach: to construct a possible candidate of the associative operator and, at the same time, to transform a proof that h is both a foldr and a foldl to a proof that h is a list homomorphism. The effort constructing the proof is thus not wasted, and the resulting program is guaranteed to be correct.

Generalising and dualising the third list-homomorphism theorem

Shin-Cheng Mu and Akimasa Morihata. In the 16th ACM SIGPLAN International Conference on Functional Programming (ICFP 2011), pages 385-391.
[PDF]

The third list-homomorphism theorem says that a function is a list homomorphism if it can be described as an instance of both a foldr and a foldl. We prove a dual theorem for unfolds and generalise both theorems to trees: if a function generating a list can be described both as an unfoldr and an unfoldl, the list can be generated from the middle, and a function that processes or builds a tree both upwards and downwards may independently process/build a subtree and its one-hole context. The point-free, relational formalism helps to reveal the beautiful symmetry hidden in the theorem.

Programming from Galois connections — principles and applications

Shin-Cheng Mu and José Nuno Oliveira. Technical Report TR-IIS-10-009, Academia Sinica, December 2010.
[PDF]

This report is an extended version of our conference submission Programming from Galois connections.

Problem statements often resort to superlatives such as in eg. “… the smallest such number”, “… the best approximation”, “… the longest such list” which lead to specifications made of two parts: one defining a broad class of solutions (the easy part) and the other requesting the optimal such solution (the hard part).

This report introduces a binary relational combinator which mirrors this linguistic structure and exploits its potential for calculating programs by optimization. This applies in particular to specifications written in the form of Galois connections, in which one of the adjoints delivers the optimal solution at target.

The framework encompasses re-factoring of results previously developed by Bird and de Moor for greedy and dynamic programming, in a way which makes them less technically involved and therefore easier to understand and play with.

Programming from Galois connections

Shin-Cheng Mu and José Nuno Oliveira. In the 12th International Conference on Relational and Algebraic Methods in Computer Science (RAMiCS #12), LNCS 6663, pages 294-313. May 30 – June 3, 2011.
[PDF]

Problem statements often resort to superlatives such as in eg. “… the smallest such number”, “… the best approximation”, “… the longest such list” which lead to specifications made of two parts: one defining a broad class of solutions (the easy part) and the other requesting the optimal such solution (the hard part).

This paper introduces a binary relational combinator which mirrors this linguistic structure and exploits its potential for calculating programs by optimization. This applies in particular to specifications written in the form of Galois connections, in which one of the adjoints delivers the optimal solution at target.

The framework encompasses re-factoring of results previously developed by Bird and de Moor for greedy and dynamic programming, in a way which makes them less technically involved and therefore easier to understand and play with.

An accompanying technical report is available online.