Research Summary (as of April 2001)

Our works on programming languages focus on high-level language constructs for complex data access. A major part of our works has been on efficient data representations for abstract data types. We often place emphasis on aggregate data access, in which efficiency is measured by the cost of a sequence of access operations, but not by the cost of a single access. Examples of aggregate data access include maintaining persistent data structures for functional languages, I/O primitives of C++ objects, and sparse matrix operations in Fortran 90.

We have used several programming languages in our research, including C++, Fortran 90, High Performance Fortran, Java, and the ML family of functional languages. The main research theme, however, remains the same: language and run-time supports for efficient aggregate access to complex data. We group our publications into three categories. A brief introduction is given below.

Functional Programming:
For aggregate data access, functional languages pose additional challenges because of their side-effect free semantics. We designed and implemented efficient data representations for various abstract data types in and for functional languages, including array [p1, p4], double-end queue [p3], and a general class of abstract data types [p5]). These schemes use amortization and randomization techniques, and require some complexity analyses to prove their efficiency. Based on the Bird-Meertens formalism, we devised functional primitives for efficient array operations [p6] and out-of-core data processing [p17]. We also developed a syntactical method for the analysis of recursive, higher-order functional programs [p2, p8], which is more effective than previous methods.

We utilize the parametric module facilities of ML-like functional languages to specify and generate high-level libraries for sparse matrices [t5] and distributed data structures [t6]. A system integrating Objective Caml (developed at INRIA), Message Passing Interface (a standard communication library), and fold/unfold functional primitives has been developed on an IBM SP2 cluster (distributed memory) and a 4-CPU Linux PC (shared memory) for parallel execution of functional programs. Recently we further used parametric modules of Objective Caml to derive generic functions for XML document processing [p24].

For the 10 years since 1992, we present 5 papers in the annual International Conference on Functional Programming (ICFP), and the biennial Lisp and Functional Programming and Functional Programming and Computer Architecture. (The later two conferences merged to become the annual ICFP in 1996.) ICFP is the most important conference in its field. Of the 5 papers, 3 of them appear after I joined the institute in 1994.

Language and Run-time Supports for Parallel Sparse Computations:
We showed how to convert dense array programs that use Fortran 90 array expressions for parallel sparse execution [p7, p12, p16, p20, p23] It is based on performance-calibrated sparse routines, structural information of the sparse matrices, and techniques of abstraction interpretation. We also considered a run-time selection scheme that utilized the language features of Java for adaptive sparse computation [p10]. Recently we showed how to use High Performance Fortran, a language not designed for sparse computations, to generate unstructured meshes in parallel [p13, p19, p21].

Software Systems and Tools:
We designed and implemented a system called ObjectStream to automatically support object I/O in C++ [p9]. The system is non-intrusive to user-defined class declarations, hence ensures wide applications. We further experimented with C++ software architectures and applications based on non-intrusive object introspection [p11, p25]. For distributed computing, we developed a programming model and a system based on Java and existing Web technologeis [p14]. One novelty of this model is the separation of the policy from the mechanism of task partition and distribution. We also implemented a Web-based Java-coded system for teaching and experimenting graph algorithms [p22].

We are also interested in social implications of information technologies. For example, we addressed privacy, security, and policy issues in government-controlled civil information systems such as the proposed (now canceled) smartcard-based combined National ID scheme [p15], and the Resident Information System [m3]. We also investigated differentiating factors in Internet access [m1, m2], social networks and their analyses [w1], and issues related to digital libraries [p18].