Department of Computer Science
National Chengchi University
Taipei, Taiwan 116, R.O.C.
In relational databases, a composition requires three operations: join, projection and duplicate elimination. An external sort is required to eliminate duplicates in a large file. Both join and external sort are expensive in database systems because they incur a nonlinear I/O overhead. Most conventional composition algorithms take join and duplicate elimination as separate operations to achieve savings on either operation but not both. In this paper, we analyze the characteristics of the composition operation and find that executing the composition as a single primitive may require much less I/O overhead. Several algorithms taking this approach are proposed to achieve savings on both operations. Several experiments have been conducted showing that this new approach outperforms other approaches under almost every condition.
Keywords: database, composition, join, duplicate elimination
Received June 28, 1995; revised March 13, 1996.
Communicated by Wei-Pang Yang.