中央研究院 資訊科學研究所

活動訊息

友善列印

列印可使用瀏覽器提供的(Ctrl+P)功能

學術演講

:::

TIGP (SNHCC) -- Finite-memory map-reduce evaluation of relational algebra queries

  • 講者陳偉鬆 教授 (國立臺灣大學資訊工程學系)
    邀請人:TIGP SNHCC Program
  • 時間2016-10-05 (Wed.) 14:00 ~ 16:00
  • 地點資訊所新館106演講廳
摘要

We present three formal models of distributed systems for query evaluation on massive databases: Distributed Streaming with Register Automata (DSAs), Distributed Streaming with Register Transducers (DSTs), and Distributed Streaming with Register Transducers and Joins (DSTJs). These models are based on the MapReduce paradigm where the input is transformed into a dataset of key-value pairs, and on each key a local computation is performed on the values associated with that key resulting in another set of key-value pairs. Computation proceeds in a constant number of rounds, where the result of the last round is the input to the next round, and transformation to key-value pairs is required to be generic. The difference between the three models is in the local computation part. In DSAs it is limited to making one pass over its input using a register automaton, while in DSTs it can make two passes: in the first pass it uses a finite-state automaton and in the second it uses a register transducer. The third model DSTJs is an extension of DSTs, where local computations are capable of constructing the Cartesian product of two sets.

Some of the results that will be presented: (1) DSAs can evaluate relational algebra queries over bounded degree databases; (2) DSTs can evaluate semi join algebra queries over arbitrary databases; (3) DSTJs can evaluate the whole relational algebra over arbitrary databases; (4) DSTJs are strictly stronger than DSTs, which in turn, are strictly stronger than DSAs; (5) within DSAs, DSTs and DSTJs there is a strict hierarchy w.r.t. the number of rounds.

During the talk, we will briefly review some of the database and automata concepts needed to follow the talks.

If time permits, we will also present some implementations (on top of Hadoop) and their experimental results on multi-semi-join evaluation based on DSTs, and compare their performance with well-known systems such as Pig and Hive.