您的瀏覽器不支援JavaScript語法,網站的部份功能在JavaScript沒有啟用的狀態下無法正常使用。

Institute of Information Science, Academia Sinica

Events

Print

Press Ctrl+P to print from browser

Seminar

:::

TIGP (SNHCC) -- Finite-memory map-reduce evaluation of relational algebra queries

  • LecturerProf. Tony Tan (Department of Computer Science & Information Engineering, National Taiwan University)
    Host: TIGP SNHCC Program
  • Time2016-10-05 (Wed.) 14:00 ~ 16:00
  • LocationAuditorium 106 at IIS new Building
Abstract

We present three formal models of distributed systems for query evaluation on massive databases: Distributed Streaming with Register Automata (DSAs), Distributed Streaming with Register Transducers (DSTs), and Distributed Streaming with Register Transducers and Joins (DSTJs). These models are based on the MapReduce paradigm where the input is transformed into a dataset of key-value pairs, and on each key a local computation is performed on the values associated with that key resulting in another set of key-value pairs. Computation proceeds in a constant number of rounds, where the result of the last round is the input to the next round, and transformation to key-value pairs is required to be generic. The difference between the three models is in the local computation part. In DSAs it is limited to making one pass over its input using a register automaton, while in DSTs it can make two passes: in the first pass it uses a finite-state automaton and in the second it uses a register transducer. The third model DSTJs is an extension of DSTs, where local computations are capable of constructing the Cartesian product of two sets.

Some of the results that will be presented: (1) DSAs can evaluate relational algebra queries over bounded degree databases; (2) DSTs can evaluate semi join algebra queries over arbitrary databases; (3) DSTJs can evaluate the whole relational algebra over arbitrary databases; (4) DSTJs are strictly stronger than DSTs, which in turn, are strictly stronger than DSAs; (5) within DSAs, DSTs and DSTJs there is a strict hierarchy w.r.t. the number of rounds.

During the talk, we will briefly review some of the database and automata concepts needed to follow the talks.

If time permits, we will also present some implementations (on top of Hadoop) and their experimental results on multi-semi-join evaluation based on DSTs, and compare their performance with well-known systems such as Pig and Hive.