Research Fellow  |  Wu, Jan-Jan  
Software Name:Smart Spark with Partial Replaceable Resilient Data Set
Inventors:Li-Yung Ho, Jan-Jan Wu, Pangfeng Liu

Due to this frequent user data update, we propose and implement a new partial-update RDD in Spark.  We proposethe partial-update RDD (PRDD) to address the problem of frequent reloading of RDD due to user data updates.  By using PRDD, one does not need to reload the entire data set from external storage to rebuild an RDD.  Instead we only replace the partition that contains any updated record in the RDD.  This approach achieves a more fine-grained updatability for an RDD and also improves theperformance of data reloading.
We further improve data reloading efficiency by storing frequently updated records in the same partition, so that we only need to replacejust a few partitions in order to rebuild the RDD.  That is, a large number of updated records will concentrate in just a few partitions,which will be rebuilt.  As a result, how to place records intopartitions in such a way that minimizes data loading cost becomes aninteresting problem. We propose a dynamic programming to solve it. The prototype system has been used in ChungHua Telecommunication for CDR billing. 

Software Name:Kylin: a distributed large-graph data processing system based on the BSP model
Inventors:Li-Yung Ho, Jan-Jan Wu

Kylin is a distributed computing and data management system for large graph computing/processing. Several optimization techniques are developed to speed up computation, reduce communication and disk IO cost. Currently, Kylin is more efficient and more scalable than existing open-source systems for graph computing.

Software Name:LnQ: an efficient dynamic binary translator for full-system emulation
Inventors:Chun-Chen Hsu, Jan-Jan Wu, Wei-Chung Hsu

Full system emulators provide virtual platforms for several
important applications, such as kernel and system
software development, co-verification with cycle accurate
CPU simulators, or application development for
hardware still in development. Full system emulators
usually use dynamic binary translation to obtain reasonable
performance. This system focuses on optimizing the
performance of full system emulators. First, we optimize
performance by enabling classic control transfer
optimizations of dynamic binary translation in full system
emulation, such as indirect branch target caching and
block chaining. Second, we improve the performance
of memory virtualization of cross-ISA virtual machines
by improving the efficiency of software translation
lookaside buffer (software TLB). We implement our optimizations
on QEMU, an industrial-strength full system
emulator, along with the Android emulator. Experimental
results show that our optimizations achieve an average
speedup of 1.92X for ARM-to-X86-64 QEMU running
SPEC CINT2006 benchmarks with train inputs. We use
a set of real applications downloaded from Google Play
as benchmarks for the Android emulator. Experimental
results show that our optimizations achieve an average
speedup of 1.42X for the Android emulator running these
applications. The system was made open-source in January 2015.

Software Name:HQEMU: a high-performance and retargetable dynamic binary translator on multicores
Inventors:Ding-Yong Hong, Chun-Chen Hsu, Jan-Jan Wu

HQEMU is a high-performance and retargetable dynamic binary translation system. It uses QEMU and the LLVM compiler backend as its building block, and is enhanced with advanced trace optimization and dynamic runtime optimization techniques. The experiment result with the PARSEC multithreaded benchmark suite demonstrates that HQEMU improves QEMU performance by more than 25X with 32 threads. This is very encouraging result for application migration across different ISAs (Instruction Set Architectures).

HQMEU was made open source in August 2014.

Software Name:SQLMR: a scalable database management system for cloud computing
Inventors:Meng-Ju Hsieh, Chao-Jui Chang, Li-Yung Ho, Jan-Jan Wu

SQLMR complies SQL-like queries to a sequence of MapReduce jobs. Existing SQL-based applications are compatible seamlessly with SQLMR and users can manage Tera to PataByte scale of data with SQL-like queries instead of writing MapReduce codes. We also devise a number of optimization techniques
to improve the performance of SQLMR. Our experiment results with the standard TCP-H benchmark suite demonstrate both performance and scalability advantage
of SQLMR compared to Hadoop's Hive.

SQLMR has been used for data processing for e-learning, e-traffic (e.g., UBike).