Efficiently Compiling Efficient Query Plans for Modern Hardware

Buzzwords

tuple at a time
1. Volcano iterator model
2. MonetDB
3. VectorWise, Peloton
operation at a time
vector at a time
Transpilation
1. Gcc
2. compilation time may be long
JIT Compilatioon
1. push-based
2. data-centric
3. keep tuple in CPU register
4. LLVM toolkit
5. intermediate representation(IR)
HyPer\’s Adaptive Execution model

Summary

For the in-memory DBMS, the CPU becomes the bottlenecks of CPU. So the paper introduces a novel compilation strategy that translates a query into intermediate representation through LLVM to obtain nice code locality and predictable branch. Moreover, the paper uses the new technic called pipline breaker to keep a tuple in a CPU register as long as possible.

The paper proposes a query compilation strategy that uses push-based algorithm and relies on LLVM to compiled queries into native machine code. The paper also proposes the idea of pipline breaker means the algebraic operator for a given input side if it takes an incoming tuple out of the CPU registers. If we can maximize the region between the pipline breakers, then we can obtain better performance.

There are two code generation plans. One is transpilation which is based on some off-shelf compiler like Gcc while the compilation time may be high for some system calls like exec and fork. The second one is JIT compilation like LLVM which relies on IR similar to assembly to be quickly compiled into native code(just like JavaCode). Moreover, the LLVM and C++ code can be used together with some hard to write code with C++ code and hot path with LLVM to both simplify the code logic and improve the performance.

Strength of the paper

A cool idea about how to improve the performance in terms of in-memory DBMS.
The idea of pipeline breaker is awesome which emphasis the registers and code locality to improve the performance.

Weakness of the paper

Many LLVM codes may need detail explanations.

Paper learned

The idea of performance improvements should be changed in terms of in-memory DBMS.