Query Processing with LLVM Paper Review
Efficiently Compiling Efficient Query Plans for Modern Hardware
Buzzwords
- tuple at a time
- Volcano iterator model
- MonetDB
- VectorWise, Peloton
- operation at a time
- vector at a time
- Transpilation
- Gcc
- compilation time may be long
- JIT Compilatioon
- push-based
- data-centric
- keep tuple in CPU register
- LLVM toolkit
- intermediate representation(IR)
- HyPer\’s Adaptive Execution model
Summary
For the in-memory DBMS, the CPU becomes the bottlenecks of CPU. So the
paper introduces a novel compilation strategy that translates a query
into intermediate representation through LLVM to obtain nice code
locality and predictable branch. Moreover, the paper uses the new
technic called pipline breaker
to keep a tuple in a CPU register as
long as possible.
The paper proposes a query compilation strategy that uses push-based
algorithm and relies on LLVM to compiled queries into native machine
code. The paper also proposes the idea of pipline breaker
means the
algebraic operator for a given input side if it takes an incoming
tuple out of the CPU registers
. If we can maximize the region between
the pipline breakers
, then we can obtain better performance.
There are two code generation plans. One is transpilation which is based
on some off-shelf compiler like Gcc
while the compilation time may be
high for some system calls like exec
and fork
. The second one is JIT
compilation like LLVM
which relies on IR similar to assembly to be
quickly compiled into native code(just like JavaCode). Moreover, the
LLVM and C++ code can be used together with some hard to write code with
C++ code and hot path with LLVM to both simplify the code logic and
improve the performance.
Strength of the paper
A cool idea about how to improve the performance in terms of in-memory DBMS.
The idea of
pipeline breaker
is awesome which emphasis the registers and code locality to improve the performance.
Weakness of the paper
- Many LLVM codes may need detail explanations.
Paper learned
- The idea of performance improvements should be changed in terms of in-memory DBMS.