Query Processing with LLVM Paper Review
Efficiently Compiling Efficient Query Plans for Modern Hardware
Buzzwords
- tuple at a time
- Volcano iterator model
- MonetDB
- VectorWise, Peloton
- operation at a time
- vector at a time
- Transpilation
- Gcc
- compilation time may be long
- JIT Compilatioon
- push-based
- data-centric
- keep tuple in CPU register
- LLVM toolkit
- intermediate representation(IR)
- HyPer\’s Adaptive Execution model
Summary
For the in-memory DBMS, the CPU becomes the bottlenecks of CPU. So the
paper introduces a novel compilation strategy that translates a query
into intermediate representation through LLVM to obtain nice code
locality and predictable branch. Moreover, the paper uses the new
technic called pipline breaker to keep a tuple in a CPU register as
long as possible.
The paper proposes a query compilation strategy that uses push-based
algorithm and relies on LLVM to compiled queries into native machine
code. The paper also proposes the idea of pipline breaker means the
algebraic operator for a given input side if it takes an incoming
tuple out of the CPU registers. If we can maximize the region between
the pipline breakers, then we can obtain better performance.
There are two code generation plans. One is transpilation which is based
on some off-shelf compiler like Gcc while the compilation time may be
high for some system calls like exec and fork. The second one is JIT
compilation like LLVM which relies on IR similar to assembly to be
quickly compiled into native code(just like JavaCode). Moreover, the
LLVM and C++ code can be used together with some hard to write code with
C++ code and hot path with LLVM to both simplify the code logic and
improve the performance.
Strength of the paper
A cool idea about how to improve the performance in terms of in-memory DBMS.
The idea of
pipeline breakeris awesome which emphasis the registers and code locality to improve the performance.
Weakness of the paper
- Many LLVM codes may need detail explanations.
Paper learned
- The idea of performance improvements should be changed in terms of in-memory DBMS.