OLTP Through the Looking Glass, and What We Found There

Buzzwords

main memory transaction processing
performance measurement
new architecture
1. logless
2. single thread
3. transaction less
TPC-C benchmark
cluster computing trend
1. shared disk
2. shared memory
3. shared nothing

Summary

Currently, many history of the DBMSs is dealing with the limitation of the hardware. The architecture of the many current DBMSs is also similar to the architecture in the 1970s. For example, because of the seriously slow disk I/O, buffer pool manager, concurrency control with logging and locking are some state-at the art designs at that time. While the situation is quite different today. The structured data can be all fit into main memory. The transaction can be faster and be processed in milliseconds.

So, the paper discusses the future improvement of the current architecture of DBMS. According to the paper, approximately, 7% of all instructions are done for actual DBMS work in a disk-based system, so the author of this paper wants to examine each component of the current design of DBMS and measures the performance of each component separately. The results of the benchmark are surprising, the benefits of stripping one of the components have relatively small impacts while if a fully stripped down systems provides a factor of twenty or more performance improvements. In terms of different workload purposed in the paper, the components involve b tree keys comparison optimization, logging. locking. latching, buffer pool manager, and some miscellaneous optimizations.

The paper also gives some advice for future OLTP engines. First, the new concurrency control should be developed with the main memory workload. Second, multi-core support should be considered with locking and latching. Moreover, weak consistency may be more suitable instead of strong consistency. Finally, B-trees optimization and replication management are also worth redesigning.

Strength of this paper

Lots of interesting ideas are purposed in this paper, which is similar to the current state-at-the-art design of in-memory DBMSs.
The way used to benchmark the current DBMSs is interesting. The idea of measure the components by stripping down each components step by step are influential.

Learned from class

The bottlenecks are completely different from before.
1. Locking/Latching
2. Cache-line misses
3. pointer chasing
4. predicate evaluation(B tree comparison optimization)
5. Data movement and copying(malloc and so on)
6. Network(improve with stored processing)
indexes may be rebuilt on the fly
query optimization is more optimized in terms of main memory DBMS. For example, traditional tuple-at-a-time and the sequential scan is not significantly faster.
lightweight logging schemes may be possible(redo only)
NVM is possible according to emerging hardware