Skip to Content

PNUTS Paper Review

PNUTS: Yahoo!’s Hosted Data Serving Platform

Buzzwords

  • data storage
    1. hashed table
    2. ordered table
  • geographically distributed service
  • automated load-balancing and failover
  • relaxed consistency
    1. per-record timeline consistency
  • asynchronous notification
    1. Pub-Sub Message System
    2. Yahoo! Message Broker(YMB)
  • hosting
  • various consistency guarantee
    1. read-any
    2. read-critical
    3. read-latest
    4. write
    5. test-and-set-write
  • scatter-gather engine

Summary

The paper aims at Web Application which requires scalability, short response time even geographically distributed, high availability, fault tolerance, and relaxed consistency. Overall, Yahoo! purposes the idea of PNUTS meets the above requiremenrs. The main components are asynchronous notification, Pub/Sub message system(Yahoo!\’s YMB) and record level mastering algorithm. And then PNUTS hosts them into a single service for simplicity.

PNUTS supports several consistency models with asynchronous interface similar to Zookeeper which is implemented by Yahoo! too. While PNUTS is more complex according to its per-record timeline consistency. PNUTS is consist of storage units, tablet controller, routers and YMB. Each record is a region and has a master. And the records are divided into tablet in terms of its storage type. The routers map between key and storage units. The read is just assigned locally. And the write should be delivered to the master and exposes asynchronous interface while also offers powerful functionality.

The update story is as below:

  1. app sends the update to local router
  2. router forwards to local storage units for the key
  3. storage unit sends the update request to its master YMB
  4. R2\’s YMB stores on disk + backup(commit point)
  5. YMB asynchronously sends the update to YMB at every region
  6. every region updates local copy (YMB -> router -> storage unit)
  7. reply with the app(ensure some stricter consistency operations)

Because of the single master and fully replicated replicas in multiple regions, the concurrent updates are serialized in one order. The write is relatively fast for asynchronous operations(provided by YSM) while also needs to be delivered to master for consistency issues. The read is fast for local operations. Moreover, the YMB is a neat idea. It supports atomic and reliable operations for publish and subscribe, and it also makes use of asynchronous sends. In terms of recovery part, the YMB uses WAL(maybe) and sends the messages to the restarted machines.

The read may be stable because of local operations, while the workflow cares little about the temporary stale data. The PNUTS consistency model also solves some weak consistency problems through the single master model, such as read-modify-write problems listed in the paper. In addition, range writes for several regions may be not consistent according to its per-record timeline consistency which may be awkward for the app developers.

Above all, PNUTS gives his answer to the real world about the distributed web services. And it makes use of the above components to build a nice hosted service.

Strength of the paper

  • YMB is a neat solution for decoupling, reliable message services and atomicity.

  • The asynchronous YMB and single master a neat idea for their relaxed consistency.

  • A nice solution to real-world problems.

Weakness of the paper

  • Many interesting parts is not clear. For example, the order problems and multiple version problems are not detailedly explained.

Comments for the paper

  • In my idea, maybe the delivery of write may be slow even the write is relatively small, I think we need some more carefully designed algorithms like conflict resolution. While the global order is also a big problem to get rid of the delivery of write to master.

  • Maybe a comparison between YMB and log-based recovery is necessary to explain whether the decoupling is feasible and even better.

Paper learned

  • How to make use of relax consistency and solve the real world problems.
comments powered by Disqus