Synopsis

Computer systems spread out over multiple machines---also known as distributed systems---power most of today's digital world and the vast majority of web applications. Such digital systems include storage systems and search engine as well as web services. Spreading out digital systems across machines has multiple benefits including (1) the ability to survive machine failures by moving computation to a different machine and (2) the ability to leverage the capabilities of multiple machines to finish computations faster. To realize these benefits, the machines within a distributed system must coordinate amongst themselves to ensure the result is identical to an idealised illusion of executing the computation on a single machine. This project will make coordination significantly faster by developing a new abstraction, deadline-ordered multicast (DOM), which combines: (1) Multicast: The ability to simultaneously transmit a message from a sender to multiple receivers and (2) Synchronized clocks: Technology to ensure that, at any instant, clocks on various machines display the same value of time and that these clocks progress in lock step with each other. The project will demonstrate how DOM can be used to accelerate widely used computer systems, such as distributed databases, coordination services, and blockchains. The project will also train both undergraduate and graduate students using a variety of approaches to understand the benefits of synchronized clocks when thinking about distributed systems.

Distributed systems are at the heart of computing today and include widely used systems such as distributed databases, fault-tolerant key-value stores, and distributed ledgers. At the core of these systems are distributed protocols such as crash-fault-tolerant consensus, concurrency control, and Byzantine-fault-tolerant consensus. Much effort has been expended on improving the performance of distributed protocols over the years including recent efforts that leverage richer network services to improve distributed protocols. Such services include switch multicast, programmable switches, programmable network-interface cards, in-network priority queues, and control over routing. However, many distributed systems today are deployed by cloud tenants, who have no access to such rich network services, making it impossible for them to benefit from the improved protocol performance enabled by such rich network services.

This project will develop a new network primitive called deadline-ordered multicast (DOM) that will make it easier to construct high-performance distributed protocols on the public cloud. DOM leverages 2 key techniques: (1) the recent availability of synchronized clocks as a service, which now permit tightly synchronized clocks in the public cloud and (2) multicast, which struggled to find traction in the Internet, but is ideal for the one-sender-multi-receiver communication at the heart of distributed systems. DOM delivers a multicast message from a sender to multiple receivers at or after a message's deadline and delivers multiple messages in deadline order. DOM provides a consistent order (the order of deadlines) in which different receivers process a set of messages, accelerating several distributed protocols in the process. This proposal will develop the DOM abstraction, design an optimized DOM service, and prototype several applications demonstrating DOM's value.

Personnel

Personnel:

  • Anirudh Sivaraman, NYU
  • Muhammad Haseeb, NYU
  • Jinkun Geng, Stonybrook University
  • Xiyu Hao, NYU
  • Daniel Duclos-Cavalcanti, NYU
  • Ulysses Butler, NYU
  • Daniel Qian, NYU
Collaborators

Collaborators:

  • Shuai Mu, Stonybrook University
  • Aurojit Panda, NYU
  • Jinyang Li, NYU
  • Joseph Tassaroti, NYU
  • Balaji Prabhakar, Stanford
  • Srinivas Narayana, Rutgers
  • Radhika Mittal, UIUC
Publications

Papers:

  • Tiga: Lightweight and Latency-Optimal Geo-Distributed Transactions with Loosely Synchronized Clocks
    Jinkun Geng, Shuai Mu, Anirudh Sivaraman, and Balaji Prabhakar
    SOSP 2025
  • Network Support For Scalable And High-Performance Cloud Exchanges
    Muhammad Haseeb, Jinkun Geng, Daniel Duclos-Cavalcanti, Xiyu Hao, Ulysses Butler, Radhika Mittal, Srinivas Narayana, and Anirudh Sivaraman
    SIGCOMM 2025
Code

Code:

Educational activities We will cover material related to these projects in graduate networking and distributed systems courses at NYU.
Outreach and other broader impact We have given talks at Google, discussed theses projects with practitioners in Coinbase, and will give talks about the projects at SIGCOMM and SOSP.