Applications enabled by programmable networks

We have worked on new use cases enabled by programmability within networks, both new algorithms for switches and routers that leverage such programmability and new distributed applications.

Papers:

HULA (SOSR 2016), a scalable and fault-tolerant load-balancing algorithm for datacenters that leveraged then-emerging programmable switching chips.
On-Ramp (NSDI 2021), a bump-in-the-wire underlay that improves the performance of any congestion-control algorithm in public cloud environments.
Nezha (VLDB 2023), a consensus protocol based on a new deadline-ordered multicast network primitive, in turn enabled by clock synchronization.
CloudEx (HotOS 2021), a stock exchange in the cloud making use of time synchronized in-network gateways.
Snicket (HotNets 2021), a distributed tracing system using WebAssembly extensions for programmable RPC processing.
COLA (arXiv), an autoscaler for microservice-based applications.

Impact:

On-Ramp featured in Peterson and Davie's book on TCP
Considerable interest from industry in On-Ramp
300+ citations for HULA

Solver-aided compilers for fast packet processing

Programming network devices today is rudimentary and akin to programming microprocessors in the 1970s or GPUs in the 2000s—before productive languages and good compilers were developed. On the bright side, unlike at the dawn of compiler research, we now have at our disposal high-quality solver engines for many optimization and satisfiability problems (e.g., program synthesis engines, SMT solvers, ILP solvers). We have used such solvers to pose compiler problems declaratively as optimization or constraint satisfaction problems. Our thesis is that such an approach can both (1) simplify compiler development by using solvers to do a compiler’s algorithmic heavy lifting (e.g., code generation, resource allocation) and (2) improve the quality of the compiler’s output via exhaustive search, which is critical for network devices that must provide high performance.

Papers:

HotNets 2019 paper describing our overall thesis and some initial results.
Chipmunk (SIGCOMM 2020), a code generator for packet-processing pipelines, which uses program synthesis to improve the quality of generated code.
Gauntlet (OSDI 2020), tools to find bugs in packet-processing compilers using SMT solvers and fuzzing techniques.
K2 (SIGCOMM 2021), an eBPF compiler based on superoptimization techniques that combine randomized search with SMT-based verification.
p4testgen (SIGCOMM 2023), a test oracle for P4 powered by symbolic and concolic execution leveraging SMT.
CaT (ASPLOS 2023), high-level synthesis for packet processing pipelines, leveraging program synthesis for code generation and ILP techniques for resource allocation.

Impact:

Gauntlet has found 100+ new and confirmed bugs in P4 compilers and led to several P4 specification changes. It now runs within the open-source P4 compiler's continuous integration workflow.
p4testgen is a P4 community open-source project, similar to the P4 compiler and behavioral model, and has found bugs in mature P4 toolchains.

Multi-tenancy for packet processing pipelines

Network programmability would be of very limited value if only the network’s owners were able to program their devices. Ideally anyone writing an application running over a network—even one they don’t own—should be able to program the network. To that end, we envision a future where cloud providers offer programmable packet processing as a service to tenants. To support this, we need mechanisms to run multiple tenant programs on a network device.

Papers:

HotCloud 2020 paper describing our overall vision for packet processing as a service.
NetVRM (NSDI 2022), a system to virtualize memory on programmable switches.
Menshen (NSDI 2022), hardware primitives for enabling isolation between different modules on a packet-processing pipeline.

Impact:

Menshen's open-source Verilog code for an RMT-like pipeline has been used by the SimBricks project (SIGCOMM 2022).

Hardware and abstractions for packet processing

We developed new router hardware designs to walk the tightrope between programmability and performance. The unifying theme in these designs was a focus on restricted, but important, classes of router functionality—providing programmability without losing performance.

Papers:

Domino (SIGCOMM 2016), a programming language, compiler, and instruction set to write router algorithms in a transactional style and compile them to run on a programmable router at line rate.
PIFOs (SIGCOMM 2016) or Push-In First-Out queues, the first abstraction for programmable scheduling: flexibly deciding which packet is next transmitted from a router’s buffer.
Marple (SIGCOMM 2017), an abstraction and hardware design to measure network performance (e.g., packet latencies, loss rates, and reordering rates) on a high-speed router.
dRMT (SIGCOMM 2017), a new hardware architecture for programmable routers that substantially improves hardware utilization relative to the standard pipeline-based architecture for programmable routers.
PANIC (OSDI 2020), a new hardware architecture for multi-tenant programmable network interface cards.
Calendar queues (NSDI 2020), an abstraction and implementation for high-speed programmable scheduling on multi-Tbit/s switches.
DC.p4 (SOSR 2015), a case study of programming a datacenter router’s forwarding plane in P4.
In-Band Network Telemetry (SIGCOMM DEMO 2015), a proposal for switches to piggyback measurement information (queueing delays, queue sizes, etc.) on data packets.

Impact:

Domino's programming model led to several changes to P4-14, which are now part of P4-16: sequential semantics for actions, support for atomic blocks, and support for conditional operations.
The PIFO project has seen significant follow-on work in the academic community such as programmable scheduling designs for FPGAs, NICs, and host stacks; and approximations of programmable scheduling using priority and first-in first-out queues.
The PIFO project led to the formation of a P4 working group on programmable scheduling.
Domino's instruction set provided stateful manipulation support for the P4-NetFPGA project.
INT is an industry-wide standard for network telemetry, which has been used for other purposes such as congestion control.
DC.p4 led to the switch.p4 program, the most widely used P4 program.

Application-centric networked systems

Papers:

WiFi, LTE, or Both? (IMC 2014) Measuring application and transport layer performance of multi-homed mobile hosts connected simultaneously to WiFi and LTE.
Mahimahi (USENIX ATC 2015), tools to record HTTP resources during a page load and replay the page load under emulated network conditions.
ExCamera (NSDI 2017), a system for low-latency video encoding that parallelizes video encoding across thousands of threads running on AWS Lambda.
WatchTower (MobiSys 2019), a system for accelerating Web page loads using cloud proxies while preserving user privacy.

Impact:

Mahimahi has been used in the evaluations of several networking research papers, is available as a Debian package, and has been used in several networking courses.
ExCamera was an early example of what has now been come to be called burst-parallel computing.

Congestion control

Papers:

Model-driven interpretability (SIGCOMM CCR 2021), a technique for interpretable congestion control that proposes a unified Markov model representation for congestion-control algorithms.
Sprout (NSDI 2013), a congestion-control protocol designed for high throughput and low latency over highly variable cellular networks.
Learnability of congestion control (SIGCOMM 2014), an empirical study of the difficulty of learning congestion-control protocols given an imperfect model of the network.
Protocol-Design Contests (SIGCOMM CCR 2014), a classroom contest to design good congestion-control protocols.

Impact:

The protocol design contest has since been run in Stanford's graduate networking course.
Sprout has has been compared against by several subsequent congestion-control protocols.