Operator Rebinding for Stream Processing on NUMA Machines

Xiaorui Du, Andrea Piccione, Adriano Pimpini, Stefano Bortoli, Alessandro Pellegrini, and Alois Knoll


Published in: Software: Practice and Experience, 2026

Abstract:
Modern stream processing engines are increasingly deployed on high-core-count servers with Non-Uniform Memory Access (NUMA) architectures, where the cost of inter-socket memory access poses a significant challenge to achieving low latency and high throughput. Existing approaches to operator placement either rely on static assignments that degrade under workload variations or employ dynamic migrations that incur excessive overhead due to blocking synchronization or global barriers. This paper introduces a lock-free, NUMA-aware operator rebinding mechanism that dynamically reallocates operator tasks across threads with minimal disruption. The mechanism uses an autonomic controller to detect imbalance in per-thread queues and enacts rebinding via control messages and atomic updates, ensuring correctness without stalling execution. A two-level policy is proposed, combining NUMA-level partitioning with intra-node thread-level refinements, triggered by latency thresholds. Extensive experiments using a 300-query urban traffic analytics workload demonstrate that the proposed method achieves non-negligible throughput improvement and reduces latency compared to state-of-the-art static and METIS-based approaches. Furthermore, it reduces latency variance by an order of magnitude, illustrating the importance of fine-grained NUMA-aware scheduling in memory-bound stream processing.

BibTeX Entry:

@article{Dux26,
author = {Du, Xiaorui and Piccione, Andrea and Pimpini, Adriano and Bortoli, Stefano and Pellegrini, Alessandro and Knoll, Alois},
title = {Operator Rebinding for Stream Processing on NUMA Machines},
journal = {Software: Practice and Experience},
year = {2026},
issn = {1097-024X},
publisher = {Wiley},
doi = {10.1002/spe.70064},
series = {SPE},
note = {To appear}
}