Umang Mathur
Timestamping through a Data-Structure Lens
Many applications in distributed and concurrent computing crucially rely on logical timestamping. This involves assigning timestamps to different events in the execution of a system so that the timestamps of any two events can be used to infer the causal ordering between them.
I will first argue that timestamping (i.e., the process of computing and assigning timestamps to events) should be viewed from a data structure perspective. This perspective allows us to formulate questions such as ""what is the best data structure for timestamping?"" and paves the way for thinking about the design of new data structures for timestamping. The most popular data structure is the vector clock data structure --- a vector clock stores an integer for each node/thread in a flat array. The two basic operations on vector clocks, namely join (or merge) and copy (or update), employ k integer operations, where k is the number of threads/processes. In applications where timestamps need to be computed at every event, such vector clock updates can be a computational bottleneck, especially when k is large. Next, I will describe tree clocks, a new timestamping data structure as a replacement for vector clocks. Joining and copying tree clocks takes time that is roughly proportional to the number of entries being actually modified, which can often be smaller than k. Hence, the two operations do not suffer the a-priori cost of k integer operations. I will then discuss the application of tree clocks in data race detection for shared memory multi-threaded programs synchronizing over locks. In such programs, the happens-before (HB) partial ordering on events is used to infer the presence of data races dynamically. For computing the HB partial order, tree clocks are optimal, in the sense that no other data structure can lead to a smaller asymptotic running time. Finally, I will present empirical evidence that this data structure can significantly speed up applications such as data race detection. For the theoretically inclined, I will also discuss some future extensions and applications.
I will first argue that timestamping (i.e., the process of computing and assigning timestamps to events) should be viewed from a data structure perspective. This perspective allows us to formulate questions such as ""what is the best data structure for timestamping?"" and paves the way for thinking about the design of new data structures for timestamping. The most popular data structure is the vector clock data structure --- a vector clock stores an integer for each node/thread in a flat array. The two basic operations on vector clocks, namely join (or merge) and copy (or update), employ k integer operations, where k is the number of threads/processes. In applications where timestamps need to be computed at every event, such vector clock updates can be a computational bottleneck, especially when k is large. Next, I will describe tree clocks, a new timestamping data structure as a replacement for vector clocks. Joining and copying tree clocks takes time that is roughly proportional to the number of entries being actually modified, which can often be smaller than k. Hence, the two operations do not suffer the a-priori cost of k integer operations. I will then discuss the application of tree clocks in data race detection for shared memory multi-threaded programs synchronizing over locks. In such programs, the happens-before (HB) partial ordering on events is used to infer the presence of data races dynamically. For computing the HB partial order, tree clocks are optimal, in the sense that no other data structure can lead to a smaller asymptotic running time. Finally, I will present empirical evidence that this data structure can significantly speed up applications such as data race detection. For the theoretically inclined, I will also discuss some future extensions and applications.
back to overview
Watch Recording