Baptiste Lepers
Johnny Cache: the End of DRAM Cache Conflicts in Tiered Main Memory Systems
We demonstrate that hardware management of a tiered memory system offers better performance for many applications than current methods of software management. Hardware management treats the fast memory as a cache on slower memory. The advantages are that caching can be done at cache line granularity and that data appears in fast memory as soon as it is accessed. The potential for cache conflicts has, however, led previous works to conclude these hardware methods generally perform poorly.
In this talk, we show that the use of low-overhead conflict avoidance techniques eliminates conflicts almost entirely and thereby address the above limitation. We explore two techniques. One technique simply tries to avoid conflicts between pages at page allocation time. The other technique relies on sampling memory accesses to avoid specifically conflicts between hot pages, at page allocation time and by dynamic remapping in the case of conflicts between hot pages missed at allocation time.
We have implemented these techniques in the Linux kernel on an Intel Optane machine in a system called JC. We use HPC applications, key value stores and databases to compare JC to the default Linux tiered memory management implementation and to a state-of-the-art software management approach, as exemplified by the HeMem system. Our measurements show that JC outperforms Linux and HeMem by up to 5x. A surprising conclusion of this paper is that a cache can be made to perform close-to-optimally by minimizing conflicts at page allocation time, without any monitoring of hot pages or dynamic page remapping.
In this talk, we show that the use of low-overhead conflict avoidance techniques eliminates conflicts almost entirely and thereby address the above limitation. We explore two techniques. One technique simply tries to avoid conflicts between pages at page allocation time. The other technique relies on sampling memory accesses to avoid specifically conflicts between hot pages, at page allocation time and by dynamic remapping in the case of conflicts between hot pages missed at allocation time.
We have implemented these techniques in the Linux kernel on an Intel Optane machine in a system called JC. We use HPC applications, key value stores and databases to compare JC to the default Linux tiered memory management implementation and to a state-of-the-art software management approach, as exemplified by the HeMem system. Our measurements show that JC outperforms Linux and HeMem by up to 5x. A surprising conclusion of this paper is that a cache can be made to perform close-to-optimally by minimizing conflicts at page allocation time, without any monitoring of hot pages or dynamic page remapping.
back to overview
Watch Recording