Paolo Valente
The BFQ I/O scheduler: a development journey from academia to industry
The BFQ I/O scheduler was born as a scientific paper. We wanted to guarantee, to each entity that competes for a given disk, its desired fraction of the bandwidth. The main difficulty for reaching this goal was that the very bandwidth of a rotating disk varies even by orders of magnitude with the pattern and the position of the I/O. Fortunately, the idea behind BFQ did work.
Yet, once we put our first implementation of BFQ in the real world, the real world showed us that it required much more than a scheduling engine with provable service guarantees. First, bandwidth fairness was not enough. On the contrary, to guarantee a high responsiveness when I/O is involved, interactive and real-time tasks needed to be detected and drastically privileged. We added that to BFQ.
But a second, more general challenge needed to be faced. Modern drives queue I/O internally, and a scheduler has no control on the service order of requests already queued in the drive. So the initial version of BFQ basically dispatched one request at a time, to keep control. But this killed throughput, because modern drives reach a high throughput only if filled with many I/O requests. Paradoxically, BFQ was able to give everyone their slice of the cake, with no errors, but it made the cake so small that everyone would have received a larger slice if BFQ just stepped away! This lead to the introduction of a new, peculiar technique of BFQ: I/O injection.
Eventually, BFQ's Karma seems to have reversed: instead of posing new difficulties, the real world apparently rewarded BFQ by choosing it for handling a new generation of rotating devices: multi-actuator drives. On the downside, BFQ's overhead is too high for fast SSDs. These devices are so fast that I/O-processing in the kernel has become the actual bottleneck. Yet the next iteration of reasearch&development loop may be starting, as some new ideas may allow BFQ's benefits to be extended even to very fast devices.
Yet, once we put our first implementation of BFQ in the real world, the real world showed us that it required much more than a scheduling engine with provable service guarantees. First, bandwidth fairness was not enough. On the contrary, to guarantee a high responsiveness when I/O is involved, interactive and real-time tasks needed to be detected and drastically privileged. We added that to BFQ.
But a second, more general challenge needed to be faced. Modern drives queue I/O internally, and a scheduler has no control on the service order of requests already queued in the drive. So the initial version of BFQ basically dispatched one request at a time, to keep control. But this killed throughput, because modern drives reach a high throughput only if filled with many I/O requests. Paradoxically, BFQ was able to give everyone their slice of the cake, with no errors, but it made the cake so small that everyone would have received a larger slice if BFQ just stepped away! This lead to the introduction of a new, peculiar technique of BFQ: I/O injection.
Eventually, BFQ's Karma seems to have reversed: instead of posing new difficulties, the real world apparently rewarded BFQ by choosing it for handling a new generation of rotating devices: multi-actuator drives. On the downside, BFQ's overhead is too high for fast SSDs. These devices are so fast that I/O-processing in the kernel has become the actual bottleneck. Yet the next iteration of reasearch&development loop may be starting, as some new ideas may allow BFQ's benefits to be extended even to very fast devices.
back to overview
Watch Recording