From: Fenggang Wu <wuxx0835(a)umn.edu>
Date: Wednesday, January 31, 2018 at 5:24 PM
To: James Harris <james.r.harris(a)intel.com>
Cc: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] Performance Scaling in BlobFS/RocksDB by Multiple I/O Threads
I also have some quick followup question about WRR inline below.
Thank you very much!
Department of Computer Science and Engineering<http://www.cs.umn.edu/>
College of Science and Engineering<http://cse.umn.edu/>
University of Minnesota, Twin Cities<http://www.umn.edu>
On Wed, Jan 31, 2018 at 12:09 PM, Harris, James R
The max IOPs number is per-device – not per-queue. The observed latency for each I/O -
from submission to completion - will be the same whether the 128 I/O are submitted on one
queue or across four queues. Spreading the I/O across four queues instead of one just
means that the device will process ¼ the rate of I/O from each of the four queues compared
to if it was submitted on a single queue.
For BlobFS, spreading the I/O across multiple NVMe queues would not normally help with
latency. There are NVMe features such as Weighted Round Robin (WRR), which provide
different priorities to different queues. With WRR, multiple NVMe queues could be used to
separate high priority I/O (i.e. WAL writes) from lower priority I/O (i.e. background
compaction I/O). Most NVMe devices today do not support WRR however and even then it’s
still questionable whether WRR alone would be sufficient or if additional software queuing
would be required.
Is WRR a feature of device or a driver software feature?
It is both. WRR must first be supported by the device. The AMS field in the CAP
(Controller Capabilities) register specifies if the controller supports WRR. I would
recommend reading the section on Command Arbitration in the NVMe specification for
But the driver also requires changes to support WRR. For example,
spdk_nvme_ctrlr_alloc_io_qpair() takes an spdk_nvme_io_qpair_opts structure where the user
can specify the priority for the allocated queue.
if it's a device feature: Currently our research lab has two 300GB P3700 SSD. Will
they support WRR?
I do not believe the P3700 supports WRR – but you can check using the SPDK identify tool –
examples/nvme/identify/identify. Look for the section “Arbitration Mechanisms
If it's a software feature: Does SPDK block driver now support WRR? I am guessing the
BlobFS layer does not support WRR in that their is only one dedicated core/thread/qpair
doing the I/O. Is my understanding right?
The SPDK bdev layer does support I/O prioritization currently. If it did, BlobFS would
still likely submit I/O from a single thread, but that thread would allocate multiple
qpairs – one for each level of prioritization used.