Hi Fengang,


See below.







From: Fenggang Wu <wuxx0835@umn.edu>
Date: Wednesday, January 31, 2018 at 5:24 PM
To: James Harris <james.r.harris@intel.com>
Cc: Storage Performance Development Kit <spdk@lists.01.org>
Subject: Re: [SPDK] Performance Scaling in BlobFS/RocksDB by Multiple I/O Threads


Hi Jim,


I also have some quick followup question about WRR inline below.


Thank you very much!



On Wed, Jan 31, 2018 at 12:09 PM, Harris, James R <james.r.harris@intel.com> wrote:

Hi Fenggang,


The max IOPs number is per-device – not per-queue.  The observed latency for each I/O - from submission to completion - will be the same whether the 128 I/O are submitted on one queue or across four queues.  Spreading the I/O across four queues instead of one just means that the device will process ¼ the rate of I/O from each of the four queues compared to if it was submitted on a single queue.


For BlobFS, spreading the I/O across multiple NVMe queues would not normally help with latency.  There are NVMe features such as Weighted Round Robin (WRR), which provide different priorities to different queues.  With WRR, multiple NVMe queues could be used to separate high priority I/O (i.e. WAL writes) from lower priority I/O (i.e. background compaction I/O).  Most NVMe devices today do not support WRR however and even then it’s still questionable whether WRR alone would be sufficient or if additional software queuing would be required.


Is WRR a feature of device or a driver software feature? 


It is both.  WRR must first be supported by the device.  The AMS field in the CAP (Controller Capabilities) register specifies if the controller supports WRR.  I would recommend reading the section on Command Arbitration in the NVMe specification for additional details.


But the driver also requires changes to support WRR.  For example, spdk_nvme_ctrlr_alloc_io_qpair() takes an spdk_nvme_io_qpair_opts structure where the user can specify the priority for the allocated queue.


if it's a device feature: Currently our research lab has two 300GB P3700 SSD. Will they support WRR?


I do not believe the P3700 supports WRR – but you can check using the SPDK identify tool – examples/nvme/identify/identify.  Look for the section “Arbitration Mechanisms Supported”.

If it's a software feature: Does SPDK block driver now support WRR? I am guessing the BlobFS layer does not support WRR in that their is only one dedicated core/thread/qpair doing the I/O. Is my understanding right?


The SPDK bdev layer does support I/O prioritization currently.  If it did, BlobFS would still likely submit I/O from a single thread, but that thread would allocate multiple qpairs – one for each level of prioritization used.