On Wed, 2018-01-10 at 19:28 +0000, Andrey Kuzmin wrote:
On Wed, Jan 10, 2018, 20:17 Walker, Benjamin
> On Wed, 2018-01-10 at 17:00 +0000, Andrey Kuzmin wrote:
> > It appears quite logical to start submission with a check for pending
> > completions, doesn't it? Or check for completions if downstream bdev
> > busy status. That would definitely meet app expectations whatever the
> > pool size is.
> We've considered checking for completions inside the submission path if we
> otherwise return ENOMEM. So far, we've decided not to go that direction for
> 1) Even if we do this, there are still cases where we'll return ENOMEM. For
> instance, if there are no completions to reap yet.
While theoretically possible, such a case is problematic to imagine in
The user has 512 queue depth available and is submitting I/O in a tight loop.
The submission path through the blobstore and into the NVMe driver probably
takes on the order of 500ns to run. That means you can submit your full queue
depth worth in 256us. On many NAND SSDs that's well within P99 latency
expectations for 4KiB I/O, and it gets increasingly likely with larger I/O to
the point where it is almost guaranteed to happen with 128KiB requests. The user
is free to reduce the available queue depth to save memory as well.
> 2) This would result in completion callbacks in response to a
> Today, the expectations are set that completions are called in response to a
> poll call only.
Feel free to correct me if I'm wrong, but my recollection is that completion
callback may be called on submission path in case of error.
I just checked and for the nvme and bdev libraries an error code will be given
to the user as the return code for the function. The callback will not be called
because the failure is known immediately. For the blobstore library it works the
opposite way - the functions have no return code and instead always call the
user callback. I think this is probably a design mistake on my part. For these
ENOMEM cases, we need to return that to the user as a return code. That makes it
much easier to handle the situation and makes it consistent with the other
The case in question is, apparently, a corner one as application must
for completions if bdev returns busy status. One cannot run an unlimited rate
client atop a rate-limited server w/o a poll enforced at some point.
It might also be helpful to add a parameter to the poll call specifying the
minimum number of completions to reap before returning control to the app, to
deal with deadlocks like this one.
There already is a parameter that limits the number of completions reaped in a
single poll call. Even if you don't specify a limit, the drivers enforce
sensible limits by default.
> SPDK mailing list
SPDK mailing list