Maybe it would help (me at least) if you described the complete & exact steps for your
test - both setup of the env & test and command to profile. Can you send that out?
From: Mittal, Rishabh [mailto:firstname.lastname@example.org]
Sent: Wednesday, September 4, 2019 2:45 PM
To: Walker, Benjamin <benjamin.walker(a)intel.com>; Harris, James R
<james.r.harris(a)intel.com>; spdk(a)lists.01.org; Luse, Paul E
Cc: Chen, Xiaoxi <xiaoxchen(a)ebay.com>; Kadayam, Hari <hkadayam(a)ebay.com>;
Szmyd, Brian <bszmyd(a)ebay.com>
Subject: Re: [SPDK] NBD with SPDK
Yes, I am using 64 q depth with one thread in fio. I am using AIO. This profiling is for
the entire system. I don't know why spdk threads are idle.
On 9/4/19, 11:08 AM, "Walker, Benjamin" <benjamin.walker(a)intel.com>
On Fri, 2019-08-30 at 22:28 +0000, Mittal, Rishabh wrote:
I got the run again. It is with 4k write.
13.16% vhost [.]
6.08% vhost [.]
4.77% vhost [.]
2.85% vhost [.]
You're doing high queue depth for at least 30 seconds while the trace runs,
right? Using fio with the libaio engine on the NBD device is probably the way to
go. Are you limiting the profiling to just the core where the main SPDK process
is pinned? I'm asking because SPDK still appears to be mostly idle, and I
suspect the time is being spent in some other thread (in the kernel). Consider
capturing a profile for the entire system. It will have fio stuff in it, but the
expensive stuff still should generally bubble up to the top.
On 8/29/19, 6:05 PM, "Mittal, Rishabh" <rimittal(a)ebay.com> wrote:
I got the profile with first run.
27.91% vhost [.]
12.94% vhost [.]
11.00% vhost [.]
6.15% vhost [.]
4.35% [kernel] [k]
3.91% vhost [.]
3.38% vhost [.]
2.83% [unknown] [k]
1.45% vhost [.]
1.20% [kernel] [k]
1.14% libpthread-2.27.so [.]
1.00% libc-2.27.so [.]
0.99% libc-2.27.so [.] 0x000000000018ef79
On 8/19/19, 7:42 AM, "Luse, Paul E" <paul.e.luse(a)intel.com> wrote:
That's great. Keep any eye out for the items Ben mentions below - at
least the first one should be quick to implement and compare both profile data
and measured performance.
Don’t' forget about the community meetings either, great place to chat
about these kinds of things.
Next one is tomorrow morn US time.
From: SPDK [mailto:email@example.com] On Behalf Of Mittal,
Rishabh via SPDK
Sent: Thursday, August 15, 2019 6:50 PM
To: Harris, James R <james.r.harris(a)intel.com>; Walker, Benjamin <
Cc: Mittal, Rishabh <rimittal(a)ebay.com>; Chen, Xiaoxi <
xiaoxchen(a)ebay.com>; Szmyd, Brian <bszmyd(a)ebay.com>; Kadayam, Hari <
Subject: Re: [SPDK] NBD with SPDK
Thanks. I will get the profiling by next week.
On 8/15/19, 6:26 PM, "Harris, James R"
On 8/15/19, 4:34 PM, "Mittal, Rishabh" <rimittal(a)ebay.com>
What tool you use to take profiling.
Mostly I just use "perf top".
On 8/14/19, 9:54 AM, "Harris, James R" <
On 8/14/19, 9:18 AM, "Walker, Benjamin" <
When an I/O is performed in the process initiating the
I/O to a file, the data
goes into the OS page cache buffers at a layer far
above the bio stack
(somewhere up in VFS). If SPDK were to reserve some
memory and hand it off to
your kernel driver, your kernel driver would still
need to copy it to that
location out of the page cache buffers. We can't
safely share the page cache
buffers with a user space process.
I think Rishabh was suggesting the SPDK reserve the
virtual address space only.
Then the kernel could map the page cache buffers into that
virtual address space.
That would not require a data copy, but would require the
I think the profiling data would be really helpful - to
quantify how much of the 50us
Is due to copying the 4KB of data. That can help drive
next steps on how to optimize
the SPDK NBD module.
As Paul said, I'm skeptical that the memcpy is
significant in the overall
performance you're measuring. I encourage you to go
look at some profiling data
and confirm that the memcpy is really showing up. I
suspect the overhead is
instead primarily in these spots:
1) Dynamic buffer allocation in the SPDK NBD backend.
As Paul indicated, the NBD target is dynamically
allocating memory for each I/O.
The NBD backend wasn't designed to be fast - it was
designed to be simple.
Pooling would be a lot faster and is something fairly
easy to implement.
2) The way SPDK does the syscalls when it implements
the NBD backend.
Again, the code was designed to be simple, not high
performance. It simply calls
read() and write() on the socket for each command.
There are much higher
performance ways of doing this, they're just more
complex to implement.
3) The lack of multi-queue support in NBD
Every I/O is funneled through a single sockpair up to
user space. That means
there is locking going on. I believe this is just a
limitation of NBD today - it
doesn't plug into the block-mq stuff in the kernel and
sockpairs. But someone more knowledgeable on the
kernel stack would need to take
> Couple of things that I am not really sure in this
flow is :- 1. How memory
> registration is going to work with RDMA driver.
> 2. What changes are required in spdk memory
> Rishabh Mittal
SPDK mailing list