I got the run again. It is with 4k write.
13.16% vhost [.] spdk_ring_dequeue
6.08% vhost [.] rte_rdtsc
4.77% vhost [.] spdk_thread_poll
2.85% vhost [.] _spdk_reactor_run
2.43% [kernel] [k] syscall_return_via_sysret
2.17% [kernel] [k] copy_user_enhanced_fast_string
2.05% [kernel] [k] _raw_spin_lock
1.83% vhost [.] _spdk_msg_queue_run_batch
1.56% vhost [.] _spdk_event_queue_run_batch
1.56% [kernel] [k] memcpy_erms
1.39% [kernel] [k] switch_mm_irqs_off
1.33% [kernel] [k] radix_tree_next_chunk
1.17% [kernel] [k] native_queued_spin_lock_slowpath
1.13% [unknown] [k] 0xfffffe000000601b
1.02% [kernel] [k] _raw_spin_lock_irqsave
0.94% [kernel] [k] unix_stream_read_generic
0.92% [kernel] [k] load_new_mm_cr3
0.87% [kernel] [k] _raw_spin_lock_irq
0.83% [kernel] [k] cmpxchg_double_slab.isra.61
0.78% [kernel] [k] mutex_lock
0.78% [kernel] [k] unix_stream_sendmsg
0.77% [kernel] [k] sock_wfree
0.74% [kernel] [k] __schedule
On 8/29/19, 6:05 PM, "Mittal, Rishabh" <rimittal(a)ebay.com> wrote:
I got the profile with first run.
27.91% vhost [.] spdk_ring_dequeue
12.94% vhost [.] rte_rdtsc
11.00% vhost [.] spdk_thread_poll
6.15% vhost [.] _spdk_reactor_run
4.35% [kernel] [k] syscall_return_via_sysret
3.91% vhost [.] _spdk_msg_queue_run_batch
3.38% vhost [.] _spdk_event_queue_run_batch
2.83% [unknown] [k] 0xfffffe000000601b
1.45% vhost [.] spdk_thread_get_from_ctx
1.20% [kernel] [k] __fget
1.14% libpthread-2.27.so [.] __libc_read
1.00% libc-2.27.so [.] 0x000000000018ef76
0.99% libc-2.27.so [.] 0x000000000018ef79
On 8/19/19, 7:42 AM, "Luse, Paul E" <paul.e.luse(a)intel.com> wrote:
That's great. Keep any eye out for the items Ben mentions below - at least
the first one should be quick to implement and compare both profile data and measured
Don’t' forget about the community meetings either, great place to chat about
these kinds of things.
Next one is tomorrow morn US time.
From: SPDK [mailto:email@example.com] On Behalf Of Mittal, Rishabh via
Sent: Thursday, August 15, 2019 6:50 PM
To: Harris, James R <james.r.harris(a)intel.com>; Walker, Benjamin
Cc: Mittal, Rishabh <rimittal(a)ebay.com>; Chen, Xiaoxi
<xiaoxchen(a)ebay.com>; Szmyd, Brian <bszmyd(a)ebay.com>; Kadayam, Hari
Subject: Re: [SPDK] NBD with SPDK
Thanks. I will get the profiling by next week.
On 8/15/19, 6:26 PM, "Harris, James R" <james.r.harris(a)intel.com>
On 8/15/19, 4:34 PM, "Mittal, Rishabh" <rimittal(a)ebay.com>
What tool you use to take profiling.
Mostly I just use "perf top".
On 8/14/19, 9:54 AM, "Harris, James R"
On 8/14/19, 9:18 AM, "Walker, Benjamin"
When an I/O is performed in the process initiating the I/O to a
file, the data
goes into the OS page cache buffers at a layer far above the bio
(somewhere up in VFS). If SPDK were to reserve some memory and
hand it off to
your kernel driver, your kernel driver would still need to copy it
location out of the page cache buffers. We can't safely share
the page cache
buffers with a user space process.
I think Rishabh was suggesting the SPDK reserve the virtual address
Then the kernel could map the page cache buffers into that virtual
That would not require a data copy, but would require the mapping
I think the profiling data would be really helpful - to quantify how
much of the 50us
Is due to copying the 4KB of data. That can help drive next steps on
how to optimize
the SPDK NBD module.
As Paul said, I'm skeptical that the memcpy is significant in
performance you're measuring. I encourage you to go look at
some profiling data
and confirm that the memcpy is really showing up. I suspect the
instead primarily in these spots:
1) Dynamic buffer allocation in the SPDK NBD backend.
As Paul indicated, the NBD target is dynamically allocating memory
for each I/O.
The NBD backend wasn't designed to be fast - it was designed
to be simple.
Pooling would be a lot faster and is something fairly easy to
2) The way SPDK does the syscalls when it implements the NBD
Again, the code was designed to be simple, not high performance.
It simply calls
read() and write() on the socket for each command. There are much
performance ways of doing this, they're just more complex to
3) The lack of multi-queue support in NBD
Every I/O is funneled through a single sockpair up to user space.
there is locking going on. I believe this is just a limitation of
NBD today - it
doesn't plug into the block-mq stuff in the kernel and expose
sockpairs. But someone more knowledgeable on the kernel stack
would need to take
Couple of things that I am not really sure in this flow is :- 1. How memory
registration is going to work with RDMA driver.
2. What changes are required in spdk memory management
SPDK mailing list