Best practices on driver binding for SPDK in production environments
by Lance Hartmann ORACLE
This email to the SPDK list is a follow-on to a brief discussion held during a recent SPDK community meeting (Tue Jun 26 UTC 15:00).
Lifted and edited from the Trello agenda item (https://trello.com/c/U291IBYx/91-best-practices-on-driver-binding-for-spd... <https://trello.com/c/U291IBYx/91-best-practices-on-driver-binding-for-spd...>):
During development many (most?) people rely on the run of SPDK's scripts/setup.sh to perform a number of initializations, among them the unbinding of the Linux kernel nvme driver from NVMe controllers targeted for use by the SPDK and then binding them to either uio_pci_generic or vfio-pci. This script is applicable for development environments, but not targeted for use in productions systems employing the SPDK.
I'd like to confer with my fellow SPDK community members on ideas, suggestions and best practices for handling this driver unbinding/binding. I wrote some udev rules along with updates to some other Linux system conf files for automatically loading either the uio_pci_generic or vfio-pci modules. I also had to update my initramfs so that when the system comes all the way up, the desired NVMe controllers are already bound to the needed driver for SPDK operation. And, as a bonus, it should "just work" when a hotplug occurs as well. However, there may be additional considerations I might have overlooked on which I'd appreciate input. Further, there's the matter of how and whether to semi-automate this configuration via some kind of script and how that might vary according to Linux distro to say nothing of the determination of employing uio_pci_generic vs vfio-pci.
And, now some details:
1. I performed this on an Oracle Linux (OL) distro. I’m currently unaware how and what configuration files might be different depending on the distro. Oracle Linux is RedHat-compatible, so I’m confident my implementation should run similarly on RedHat-based systems, but I’ve yet to delve into other distro’s like Debian, SuSE, etc.
2. In preparation to writing my own udev rules, I unbound a specific NVMe controller from the Linux nvme driver by hand. Then, in another window I launched: "udevadm monitor -k -p” so that I could observe the usual udev events when a NVMe controller is bound to the nvme driver. On my system, I observed four (4) udev kernel events (abbreviated/edited output to avoid this become excessively long):
(Event 1)
KERNEL[382128.187273] add /devices/pci0000:00/0000:00:02.2/0000:30:00.0/nvme/nvme0 (nvme)
ACTION=add
DEVNAME=/dev/nvme0
…
SUBSYSTEM=nvme
(Event 2)
KERNEL[382128.244658] bind /devices/pci0000:00/0000:00:02.2/0000:30:00.0 (pci)
ACTION=bind
DEVPATH=/devices/pci0000:00/0000:00:02.2/0000:30:00.0
DRIVER=nvme
…
SUBSYSTEM=pci
(Event 3)
KERNEL[382130.697832] add /devices/virtual/bdi/259:0 (bdi)
ACTION=add
DEVPATH=/devices/virtual/bdi/259:0
...
SUBSYSTEM=bdi
(Event 4)
KERNEL[382130.698192] add /devices/pci0000:00/0000:00:02.2/0000:30:00.0/nvme/nvme0/nvme0n1 (block)
ACTION=add
DEVNAME=/dev/nvme0n1
DEVPATH=/devices/pci0000:00/0000:00:02.2/0000:30:00.0/nvme/nvme0/nvme0n1
DEVTYPE=disk
...
SUBSYSTEM=block
3. My udev rule triggers on (Event 2) above: the bind action. Upon this action, my udev rule appends operations to the special udev RUN variable such that udev will essentially mirror that which is done in the SPDK’s scripts/setup.sh for unbinding from the nvme driver and binding to, in my case, the vfio-pci driver.
4. With my new udev rules in place, I was successful getting specific NVMe controllers (based on bus-device-function) to unbind from the Linux nvme driver and bind to vfio-pci. However, I made a couple of observations in the kernel log (dmesg). In particular, I was drawn to the following for an NVMe controller at BDF: 0000:40:00.0 for which I had a udev rule to unbind from nvme and bind to vfio-pci:
[ 35.534279] nvme nvme1: pci function 0000:40:00.0
[ 37.964945] nvme nvme1: failed to mark controller live
[ 37.964947] nvme nvme1: Removing after probe failure status: 0
One theory I have for the above is that my udev RUN rule was invoked while the nvme driver’s probe() was still running on this controller, and perhaps the unbind request came in before the probe() completed hence this “name1: failed to mark controller live”. This has left lingering in my mind that maybe instead of triggering on (Event 2) when the bind occurs, that perhaps I should instead try to derive a trigger on the “last" udev event, an “add”, where the NVMe namespace’s are instantiated. Of course, I’d need to know ahead of time just how many namespaces exist on that controller if I were to do that so I’d trigger on the last one. I’m wondering if that may help to avoid what looks like a complaint during the middle of probe() of that particular controller. Then, again, maybe I can just safely ignore that and not worry about it at all? Thoughts?
I discovered another issue during this experimentation that is somewhat tangential to this task, but I’ll write a separate email on that topic.
thanks for any feedback,
--
Lance Hartmann
lance.hartmann(a)oracle.com
1 year, 2 months
Chandler Build Pool Test Failures
by Howell, Seth
Hi all,
There has been a rash of failures on the test pool starting last night. I was able to root cause the failures to a point in the NVMe-oF shutdown tests. The main substance of the failure is that QAT and the DPDK framework don't always play well with secondary dpdk processes. In the interest of avoiding these failures on future builds, please rebase your changes on the following patch series which includes the fix of not running bdevperf as a secondary process in the NVMe-oF shutdown tests.
https://review.gerrithub.io/c/spdk/spdk/+/435937/6
Thanks,
Seth Howell
1 year, 3 months
Topic from last week's community meeting
by Luse, Paul E
Hi Shuhei,
I was out of town last week and missed the meeting but saw on Trello you had the topic below:
"a few idea: log structured data store , data store with compression, and metadata replication of Blobstore"
Which I'd be pretty interested in working on with you or at least hearing more about it. When you get a chance, no hurry, can you please expand a little on how the conversation went and what you're looking at specifically?
Thanks!
Paul
1 year, 3 months
Add py-spdk client for SPDK
by We We
Hi, all
I have submitted the py-spdk code on https://review.gerrithub.io/#/c/379741/, please take some time to visit it, I will be very grateful to you.
The py-spdk is client which can help the upper-level app to communicate with the SPDK-based app (such as: nvmf_tgt, vhost, iscsi_tgt, etc.). Should I submit it into the other repo I rebuild rather than SPDK repo? Because I think it is a relatively independent kit upon the SPDK.
If you have some thoughts about the py-spdk, please share with me.
Regards,
Helloway
1 year, 3 months
SPDK virtio-vhost-user series
by Stojaczyk, Dariusz
Hi Nikos,
I'm SPDK core maintainer responsible for the vhost library.
I saw your virtio-vhost-user patch series on gerrithub. I know you've
been talking about it on SPDK community meeting over a month ago,
although I was on holiday at that time.
I wanted to give you some background of what is currently going on
around SPDK vhost.
SPDK currently keeps an internal copy of DPDK's rte_vhost with a
couple of storage specific changes. We have tried to upstream those
changes to DPDK, but they were rejected [1]. Although they were
critical to support vhost-scsi or vhost-blk, they also altered how
vhost-net operated and that was DPDK's major concern. We kept the
internal rte_vhost copy but still haven't decided whether to try to
switch to DPDK's version or to completely derive from DPDK and
maintain our own vhost library. At one point we've also put together a
list of rte_vhost issues - one of which was vhost-user specification
incompliance that eventually made our vhost-scsi unusable with QEMU
2.12+. The amount of "fixes" that rte_vhost required was huge.
Instead, we tried to create a new, even lower level vhost library in
DPDK [2]. The initial API proposal was warmly welcomed [3], but a few
months later, after a PoC implementation was ready, the whole library
was rejected as well [4]. [One of the concerns the new library would
address was creating an abstraction and environment for
virtio-vhost-user, but apparently DPDK team didn't find that useful at
the time]
We still have the rte_vhost copy in SPDK and we still haven't decided
on its future strategy, which is why we were so reluctant to reviewing
your patches.
Just last week we seem to have finally made some progress, as a DPDK
patch that would potentially allow SPDK to use DPDK's rte_vhost
directly [5] was approved for DPDK 19.05. Around the end of February I
believe SPDK will try to stop using its rte_vhost copy and switch to
DPDK's rte_vhost with the mentioned patch.
After that happens, I would like to ask you to rebase your patches on
latest DPDK's rte_vhost and resubmit them to DPDK. I can certainly
help with upstreaming vfio no-iommu support in SPDK and am even
willing to implement registering non-2MB-aligned memory, but rte_vhost
changes belong in DPDK.
I'm sorry for the previous lack of transparency in this matter.
D.
[1] https://www.mail-archive.com/dev@dpdk.org/msg91788.html
[2] https://www.mail-archive.com/dev@dpdk.org/msg101943.html
[3] https://www.mail-archive.com/dev@dpdk.org/msg102042.html
[4] https://www.mail-archive.com/dev@dpdk.org/msg104886.html
[5] http://patches.dpdk.org/patch/49921/
1 year, 8 months
Third Annual U.S. Summit!
by Walker, Benjamin
Our 3rd annual U.S. Summit is tentatively scheduled for April 16th and April
17th at the Dolce Hayes Mansion in San Jose, CA (same venue as last year). We
are working to close on the venue and will send a subsequent announcement as
soon as an agreement is in place. The agenda for the two day event is still to
be determined. Sessions will include: keynote talks, what’s new, and what's
coming in the future. There will be technical deep dives on exciting topics
including NVMe-oF, Open Channel SSDs, Persistent Memory, and Networking, and
sessions from multiple community members.
At this time, we'd like to solicit talks for the summit. If your project is
using SPDK, PMDK, ISA-L, and/or DPDK and you'd like to give a 30 minute or one
hour technical talk about the cool and innovative ways you're using or
contributing to these projects, please write a short abstract and send it
directly to me (benjamin.walker(a)intel.com) by February 15th. Please include your
name, your contact information, and your company or organization. Space will be
limited.
Last year’s event had 200 attendees and was not only full of great content and
conversation, but was a ton of fun! We hope to see you there!
1 year, 11 months
Re: [SPDK] Strange CI failure
by Harris, James R
Thanks Shahar. For now, you can reply to your own patch on GerritHub with just the word "retrigger" - it will re-run your patch through the test pool. That will get your patch unblocked while Paul looks at the intermittent test failure.
-Jim
On 1/29/19, 8:48 AM, "SPDK on behalf of Luse, Paul E" <spdk-bounces(a)lists.01.org on behalf of paul.e.luse(a)intel.com> wrote:
Thanks! I've got a few hours of meetings coming up but here's what I see. If you can repro that'd be great, we can get a github issue up and going. If not I can look deeper into this later if someone else doesn't jump in by then with an "aha" moment :)
Starting SPDK v19.01-pre / DPDK 18.11.0 initialization...
[ DPDK EAL parameters: identify -c 0x1 -n 1 -m 0 --base-virtaddr=0x200000000000 --file-prefix=spdk0 --proc-type=auto ]
EAL: Detected 16 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Auto-detected process type: SECONDARY
EAL: Multi-process socket /var/run/dpdk/spdk0/mp_socket_835807_c029d817e596b
EAL: Probing VFIO support...
EAL: VFIO support initialized
test/nvme/nvme.sh: line 108: 835807 Segmentation fault (core dumped) $rootdir/examples/nvme/identify/identify -i 0
08:50:18 # trap - ERR
08:50:18 # print_backtrace
08:50:18 # [[ ehxBE =~ e ]]
08:50:18 # local shell_options=ehxBE
08:50:18 # set +x
========== Backtrace start: ==========
From: Shahar Salzman [mailto:shahar.salzman@kaminario.com]
Sent: Tuesday, January 29, 2019 8:35 AM
To: Luse, Paul E <paul.e.luse(a)intel.com>; Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: Strange CI failure
https://ci.spdk.io/spdk-jenkins/results/autotest-per-patch/builds/21382/a...
I can copy paste it if you cannot reach the link.
________________________________
From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces@lists.01.org>> on behalf of Luse, Paul E <paul.e.luse(a)intel.com<mailto:paul.e.luse@intel.com>>
Sent: Tuesday, January 29, 2019 5:22 PM
To: Storage Performance Development Kit
Subject: Re: [SPDK] Strange CI failure
Can you send a link to the full log?
-----Original Message-----
From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Shahar Salzman
Sent: Tuesday, January 29, 2019 8:21 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>
Subject: [SPDK] Strange CI failure
Hi,
I have encountered a CI failure that has nothing to do with my code.
The reason that I know it has nothing to do with it, is that the change is a gdb macro.
Do we know that this test machine is unstable?
Here is the backtrace:
========== Backtrace start: ==========
in test/nvme/nvme.sh:108 -> main()
...
103 report_test_completion "nightly_nvme_reset"
104 timing_exit reset
105 fi
106
107 timing_enter identify
=> 108 $rootdir/examples/nvme/identify/identify -i 0
109 for bdf in $(iter_pci_class_code 01 08 02); do
110 $rootdir/examples/nvme/identify/identify -r "trtype:PCIe traddr:${bdf}" -i 0
111 done
112 timing_exit identify
113
...
Shahar
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK@lists.01.org>
https://lists.01.org/mailman/listinfo/spdk
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK@lists.01.org>
https://lists.01.org/mailman/listinfo/spdk
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk
1 year, 11 months
Spdk NvmeOverTcp with VPP crashes on perf tool exit
by Ramaraj Pandian
I am trying to run spdk-perf-tcp tool against spdk TCP target with VPP enabled. Test successfully ran but on exit, target crashes with the following:
(gdb) bt
#0 0x00007f91e414d1f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f91e414e8e8 in __GI_abort () at abort.c:90
#2 0x000000000060e435 in os_panic () at /home/vpp_18.01.1/build-data/../src/vppinfra/unix-misc.c:176
#3 0x000000000055b63c in debugger () at /home/vpp_18.01.1/build-data/../src/vppinfra/error.c:84
#4 0x000000000055ba43 in _clib_error (how_to_die=2, function_name=0x0, line_number=0, fmt=0x6a0400 "%s:%d (%s) assertion `%s' fails")
at /home/vpp_18.01.1/build-data/../src/vppinfra/error.c:143
#5 0x00000000005984bd in mheap_maybe_unlock (v=0x7f91b8001000) at /home/vpp_18.01.1/build-data/../src/vppinfra/mheap.c:80
#6 0x000000000059a6ec in mheap_put (v=0x7f91b8001000, uoffset=0) at /home/vpp_18.01.1/build-data/../src/vppinfra/mheap.c:862
#7 0x00000000005eae35 in clib_mem_free (p=0x7f91b80145e8) at /home/vpp_18.01.1/build-data/../src/vppinfra/mem.h:186
#8 0x00000000005eb15a in vec_resize_allocate_memory (v=0x7f91b80145ec, length_increment=6, data_bytes=57, header_bytes=4, data_align=4)
at /home/vpp_18.01.1/build-data/../src/vppinfra/vec.c:96
#9 0x000000000056b0d9 in _vec_resize (v=0x7f91b80145ec, length_increment=6, data_bytes=53, header_bytes=0, data_align=0)
at /home/vpp_18.01.1/build-data/../src/vppinfra/vec.h:142
#10 0x000000000056c3dc in va_format (s=0x7f91b80145ec "vppcom_session_close:2175: [447] vpp handle 0x1", fmt=0x6acaf0 "[%d] vpp handle 0x%llx, sid %d: closing session...", va=0x7f91e29fd4d0)
at /home/vpp_18.01.1/build-data/../src/vppinfra/format.c:403
#11 0x000000000055b8f2 in _clib_error (how_to_die=4, function_name=0x6ae7c0 <__FUNCTION__.38260> "vppcom_session_close", line_number=2175, fmt=0x6acaf0 "[%d] vpp handle 0x%llx, sid %d: closing session...")
at /home/vpp_18.01.1/build-data/../src/vppinfra/error.c:127
#12 0x000000000063d3f0 in vppcom_session_close (session_index=6) at /home/vpp_18.01.1/build-data/../src/vcl/vppcom.c:2174
#13 0x0000000000469e9c in spdk_vpp_sock_close (_sock=0x15bb810) at vpp.c:361
#14 0x00000000004aed70 in spdk_sock_close (sock=0x15bb8b8) at sock.c:113
#15 0x000000000047ed83 in spdk_nvmf_tcp_qpair_destroy (tqpair=0x15bb850) at tcp.c:548
#16 0x0000000000484d8c in spdk_nvmf_tcp_close_qpair (qpair=0x15bb850) at tcp.c:2804
#17 0x000000000047d73a in spdk_nvmf_transport_qpair_fini (qpair=0x15bb850) at transport.c:233
#18 0x000000000047bb40 in _spdk_nvmf_qpair_destroy (ctx=0x7f91d403ead0, status=0) at nvmf.c:734
I would really appreciate your help.
Thanks
Ram
1 year, 11 months
Implementing NVM Subsystem Reset (NSSR) in SPDK
by Gill, John
We are trying to implement NSSR using SPDK. In NVME CLI using the default linux NVMe driver, we can get the reset to occur, and the kernel driver seemingly re-probes the system automatically. We would like to achieve similar behavior when using the DPDK driver without toggling drivers or performing a host reboot.
I am not sure if anyone has had similar needs. If so, I would appreciate any suggestions on accomplishing this.
1 year, 11 months
Strange CI failure
by Shahar Salzman
Hi,
I have encountered a CI failure that has nothing to do with my code.
The reason that I know it has nothing to do with it, is that the change is a gdb macro.
Do we know that this test machine is unstable?
Here is the backtrace:
========== Backtrace start: ==========
in test/nvme/nvme.sh:108 -> main()
...
103 report_test_completion "nightly_nvme_reset"
104 timing_exit reset
105 fi
106
107 timing_enter identify
=> 108 $rootdir/examples/nvme/identify/identify -i 0
109 for bdf in $(iter_pci_class_code 01 08 02); do
110 $rootdir/examples/nvme/identify/identify -r "trtype:PCIe traddr:${bdf}" -i 0
111 done
112 timing_exit identify
113
...
Shahar
1 year, 11 months