Hi Jim,
Comments below:
On Jul 2, 2018, at 11:54 AM, Harris, James R
<james.r.harris(a)intel.com> wrote:
Hi Lance,
I haven’t seen this problem before. Does this system have multiple NVMe devices, and
only one (or a subset) of them has driver_override set?
Correct. However, I’ve been on other systems with the same configuration — not only the
same model of NVMe controllers, but also physically in the same slots — and on those did
not see this behavior. It’s only on one particular system so far.
I notice that the two examples you gave were both for PCI BDF 40:00.0 – I assume that
means this was collected on two different systems?
I’m sorry, that was a typo. For the case where driver_override was null, the BDF was
0000:40:00.0. Then, on that same system, the driver_override was “nvme” for BDF
0000:30:00.0.
https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-bus-pci
<
https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-bus-pci> seems to
indicate that something must be writing “nvme” to that file – i.e. it’s not done
internally by the kernel. Maybe our setup.sh script should at least check for this – i.e.
don’t reset driver_override to (null) but print a warning message if it finds it set to
“nvme”.
I agree completely, but so far my find/grep (for strings “override” and "30:00.0”)
has failed to locate the source; again, neither below my real rootfs’s /etc nor anywhere
on my initramfs. I also checked my kernel cmdline. The mystery continues. Like you,
I was also considering modifying the SPDK’s scripts/setup.sh to inspect driver_override
and at least put out some kind of message if the contents is not “(null)”.
--
Lance Hartmann
lance.hartmann(a)oracle.com
From: SPDK <spdk-bounces(a)lists.01.org <mailto:spdk-bounces@lists.01.org>> on
behalf of Lance Hartmann ORACLE <lance.hartmann(a)oracle.com
<mailto:lance.hartmann@oracle.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org
<mailto:spdk@lists.01.org>>
Date: Tuesday, June 26, 2018 at 10:17 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org
<mailto:spdk@lists.01.org>>
Subject: [SPDK] Debugging /sys driver_override affecting driver binding
During my experimentation of unbinding NVMe controllers from the Linux nvme driver and
then binding them to vfio-pci for use with SPDK, I encountered unusual behavior with one
of the controllers. For some, initially inexplicable, reason, one of the NVMe
controllers did get unbound as desired from the nvme driver, but it refused to bind to
vfio-pci, whereas all the other NVMe controllers had no trouble at all binding to
vfio-pci. Inspection of the kernel log (dmesg) didn’t help. And so after a bunch of
debugging I uncovered the culprit: the /sys driver attribute, driver_override. By
default, all of my NVMe controller’s appeared to have that attribute empty/null, e.g.:
> # cat /sys/bus/pci/devices/0000:40:00.0/driver_override
> (null)
However, I discovered that for the NVMe controller that refused to bind to vfio-pci, its
driver_override attribute contained the string “nvme”:
> # cat /sys/bus/pci/devices/0000:40:00.0/driver_override
> nvme
Per Linux kernel documentation, ABI/testing/sysfs-bus-pci:
> This file allows the driver for a device to be specified which
> will override standard static and dynamic ID matching. When
> specified, only a driver with a name matching the value written
> to driver_override will have an opportunity to bind to the
> device.
> …
Eureka! So, that explains why I had a particular NVMe device that refused to bind to
vfio-pci. I wanted to share this discovery with other folks in case that run into a
similar issue. Now, the mystery that remains: how and why did this particular NVMe
controller get its driver_override attribute set to “nvme”? It’s not being used as boot
device, I’ve never attempted to use it with LVM (Linux Logical Volume Management), nor
built any file systems on it, or any such thing. I grep’d through both my real
rootfs’s/etc and searched through my initramfs as well, but I’ve yet to discover what’s
responsible for setting that particular NVMe controller’s driver_override. Anyone have
some ideas?
thanks,
--
Lance Hartmann
lance.hartmann(a)oracle.com <mailto:lance.hartmann@oracle.com>
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org <mailto:SPDK@lists.01.org>
https://lists.01.org/mailman/listinfo/spdk
<
https://lists.01.org/mailman/listinfo/spdk>