Thunderbolt 3 GPU is locking up linux host.
by Roman Babenko
Hello, I have the following problem.
I am running an Ubuntu Xenial (4.13.8) laptop (Thinkpad P50) with an
Akitio Node eGPU enclosure together with RX570 GPU.
After I authorize the thunderbolt connection, I get two addition PCI
devices registered with the kernel. Then I pass those devices (a GPU
and an HDMI Audio device) through to a Windows 10 KVM VM using
vfio-pci.
This is a setup with a many moving parts, however, this is the only
possibility I see to provide GPU power to a VM on a laptop.
Everything works perfectly for some time, then the VM and the host
lock up. After the VM lockup the mouse on the host can be moved for a
couple of second with increasingly larger lag and then finally there
is no response from the host. No dmesg errors. As the host is locked
up, I can pull the thunderbolt cable and the host starts responding.
dmesg is then showing lines as like as follows:
rtkit-daemon[1145]: The canary thread is apparently starving. Taking action.
rtkit-daemon[1145]: Demoting known real-time threads.
or
sched: RT throttling activated
So apparently something related to thunderbolt is monopolizing the cpu.
How can I further debug the problem to the point where I can file a
meaningful bug report against thunderbolt/vfio-pci/qemu ?
If this list is not the right one, who should I contact instead?
Thanks a lot in advance, Roman
3 years, 4 months
Thunderbolt not working
by Richard Thornton
Hi,
I had thunderbolt working on my NUC.
I moved the NVMe with the latest Clear Linux installed from my NUC to
a bigger PC, Thunderbolt isn't working.
It's an x299 motherboard with a Alpine Ridge add in card, the BIOS
sees thunderbolt.
Plugging in a device that used to work on the NUC doesn't load the
thunderbolt module, loading the module just gives me:
pcieport 0000:00:1b.4: Intel SPT PCH root port ACS workaround enabled
Nothing in the /sys/bus/thunderbolt/devices directory.
Thanks for taking a look.
Richard
3 years, 4 months
Networking with the new-userspace
by Joel Wirāmu Pauling
Hi there,
I was previously using tb3 networking between two alpine ridge devices (hp
zbook studio g3 and Gigabyte z270 gaming 7 Desktop mobo).
Previously with the 4.11 kernel tree and the old userspace I was
successfully able to get the userspace net device to appear and
subsequently address it with iproute2 tools/brdiges etc.
With the new 4.13 in-kernel codebase and tbtadm I am unable to get the
device chain to recognize the endpoints.
I have tried directly connecting them, and via a tb3 hub (which is my
normal usecase - laptop is connected to the hub, desktop to the hub via a
cable). Cable lengths are 30cm in both cases.
The hub enumerates correctly to the laptop, and I have no security set in
the bios in either side.
I have attempted to change the security level to user on both sides without
success.
D
esktop Side:
aenertia@kiorewha:/vol/8tb/build/qmk_firmware/keyboards$ tbtadm topology
Controller 0
+- Details:
+- Name: GA-Z270X-GAMING 7, GIGABYTE
+- Security level: SL0 (none)
Laptopside:
aenertia@hurarongo:~$ tbtadm topology Controller 0 +- Details: | +- Name:
HP ZBook Studio G3, HP Inc. | +- Security level: SL0 (none) | +- HP
Thunderbolt 3 Dock, HP Inc. +- Details: +- Route-string: 0-3 +- Authorized:
Yes +- In ACL: No +- UUID: d3010000-0000-8f08-a224-34ca4ed0221
After an unplug-replug on the desktop side in dmesg:
[927250.309988] pci_bus 0000:09: Allocating resources
[1012107.615850] pci 0000:0a:00.0: [8086:15d2] type 00 class 0x088000
[1012107.615879] pci 0000:0a:00.0: reg 0x10: [mem 0xebf00000-0xebf3ffff]
[1012107.615888] pci 0000:0a:00.0: reg 0x14: [mem 0xebf40000-0xebf40fff]
[1012107.616014] pci 0000:0a:00.0: supports D1 D2
[1012107.616015] pci 0000:0a:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[1012107.616090] iommu: Adding device 0000:0a:00.0 to group 18
[1012107.616191] pci_bus 0000:09: Allocating resources
[1012107.616638] thunderbolt 0000:0a:00.0: NHI initialized, starting
thunderbolt
[1012107.616640] thunderbolt 0000:0a:00.0: allocating TX ring 0 of size 10
[1012107.616742] thunderbolt 0000:0a:00.0: allocating RX ring 0 of size 10
[1012107.616752] thunderbolt 0000:0a:00.0: control channel created
[1012107.616753] thunderbolt 0000:0a:00.0: control channel starting...
[1012107.616754] thunderbolt 0000:0a:00.0: starting TX ring 0
[1012107.616759] thunderbolt 0000:0a:00.0: enabling interrupt at register
0x38200 bit 0 (0x0 -> 0x1)
[1012107.616760] thunderbolt 0000:0a:00.0: starting RX ring 0
[1012107.616765] thunderbolt 0000:0a:00.0: enabling interrupt at register
0x38200 bit 12 (0x1 -> 0x1001)
[1012107.731448] thunderbolt 0000:0a:00.0: current switch config:
[1012107.731449] thunderbolt 0000:0a:00.0: Switch: 8086:15d3 (Revision: 6,
TB Version: 2)
[1012107.731450] thunderbolt 0000:0a:00.0: Max Port Number: 11
[1012107.731450] thunderbolt 0000:0a:00.0: Config:
[1012107.731451] thunderbolt 0000:0a:00.0: Upstream Port Number: 5
Depth: 0 Route String: 0x0 Enabled: 1, PlugEventsDelay: 254ms
[1012107.731451] thunderbolt 0000:0a:00.0: unknown1: 0x0 unknown4: 0x0
[1012107.742507] thunderbolt 0000:0a:00.0: 0: uid: 0x8086608ac4719700
[1012107.742840] thunderbolt 0000:0a:00.0: Port 0: 8086:15d3 (Revision: 6,
TB Version: 1, Type: Port (0x1))
[1012107.742840] thunderbolt 0000:0a:00.0: Max hop id (in/out): 7/7
[1012107.742841] thunderbolt 0000:0a:00.0: Max counters: 8
[1012107.742841] thunderbolt 0000:0a:00.0: NFC Credits: 0x800000
[1012107.742960] thunderbolt 0000:0a:00.0: Port 1: 8086:15d3 (Revision: 6,
TB Version: 1, Type: Port (0x1))
[1012107.742961] thunderbolt 0000:0a:00.0: Max hop id (in/out): 15/15
[1012107.742961] thunderbolt 0000:0a:00.0: Max counters: 16
[1012107.742962] thunderbolt 0000:0a:00.0: NFC Credits: 0x3c00000
[1012107.743092] thunderbolt 0000:0a:00.0: Port 2: 8086:15d3 (Revision: 6,
TB Version: 1, Type: Port (0x1))
[1012107.743092] thunderbolt 0000:0a:00.0: Max hop id (in/out): 15/15
[1012107.743093] thunderbolt 0000:0a:00.0: Max counters: 16
[1012107.743093] thunderbolt 0000:0a:00.0: NFC Credits: 0x3c00000
[1012107.743094] thunderbolt 0000:0a:00.0: 0:3: disabled by eeprom
[1012107.743094] thunderbolt 0000:0a:00.0: 0:4: disabled by eeprom
[1012107.743094] thunderbolt 0000:0a:00.0: 0:5: disabled by eeprom
[1012107.743125] thunderbolt 0000:0a:00.0: Port 6: 8086:15d3 (Revision: 6,
TB Version: 1, Type: PCIe (0x100101))
[1012107.743126] thunderbolt 0000:0a:00.0: Max hop id (in/out): 8/8
[1012107.743126] thunderbolt 0000:0a:00.0: Max counters: 2
[1012107.743127] thunderbolt 0000:0a:00.0: NFC Credits: 0x800000
[1012107.743160] thunderbolt 0000:0a:00.0: Port 7: 8086:15d3 (Revision: 6,
TB Version: 1, Type: PCIe (0x100101))
[1012107.743160] thunderbolt 0000:0a:00.0: Max hop id (in/out): 8/8
[1012107.743160] thunderbolt 0000:0a:00.0: Max counters: 2
[1012107.743161] thunderbolt 0000:0a:00.0: NFC Credits: 0x800000
[1012107.743161] thunderbolt 0000:0a:00.0: 0:8: disabled by eeprom
[1012107.743161] thunderbolt 0000:0a:00.0: 0:9: disabled by eeprom
[1012107.743193] thunderbolt 0000:0a:00.0: Port 10: 8086:15d3 (Revision:
6, TB Version: 1, Type: DP/HDMI (0xe0101))
[1012107.743193] thunderbolt 0000:0a:00.0: Max hop id (in/out): 9/9
[1012107.743194] thunderbolt 0000:0a:00.0: Max counters: 2
[1012107.743194] thunderbolt 0000:0a:00.0: NFC Credits: 0x1000000
[1012107.743195] thunderbolt 0000:0a:00.0: 0:b: disabled by eeprom
---
Any assistance appreciated.
I am giving a talk on 10gbit+ in the home at lca2018 in Sydney and
currently I only have benchmarks from the old 4.11 codebase for tb3
networking. I would quite like to have a useable set of configs for the
current release.
Kind regards
-Joel
3 years, 4 months