Thunderbolt dock: Linux kernel does not detect disconnect / reconnect correctly
by Klaus Kusche
Hello,
I use a Dell Precision 7740 notebook with a noname (startech) thunderbolt dock
(for displayport, ethernet, and several usb devices).
With the pre-installed ubuntu, everything including the tb dock
works fine in most cases.
Now I tried to run my gentoo with a self-configured kernel on that notebook.
Basically, everything works, including all devices on the tb dock.
However, there are three problems:
Problem 1:
==========
Tb hotplug does not work at all:
Tracing udev shows that my kernel generates very few "change" or "remove"
udev events, and it also generates very few "add" events on replugging:
KERNEL[256.166684] change /devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card0 (drm)
KERNEL[256.429400] change /devices/platform/USBC000:00/typec/port1 (typec)
KERNEL[256.429414] change /devices/platform/USBC000:00/typec/port1 (typec)
KERNEL[256.429422] remove /devices/platform/USBC000:00/typec/port1/port1-partner (typec)
KERNEL[256.571147] remove /devices/pci0000:00/0000:00:1c.0/0000:03:00.0/0000:04:00.0/0000:05:00.0/domain0/0-0/0-3/nvm_non_active1 (nvmem)
KERNEL[256.571156] remove /devices/pci0000:00/0000:00:1c.0/0000:03:00.0/0000:04:00.0/0000:05:00.0/domain0/0-0/0-3/nvm_active1 (nvmem)
KERNEL[256.571163] remove /devices/pci0000:00/0000:00:1c.0/0000:03:00.0/0000:04:00.0/0000:05:00.0/domain0/0-0/0-3 (thunderbolt)
KERNEL[269.153578] change /devices/platform/USBC000:00/typec/port1 (typec)
KERNEL[269.153604] change /devices/platform/USBC000:00/typec/port1 (typec)
KERNEL[269.153614] add /devices/platform/USBC000:00/typec/port1/port1-partner (typec)
KERNEL[272.552894] add /devices/pci0000:00/0000:00:1c.0/0000:03:00.0/0000:04:00.0/0000:05:00.0/domain0/0-0/0-3 (thunderbolt)
KERNEL[272.554033] add /devices/pci0000:00/0000:00:1c.0/0000:03:00.0/0000:04:00.0/0000:05:00.0/domain0/0-0/0-3/nvm_active1 (nvmem)
KERNEL[272.554079] add /devices/pci0000:00/0000:00:1c.0/0000:03:00.0/0000:04:00.0/0000:05:00.0/domain0/0-0/0-3/nvm_non_active1 (nvmem)
KERNEL[273.033059] change /devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card0 (drm)
As a result, after unplugging tb, "lspci" and "lsusb" still show
all devices on tb (although they are no longer present),
and after replugging, none of these devices work,
because they aren't reinitialized or reloaded.
After removing all the tb pci devices by hand via sysfs
and forcing a pci rescan, everything is working fine again.
The ubuntu kernel automatically generates almost 100 udev
"unbind" "remove" and "change" events on unplugging,
removing all the tb usb and pci devices
("lsusb" and "lspci" don't show any tb dock devices afterwards),
and about as many udev "add" events on replugging,
which causes all devices on tb to automatically work after replugging.
Why doesn't my kernel "get the message" for all the devices on the dock
and the usb when tb is disconnected or reconnected?
What is missing in my kernel (drivers?) or in my userland?
The kernel (5.3.6) is configured for pci hotplug, for usb type c, ...
However, in contrast to ubuntu, it is a statically linked kernel:
No modules, no initrd, not even a kernel module loader.
Tb is configured to "auth none", so auth should not be the problem.
Problem 2:
==========
DisplayPort over tb does not come back on after being turned off and on
by the kernel or X. This happens every time when switching from text console
to X or back, for all resolution changes, and for unplugging and replugging.
The X server thinks that everything is fine and the display is active
(no error messages in X, xrandr shows the DisplayPort output as "on"),
but at the kernel level, there is an error message
Oct 17 08:51:31 lap kernel: [drm:amdgpu_atombios_dp_link_train] *ERROR* clock recovery tried 5 times
Oct 17 08:51:31 lap kernel: [drm:amdgpu_atombios_dp_link_train] *ERROR* clock recovery failed
and there is no DP or HDMI signal arriving at the monitor
(monitor says "no input" and suspends itself).
In many cases, the problem can be solved by
"xset dpms force off ; sleep 2 ; xset dpms force on",
i.e. by power-cycling the output in software.
Sometimes also turning the monitor off and back on helps.
In rare cases, only shutting down and powercycling everything
brings the monitor back to life.
Again, it works (in most cases, not always) in Ubuntu.
Same question: What is my kernel or userland missing?
Problem 3:
==========
Depending on the power on order (dock + usb hub first or notebook first)
and the moment tb is connected to the notebook (before powering the notebook,
while booting, or after starting X), the kernel sometimes "forgets"
to enumerate the tb dock devices (and the usb devices connected)
connected to it: "lspci" shows only the tb controller
(in some rare cases also the pci devices in the dock),
but "lsusb" shows nothing connected to the dock.
DP over tb usually works, but nothing else.
Again, manually removing pci devices and forcing a rescan helps,
and again, ubuntu does better (in most cases).
Is there any step-by-step guide for manually configuring and testing
a kernel and system for thunderbolt by hand, without using one
of the big linux distributions?
Many thanks in advance for any help!
--
Prof. Dr. Klaus Kusche
Private address: Rosenberg 41, 07546 Gera, Germany
+49 365 20413058 klaus.kusche(a)computerix.info https://www.computerix.info
Office address: DHGE Gera, Weg der Freundschaft 4, 07546 Gera, Germany
+49 365 4341 306 klaus.kusche(a)dhge.de https://www.dhge.de
1 year, 4 months