Hi Darek,
thanks for keeping me in the loop. I was completely unaware of your
effort to sync SPDK rte_vhost with DPDK rte_vhost. I have a couple of
questions for you.
On 31/1/19 1:09 μ.μ., Stojaczyk, Dariusz wrote:
Hi Nikos,
I'm SPDK core maintainer responsible for the vhost library.
I saw your virtio-vhost-user patch series on gerrithub. I know you've
been talking about it on SPDK community meeting over a month ago,
although I was on holiday at that time.
I wanted to give you some background of what is currently going on
around SPDK vhost.
SPDK currently keeps an internal copy of DPDK's rte_vhost with a
couple of storage specific changes. We have tried to upstream those
changes to DPDK, but they were rejected [1].
Yes, I have noticed a lot of differences. What is the point of
upstreaming the storage-specific changes to DPDK? I thought DPDK is
focused on networking.
Although they were
critical to support vhost-scsi or vhost-blk, they also altered how
vhost-net operated and that was DPDK's major concern. We kept the
internal rte_vhost copy but still haven't decided whether to try to
switch to DPDK's version or to completely derive from DPDK and
maintain our own vhost library.
Could you shed some light on this? I would like to know more about the
community’s thoughts on this subject. What’s the reason that discourages
you from diverging from DPDK’s rte_vhost? Why should those two projects
be related at all (not generally speaking, just in case of vhost)?
At one point we've also put together a
list of rte_vhost issues - one of which was vhost-user specification
incompliance that eventually made our vhost-scsi unusable with QEMU
2.12+. The amount of "fixes" that rte_vhost required was huge.
Instead, we tried to create a new, even lower level vhost library in
DPDK [2].
I will have a closer look at this.
The initial API proposal was warmly welcomed [3], but a few
months later, after a PoC implementation was ready, the whole library
was rejected as well [4]. [One of the concerns the new library would
address was creating an abstraction and environment for
virtio-vhost-user, but apparently DPDK team didn't find that useful at
the time]
We still have the rte_vhost copy in SPDK and we still haven't decided
on its future strategy, which is why we were so reluctant to reviewing
your patches.
Never mind.
Just last week we seem to have finally made some progress, as a DPDK
patch that would potentially allow SPDK to use DPDK's rte_vhost
directly [5] was approved for DPDK 19.05. Around the end of February I
believe SPDK will try to stop using its rte_vhost copy and switch to
DPDK's rte_vhost with the mentioned patch.
That is great news.
What exactly do you mean by “switch to DPDK’s rte_vhost”? How are you
planning to use DPDK's rte_vhost? Are you going to export the DPDK
rte_vhost public API through SPDK env_dpdk? Can you give me a roadmap
about the upcoming patches?
After that happens, I would like to ask you to rebase your patches on
latest DPDK's rte_vhost and resubmit them to DPDK.
Sure.
I can certainly
help with upstreaming vfio no-iommu support in SPDK and am even
willing to implement registering non-2MB-aligned memory,
That would be great. I think that those two changes are irrelevant to
virtio-vhost-user. Let me express my thoughts.
vfio no-IOMMU support makes perfect sense to me and it is quite easy to
support it. I used it a lot in the Storage Appliance VM, especially in
cases I wanted to have an SPDK vhost-scsi device with a virtio-scsi
storage backend (terminating in QEMU user space SCSI target) attached as
vhost SCSI LUN. I originally tried to use a vIOMMU, but I think I hit
into a QEMU bug. I reported it in qemu-devel mailing list here:
http://lists.nongnu.org/archive/html/qemu-devel/2018-10/msg05382.html
but I got no answer. So, vfio no-IOMMU was the only way for me to have
virtio-scsi storage backends. And I think that DPDK already supports
vfio no-IOMMU mode.
Speaking of the 2MB alignment restriction, I know that you are
working towards this direction:
https://review.gerrithub.io/c/spdk/spdk/+/427816/1
In case of vhost, the vhost shared memory is 2MB-aligned given that the
VM’s memory is hugepage backed. But is this restriction necessary? I
think that the whole configuration would still work if the VM’s memory
was backed by a tmpfs file (normal 4KB pages) given that we use vfio
with IOMMU support. The vhost memory regions would be mapped by the SPDK
vhost target and then registered to vfio as DMA-able memory with MAP_DMA
ioctl. vfio would take care of making this memory DMA-able. This
basically involves pinning the memory and updating the device's IOVA
domain to grant access to this memory.
I haven't tried to implement this. I will have a look at the code. If
you have any pointers on this or you have made any progress, I would
love to hear.
but rte_vhost changes belong in DPDK.
Yes, but what about the rest of my patches in the vhost library? Except
for inserting the virtio-vhost-user transport in rte_vhost, this new
transport has to be exported somehow to the end user. Currently, I am
using a command-line option in the vhost app, but I think the best would
be to choose the vhost-user transport from the RPC calls for the
construction of the vhost controller. Anyways, I guess we are going to
see that later.
I'm sorry for the previous lack of transparency in this matter.
D.
Thanks again for the above outline. I would appreciate if you could keep
me in the loop for any progress with rte_vhost. In the meantime, I am
looking forward to any comments on the vfio no-IOMMU related patches.
Nikos