On Mon, Jan 08, 2018 at 11:09:17AM -0700, Jason Gunthorpe wrote:
> As usual we implement what actually has a consumer. On top of
that the
> R/W API is the only core RDMA API that actually does DMA mapping for the
> ULP at the moment.
Well again the same can be said for dma_map_page vs dma_map_sg...
I don't understand this comment.
> For SENDs and everything else dma maps are done by the ULP (I'd like
> to eventually change that, though - e.g. sends through that are
> inline to the workqueue don't need a dma map to start with).
> That's because the initial design was to let the ULPs do the DMA
> mappings, which fundamentally is wrong. I've fixed it for the R/W
> API when adding it, but no one has started work on SENDs and atomics.
Well, you know why it is like this, and it is very complicated to
unwind - the HW driver does not have enough information during CQ
processing to properly do any unmaps, let alone serious error tear
down unmaps, so we'd need a bunch of new APIs developed first, like RW
did. :\
Yes, if it was trivial we would have done it already.
> > And on that topic, does this scheme work with HFI?
>
> No, and I guess we need an opt-out. HFI generally seems to be
> extremely weird.
This series needs some kind of fix so HFI, QIB, rxe, etc don't get
broken, and it shouldn't be 'fixed' at the RDMA level.
I don't think rxe is a problem as it won't show up a pci device.
HFI and QIB do show as PCI devices, and could be used for P2P transfers
from the PCI point of view. It's just that they have a layer of
software indirection between their hardware and what is exposed at
the RDMA layer.
So I very much disagree about where to place that workaround - the
RDMA code is exactly the right place.
> > This is why P2P must fit in to the common DMA framework
somehow, we
> > rely on these abstractions to work properly and fully in RDMA.
>
> Moving P2P up to common RDMA code isn't going to fix this. For that
> we need to stop preting that something that isn't DMA can abuse the
> dma mapping framework, and until then opt them out of behavior that
> assumes actual DMA like P2P.
It could, if we had a DMA op for p2p then the drivers that provide
their own ops can implement it appropriately or not at all.
Eg the correct implementation for rxe to support p2p memory is
probably somewhat straightfoward.
But P2P is _not_ a factor of the dma_ops implementation at all,
it is something that happens behind the dma_map implementation.
Think about what the dma mapping routines do:
(a) translate from host address to bus addresses
and
(b) flush caches (in non-coherent architectures)
Both are obviously not needed for P2P transfers, as they never reach
the host.
Very long term the IOMMUs under the ops will need to care about
this,
so the wrapper is not an optimal place to put it - but I wouldn't
object if it gets it out of RDMA :)
Unless you have an IOMMU on your PCIe switch and not before/inside
the root complex that is not correct.