> I am going to give it a few more days for others to review and
comment on this
> and other aspects of the patch series and then I will resubmit.
I've been mulling this over for a few days and now I believe I
see a strategy
that's going to simplify this a lot. First, a little bit of background.
Thanks for taking a look at this Ben! I really like your approach and will work on a
respin based on this. I do have a couple of comments inline below...
The first patch in your series alters spdk_mem_register so that it
has an
optional second argument (paddr). But that doesn't make sense - only one of the
tables knows what a physical address is. What we really want to do is change the
code so that the notify function in the code managing the va to pa table can
correctly identify the pa for the given va when the va is a BAR. That function
is spdk_vtophys_notify() in lib/env_dpdk/vtophys.c.
Great! I did feel there had to be a cleaner way to do this.
Is the correct translation from va to pa for the BAR present in
/proc/self/pagemap?
No. I *think* /proc/self/pagemap only contains translations for memory that has a struct
page backing and in upstream Linux this is not true for IO memory like PCIe BARs. So they
do not appear in pagemap. I certainly did check this but if you can confirm you see the
same I'd appreciate it. Note you can even do a simple check by mmaping any BAR into a
user-space program and dumping its pagemap so you don't need a CMB enabled NVMe SSD to
test this part.
There is a facility for iterating the devices already -
FOREACH_DEVICE_ON_PCIBUS, which iterates over structures
Perfect!
I think that's the right way to approach this. I know that
when you call
spdk_mem_register, you have the physical address sitting there, so on the
surface this seems much less efficient. But it keeps our nice delineation
between the roles and responsibilities of the software stack such that
spdk_mem_register is entirely unaffected - just the code that is in charge of
managing specifically the va to pa map is modified (vtophys.c). The CMB is also
going to be automatically registered with RDMA NICs for NVMe-oF as well.
I like this demarcation and the fact it will tie into the fabrics side of things is great
as I think this is an obvious next step for CMB/PMR enabled NVMe SSDs.
I haven't thought as deeply about the rest of the patches, but
they look fine to
me right now. I think this is going to be pretty neat!
OK. I will give it a few more days to get more feedback but I will incorporate all of this
into my v2 which I will probably start working on next week. Thanks again for all the
useful input. I agree this will be a nice feature to have!
Stephen