Well, no v5.8-rc8 to line this up for v5.9, so next best is early
integration into -mm before other collisions develop.
Chatted with Justin offline and it currently appears that the missing
numa information is the fault of the platform firmware to populate all
the necessary NUMA data in the NFIT.
I'm planning on looking at some bits of this series this week, but some
questions upfront ...
The device-dax facility allows an address range to be directly mapped
through a chardev, or optionally hotplugged to the core kernel page
allocator as System-RAM. It is the mechanism for converting persistent
memory (pmem) to be used as another volatile memory pool i.e. the
current Memory Tiering hot topic on linux-mm.
In the case of pmem the nvdimm-namespace-label mechanism can sub-divide
it, but that labeling mechanism is not available / applicable to
soft-reserved ("EFI specific purpose") memory . This series provides
a sysfs-mechanism for the daxctl utility to enable provisioning of
volatile-soft-reserved memory ranges.
The motivations for this facility are:
1/ Allow performance differentiated memory ranges to be split between
kernel-managed and directly-accessed use cases.
2/ Allow physical memory to be provisioned along performance relevant
address boundaries. For example, divide a memory-side cache  along
3/ Parcel out soft-reserved memory to VMs using device-dax as a security
/ permissions boundary . Specifically I have seen people (ab)using
memmap=nn!ss (mark System-RAM as Persistent Memory) just to get the
device-dax interface on custom address ranges. A follow-on for the VM
use case is to teach device-dax to dynamically allocate 'struct page' at
runtime to reduce the duplication of 'struct page' space in both the
guest and the host kernel for the same physical pages.
I think I am missing some important pieces. Bear with me.
1. On x86-64, e820 indicates "soft-reserved" memory. This memory is not
automatically used in the buddy during boot, but remains untouched
(similar to pmem). But as it involves ACPI as well, it could also be
used on arm64 (-e820), correct?
2. Soft-reserved memory is volatile RAM with differing performance
characteristics ("performance differentiated memory"). What would be
examples of such memory? Like, memory that is faster than RAM (scratch
pad), or slower (pmem)? Or both? :) Is it a valid use case to use pmem
in a hypervisor to back this memory?
3. There seem to be use cases where "soft-reserved" memory is used via
DAX. What is an example use case? I assume it's *not* to treat it like
PMEM but instead e.g., use it as a fast buffer inside applications or
4. There seem to be use cases where some part of "soft-reserved" memory
is used via DAX, some other is given to the buddy. What is an example
use case? Is this really necessary or only some theoretical use case?
5. The "provisioned along performance relevant address boundaries." part
is unclear to me. Can you give an example of how this would look like
from user space? Like, split that memory in blocks of size X with
alignment Y and give them to separate applications?
6. If you add such memory to the buddy, is there any way the system can
differentiate it from other memory? E.g., via fake/other NUMA nodes?
Also, can you give examples of how kmem-added memory is represented in
/proc/iomem for a) pmem and b) soft-resered memory after this series
(skimming over the patches, I think there is a change for pmem, right?)?
I am really wondering if it's the right approach to squeeze this into
our pmem/nvdimm infrastructure just because it's easy to do. E.g., man
"ndctl" - "ndctl - Manage "libnvdimm" subsystem devices
Memory)" speaks explicitly about non-volatile memory.
Dan Williams (19):
x86/numa: Cleanup configuration dependent command-line options
x86/numa: Add 'nohmat' option
efi/fake_mem: Arrange for a resource entry per efi_fake_mem instance
ACPI: HMAT: Refactor hmat_register_target_device to hmem_register_device
resource: Report parent to walk_iomem_res_desc() callback
mm/memory_hotplug: Introduce default phys_to_target_node() implementation
ACPI: HMAT: Attach a device for each soft-reserved range
device-dax: Drop the dax_region.pfn_flags attribute
device-dax: Move instance creation parameters to 'struct dev_dax_data'
device-dax: Make pgmap optional for instance creation
device-dax: Kill dax_kmem_res
device-dax: Add an allocation interface for device-dax instances
device-dax: Introduce 'seed' devices
drivers/base: Make device_find_child_by_name() compatible with sysfs inputs
device-dax: Add resize support
mm/memremap_pages: Convert to 'struct range'
mm/memremap_pages: Support multiple ranges per invocation
device-dax: Add dis-contiguous resource support
device-dax: Introduce 'mapping' devices
Joao Martins (4):
device-dax: Make align a per-device property
device-dax: Add an 'align' attribute
dax/hmem: Introduce dax_hmem.region_idle parameter
device-dax: Add a range mapping allocation attribute
Documentation/x86/x86_64/boot-options.rst | 4
arch/powerpc/kvm/book3s_hv_uvmem.c | 14
arch/x86/include/asm/numa.h | 8
arch/x86/kernel/e820.c | 16
arch/x86/mm/numa.c | 11
arch/x86/mm/numa_emulation.c | 3
arch/x86/xen/enlighten_pv.c | 2
drivers/acpi/numa/hmat.c | 76 --
drivers/acpi/numa/srat.c | 9
drivers/base/core.c | 2
drivers/dax/Kconfig | 4
drivers/dax/Makefile | 3
drivers/dax/bus.c | 1046 +++++++++++++++++++++++++++--
drivers/dax/bus.h | 28 -
drivers/dax/dax-private.h | 60 +-
drivers/dax/device.c | 134 ++--
drivers/dax/hmem.c | 56 --
drivers/dax/hmem/Makefile | 6
drivers/dax/hmem/device.c | 100 +++
drivers/dax/hmem/hmem.c | 65 ++
drivers/dax/kmem.c | 199 +++---
drivers/dax/pmem/compat.c | 2
drivers/dax/pmem/core.c | 22 -
drivers/firmware/efi/x86_fake_mem.c | 12
drivers/gpu/drm/nouveau/nouveau_dmem.c | 15
drivers/nvdimm/badrange.c | 26 -
drivers/nvdimm/claim.c | 13
drivers/nvdimm/nd.h | 3
drivers/nvdimm/pfn_devs.c | 13
drivers/nvdimm/pmem.c | 27 -
drivers/nvdimm/region.c | 21 -
drivers/pci/p2pdma.c | 12
include/acpi/acpi_numa.h | 14
include/linux/dax.h | 8
include/linux/memory_hotplug.h | 5
include/linux/memremap.h | 11
include/linux/numa.h | 11
include/linux/range.h | 6
kernel/resource.c | 11
lib/test_hmm.c | 15
mm/memory_hotplug.c | 10
mm/memremap.c | 299 +++++---
tools/testing/nvdimm/dax-dev.c | 22 -
tools/testing/nvdimm/test/iomap.c | 2
44 files changed, 1825 insertions(+), 601 deletions(-)
delete mode 100644 drivers/dax/hmem.c
create mode 100644 drivers/dax/hmem/Makefile
create mode 100644 drivers/dax/hmem/device.c
create mode 100644 drivers/dax/hmem/hmem.c
David / dhildenb