On Fri, Aug 21, 2020 at 3:15 AM David Hildenbrand <david(a)redhat.com> wrote:
>> 1. On x86-64, e820 indicates "soft-reserved" memory. This memory is
>> automatically used in the buddy during boot, but remains untouched
>> (similar to pmem). But as it involves ACPI as well, it could also be
>> used on arm64 (-e820), correct?
> Correct, arm64 also gets the EFI support for enumerating memory this
> way. However, I would clarify that whether soft-reserved is given to
> the buddy allocator by default or not is the kernel's policy choice,
> "buddy-by-default" is ok and is what will happen anyways with older
> kernels on platforms that enumerate a memory range this way.
Is "soft-reserved" then the right terminology for that? It sounds very
x86-64/e820 specific. Maybe a compressed for of "performance
differentiated memory" might be a better fit to expose to user space, no?
No. The EFI "Specific Purpose" bit is an attribute independent of
e820, it's x86-Linux that entangles those together. There is no
requirement for platform firmware to use that designation even for
drastic performance differentiation between ranges, and conversely
there is no requirement that memory *with* that designation has any
performance difference compared to the default memory pool. So it
really is a reservation policy about a memory range to keep out of the
buddy allocator by default.
> Both, but note that PMEM is already hard-reserved by default.
> Soft-reserved is about a memory range that, for example, an
> administrator may want to reserve 100% for a weather simulation where
> if even a small amount of memory was stolen for the page cache the
> application may not meet its performance targets. It could also be a
> memory range that is so slow that only applications with higher
> latency tolerances would be prepared to consume it.
> In other words the soft-reserved memory can be used to indicate memory
> that is either too precious, or too slow for general purpose OS
Right, so actually performance-differentiated in any way :)
... or not differentiated at all which is Joao's use case for example.
> Numa node numbers / are how performance differentiated memory
> are enumerated. The expectation is that all distinct performance
> memory targets have unique ACPI proximity domains and Linux numa node
> numbers as a result.
Makes sense to me (although it's somehow weird, because memory of the
same socket/node would be represented via different NUMA nodes), thanks!
Yes, numa ids as only physical socket identifiers is no longer a
reliable assumption since the introduction of the ACPI HMAT.