[GIT PULL] libnvdimm for v5.8-rc2
by Dan Williams
Hi Linus, please pull from:
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
tags/libnvdimm-for-5.8-rc2
...to receive a feature (papr_scm health retrieval) and a fix (sysfs
attribute visibility) for v5.8.
Vaibhav explains in the merge commit below why missing v5.8 would be
painful and I agreed to try a -rc2 pull because only cosmetics kept
this out of -rc1 and his initial versions were posted in more than
enough time for v5.8 consideration.
===
These patches are tied to specific features that were committed to
customers in upcoming distros releases (RHEL and SLES) whose time-lines
are tied to 5.8 kernel release.
Being able to track the health of an nvdimm is critical for our
customers that are running workloads leveraging papr-scm nvdimms.
Missing the 5.8 kernel would mean missing the distro timelines and
shifting forward the availability of this feature in distro kernels by
at least 6 months.
===
I notice that these do not have an ack from Michael, but I had been
assuming that he was deferring this to a libnvdimm subsystem decision
ever since v7 back at the end of May where he said "I don't have
strong opinions about the user API, it's really up to the nvdimm
folks." [1]
This pull request includes v13 of papr_scm set, and it looks good to me.
Please consider pulling, I would not normally have broached asking for
this exception, but Vaibhav made sure to have less than 24 hour turn
around on all final review comments.
This has been in -next all week with no reported issues.
[1]: http://lore.kernel.org/r/875zcigafk.fsf@mpe.ellerman.id.au
---
The following changes since commit b3a9e3b9622ae10064826dccb4f7a52bd88c7407:
Linux 5.8-rc1 (2020-06-14 12:45:04 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
tags/libnvdimm-for-5.8-rc2
for you to fetch changes up to 9df24eaef86f5d5cb38c77eaa1cfa3eec09ebfe8:
Merge branch 'for-5.8/papr_scm' into libnvdimm-for-next (2020-06-19
14:18:51 -0700)
----------------------------------------------------------------
libnvdimm for 5.8-rc2
- Fix the visibility of the region 'align' attribute. The new unit tests
for region alignment handling caught a corner case where the alignment
cannot be specified if the region is converted from static to dynamic
provisioning at runtime.
- Add support for device health retrieval for the persistent memory
supported by the papr_scm driver. This includes both the standard
sysfs "health flags" that the nfit persistent memory driver publishes
and a mechanism for the ndctl tool to retrieve a health-command payload.
----------------------------------------------------------------
Dan Williams (1):
Merge branch 'for-5.8/papr_scm' into libnvdimm-for-next
Vaibhav Jain (6):
powerpc: Document details on H_SCM_HEALTH hcall
seq_buf: Export seq_buf_printf
powerpc/papr_scm: Fetch nvdimm health information from PHYP
powerpc/papr_scm: Improve error logging and handling papr_scm_ndctl()
ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods
powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH
Vishal Verma (1):
nvdimm/region: always show the 'align' attribute
Documentation/ABI/testing/sysfs-bus-papr-pmem | 27 ++
Documentation/powerpc/papr_hcalls.rst | 46 ++-
arch/powerpc/include/uapi/asm/papr_pdsm.h | 132 ++++++++
arch/powerpc/platforms/pseries/papr_scm.c | 420 +++++++++++++++++++++++++-
drivers/nvdimm/region_devs.c | 14 +-
include/uapi/linux/ndctl.h | 1 +
lib/seq_buf.c | 1 +
7 files changed, 618 insertions(+), 23 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-pmem
create mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h
2 years
Re: Question on PMEM regions (Linux 4.9 Kernel & above)
by Dan Williams
[ add back linux-nvdimm as others may hit the same issue too and I
want this in the archives ]
On Fri, Jun 19, 2020 at 4:49 PM Ananth, Rajesh <Rajesh.Ananth(a)smartm.com> wrote:
>
> Dan,
>
> Thank you so much for your response. Our PLATFORM is totally NFIT compliant and does not use the Type-12/E820 maps.
Ah, great.
>
> We have 2 NVDIMMs interleaved in the same Memory Channel, each 16 GB in size.
>
> This is what the 4.7.9 Kernel reports for the for "/proc/iomem":
Can you post the output of:
acpdump -n NFIT
...?
Labels can't create new regions, so there must be a behavior
difference in how these kernels are parsing this NFIT.
>
> 00001000-0009afff : System RAM
> 0009b000-0009ffff : reserved
> 000a0000-000bffff : PCI Bus 0000:00
> 000c0000-000c7fff : Video ROM
> 000c4000-000c7fff : PCI Bus 0000:00
> 000c8000-000c8dff : Adapter ROM
> 000c9000-000c9dff : Adapter ROM
> 000e0000-000fffff : reserved
> 000f0000-000fffff : System ROM
> 00100000-6984ffff : System RAM
> 2e000000-2e7f1922 : Kernel code
> 2e7f1923-2ed448ff : Kernel data
> 2eedb000-2f055fff : Kernel bss
> 69850000-6c1f8fff : reserved
> 6b1dd018-6b1dd018 : APEI ERST
> 6b1dd01c-6b1dd021 : APEI ERST
> 6b1dd028-6b1dd039 : APEI ERST
> 6b1dd040-6b1dd04c : APEI ERST
> 6b1dd050-6b1df04f : APEI ERST
> 6c1f9000-6c322fff : System RAM
> 6c323000-6ce83fff : ACPI Non-volatile Storage
> 6ce84000-6f2fcfff : reserved
> 6f2fd000-6f7fffff : System RAM
> fec00000-fec003ff : IOAPIC 0
> fec01000-fec013ff : IOAPIC 1
> fec08000-fec083ff : IOAPIC 2
> fec10000-fec103ff : IOAPIC 3
> fec18000-fec183ff : IOAPIC 4
> fec20000-fec203ff : IOAPIC 5
> fec28000-fec283ff : IOAPIC 6
> fec30000-fec303ff : IOAPIC 7
> fec38000-fec383ff : IOAPIC 8
> fed00000-fed003ff : HPET 0
> fed00000-fed003ff : PNP0103:00
> fed12000-fed1200f : pnp 00:01
> fed12010-fed1201f : pnp 00:01
> fed1b000-fed1bfff : pnp 00:01
> fed20000-fed44fff : reserved
> fed45000-fed8bfff : pnp 00:01
> fee00000-feefffff : pnp 00:01
> fee00000-fee00fff : Local APIC
> ff000000-ffffffff : reserved
> ff000000-ffffffff : pnp 00:01
> 100000000-407fffffff : System RAM
> 4080000000-487fffffff : Persistent Memory <<<<< PERSISTENT MEMORY
> 4080000000-487fffffff : namespace0.0
> 4880000000-887fffffff : System RAM
>
> The same system configuration under 4.16 Kernel (We just rebooted with a new Kernel):
>
> 00001000-0009afff : System RAM
> 0009b000-0009ffff : Reserved
> 000a0000-000bffff : PCI Bus 0000:00
> 000c0000-000c7fff : Video ROM
> 000c4000-000c7fff : PCI Bus 0000:00
> 000c8000-000c8dff : Adapter ROM
> 000c9000-000c9dff : Adapter ROM
> 000e0000-000fffff : Reserved
> 000f0000-000fffff : System ROM
> 00100000-6984ffff : System RAM
> 69850000-6c1f8fff : Reserved
> 6b1dd018-6b1dd018 : APEI ERST
> 6b1dd01c-6b1dd021 : APEI ERST
> 6b1dd028-6b1dd039 : APEI ERST
> 6b1dd040-6b1dd04c : APEI ERST
> 6b1dd050-6b1df04f : APEI ERST
> 6c1f9000-6c322fff : System RAM
> 6c323000-6ce83fff : ACPI Non-volatile Storage
> 6ce84000-6f2fcfff : Reserved
> 6f2fd000-6f7fffff : System RAM
> 6f800000-8fffffff : Reserved
> 80000000-8fffffff : PCI MMCONFIG 0000 [bus 00-ff]
> 90000000-9d7fffff : PCI Bus 0000:00
> fec18000-fec183ff : IOAPIC 4
> fec20000-fec203ff : IOAPIC 5
> fec28000-fec283ff : IOAPIC 6
> fec30000-fec303ff : IOAPIC 7
> fec38000-fec383ff : IOAPIC 8
> fed00000-fed003ff : HPET 0
> fed00000-fed003ff : PNP0103:00
> fed12000-fed1200f : pnp 00:01
> fed12010-fed1201f : pnp 00:01
> fed1b000-fed1bfff : pnp 00:01
> fed20000-fed44fff : Reserved
> fed45000-fed8bfff : pnp 00:01
> fee00000-feefffff : pnp 00:01
> fee00000-fee00fff : Local APIC
> ff000000-ffffffff : Reserved
> ff000000-ffffffff : pnp 00:01
> 100000000-407fffffff : System RAM
> 4080000000-487fffffff : Persistent Memory <<< PERSISTENT MEMORY
> 4080000000-447fffffff : namespace0.0
> 4480000000-487fffffff : namespace1.0
> 4880000000-887fffffff : System RAM
> 4d15000000-4d15c031d0 : Kernel code
> 4d15c031d1-4d16387b7f : Kernel data
> 4d1692d000-4d16a82fff : Kernel bss
>
>
> Thanks,
> Rajesh
>
> -----Original Message-----
> From: Dan Williams [mailto:dan.j.williams@intel.com]
> Sent: Friday, June 19, 2020 4:34 PM
> To: Ananth, Rajesh
> Cc: linux-nvdimm(a)lists.01.org
> Subject: Re: Question on PMEM regions (Linux 4.9 Kernel & above)
>
> SMART Modular Security Checkpoint: External email. Please make sure you trust this source before clicking links or opening attachments.
>
> On Fri, Jun 19, 2020 at 4:18 PM Ananth, Rajesh <Rajesh.Ananth(a)smartm.com> wrote:
> >
> > I have a question on the default REGION creation (unlabeled NVDIMM) on the Interleave Sets. I observe that for a Single Interleave Set, the Linux Kernels earlier to 4.9 create only one "Region0->namespace0.0" (pmem0 for the entire size), but in the later Kernels I observe for the same Interleave Set it creates "Region0->namespace0.0" and "Region1->namespace1.0" by default (pmem0, pmem1 for half the size of the Interleave set).
> >
> > I don't have any explicit labels created using the ndctl utilities. I just plug-in the fresh NVDIMM modules like I always do.
> >
> > I searched for and found the relevant information on that front regarding the nd_pmem driver and the support for multiple pmem namespaces. I am wondering whether is there a way I could -- through Kernel Parameters or something -- get the default behavior the same as it existed before Kernel 4.9 driver changes.
>
> How is your platform BIOS indicating the persistent memory range? I
> suspect you might be using the non-standard Type-12 memory hack and
> are hitting this issue:
>
> 23446cb66c07 x86/e820: Don't merge consecutive E820_PRAM ranges
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit...
>
> For it to show up as one range the BIOS needs to tell Linux that it is
> one coherent range. You can force the kernel to override the BIOS
> provided memory map with the memmap= parameter. Some details of that
> here:
>
> https://nvdimm.wiki.kernel.org/how_to_choose_the_correct_memmap_kernel_pa...
2 years
Re: Question on PMEM regions (Linux 4.9 Kernel & above)
by Dan Williams
On Fri, Jun 19, 2020 at 4:18 PM Ananth, Rajesh <Rajesh.Ananth(a)smartm.com> wrote:
>
> I have a question on the default REGION creation (unlabeled NVDIMM) on the Interleave Sets. I observe that for a Single Interleave Set, the Linux Kernels earlier to 4.9 create only one "Region0->namespace0.0" (pmem0 for the entire size), but in the later Kernels I observe for the same Interleave Set it creates "Region0->namespace0.0" and "Region1->namespace1.0" by default (pmem0, pmem1 for half the size of the Interleave set).
>
> I don't have any explicit labels created using the ndctl utilities. I just plug-in the fresh NVDIMM modules like I always do.
>
> I searched for and found the relevant information on that front regarding the nd_pmem driver and the support for multiple pmem namespaces. I am wondering whether is there a way I could -- through Kernel Parameters or something -- get the default behavior the same as it existed before Kernel 4.9 driver changes.
How is your platform BIOS indicating the persistent memory range? I
suspect you might be using the non-standard Type-12 memory hack and
are hitting this issue:
23446cb66c07 x86/e820: Don't merge consecutive E820_PRAM ranges
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit...
For it to show up as one range the BIOS needs to tell Linux that it is
one coherent range. You can force the kernel to override the BIOS
provided memory map with the memmap= parameter. Some details of that
here:
https://nvdimm.wiki.kernel.org/how_to_choose_the_correct_memmap_kernel_pa...
2 years
Question on PMEM regions (Linux 4.9 Kernel & above)
by Ananth, Rajesh
I have a question on the default REGION creation (unlabeled NVDIMM) on the Interleave Sets. I observe that for a Single Interleave Set, the Linux Kernels earlier to 4.9 create only one "Region0->namespace0.0" (pmem0 for the entire size), but in the later Kernels I observe for the same Interleave Set it creates "Region0->namespace0.0" and "Region1->namespace1.0" by default (pmem0, pmem1 for half the size of the Interleave set).
I don't have any explicit labels created using the ndctl utilities. I just plug-in the fresh NVDIMM modules like I always do.
I searched for and found the relevant information on that front regarding the nd_pmem driver and the support for multiple pmem namespaces. I am wondering whether is there a way I could -- through Kernel Parameters or something -- get the default behavior the same as it existed before Kernel 4.9 driver changes.
Thanks,
Rajesh
2 years
[PATCH][next] nvdimm/region: Use struct_size() in kzalloc()
by Gustavo A. R. Silva
Make use of the struct_size() helper instead of an open-coded version
in order to avoid any potential type mistakes.
This issue was found with the help of Coccinelle and, audited and fixed
manually.
Addresses-KSPP-ID: https://github.com/KSPP/linux/issues/83
Signed-off-by: Gustavo A. R. Silva <gustavoars(a)kernel.org>
---
drivers/nvdimm/region_devs.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 4502f9c4708d..8365fb1a9114 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1063,8 +1063,7 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
struct nd_blk_region *ndbr;
ndbr_desc = to_blk_region_desc(ndr_desc);
- ndbr = kzalloc(sizeof(*ndbr) + sizeof(struct nd_mapping)
- * ndr_desc->num_mappings,
+ ndbr = kzalloc(struct_size(ndbr, nd_region.mapping, ndr_desc->num_mappings),
GFP_KERNEL);
if (ndbr) {
nd_region = &ndbr->nd_region;
--
2.27.0
2 years
[PATCH v5 00/10] Support new pmem flush and sync instructions for POWER
by Aneesh Kumar K.V
This patch series enables the usage os new pmem flush and sync instructions on POWER
architecture. POWER10 introduces two new variants of dcbf instructions (dcbstps and dcbfps)
that can be used to write modified locations back to persistent storage. Additionally,
POWER10 also introduce phwsync and plwsync which can be used to establish order of these
writes to persistent storage.
This series exposes these instructions to the rest of the kernel. The existing
dcbf and hwsync instructions in P8 and P9 are adequate to enable appropriate
synchronization with OpenCAPI-hosted persistent storage. Hence the new instructions
are added as a variant of the old ones that old hardware won't differentiate.
On POWER10, pmem devices will be represented by a different device tree compat
strings. This ensures that older kernels won't initialize pmem devices on POWER10.
W.r.t userspace we want to make sure applications are enabled to use MAP_SYNC only
if they are using the new instructions. To avoid the wrong usage of MAP_SYNC on
newer hardware, we disable MAP_SYNC by default on newer hardware. The namespace specific
attribute /sys/block/pmem0/dax/sync_fault can be used to enable MAP_SYNC later.
With this:
1) vPMEM continues to work since it is a volatile region. That
doesn't need any flush instructions.
2) pmdk and other user applications get updated to use new instructions
and updated packages are made available to all distributions
3) On newer hardware, the device will appear with a new compat string.
Hence older distributions won't initialize pmem on newer hardware.
4) If we have a newer kernel with an older distro, we use the per
namespace sysfs knob that prevents the usage of MAP_SYNC.
5) Sometime in the future, we mark the CONFIG_ARCH_MAP_SYNC_DISABLE=n
on ppc64 when we are confident that everybody is using the new flush
instruction.
Chaanges from V4:
* Add namespace specific sychronous fault control.
Changes from V3:
* Add new compat string to be used for the device.
* Use arch_pmem_flush_barrier() in dm-writecache.
Aneesh Kumar K.V (10):
powerpc/pmem: Restrict papr_scm to P8 and above.
powerpc/pmem: Add new instructions for persistent storage and sync
powerpc/pmem: Add flush routines using new pmem store and sync
instruction
libnvdimm/nvdimm/flush: Allow architecture to override the flush
barrier
powerpc/pmem/of_pmem: Update of_pmem to use the new barrier
instruction.
powerpc/pmem: Avoid the barrier in flush routines
powerpc/book3s/pmem: Add WARN_ONCE to catch the wrong usage of pmem
flush functions.
libnvdimm/dax: Add a dax flag to control synchronous fault support
powerpc/pmem: Disable synchronous fault by default
powerpc/pmem: Initialize pmem device on newer hardware
arch/powerpc/include/asm/cacheflush.h | 10 ++++
arch/powerpc/include/asm/ppc-opcode.h | 12 ++++
arch/powerpc/lib/pmem.c | 46 ++++++++++++--
arch/powerpc/platforms/Kconfig.cputype | 9 +++
arch/powerpc/platforms/pseries/papr_scm.c | 31 +++++++++-
arch/powerpc/platforms/pseries/pmem.c | 6 ++
drivers/dax/bus.c | 2 +-
drivers/dax/super.c | 73 +++++++++++++++++++++++
drivers/md/dm-writecache.c | 2 +-
drivers/nvdimm/of_pmem.c | 8 +++
drivers/nvdimm/pmem.c | 4 ++
drivers/nvdimm/region_devs.c | 24 ++++++--
include/linux/dax.h | 16 +++++
include/linux/libnvdimm.h | 8 +++
mm/Kconfig | 3 +
15 files changed, 243 insertions(+), 11 deletions(-)
--
2.26.2
2 years
[RESEND PATCH v2] monitor: Add epoll timeout for forcing a full dimm health check
by Vaibhav Jain
This patch adds a new command argument to the 'monitor' command namely
'--poll' that triggers a call to notify_dimm_event() at regular
intervals forcing a periodic check of status/events for the nvdimm
objects i.e. bus, dimms, regions or namespaces.
This behavior is useful for dimms that do not support event notifications
in case the health status of an nvdimm changes. This is especially
true in case of PAPR-SCM nvdimms as the PHYP hypervisor doesn't provide
any notifications to the guest kernel on a change in nvdimm health
status. In such case periodic polling of the is the only way to track
the health of a nvdimm.
The patch updates monitor_event() adding a timeout value to
epoll_wait() call. Also to prevent the possibility of a single dimm
generating enough events thereby preventing check for status of other
nvdimms objects, a 'fullpoll_ts' time-stamp is added to keep track of
when full check of all nvdimms objects happened. If after epoll_wait()
returns 'fullpoll_ts' time-stamp indicates last a full status check
for nvdimm objects happened beyond 'poll-interval' seconds then a full
status check is enforced.
Cc: QI Fuli <qi.fuli(a)jp.fujitsu.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Signed-off-by: Vaibhav Jain <vaibhav(a)linux.ibm.com>
---
Changelog:
Resend
* None
v1..v2
* Changed the '--check-interval' arg to '--poll' [Dan Williams]
* Update the documentation and patch description of the '--poll' arg
to accuratly reflect that it can report status/events for
all nvdimm objects. [Dan Williams]
---
Documentation/ndctl/ndctl-monitor.txt | 4 ++++
ndctl/monitor.c | 31 ++++++++++++++++++++++++---
2 files changed, 32 insertions(+), 3 deletions(-)
diff --git a/Documentation/ndctl/ndctl-monitor.txt b/Documentation/ndctl/ndctl-monitor.txt
index 2239f047266d..0b6bb5c416c6 100644
--- a/Documentation/ndctl/ndctl-monitor.txt
+++ b/Documentation/ndctl/ndctl-monitor.txt
@@ -108,6 +108,10 @@ will not work if "--daemon" is specified.
The monitor will attempt to enable the alarm control bits for all
specified events.
+-p::
+--poll=::
+ Poll and report status/event every <n> seconds.
+
-u::
--human::
Output monitor notification as human friendly json format instead
diff --git a/ndctl/monitor.c b/ndctl/monitor.c
index 1755b87a5eeb..4e9b2236ff3c 100644
--- a/ndctl/monitor.c
+++ b/ndctl/monitor.c
@@ -4,6 +4,7 @@
#include <stdio.h>
#include <json-c/json.h>
#include <libgen.h>
+#include <time.h>
#include <dirent.h>
#include <util/json.h>
#include <util/filter.h>
@@ -33,6 +34,7 @@ static struct monitor {
bool daemon;
bool human;
bool verbose;
+ unsigned int poll_timeout;
unsigned int event_flags;
struct log_ctx ctx;
} monitor;
@@ -322,9 +324,14 @@ static int monitor_event(struct ndctl_ctx *ctx,
struct monitor_filter_arg *mfa)
{
struct epoll_event ev, *events;
- int nfds, epollfd, i, rc = 0;
+ int nfds, epollfd, i, rc = 0, polltimeout = -1;
struct monitor_dimm *mdimm;
char buf;
+ /* last time a full poll happened */
+ struct timespec fullpoll_ts, ts;
+
+ if (monitor.poll_timeout)
+ polltimeout = monitor.poll_timeout * 1000;
events = calloc(mfa->num_dimm, sizeof(struct epoll_event));
if (!events) {
@@ -354,14 +361,30 @@ static int monitor_event(struct ndctl_ctx *ctx,
}
}
+ clock_gettime(CLOCK_BOOTTIME, &fullpoll_ts);
while (1) {
did_fail = 0;
- nfds = epoll_wait(epollfd, events, mfa->num_dimm, -1);
- if (nfds <= 0 && errno != EINTR) {
+ nfds = epoll_wait(epollfd, events, mfa->num_dimm, polltimeout);
+ if (nfds < 0 && errno != EINTR) {
err(&monitor, "epoll_wait error: (%s)\n", strerror(errno));
rc = -errno;
goto out;
}
+
+ /* If needed force a full poll of dimm health */
+ clock_gettime(CLOCK_BOOTTIME, &ts);
+ if ((fullpoll_ts.tv_sec - ts.tv_sec) > monitor.poll_timeout) {
+ nfds = 0;
+ dbg(&monitor, "forcing a full poll\n");
+ }
+
+ /* If we timed out then fill events array with all dimms */
+ if (nfds == 0) {
+ list_for_each(&mfa->dimms, mdimm, list)
+ events[nfds++].data.ptr = mdimm;
+ fullpoll_ts = ts;
+ }
+
for (i = 0; i < nfds; i++) {
mdimm = events[i].data.ptr;
if (util_dimm_event_filter(mdimm, monitor.event_flags)) {
@@ -570,6 +593,8 @@ int cmd_monitor(int argc, const char **argv, struct ndctl_ctx *ctx)
"use human friendly output formats"),
OPT_BOOLEAN('v', "verbose", &monitor.verbose,
"emit extra debug messages to log"),
+ OPT_UINTEGER('p', "poll", &monitor.poll_timeout,
+ "poll and report events/status every <n> seconds"),
OPT_END(),
};
const char * const u[] = {
--
2.26.2
2 years
[ndctl PATCH v7 0/5] Add support for reporting papr nvdimm health
by Vaibhav Jain
Changes since v6 [1]:
* Removed a stale comment and assignment from 'add_dimm()'.
* Updated patch description for Patch-1,2 based on review comments on
v6 patch-series.
* Updated links to kernel patch series in patch-2.
[1] https://lore.kernel.org/linux-nvdimm/20200616053029.84731-1-vaibhav@linux...
---
This patch-set proposes changes to libndctl to add support for reporting
health for nvdimms that support the PAPR standard[2]. The standard defines
machenism (HCALL) through which a guest kernel can query and fetch health
and performance stats of an nvdimm attached to the hypervisor[3]. Until
now 'ndctl' was unable to report these stats for papr_scm dimms on PPC64
guests due to absence of ACPI/NFIT, a limitation which this patch-set tries
to address.
The patch-set introduces support for the new PAPR PDSM family
defined at [4] & [5] via a new dimm-ops named
'papr_dimm_ops'. Infrastructure to probe and distinguish papr-scm
dimms from other dimm families that may support ACPI/NFIT is
implemented by updating the 'struct ndctl_dimm' initialization
routines to bifurcate based on the nvdimm type. We also introduce two
new dimm-ops member for handling initialization of dimm specific data
for specific DSM families.
These changes coupled with proposed kernel changes located at Ref[1] should
provide a way for the user to retrieve NVDIMM health status using ndtcl for
pseries guests. Below is a sample output using proposed kernel + ndctl
changes:
# ndctl list -DH
[
{
"dev":"nmem0",
"flag_smart_event":true,
"health":{
"health_state":"fatal",
"shutdown_state":"dirty"
}
}
]
Structure of the patchset
=========================
We start with a re-factoring patch that splits the 'add_dimm()' function
into two functions one that take care of allocating and initializing
'struct ndctl_dimm' and another that takes care of initializing nfit
specific dimm attributes.
Patch-2 introduces probe function of papr nvdimms and assigning
'papr_dimm_ops' defined in 'papr.c' to 'dimm->ops' if
needed. The patch also code to parse the dimm flags specific to
papr nvdimms
Patches-3,4 implements scaffolding to add support for PAPR PDSM
requests and pull in their definitions from the kernel.
Finally Patch-6 add support for issuing and handling the result of
'struct ndctl_cmd' to request dimm health stats from papr_scm kernel module
and returning appropriate health status to libndctl for reporting.
References
==========
[2] "Power Architecture Platform Reference"
https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
[3] "Hypercall Op-codes (hcalls)"
https://github.com/torvalds/linux/blob/master/Documentation/powerpc/papr_...
[4] "powerpc/papr_scm: Add support for reporting nvdimm health"
https://lore.kernel.org/linux-nvdimm/20200615124407.32596-1-vaibhav@linux...
[5] "ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods"
https://lore.kernel.org/linux-nvdimm/20200615124407.32596-6-vaibhav@linux...
Vaibhav Jain (5):
libndctl: Refactor out add_dimm() to handle NFIT specific init
libncdtl: Add initial support for NVDIMM_FAMILY_PAPR nvdimm family
libndctl,papr_scm: Add definitions for PAPR nvdimm specific methods
papr: Add scaffolding to issue and handle PDSM requests
libndctl,papr_scm: Implement support for PAPR_PDSM_HEALTH
ndctl/lib/Makefile.am | 1 +
ndctl/lib/libndctl.c | 262 +++++++++++++++++++++++++++++------------
ndctl/lib/libndctl.sym | 5 +
ndctl/lib/papr.c | 224 +++++++++++++++++++++++++++++++++++
ndctl/lib/papr.h | 15 +++
ndctl/lib/papr_pdsm.h | 132 +++++++++++++++++++++
ndctl/lib/private.h | 4 +
ndctl/libndctl.h | 2 +
ndctl/ndctl.h | 1 +
9 files changed, 573 insertions(+), 73 deletions(-)
create mode 100644 ndctl/lib/papr.c
create mode 100644 ndctl/lib/papr.h
create mode 100644 ndctl/lib/papr_pdsm.h
--
2.26.2
2 years
[ndctl PATCH v6 0/5] Add support for reporting papr nvdimm health
by Vaibhav Jain
Changes since v5 [1]:
* Removed the patch introducing new dimm-ops 'dimm_init()' &
'dimm_uninit()'. Corrosponding code that used the dimm private
initialization is also removed.
* Updated various dimm ops callback to rely on 'struct ndctl_cmd' arg
instead of dimm-private.
* Added ndctl_bus_has_of_node() and ndctl_bus_is_papr_scm() to library
ld version script.
* Simplified probing of new papr compatible nvdimm based on
introduction of new exported library function
ndctl_bus_is_papr_scm().
* Reworked various dimm-ops callbacks based on update uapi interface
with papr_scm as defined at [5].
* Introduced a new header 'papr.h' that defines 'struct nd_pkg_papr'
that holds 'struct nd_cmd_pkg gen' and 'struct nd_pkg_pdsm pdsm'
together.
[1] https://lore.kernel.org/linux-nvdimm/20200529220600.225320-1-vaibhav@linu...
---
This patch-set proposes changes to libndctl to add support for reporting
health for nvdimms that support the PAPR standard[2]. The standard defines
machenism (HCALL) through which a guest kernel can query and fetch health
and performance stats of an nvdimm attached to the hypervisor[3]. Until
now 'ndctl' was unable to report these stats for papr_scm dimms on PPC64
guests due to absence of ACPI/NFIT, a limitation which this patch-set tries
to address.
The patch-set introduces support for the new PAPR PDSM family
defined at [4] & [5] via a new dimm-ops named
'papr_dimm_ops'. Infrastructure to probe and distinguish papr-scm
dimms from other dimm families that may support ACPI/NFIT is
implemented by updating the 'struct ndctl_dimm' initialization
routines to bifurcate based on the nvdimm type. We also introduce two
new dimm-ops member for handling initialization of dimm specific data
for specific DSM families.
These changes coupled with proposed kernel changes located at Ref[1] should
provide a way for the user to retrieve NVDIMM health status using ndtcl for
pseries guests. Below is a sample output using proposed kernel + ndctl
changes:
# ndctl list -DH
[
{
"dev":"nmem0",
"flag_smart_event":true,
"health":{
"health_state":"fatal",
"shutdown_state":"dirty"
}
}
]
Structure of the patchset
=========================
We start with a re-factoring patch that splits the 'add_dimm()' function
into two functions one that take care of allocating and initializing
'struct ndctl_dimm' and another that takes care of initializing nfit
specific dimm attributes.
Patch-2 introduces probe function of papr nvdimms and assigning
'papr_dimm_ops' defined in 'papr.c' to 'dimm->ops' if
needed. The patch also code to parse the dimm flags specific to
papr nvdimms
Patches-3,4 implements scaffolding to add support for PAPR PDSM
requests and pull in their definitions from the kernel.
Finally Patch-6 add support for issuing and handling the result of
'struct ndctl_cmd' to request dimm health stats from papr_scm kernel module
and returning appropriate health status to libndctl for reporting.
References
==========
[2] "Power Architecture Platform Reference"
https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
[3] "Hypercall Op-codes (hcalls)"
https://github.com/torvalds/linux/blob/master/Documentation/powerpc/papr_...
[4] "powerpc/papr_scm: Add support for reporting nvdimm health"
https://lore.kernel.org/linux-nvdimm/20200615124407.32596-1-vaibhav@linux...
[5] "ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods"
https://lore.kernel.org/linux-nvdimm/20200615124407.32596-6-vaibhav@linux...
Vaibhav Jain (5):
libndctl: Refactor out add_dimm() to handle NFIT specific init
libncdtl: Add initial support for NVDIMM_FAMILY_PAPR nvdimm family
libndctl,papr_scm: Add definitions for PAPR nvdimm specific methods
papr: Add scaffolding to issue and handle PDSM requests
libndctl,papr_scm: Implement support for PAPR_PDSM_HEALTH
ndctl/lib/Makefile.am | 1 +
ndctl/lib/libndctl.c | 264 +++++++++++++++++++++++++++++------------
ndctl/lib/libndctl.sym | 5 +
ndctl/lib/papr.c | 218 ++++++++++++++++++++++++++++++++++
ndctl/lib/papr.h | 15 +++
ndctl/lib/papr_pdsm.h | 132 +++++++++++++++++++++
ndctl/lib/private.h | 4 +
ndctl/libndctl.h | 2 +
ndctl/ndctl.h | 1 +
9 files changed, 569 insertions(+), 73 deletions(-)
create mode 100644 ndctl/lib/papr.c
create mode 100644 ndctl/lib/papr.h
create mode 100644 ndctl/lib/papr_pdsm.h
--
2.26.2
2 years
[PATCH v13 0/6] powerpc/papr_scm: Add support for reporting nvdimm health
by Vaibhav Jain
Changes since v12 [1]:
* Fixed the clang warning regarding variable length object' being not at
the end of the 'struct nd_pdsm_cmd_pkg' by introducing a new layout.
* Removed instance of 'struct nd_cmd_pkg hdr' from 'struct
nd_pdsm_cmd_pkg' and renamed the struct to 'struct nd_pkg_pdsm' to
match the libndctl naming convention.
* Introduced 'union nd_pdsm_payload' thats a maximal union of all
possible payload structs and use it instead of having flexible
'payload' member 'struct nd_pdsm_cmd_pkg'.
* Introduce pdsm descriptor 'struct pdsm_cmd_desc' and its array
__pdsm_cmd_descriptors[] that holds a payload 'size_[in|out]' and
service function for each pdsm. This is analogues to
'__nd_cmd_dimm_descs[]'
* Introduce function 'pdsm_cmd_desc()' to fetch the corresponding pdsm
descriptor for each valid pdsm.
* Updated papr_scm_service_pdsm() to use 'pdsm_cmd_desc()' and apply
checks on 'nd_cmd_pkg' payload based on psdm descriptor members and
finally service the pdsm using the 'service' member of the
descriptor.
* Updated Patch-5 that to use the updated 'struct nd_pkg_pdsm'
definition.
[1] https://lore.kernel.org/linux-nvdimm/20200608211026.67573-1-vaibhav@linux...
---
The PAPR standard[2][4] provides mechanisms to query the health and
performance stats of an NVDIMM via various hcalls as described in
Ref[3]. Until now these stats were never available nor exposed to the
user-space tools like 'ndctl'. This is partly due to PAPR platform not
having support for ACPI and NFIT. Hence 'ndctl' is unable to query and
report the dimm health status and a user had no way to determine the
current health status of a NDVIMM.
To overcome this limitation, this patch-set updates papr_scm kernel
module to query and fetch NVDIMM health stats using hcalls described
in Ref[3]. This health and performance stats are then exposed to
userspace via sysfs and PAPR-NVDIMM-Specific-Methods(PDSM) issued by
libndctl.
These changes coupled with proposed ndtcl changes located at Ref[5]
should provide a way for the user to retrieve NVDIMM health status
using ndtcl.
Below is a sample output using proposed kernel + ndctl for PAPR NVDIMM
in a emulation environment:
# ndctl list -DH
[
{
"dev":"nmem0",
"health":{
"health_state":"fatal",
"shutdown_state":"dirty"
}
}
]
Dimm health report output on a pseries guest lpar with vPMEM or HMS
based NVDIMMs that are in perfectly healthy conditions:
# ndctl list -d nmem0 -H
[
{
"dev":"nmem0",
"health":{
"health_state":"ok",
"shutdown_state":"clean"
}
}
]
PAPR NVDIMM-Specific-Methods(PDSM)
==================================
PDSM requests are issued by vendor specific code in libndctl to
execute certain operations or fetch information from NVDIMMS. PDSMs
requests can be sent to papr_scm module via libndctl(userspace) and
libnvdimm (kernel) using the ND_CMD_CALL ioctl command which can be
handled in the dimm control function papr_scm_ndctl(). Current
patchset proposes a single PDSM to retrieve NVDIMM health, defined in
the newly introduced uapi header named 'papr_pdsm.h'. Support for
more PDSMs will be added in future.
Structure of the patch-set
==========================
The patch-set starts with a doc patch documenting details of hcall
H_SCM_HEALTH. Second patch exports kernel symbol seq_buf_printf()
thats used in subsequent patches to generate sysfs attribute content.
Third patch implements support for fetching NVDIMM health information
from PHYP and partially exposing it to user-space via a NVDIMM sysfs
flag.
Fourth patch updates papr_scm_ndctl() to handle a possible error case
and also improve debug logging.
Fifth patch deals with implementing support for servicing PDSM
commands in papr_scm module.
Finally the last patch implements support for servicing PDSM
'PAPR_PDSM_HEALTH' that returns the NVDIMM health information to
libndctl.
References:
[2] "Power Architecture Platform Reference"
https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
[3] commit 58b278f568f0
("powerpc: Provide initial documentation for PAPR hcalls")
[4] "Linux on Power Architecture Platform Reference"
https://members.openpowerfoundation.org/document/dl/469
[5] https://github.com/vaibhav92/ndctl/tree/papr_scm_health_v13
---
Vaibhav Jain (6):
powerpc: Document details on H_SCM_HEALTH hcall
seq_buf: Export seq_buf_printf
powerpc/papr_scm: Fetch nvdimm health information from PHYP
powerpc/papr_scm: Improve error logging and handling papr_scm_ndctl()
ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods
powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH
Documentation/ABI/testing/sysfs-bus-papr-pmem | 27 ++
Documentation/powerpc/papr_hcalls.rst | 46 +-
arch/powerpc/include/uapi/asm/papr_pdsm.h | 132 ++++++
arch/powerpc/platforms/pseries/papr_scm.c | 420 +++++++++++++++++-
include/uapi/linux/ndctl.h | 1 +
lib/seq_buf.c | 1 +
6 files changed, 616 insertions(+), 11 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-pmem
create mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h
--
2.26.2
2 years