[ndctl PATCH v3 0/6] Add support for reporting papr-scm nvdimm health
by Vaibhav Jain
This patch-set proposes changes to libndctl to add support for reporting
health for nvdimms that support the PAPR standard[1]. The standard defines
machenism (HCALL) through which a guest kernel can query and fetch health
and performance stats of an nvdimm attached to the hypervisor[2]. Until
now 'ndctl' was unable to report these stats for papr_scm dimms on PPC64
guests due to absence of ACPI/NFIT, a limitation which this patch-set tries
to address.
The patch-set introduces support for the new PAPR-SCM PDSM family
defined at [3] via a new dimm-ops named
'papr_scm_dimm_ops'. Infrastructure to probe and distinguish papr-scm
dimms from other dimm families that may support ACPI/NFIT is
implemented by updating the 'struct ndctl_dimm' initialization
routines to bifurcate based on the nvdimm type. We also introduce two
new dimm-ops member for handling initialization of dimm specific data
for specific DSM families.
These changes coupled with proposed kernel changes located at Ref[4] should
provide a way for the user to retrieve NVDIMM health status using ndtcl for
papr_scm guests. Below is a sample output using proposed kernel + ndctl
changes:
# ndctl list -DH
[
{
"dev":"nmem0",
"flag_smart_event":true,
"health":{
"health_state":"fatal",
"shutdown_state":"dirty"
}
}
]
Structure of the patchset
=========================
We start with a re-factoring patch that splits the 'add_dimm()' function
into two functions one that take care of allocating and initializing
'struct ndctl_dimm' and another that takes care of initializing nfit
specific dimm attributes.
Patch-2 introduces probe function of papr_scm nvdimms and assigning
'papr_scm_dimm_ops' defined in 'papr_scm.c' to 'dimm->ops' if
needed. The patch also code to parse the dimm flags specific to
papr-scm nvdimms
Patch-3 introduces new dimm ops 'dimm_init()' & 'dimm_uninit()' to handle
DSM family specific initialization of 'struct ndctl_dimm'.
Patches-4,5 implements scaffolding to add support for PAPR_SCM PDSM
requests and pull in their definitions from the kernel.
Finally Patch-6 add support for issuing and handling the result of
'struct ndctl_cmd' to request dimm health stats from papr_scm kernel module
and returning appropriate health status to libndctl for reporting.
Changelog
=========
v2..v3:
* Rename of function add_of_pmem_dimm() to add_papr_dimm() [ Aneesh ]
* Added instance of 'struct nd_pdsm_cmd_pkg' to 'struct ndctl_cmd'.
* Update papr_scm_pdsm.h to proposed version in Ref[3].
v1..v2:
* Reordered and squashed couple of patches to reduce the patchset size.
* Addressed few offline review comments from Santosh Sivaraj.
* Updated the uapi header file for papr_scm_pdsm.h.
* Updated the struct names based on the new uapi header.
References
==========
[1] "Power Architecture Platform Reference"
https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
[2] "Hypercall Op-codes (hcalls)"
https://github.com/torvalds/linux/blob/master/Documentation/powerpc/papr_...
[3] "[PATCH v7 3/4] ndctl/papr_scm,uapi: Add support for PAPR nvdimm
specific methods"
https://lore.kernel.org/linux-nvdimm/20200508104922.72565-5-vaibhav@linux...
[4] "powerpc/papr_scm: Add support for reporting nvdimm health"
https://lore.kernel.org/linux-nvdimm/20200508104922.72565-1-vaibhav@linux...
Vaibhav Jain (6):
libndctl: Refactor out add_dimm() to handle NFIT specific init
libncdtl: Add initial support for NVDIMM_FAMILY_PAPR_SCM dimm family
libndctl: Introduce new dimm-ops dimm_init() & dimm_uninit()
libndctl,papr_scm: Add definitions for PAPR nvdimm specific methods
libndctl,papr_scm: Add scaffolding to issue and handle PDSM requests
libndctl,papr_scm: Implement support for PAPR_SCM_PDSM_HEALTH
ndctl/lib/Makefile.am | 1 +
ndctl/lib/libndctl.c | 260 +++++++++++++++++++++++---------
ndctl/lib/papr_scm.c | 302 ++++++++++++++++++++++++++++++++++++++
ndctl/lib/papr_scm_pdsm.h | 173 ++++++++++++++++++++++
ndctl/lib/private.h | 9 ++
ndctl/libndctl.h | 1 +
ndctl/ndctl.h | 1 +
7 files changed, 677 insertions(+), 70 deletions(-)
create mode 100644 ndctl/lib/papr_scm.c
create mode 100644 ndctl/lib/papr_scm_pdsm.h
--
2.26.2
2 years, 1 month
[PATCH] ACPI: Drop rcu usage for MMIO mappings
by Dan Williams
Recently a performance problem was reported for a process invoking a
non-trival ASL program. The method call in this case ends up
repetitively triggering a call path like:
acpi_ex_store
acpi_ex_store_object_to_node
acpi_ex_write_data_to_field
acpi_ex_insert_into_field
acpi_ex_write_with_update_rule
acpi_ex_field_datum_io
acpi_ex_access_region
acpi_ev_address_space_dispatch
acpi_ex_system_memory_space_handler
acpi_os_map_cleanup.part.14
_synchronize_rcu_expedited.constprop.89
schedule
The end result of frequent synchronize_rcu_expedited() invocation is
tiny sub-millisecond spurts of execution where the scheduler freely
migrates this apparently sleepy task. The overhead of frequent scheduler
invocation multiplies the execution time by a factor of 2-3X.
For example, performance improves from 16 minutes to 7 minutes for a
firmware update procedure across 24 devices.
Perhaps the rcu usage was intended to allow for not taking a sleeping
lock in the acpi_os_{read,write}_memory() path which ostensibly could be
called from an APEI NMI error interrupt? Neither rcu_read_lock() nor
ioremap() are interrupt safe, so add a WARN_ONCE() to validate that rcu
was not serving as a mechanism to avoid direct calls to ioremap(). Even
the original implementation had a spin_lock_irqsave(), but that is not
NMI safe.
APEI itself already has some concept of avoiding ioremap() from
interrupt context (see erst_exec_move_data()), if the new warning
triggers it means that APEI either needs more instrumentation like that
to pre-emptively fail, or more infrastructure to arrange for pre-mapping
the resources it needs in NMI context.
Cc: <stable(a)vger.kernel.org>
Fixes: 620242ae8c3d ("ACPI: Maintain a list of ACPI memory mapped I/O remappings")
Cc: Len Brown <lenb(a)kernel.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: James Morse <james.morse(a)arm.com>
Cc: Erik Kaneda <erik.kaneda(a)intel.com>
Cc: Myron Stowe <myron.stowe(a)redhat.com>
Cc: "Rafael J. Wysocki" <rjw(a)rjwysocki.net>
Cc: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
drivers/acpi/osl.c | 117 +++++++++++++++++++++++++---------------------------
1 file changed, 57 insertions(+), 60 deletions(-)
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 762c5d50b8fe..207528c71e9c 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -214,13 +214,13 @@ acpi_physical_address __init acpi_os_get_root_pointer(void)
return pa;
}
-/* Must be called with 'acpi_ioremap_lock' or RCU read lock held. */
static struct acpi_ioremap *
acpi_map_lookup(acpi_physical_address phys, acpi_size size)
{
struct acpi_ioremap *map;
- list_for_each_entry_rcu(map, &acpi_ioremaps, list, acpi_ioremap_lock_held())
+ lockdep_assert_held(&acpi_ioremap_lock);
+ list_for_each_entry(map, &acpi_ioremaps, list)
if (map->phys <= phys &&
phys + size <= map->phys + map->size)
return map;
@@ -228,7 +228,6 @@ acpi_map_lookup(acpi_physical_address phys, acpi_size size)
return NULL;
}
-/* Must be called with 'acpi_ioremap_lock' or RCU read lock held. */
static void __iomem *
acpi_map_vaddr_lookup(acpi_physical_address phys, unsigned int size)
{
@@ -263,7 +262,8 @@ acpi_map_lookup_virt(void __iomem *virt, acpi_size size)
{
struct acpi_ioremap *map;
- list_for_each_entry_rcu(map, &acpi_ioremaps, list, acpi_ioremap_lock_held())
+ lockdep_assert_held(&acpi_ioremap_lock);
+ list_for_each_entry(map, &acpi_ioremaps, list)
if (map->virt <= virt &&
virt + size <= map->virt + map->size)
return map;
@@ -360,7 +360,7 @@ void __iomem __ref
map->size = pg_sz;
map->refcount = 1;
- list_add_tail_rcu(&map->list, &acpi_ioremaps);
+ list_add_tail(&map->list, &acpi_ioremaps);
out:
mutex_unlock(&acpi_ioremap_lock);
@@ -374,20 +374,13 @@ void *__ref acpi_os_map_memory(acpi_physical_address phys, acpi_size size)
}
EXPORT_SYMBOL_GPL(acpi_os_map_memory);
-/* Must be called with mutex_lock(&acpi_ioremap_lock) */
-static unsigned long acpi_os_drop_map_ref(struct acpi_ioremap *map)
-{
- unsigned long refcount = --map->refcount;
-
- if (!refcount)
- list_del_rcu(&map->list);
- return refcount;
-}
-
-static void acpi_os_map_cleanup(struct acpi_ioremap *map)
+static void acpi_os_drop_map_ref(struct acpi_ioremap *map)
{
- synchronize_rcu_expedited();
+ lockdep_assert_held(&acpi_ioremap_lock);
+ if (--map->refcount > 0)
+ return;
acpi_unmap(map->phys, map->virt);
+ list_del(&map->list);
kfree(map);
}
@@ -408,7 +401,6 @@ static void acpi_os_map_cleanup(struct acpi_ioremap *map)
void __ref acpi_os_unmap_iomem(void __iomem *virt, acpi_size size)
{
struct acpi_ioremap *map;
- unsigned long refcount;
if (!acpi_permanent_mmap) {
__acpi_unmap_table(virt, size);
@@ -422,11 +414,8 @@ void __ref acpi_os_unmap_iomem(void __iomem *virt, acpi_size size)
WARN(true, PREFIX "%s: bad address %p\n", __func__, virt);
return;
}
- refcount = acpi_os_drop_map_ref(map);
+ acpi_os_drop_map_ref(map);
mutex_unlock(&acpi_ioremap_lock);
-
- if (!refcount)
- acpi_os_map_cleanup(map);
}
EXPORT_SYMBOL_GPL(acpi_os_unmap_iomem);
@@ -461,7 +450,6 @@ void acpi_os_unmap_generic_address(struct acpi_generic_address *gas)
{
u64 addr;
struct acpi_ioremap *map;
- unsigned long refcount;
if (gas->space_id != ACPI_ADR_SPACE_SYSTEM_MEMORY)
return;
@@ -477,11 +465,8 @@ void acpi_os_unmap_generic_address(struct acpi_generic_address *gas)
mutex_unlock(&acpi_ioremap_lock);
return;
}
- refcount = acpi_os_drop_map_ref(map);
+ acpi_os_drop_map_ref(map);
mutex_unlock(&acpi_ioremap_lock);
-
- if (!refcount)
- acpi_os_map_cleanup(map);
}
EXPORT_SYMBOL(acpi_os_unmap_generic_address);
@@ -700,55 +685,71 @@ int acpi_os_read_iomem(void __iomem *virt_addr, u64 *value, u32 width)
return 0;
}
+static void __iomem *acpi_os_rw_map(acpi_physical_address phys_addr,
+ unsigned int size, bool *did_fallback)
+{
+ void __iomem *virt_addr = NULL;
+
+ if (WARN_ONCE(in_interrupt(), "ioremap in interrupt context\n"))
+ return NULL;
+
+ /* Try to use a cached mapping and fallback otherwise */
+ *did_fallback = false;
+ mutex_lock(&acpi_ioremap_lock);
+ virt_addr = acpi_map_vaddr_lookup(phys_addr, size);
+ if (virt_addr)
+ return virt_addr;
+ mutex_unlock(&acpi_ioremap_lock);
+
+ virt_addr = acpi_os_ioremap(phys_addr, size);
+ *did_fallback = true;
+
+ return virt_addr;
+}
+
+static void acpi_os_rw_unmap(void __iomem *virt_addr, bool did_fallback)
+{
+ if (did_fallback) {
+ /* in the fallback case no lock is held */
+ iounmap(virt_addr);
+ return;
+ }
+
+ mutex_unlock(&acpi_ioremap_lock);
+}
+
acpi_status
acpi_os_read_memory(acpi_physical_address phys_addr, u64 *value, u32 width)
{
- void __iomem *virt_addr;
unsigned int size = width / 8;
- bool unmap = false;
+ bool did_fallback = false;
+ void __iomem *virt_addr;
u64 dummy;
int error;
- rcu_read_lock();
- virt_addr = acpi_map_vaddr_lookup(phys_addr, size);
- if (!virt_addr) {
- rcu_read_unlock();
- virt_addr = acpi_os_ioremap(phys_addr, size);
- if (!virt_addr)
- return AE_BAD_ADDRESS;
- unmap = true;
- }
-
+ virt_addr = acpi_os_rw_map(phys_addr, size, &did_fallback);
+ if (!virt_addr)
+ return AE_BAD_ADDRESS;
if (!value)
value = &dummy;
error = acpi_os_read_iomem(virt_addr, value, width);
BUG_ON(error);
- if (unmap)
- iounmap(virt_addr);
- else
- rcu_read_unlock();
-
+ acpi_os_rw_unmap(virt_addr, did_fallback);
return AE_OK;
}
acpi_status
acpi_os_write_memory(acpi_physical_address phys_addr, u64 value, u32 width)
{
- void __iomem *virt_addr;
unsigned int size = width / 8;
- bool unmap = false;
+ bool did_fallback = false;
+ void __iomem *virt_addr;
- rcu_read_lock();
- virt_addr = acpi_map_vaddr_lookup(phys_addr, size);
- if (!virt_addr) {
- rcu_read_unlock();
- virt_addr = acpi_os_ioremap(phys_addr, size);
- if (!virt_addr)
- return AE_BAD_ADDRESS;
- unmap = true;
- }
+ virt_addr = acpi_os_rw_map(phys_addr, size, &did_fallback);
+ if (!virt_addr)
+ return AE_BAD_ADDRESS;
switch (width) {
case 8:
@@ -767,11 +768,7 @@ acpi_os_write_memory(acpi_physical_address phys_addr, u64 value, u32 width)
BUG();
}
- if (unmap)
- iounmap(virt_addr);
- else
- rcu_read_unlock();
-
+ acpi_os_rw_unmap(virt_addr, did_fallback);
return AE_OK;
}
2 years, 1 month
[ndctl RFC-PATCH 0/4] Add support for reporting PAPR NVDIMM Statistics
by Vaibhav Jain
This patch-set proposes addition of new functionality to ndctl and
libndctl enabling them to report vendor specific NVDIMM Statistics
(dimm-stats) like "Cache Read/Write Hit Count" which arent S.M.A.R.T
attributes but still indicate important NVDIMM performance metrics.
Until now these statistics were exposed via vendor specific tools
like ipmictl[1] in case of Intel Optane-DC memory. These patch-set
however tries to implement a generic abstraction within libndctl to
report such statistics from the ndctl tool.
The patch-set proposes to add a new command line arg '--stats' / '-S'
that reports vendor specific dimm-stats for a NVDIMM. Below is an
example invocation and output of the proposed changes:
# ndctl list -D --stats
[
{
"dev":"nmem0",
"stats":{
"Controller Reset Count":2,
"Controller Reset Elapsed Time":603331,
"Power-on Seconds":603931,
"Life Remaining":"100%",
"Critical Resource Utilization":"0%",
"Host Load Count":5781028,
"Host Store Count":8966800,
"Host Load Duration":975895365,
"Host Store Duration":716230690,
"Media Read Count":0,
"Media Write Count":6313,
"Media Read Duration":0,
"Media Write Duration":9679615,
"Cache Read Hit Count":5781028,
"Cache Write Hit Count":8442479,
"Fast Write Count":8969912
}
}
]
As a proof of concept the patch-set implements reporting dimm-stats for
PAPR compatible NVDIMMs. The output above is from a PPC64 PSeries
Guest LPAR with an NVDIMM, running on a PowerVM Hyper-visor.
The patch-set is dependent on existing patch-set "[ndctl PATCH v3 0/6]
Add support for reporting papr-scm nvdimm health" available at
Ref[2] and [3]. That patch-set implemented the base infrastructure
needed to support PAPR complaint NVDIMMs in ndctl and libndctl.
For PAPR compliant NVDIMMs we also depend on kernel side changes
published at [4] that add support for necessary pdsms to expose
dimm-stats to libndctl via CMD_CALL ioctl interface.
Structure of the patch-set
==========================
First patch in the series starts with implementing necessary
infrastructure in libndctl to introduce two new dimm_ops namely
'new_stats' and 'get_stat' that can be implemented by dimm-providers
to provide libndctl access to vendor specific dimm-stats. The patch
also implements necessary changes in ndctl ndctl/util/json-smart.c to
call these new dimm-ops and generate json-c objects to generate an
output as mentioned above.
Next three patches deal with implementing support for these new
dimm-ops for papr_scm in libndctl. Patch-2 implements dimm-op
'new_stats' that returns a 'struct ndctl_cmd' that can be submitted to
libnvdimm for fetching dimm-stats and copy them to papr_scm managed
buffer.
Patch-2 implements parsing and clean-up of the fetched dimm-stats from
kernel.
Finally Patch-3 implements dimm_ops 'get_stat' that provides ndctl
access to dimm-stat which then constructs a json object aggregating
them and then generating a text output from it.
References
==========
[1] https://docs.pmem.io/ipmctl-user-guide/instrumentation/show-device-perfor...
[2] https://github.com/vaibhav92/ndctl/tree/papr_scm_health_v7
[3] https://lore.kernel.org/linux-nvdimm/20200420075556.272174-1-vaibhav@linu...
[4] https://lore.kernel.org/linux-nvdimm/20200518110814.145644-1-vaibhav@linu...
Vaibhav Jain (4):
ndctl,libndctl: Implement new dimm-ops 'new_stats' and 'get_stat'
papr_scm: Add support for fetching dimm-stats
papr_scm: Implement parsing and clean-up for fetched dimm stats
papr_scm: Implement dimm op 'get_stat'
Documentation/ndctl/ndctl-list.txt | 24 +++
ndctl/lib/libndctl.sym | 5 +
ndctl/lib/papr_scm.c | 300 ++++++++++++++++++++++++++++-
ndctl/lib/papr_scm_pdsm.h | 48 +++++
ndctl/lib/private.h | 6 +
ndctl/lib/smart.c | 26 +++
ndctl/libndctl.h | 23 +++
ndctl/list.c | 9 +
ndctl/util/json-smart.c | 73 +++++++
util/json.h | 1 +
10 files changed, 514 insertions(+), 1 deletion(-)
--
2.26.2
2 years, 1 month
[RFC] tools/testing/nvdimm: Add generic test module for non-nfit platforms
by Santosh Sivaraj
We cannot use ndctl test facility (make check) in platforms that don't
support ACPI. In order to get the ndctl tests working, we need a module
which can emulate NVDIMM devices without relying on ACPI/NFIT, which the
current nfit_test module does.
In this proposed module, which is mostly a copy of nfit_test.c but without
the ACPI dependencies, two regions are implemented without interleaving or
error injection. Using `make check` with this module passes only three tests
because of the limited implementation.
Please comment if anything can be done differently or better.
Signed-off-by: Santosh Sivaraj <santosh(a)fossix.org>
---
tools/testing/nvdimm/config_check.c | 3 +-
tools/testing/nvdimm/test/Kbuild | 4 +
tools/testing/nvdimm/test/ndtest.c | 603 ++++++++++++++++++++++++++++
3 files changed, 609 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/nvdimm/test/ndtest.c
diff --git a/tools/testing/nvdimm/config_check.c b/tools/testing/nvdimm/config_check.c
index cac891028cd1..3e3a5f518864 100644
--- a/tools/testing/nvdimm/config_check.c
+++ b/tools/testing/nvdimm/config_check.c
@@ -12,7 +12,8 @@ void check(void)
BUILD_BUG_ON(!IS_MODULE(CONFIG_ND_BTT));
BUILD_BUG_ON(!IS_MODULE(CONFIG_ND_PFN));
BUILD_BUG_ON(!IS_MODULE(CONFIG_ND_BLK));
- BUILD_BUG_ON(!IS_MODULE(CONFIG_ACPI_NFIT));
+ if (IS_ENABLED(CONFIG_ACPI_NFIT))
+ BUILD_BUG_ON(!IS_MODULE(CONFIG_ACPI_NFIT));
BUILD_BUG_ON(!IS_MODULE(CONFIG_DEV_DAX));
BUILD_BUG_ON(!IS_MODULE(CONFIG_DEV_DAX_PMEM));
}
diff --git a/tools/testing/nvdimm/test/Kbuild b/tools/testing/nvdimm/test/Kbuild
index 75baebf8f4ba..2607da1be2cc 100644
--- a/tools/testing/nvdimm/test/Kbuild
+++ b/tools/testing/nvdimm/test/Kbuild
@@ -5,5 +5,9 @@ ccflags-y += -I$(srctree)/drivers/acpi/nfit/
obj-m += nfit_test.o
obj-m += nfit_test_iomap.o
+ifeq ($(CONFIG_ACPI),y)
nfit_test-y := nfit.o
+else
+nfit_test-y := ndtest.o
+endif
nfit_test_iomap-y := iomap.o
diff --git a/tools/testing/nvdimm/test/ndtest.c b/tools/testing/nvdimm/test/ndtest.c
new file mode 100644
index 000000000000..19a077988cc3
--- /dev/null
+++ b/tools/testing/nvdimm/test/ndtest.c
@@ -0,0 +1,603 @@
+#include <linux/platform_device.h>
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/genalloc.h>
+#include <linux/vmalloc.h>
+#include <linux/dma-mapping.h>
+#include <linux/list_sort.h>
+#include <linux/libnvdimm.h>
+#include <linux/ndctl.h>
+#include <nd-core.h>
+
+#include "../watermark.h"
+#include "nfit_test.h"
+
+struct ndtest_priv {
+ struct platform_device pdev;
+ struct device_node *dn;
+ struct list_head resources;
+ unsigned long config_size;
+ bool is_volatile;
+ void *label_area;
+
+ void (*setup)(struct ndtest_priv *t);
+
+ struct nvdimm_bus_descriptor bus_desc;
+ struct nvdimm_bus *bus;
+ struct nvdimm *nvdimm;
+ struct resource res;
+ struct nd_region *region;
+ struct nd_interleave_set nd_set;
+ struct device *dimm_dev;
+};
+
+enum {
+ DIMM_SIZE = SZ_32M,
+ LABEL_SIZE = SZ_128K,
+ NUM_INSTANCES = 2,
+};
+
+static unsigned long dimm_fail_cmd_flags[6];
+static DEFINE_SPINLOCK(nfit_test_lock);
+struct ndtest_priv *instances[NUM_INSTANCES];
+static struct class *ndtest_dimm_class;
+static struct gen_pool *ndtest_pool;
+
+#define NFIT_DIMM_HANDLE(node, socket, imc, chan, dimm) \
+ (((node & 0xfff) << 16) | ((socket & 0xf) << 12) \
+ | ((imc & 0xf) << 8) | ((chan & 0xf) << 4) | (dimm & 0xf))
+
+static u32 handle[] = {
+ [0] = NFIT_DIMM_HANDLE(0, 0, 0, 0, 0),
+ [1] = NFIT_DIMM_HANDLE(0, 0, 0, 0, 1),
+};
+
+int ndtest_config_get(struct ndtest_priv *p, unsigned int buf_len,
+ struct nd_cmd_get_config_data_hdr *hdr)
+{
+ unsigned int len;
+
+ if ((hdr->in_offset + hdr->in_length) > LABEL_SIZE)
+ return -EINVAL;
+
+ hdr->status = 0;
+ len = min(hdr->in_length, LABEL_SIZE - hdr->in_offset);
+ memcpy(hdr->out_buf, p->label_area + hdr->in_offset, len);
+
+ return buf_len - len;
+}
+
+int ndtest_config_set(struct ndtest_priv *p, unsigned int buf_len,
+ struct nd_cmd_set_config_hdr *hdr)
+{
+ unsigned int len;
+ if ((hdr->in_offset + hdr->in_length) > LABEL_SIZE)
+ return -EINVAL;
+
+ len = min(hdr->in_length, LABEL_SIZE - hdr->in_offset);
+ memcpy(p->label_area + hdr->in_offset, hdr->in_buf, len);
+
+ return buf_len - len;
+}
+
+static int ndtest_ctl(struct nvdimm_bus_descriptor *nd_desc,
+ struct nvdimm *nvdimm, unsigned int cmd, void *buf,
+ unsigned int buf_len, int *cmd_rc)
+{
+ struct nd_cmd_get_config_size *size;
+ struct ndtest_priv *p;
+
+ if (!nvdimm)
+ return -EINVAL;
+
+ p = nvdimm_provider_data(nvdimm);
+ switch(cmd) {
+ case ND_CMD_GET_CONFIG_SIZE:
+ size = (struct nd_cmd_get_config_size *) buf;
+ size->status = 0;
+ size->max_xfer = 8;
+ size->config_size = p->config_size;
+ *cmd_rc = 0;
+ break;
+
+ case ND_CMD_GET_CONFIG_DATA:
+ *cmd_rc = ndtest_config_get(p, buf_len, buf);
+ break;
+
+ case ND_CMD_SET_CONFIG_DATA:
+ *cmd_rc = ndtest_config_set(p, buf_len, buf);
+ break;
+ default:
+ dev_dbg(&p->pdev.dev, "invalid command %u\n", cmd);
+ return -EINVAL;
+ }
+
+ *cmd_rc = 0;
+ return 0;
+}
+
+static int dimm_name_to_id(struct device *dev)
+{
+ int dimm;
+
+ if (sscanf(dev_name(dev), "test_dimm%d", &dimm) != 1)
+ return -ENXIO;
+ return dimm;
+}
+
+static ssize_t handle_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ int dimm = dimm_name_to_id(dev);
+
+ if (dimm < 0)
+ return dimm;
+
+ return sprintf(buf, "%#x\n", handle[dimm]);
+}
+DEVICE_ATTR_RO(handle);
+
+static ssize_t fail_cmd_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ int dimm = dimm_name_to_id(dev);
+
+ if (dimm < 0)
+ return dimm;
+
+ return sprintf(buf, "%#lx\n", dimm_fail_cmd_flags[dimm]);
+}
+
+static ssize_t fail_cmd_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t size)
+{
+ int dimm = dimm_name_to_id(dev);
+ unsigned long val;
+ ssize_t rc;
+
+ if (dimm < 0)
+ return dimm;
+
+ rc = kstrtol(buf, 0, &val);
+ if (rc)
+ return rc;
+
+ dimm_fail_cmd_flags[dimm] = val;
+ return size;
+}
+static DEVICE_ATTR_RW(fail_cmd);
+
+static struct attribute *ndtest_test_dimm_attributes[] = {
+ &dev_attr_handle.attr,
+ &dev_attr_fail_cmd.attr,
+ NULL,
+};
+
+static struct attribute_group ndtest_test_dimm_attribute_group = {
+ .attrs = ndtest_test_dimm_attributes,
+};
+
+static const struct attribute_group *ndtest_test_dimm_attribute_groups[] = {
+ &ndtest_test_dimm_attribute_group,
+ NULL,
+};
+
+static void put_dimms(void *data)
+{
+ struct ndtest_priv *p = data;
+
+ if (p->dimm_dev)
+ device_unregister(p->dimm_dev);
+}
+
+int ndtest_dimm_init(struct ndtest_priv *p)
+{
+ if (devm_add_action_or_reset(&p->pdev.dev, put_dimms, p))
+ return -ENOMEM;
+
+ p->dimm_dev = device_create_with_groups(ndtest_dimm_class, &p->pdev.dev,
+ 0, NULL,
+ ndtest_test_dimm_attribute_groups,
+ "test_dimm%d", p->pdev.id);
+
+ if (!p->dimm_dev)
+ return -ENOMEM;
+
+ return 0;
+}
+
+#define NDTEST_SCM_DIMM_CMD_MASK \
+ ((1ul << ND_CMD_GET_CONFIG_SIZE) | \
+ (1ul << ND_CMD_GET_CONFIG_DATA) | \
+ (1ul << ND_CMD_SET_CONFIG_DATA))
+
+static ssize_t vendor_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+
+ return sprintf(buf, "0x1234567\n");
+}
+static DEVICE_ATTR_RO(vendor);
+
+static struct ndtest_priv *to_ndtest_priv(struct device *dev)
+{
+ struct platform_device *pdev = to_platform_device(dev);
+
+ return container_of(pdev, struct ndtest_priv, pdev);
+}
+
+static ssize_t id_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct ndtest_priv *p = to_ndtest_priv(dev);
+
+ return sprintf(buf, "ndtest%d\n", p->pdev.id);
+}
+static DEVICE_ATTR_RO(id);
+
+static struct attribute *ndtest_dimm_attributes[] = {
+ &dev_attr_vendor.attr,
+ &dev_attr_id.attr,
+ NULL,
+};
+
+static umode_t ndtest_dimm_attr_visible(struct kobject *kobj,
+ struct attribute *a, int n)
+{
+ return a->mode;
+}
+
+static const struct attribute_group ndtest_dimm_attribute_group = {
+ .name = "ndtest",
+ .attrs = ndtest_dimm_attributes,
+ .is_visible = ndtest_dimm_attr_visible,
+};
+
+static const struct attribute_group *ndtest_dimm_attribute_groups[] = {
+ &ndtest_dimm_attribute_group,
+ NULL,
+};
+
+static int ndtest_nvdimm_init(struct ndtest_priv *priv)
+{
+ struct device *dev = &priv->pdev.dev;
+ struct nd_mapping_desc mapping;
+ struct nd_region_desc ndr_desc;
+ unsigned long dimm_flags = 0;
+
+ priv->bus_desc.ndctl = ndtest_ctl;
+ priv->bus_desc.module = THIS_MODULE;
+ priv->bus_desc.provider_name = NULL;
+
+ priv->bus = nvdimm_bus_register(&priv->pdev.dev, &priv->bus_desc);
+ if (!priv->bus) {
+ dev_err(dev, "Error creating nvdimm bus %pOF\n", priv->dn);
+ return -ENXIO;
+ }
+
+ if (priv->pdev.id) {
+ set_bit(NDD_ALIASING, &dimm_flags);
+ }
+ set_bit(NDD_LABELING, &dimm_flags);
+
+ priv->nvdimm = nvdimm_create(priv->bus, priv,
+ ndtest_dimm_attribute_groups, dimm_flags,
+ NDTEST_SCM_DIMM_CMD_MASK, 0, NULL);
+ if (!priv->nvdimm) {
+ dev_err(dev, "Error creating DIMM object for %pOF\n", priv->dn);
+ goto err;
+ }
+
+ if (nvdimm_bus_check_dimm_count(priv->bus, 1))
+ goto err;
+
+ ndtest_dimm_init(priv);
+
+ /* now add the region */
+ memset(&mapping, 0, sizeof(mapping));
+ mapping.nvdimm = priv->nvdimm;
+ mapping.start = priv->res.start;
+ mapping.size = DIMM_SIZE;
+
+ memset(&ndr_desc, 0, sizeof(ndr_desc));
+ ndr_desc.res = &priv->res;
+ ndr_desc.provider_data = priv;
+ ndr_desc.mapping = &mapping;
+ ndr_desc.num_mappings = 1;
+ ndr_desc.nd_set = &priv->nd_set;
+
+ priv->region = nvdimm_pmem_region_create(priv->bus, &ndr_desc);
+ if (!priv->region) {
+ dev_err(dev, "Error registering region %pR\n", ndr_desc.res);
+ goto err;
+ }
+
+ return 0;
+
+err:
+ nvdimm_bus_unregister(priv->bus);
+ kfree(priv->bus_desc.provider_name);
+ return -ENXIO;
+}
+
+static void ndtest_release(struct device *dev)
+{
+ struct ndtest_priv *p = to_ndtest_priv(dev);
+
+ kfree(p);
+}
+
+static struct nfit_test_resource *ndtest_resource_lookup(resource_size_t addr)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(instances); i++) {
+ struct nfit_test_resource *n, *nfit_res = NULL;
+ struct ndtest_priv *t = instances[i];
+
+ if (!t)
+ continue;
+ spin_lock(&nfit_test_lock);
+ list_for_each_entry(n, &t->resources, list) {
+ if (addr >= n->res.start && (addr < n->res.start
+ + resource_size(&n->res))) {
+ nfit_res = n;
+ break;
+ } else if (addr >= (unsigned long) n->buf
+ && (addr < (unsigned long) n->buf
+ + resource_size(&n->res))) {
+ nfit_res = n;
+ break;
+ }
+ }
+ spin_unlock(&nfit_test_lock);
+ if (nfit_res)
+ return nfit_res;
+ }
+
+ return NULL;
+}
+
+static void ndtest_release_resource(void * data)
+{
+ struct nfit_test_resource *res = data;
+
+ spin_lock(&nfit_test_lock);
+ list_del(&res->list);
+ spin_unlock(&nfit_test_lock);
+
+ if (resource_size(&res->res) >= DIMM_SIZE)
+ gen_pool_free(ndtest_pool, res->res.start,
+ resource_size(&res->res));
+ vfree(res->buf);
+ kfree(res);
+}
+
+struct nfit_test_resource *ndtest_get_resource(struct ndtest_priv *p, size_t size)
+{
+ struct nfit_test_resource *res;
+ struct genpool_data_align data = {
+ .align = SZ_128M,
+ };
+ unsigned long buf;
+
+ if (!size)
+ return NULL;
+
+ res = kzalloc(sizeof(*res), GFP_KERNEL);
+ if (!res)
+ return NULL;
+
+ buf = gen_pool_alloc_algo(ndtest_pool, DIMM_SIZE,
+ gen_pool_first_fit_align, &data);
+ if (!buf) {
+ kfree(res);
+ return NULL;
+ }
+
+ INIT_LIST_HEAD(&res->list);
+ res->dev = &p->pdev.dev;
+
+ res->buf = vmalloc(size);
+ if (!res->buf)
+ goto buf_err;
+
+ res->res.start = buf;
+ res->res.end = buf + size - 1;
+ spin_lock_init(&res->lock);
+ INIT_LIST_HEAD(&res->requests);
+ spin_lock(&nfit_test_lock);
+ list_add(&res->list, &p->resources);
+ spin_unlock(&nfit_test_lock);
+
+ if(!devm_add_action(&p->pdev.dev, ndtest_release_resource, res))
+ return res;
+
+ kfree(res->buf);
+buf_err:
+ gen_pool_free(ndtest_pool, buf, size);
+ kfree(res);
+
+ return NULL;
+}
+
+#define UUID_NDTEST_BUS "2f10e7a4-9e91-11e4-89d3-123b93f75cba"
+static int ndtest_probe(struct platform_device *pdev)
+{
+ struct nfit_test_resource *res;
+ struct ndtest_priv *p;
+ u64 uuid[2];
+ int rc;
+
+
+ p = to_ndtest_priv(&pdev->dev);
+
+ /* We just need to ensure that set cookies are unique across */
+ uuid_parse(UUID_NDTEST_BUS, (uuid_t *) uuid);
+ /*
+ * cookie1 and cookie2 are not really little endian
+ * we store a little endian representation of the
+ * uuid str so that we can compare this with the label
+ * area cookie irrespective of the endian config with which
+ * the kernel is built.
+ */
+ p->nd_set.cookie1 = cpu_to_le64(uuid[0]);
+ p->nd_set.cookie2 = cpu_to_le64(uuid[1]);
+
+ /* setup the resource */
+ res = ndtest_get_resource(p, DIMM_SIZE);
+ if (!res) {
+ rc = -ENOMEM;
+ goto err;
+ }
+
+ p->res.start = (resource_size_t) res->res.start;
+ p->res.end = res->res.end;
+ p->label_area = vmalloc(LABEL_SIZE);
+ sprintf(p->label_area, "label%d", p->pdev.id);
+ p->config_size = LABEL_SIZE;
+ p->res.name = pdev->name;
+ p->res.flags = IORESOURCE_MEM;
+
+ rc = ndtest_nvdimm_init(p);
+ if (rc)
+ goto err;
+
+ platform_set_drvdata(pdev, p);
+
+ return 0;
+
+err:
+ put_device(&pdev->dev);
+ kfree(p);
+ return rc;
+}
+
+static int ndtest_remove(struct platform_device *pdev)
+{
+ struct ndtest_priv *p = platform_get_drvdata(pdev);
+
+ nvdimm_bus_unregister(p->bus);
+ kfree(p);
+
+ return 0;
+}
+
+static const struct platform_device_id ndtest_id[] = {
+ { KBUILD_MODNAME },
+ { },
+};
+
+static struct platform_driver ndtest_driver = {
+ .probe = ndtest_probe,
+ .remove = ndtest_remove,
+ .driver = {
+ .name = KBUILD_MODNAME,
+ },
+ .id_table = ndtest_id,
+};
+
+static __init int ndtest_init(void)
+{
+ int rc, i;
+
+ pmem_test();
+ libnvdimm_test();
+ device_dax_test();
+ dax_pmem_test();
+ dax_pmem_core_test();
+#ifdef CONFIG_DEV_DAX_PMEM_COMPAT
+ dax_pmem_compat_test();
+#endif
+
+ nfit_test_setup(ndtest_resource_lookup, NULL);
+
+ ndtest_dimm_class = class_create(THIS_MODULE, "nfit_test_dimm");
+ if (IS_ERR(ndtest_dimm_class)) {
+ rc = PTR_ERR(ndtest_dimm_class);
+ goto err_register;
+ }
+
+ ndtest_pool = gen_pool_create(ilog2(SZ_4M), NUMA_NO_NODE);
+ if (!ndtest_pool) {
+ rc = -ENOMEM;
+ goto err_register;
+ }
+
+ if (gen_pool_add(ndtest_pool, SZ_4G, SZ_4G, NUMA_NO_NODE)) {
+ rc = -ENOMEM;
+ goto err_register;
+ }
+
+ /* Each instance can be taken as a bus, which can have multiple dimms */
+ for (i = 0; i < NUM_INSTANCES; ++i) {
+ struct ndtest_priv *priv;
+ struct platform_device *pdev;
+
+ instances[i] = kzalloc(sizeof(*priv), GFP_KERNEL);
+ if (!instances[i]) {
+ rc = -ENOMEM;
+ goto err_register;
+ }
+
+ priv = instances[i];
+ INIT_LIST_HEAD(&priv->resources);
+ pdev = &priv->pdev;
+ pdev->name = KBUILD_MODNAME;
+ pdev->id = i;
+ pdev->dev.release = ndtest_release;
+ rc = platform_device_register(pdev);
+ if (rc) {
+ put_device(&pdev->dev);
+ goto err_register;
+ }
+ get_device(&pdev->dev);
+ }
+
+ rc = platform_driver_register(&ndtest_driver);
+ if (rc)
+ goto err_register;
+
+ return 0;
+
+ err_register:
+ if (ndtest_pool)
+ gen_pool_destroy(ndtest_pool);
+
+ for (i = 0; i < NUM_INSTANCES; ++i) {
+ if (instances[i]) {
+ put_device(&instances[i]->pdev.dev);
+ platform_device_unregister(&instances[i]->pdev);
+ kfree(instances[i]);
+ }
+ }
+
+ *instances = NULL;
+
+ return rc;
+}
+
+static __exit void ndtest_exit(void)
+{
+ int i;
+
+ if (!*instances)
+ return;
+
+ for (i = 0; i < NUM_INSTANCES; ++i) {
+ if (instances[i]) {
+ put_device(&instances[i]->pdev.dev);
+ platform_device_unregister(&instances[i]->pdev);
+ kfree(instances[i]);
+ }
+ }
+
+ platform_driver_unregister(&ndtest_driver);
+ gen_pool_destroy(ndtest_pool);
+ class_destroy(ndtest_dimm_class);
+}
+
+module_init(ndtest_init);
+module_exit(ndtest_exit);
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("IBM Corporation");
--
2.25.4
2 years, 1 month
How to fake a dax device for debugging purposes?
by Theodore Y. Ts'o
(Please keep me on the cc line since I'm not okn the linux-nvdimm list.)
Hi,
I used to fake up a dax-capable device for debugging ext4 by using
instructions similar to the ones that can be found here:
https://docs.pmem.io/persistent-memory/getting-started-guide/creating-dev...
The problem is that with more recent kernels, this is no longer
working for me.
Here are the relevant dmesg lines (from running "gce-xfstests -c dax
launch"):
[ 0.000000] Linux version 5.7.0-rc4-xfstests-00002-g8867a85a3164-dirty (tytso@lambda) (gcc version 9.3.0 (Debian 9.3.0-11), GNU ld (GNU Binutils for Debian) 2.34) #1692 SMP Sun May 10 21:21:14 EDT 2020
[ 0.000000] Command line: root=/dev/sda1 ro console=ttyS0,38400n8 elevator=noop net.ifnames=0 biosdevname=0 console=ttyS0 memmap=4G!9G memmap=9G!14G cmd=maint mem=26624M fstestcfg= fstestset= fstestexc= fstestopt= fstesttyp=ext4 fstestapi=1.5 fsteststr= nfssrv=
...
[ 0.000000] user: [mem 0x0000000240000000-0x000000033fffffff] persistent (type 12)
[ 0.000000] user: [mem 0x0000000340000000-0x000000037fffffff] usable
[ 0.000000] user: [mem 0x0000000380000000-0x00000005bfffffff] persistent (type 12)
[ 0.000000] user: [mem 0x00000005c0000000-0x000000067fffffff] usable
....
[ 3.180904] nd_pmem namespace0.0: unable to guarantee persistence of writes
[ 3.181750] nd_pmem namespace1.0: unable to guarantee persistence of writes
[ 3.188025] pmem0: detected capacity change from 0 to 4294967296
[ 3.189896] pmem1: detected capacity change from 0 to 9663676416
But when I try to mount a file system with: "mount -o dax -t ext4 /dev/pmem0 /mnt" I get:
[ 168.136331] EXT4-fs (pmem0): DAX unsupported by block device.
Looking at drivers/dax/super.c, and changing a bunch of pr_debug to
pr_err, I found the following had triggered.
[ 168.130603] pmem0: error: request queue doesn't support dax
So looks like drivers/nvdimm/pmem.c is failing to set QUEUE_FLAG_DAX
flag on its queue, and so in turn that's because pmem->pfn_flags
doesn't have PFN_MAP set. And.... at that point, I'm lost.
How do I make a /dev/pmem0 via the memmap= boot command line options
be mountable as a dax mount file system?
- Ted
2 years, 1 month
remove a few uses of ->queuedata
by Christoph Hellwig
Hi all,
various bio based drivers use queue->queuedata despite already having
set up disk->private_data, which can be used just as easily. This
series cleans them up to only use a single private data pointer.
blk-mq based drivers that have code pathes that can't easily get at
the gendisk are unaffected by this series.
2 years, 1 month
[PATCH v2 0/2] Replace and improve "mcsafe" with copy_safe()
by Dan Williams
Changes since v1:
- Rename memcpy_mcsafe() to copy_safe() since the x86-machine-check
specifics have already been de-emphasized in a previous commit and are
further de-emphasized by these changes. (Linus)
- Move copy_safe() out-of-line since it no longer reverts to plain
memcpy (Linus)
- Move copy_safe() to its own stand-alone compilation unit where it no
longer entangles with arch/x86/lib/memcpy_64.S. This also allows perf
to stop tracking ongoing updates to that file due to copy_safe()
updates. (Linus)
- Move the PowerPC implementation over to the new name.
[1]: http://lore.kernel.org/r/158654083112.1572482.8944305411228188871.stgit@d...
---
The primary motivation to go touch memcpy_mcsafe() is that the existing
benefit of doing slow and careful copies is obviated on newer CPUs. That
fact solves the problem of needing to detect machine-check recovery
capability. Now the old "mcsafe_key" opt-in to careful copying can be made
an opt-out from the default fast copy implementation.
The discussion with Linus further made clear that this facility had
already lost its x86-machine-check specificity starting with commit
2c89130a56a ("x86/asm/memcpy_mcsafe: Add write-protection-fault
handling"). The new changes to not require a "careful copy" further
de-emphasizes the role that x86-MCA plays in the implementation to just
one more source of recoverable trap during the operation.
With the above realizations the name "mcsafe" is no longer accurate and
copy_safe() is proposed as its replacement. x86 grows a copy_safe_fast()
implementation as a default implementation that is independent of
detecting the presence of x86-MCA.
---
Dan Williams (2):
copy_safe: Rename memcpy_mcsafe() to copy_safe()
x86/copy_safe: Introduce copy_safe_fast()
arch/powerpc/Kconfig | 2
arch/powerpc/include/asm/string.h | 2
arch/powerpc/include/asm/uaccess.h | 4
arch/powerpc/lib/Makefile | 2
arch/powerpc/lib/copy_safe.S | 4
arch/x86/Kconfig | 2
arch/x86/Kconfig.debug | 2
arch/x86/include/asm/copy_safe.h | 18 ++
arch/x86/include/asm/copy_safe_test.h | 75 +++++++++
arch/x86/include/asm/mcsafe_test.h | 75 ---------
arch/x86/include/asm/string_64.h | 32 ----
arch/x86/include/asm/uaccess_64.h | 21 ---
arch/x86/kernel/cpu/mce/core.c | 9 -
arch/x86/kernel/quirks.c | 10 -
arch/x86/lib/Makefile | 1
arch/x86/lib/copy_safe.c | 66 ++++++++
arch/x86/lib/copy_safe_64.S | 163 ++++++++++++++++++++
arch/x86/lib/memcpy_64.S | 115 --------------
arch/x86/lib/usercopy_64.c | 21 ---
drivers/md/dm-writecache.c | 12 +
drivers/nvdimm/claim.c | 2
drivers/nvdimm/pmem.c | 6 -
include/linux/string.h | 17 +-
include/linux/uio.h | 10 +
lib/Kconfig | 2
lib/iov_iter.c | 36 ++--
tools/arch/x86/include/asm/copy_safe_test.h | 13 ++
tools/arch/x86/include/asm/mcsafe_test.h | 13 --
tools/arch/x86/lib/memcpy_64.S | 115 --------------
tools/objtool/check.c | 5 -
tools/perf/bench/Build | 1
tools/perf/bench/mem-memcpy-x86-64-lib.c | 24 ---
tools/testing/nvdimm/test/nfit.c | 49 +++---
.../testing/selftests/powerpc/copyloops/.gitignore | 2
tools/testing/selftests/powerpc/copyloops/Makefile | 6 -
.../selftests/powerpc/copyloops/copy_safe.S | 0
36 files changed, 429 insertions(+), 508 deletions(-)
rename arch/powerpc/lib/{memcpy_mcsafe_64.S => copy_safe.S} (98%)
create mode 100644 arch/x86/include/asm/copy_safe.h
create mode 100644 arch/x86/include/asm/copy_safe_test.h
delete mode 100644 arch/x86/include/asm/mcsafe_test.h
create mode 100644 arch/x86/lib/copy_safe.c
create mode 100644 arch/x86/lib/copy_safe_64.S
create mode 100644 tools/arch/x86/include/asm/copy_safe_test.h
delete mode 100644 tools/arch/x86/include/asm/mcsafe_test.h
delete mode 100644 tools/perf/bench/mem-memcpy-x86-64-lib.c
rename tools/testing/selftests/powerpc/copyloops/{memcpy_mcsafe_64.S => copy_safe.S} (100%)
base-commit: b8dcd632c06b8706d22934f9bf9bf16a42b1ecc7
2 years, 1 month
[PATCH 1/1] ndctl/namespace: Fix disable-namespace accounting relative to seed devices
by Redhairer Li
Seed namespaces are included in "ndctl disable-namespace all". However
since the user never "creates" them it is surprising to see
"disable-namespace" report 1 more namespace relative to the number that
have been created. Catch attempts to disable a zero-sized namespace:
Before:
{
"dev":"namespace1.0",
"size":"492.00 MiB (515.90 MB)",
"blockdev":"pmem1"
}
{
"dev":"namespace1.1",
"size":"492.00 MiB (515.90 MB)",
"blockdev":"pmem1.1"
}
{
"dev":"namespace1.2",
"size":"492.00 MiB (515.90 MB)",
"blockdev":"pmem1.2"
}
disabled 4 namespaces
After:
{
"dev":"namespace1.0",
"size":"492.00 MiB (515.90 MB)",
"blockdev":"pmem1"
}
{
"dev":"namespace1.3",
"size":"492.00 MiB (515.90 MB)",
"blockdev":"pmem1.3"
}
{
"dev":"namespace1.1",
"size":"492.00 MiB (515.90 MB)",
"blockdev":"pmem1.1"
}
disabled 3 namespaces
Signed-off-by: Redhairer Li <redhairer.li(a)intel.com>
---
ndctl/namespace.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/ndctl/namespace.c b/ndctl/namespace.c
index 0550580..1bfe736 100644
--- a/ndctl/namespace.c
+++ b/ndctl/namespace.c
@@ -2034,6 +2034,7 @@ static int do_xaction_namespace(const char *namespace,
struct ndctl_region *region;
const char *ndns_name;
struct ndctl_bus *bus;
+ unsigned long long size;
*processed = 0;
@@ -2134,7 +2135,8 @@ static int do_xaction_namespace(const char *namespace,
switch (action) {
case ACTION_DISABLE:
rc = ndctl_namespace_disable_safe(ndns);
- if (rc == 0)
+ size = ndctl_namespace_get_size(ndns);
+ if (rc == 0 && size != 0)
(*processed)++;
break;
case ACTION_ENABLE:
--
2.20.1.windows.1
2 years, 1 month
[PATCH v7 0/5] powerpc/papr_scm: Add support for reporting nvdimm health
by Vaibhav Jain
The PAPR standard[1][3] provides mechanisms to query the health and
performance stats of an NVDIMM via various hcalls as described in
Ref[2]. Until now these stats were never available nor exposed to the
user-space tools like 'ndctl'. This is partly due to PAPR platform not
having support for ACPI and NFIT. Hence 'ndctl' is unable to query and
report the dimm health status and a user had no way to determine the
current health status of a NDVIMM.
To overcome this limitation, this patch-set updates papr_scm kernel
module to query and fetch NVDIMM health stats using hcalls described
in Ref[2]. This health and performance stats are then exposed to
userspace via sysfs and PAPR-NVDIMM-Specific-Methods(PDSM) issued by
libndctl.
These changes coupled with proposed ndtcl changes located at Ref[4]
should provide a way for the user to retrieve NVDIMM health status
using ndtcl.
Below is a sample output using proposed kernel + ndctl for PAPR NVDIMM
in a emulation environment:
# ndctl list -DH
[
{
"dev":"nmem0",
"health":{
"health_state":"fatal",
"shutdown_state":"dirty"
}
}
]
Dimm health report output on a pseries guest lpar with vPMEM or HMS
based NVDIMMs that are in perfectly healthy conditions:
# ndctl list -d nmem0 -H
[
{
"dev":"nmem0",
"health":{
"health_state":"ok",
"shutdown_state":"clean"
}
}
]
PAPR NVDIMM-Specific-Methods(PDSM)
==================================
PDSM requests are issued by vendor specific code in libndctl to
execute certain operations or fetch information from NVDIMMS. PDSMs
requests can be sent to papr_scm module via libndctl(userspace) and
libnvdimm (kernel) using the ND_CMD_CALL ioctl command which can be
handled in the dimm control function papr_scm_ndctl(). Current
patchset proposes a single PDSM to retrieve NVDIMM health, defined in
the newly introduced uapi header named 'papr_scm_pdsm.h'. Support for
more PDSMs will be added in future.
Structure of the patch-set
==========================
The patch-set starts with a doc patch documenting details of hcall
H_SCM_HEALTH. Second patch exports kernel symbol seq_buf_printf()
thats used in subsequent patches to generate sysfs attribute content.
Third patch implements support for fetching NVDIMM health information
from PHYP and partially exposing it to user-space via a NVDIMM sysfs
flag.
Fourth patches deal with implementing support for servicing PDSM
commands in papr_scm module.
Finally the last patch implements support for servicing PDSM
'PAPR_SCM_PDSM_HEALTH' that returns the NVDIMM health information to
libndctl.
Changelog:
==========
v6..v7:
* Incorporate various review comments from Mpe. Removed papr_scm.h
* Added a patch to export seq_buf_printf() [Mpe, Steven Rostedt]
* header file and moved its contents to papr_scm.c.
* Split function drc_pmem_query_health() into two functions, one that takes
care of caching and concurrency and other one that doesn't.
* Fixed a possible incorrect way to make local copy of nvdimm health data.
* Some variable renames changed as suggested in previous review.
* Removed unused macros/defines from papr_scm_pdsm.h
* Updated papr_scm_pdsm.h to remove usage of __KERNEL__ define.
* Updated papr_scm_pdsm.h to remove redefinition of __packed macro.
v5..v6:
* Incorporate review comments from Mpe and Dan Williams.
* Changed the usage of term DSM to PDSM as former conflicted with
usage in ACPI context.
* UAPI updates to remove usage of bool and marking the structs
defined as 'packed'.
* Simplified the health-bitmap handling in papr_scm to use u64
instead of __be64 integers.
* Caching of the health information so reading the dimm-flag file
doesn't result in costly hcalls everytime.
* Changed dimm-flag 'save_fail' to 'flush_fail'
* Moved the dimm flag file from 'papr_flags' to 'papr/flags'.
* Added a patch to document H_SCM_HEALTH hcall return values.
* Added sysfs ABI documentation for newly introduce dimm-flag
sysfs file 'papr/flags'
v4..v5:
* Fixed a bug in new implementation of papr_scm_ndctl() that was triggering
a false error condition.
v3..v4:
* Restructured papr_scm_ndctl() to dispatch ND_CMD_CALL commands to a new
function named papr_scm_service_dsm() to serivice DSM requests. [Aneesh]
v2..v3:
* Updated the papr_scm_dsm.h header to be more confimant general kernel
guidelines for UAPI headers. [Aneesh]
* Changed the definition of macro PAPR_SCM_DIMM_UNARMED_MASK to not
include case when the NVDIMM is unarmed because its a vPMEM
NVDIMM. [Aneesh]
v1..v2:
* Restructured the patch-set based on review comments on V1 patch-set to
simplify the patch review. Multiple small patches have been combined into
single patches to reduce cross referencing that was needed in earlier
patch-set. Hence most of the patches in this patch-set as now new. [Aneesh]
* Removed the initial work done for fetch NVDIMM performance statistics.
These changes will be re-proposed in a separate patch-set. [Aneesh]
* Simplified handling of versioning of 'struct
nd_papr_scm_dimm_health_stat_v1' as only one version of the structure is
currently in existence.
References:
[1] "Power Architecture Platform Reference"
https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
[2] commit 58b278f568f0
("powerpc: Provide initial documentation for PAPR hcalls")
[3] "Linux on Power Architecture Platform Reference"
https://members.openpowerfoundation.org/document/dl/469
[4] https://github.com/vaibhav92/ndctl/tree/papr_scm_health_v7
Vaibhav Jain (5):
powerpc: Document details on H_SCM_HEALTH hcall
seq_buf: Export seq_buf_printf() to external modules
powerpc/papr_scm: Fetch nvdimm health information from PHYP
ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods
powerpc/papr_scm: Implement support for PAPR_SCM_PDSM_HEALTH
Documentation/ABI/testing/sysfs-bus-papr-scm | 27 ++
Documentation/powerpc/papr_hcalls.rst | 43 ++-
arch/powerpc/include/uapi/asm/papr_scm_pdsm.h | 173 +++++++++
arch/powerpc/platforms/pseries/papr_scm.c | 363 +++++++++++++++++-
include/uapi/linux/ndctl.h | 1 +
lib/seq_buf.c | 1 +
6 files changed, 595 insertions(+), 13 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-scm
create mode 100644 arch/powerpc/include/uapi/asm/papr_scm_pdsm.h
--
2.26.2
2 years, 1 month
[PATCH v4 0/4] mm/memory_hotplug: Interface to add driver-managed system ram
by David Hildenbrand
I did some more testing to v3 and found issues with unloading the kmem
module, followed by reconfiguring the namespace.
kexec (via kexec_load()) can currently not properly handle memory added via
dax/kmem, and will have similar issues with virtio-mem. kexec-tools will
currently add all memory to the fixed-up initial firmware memmap. In case
of dax/kmem, this means that - in contrast to a proper reboot - how that
persistent memory will be used can no longer be configured by the kexec'd
kernel. In case of virtio-mem it will be harmful, because that memory
might contain inaccessible pieces that require coordination with hypervisor
first.
In both cases, we want to let the driver in the kexec'd kernel handle
detecting and adding the memory, like during an ordinary reboot.
Introduce add_memory_driver_managed(). More on the samentics are in patch
#1.
In the future, we might want to make this behavior configurable for
dax/kmem- either by configuring it in the kernel (which would then also
allow to configure kexec_file_load()) or in kexec-tools by also adding
"System RAM (kmem)" memory from /proc/iomem to the fixed-up initial
firmware memmap.
More on the motivation can be found in [1] and [2].
v3 -> v4:
- "device-dax: Don't leak kernel memory to user space after unloading kmem"
-- Added
- "device-dax: Add memory via add_memory_driver_managed()"
-- kstrdup_const() the resource name to be used for added memory
-- Remember if any hotremove failed / we still have memory added to the
system and conditionally kfree_const().
v2 -> v3:
- Don't use flags for add_memory() and friends, provide
add_memory_driver_managed() instead.
- Flag memory resources via IORESOURCE_MEM_DRIVER_MANAGED and handle them
in kexec.
- Name memory resources "System RAM ($DRIVER)", visible via /proc/iomem
- Added more details to the patch descriptions, especially regarding the
history of /sys/firmware/memmap
- Add a comment to the device-dax change. Dropped Dave's Ack as the
v1 -> v2:
- Don't change the resource name
- Rename the flag to MHP_NO_FIRMWARE_MEMMAP to reflect what it is doing
- Rephrase subjects/descriptions
- Use the flag for dax/kmem
[1] https://lkml.kernel.org/r/20200429160803.109056-1-david@redhat.com
[2] https://lkml.kernel.org/r/20200430102908.10107-1-david@redhat.com
David Hildenbrand (4):
device-dax: Don't leak kernel memory to user space after unloading
kmem
mm/memory_hotplug: Introduce add_memory_driver_managed()
kexec_file: Don't place kexec images on IORESOURCE_MEM_DRIVER_MANAGED
device-dax: Add memory via add_memory_driver_managed()
drivers/dax/dax-private.h | 1 +
drivers/dax/kmem.c | 42 ++++++++++++++++++++---
include/linux/ioport.h | 1 +
include/linux/memory_hotplug.h | 2 ++
kernel/kexec_file.c | 5 +++
mm/memory_hotplug.c | 62 +++++++++++++++++++++++++++++++---
6 files changed, 104 insertions(+), 9 deletions(-)
--
2.25.4
2 years, 1 month