[ndctl PATCH] ndctl, ars: don't invalidate the user-provided command
by Vishal Verma
In ndctl_cmd_ars_in_progress, which expects a successfully completed
ars_status command, we used to invalidate the command by setting its
status to '1', so that the user has to provide a fresh ars_status
command the next time this check is performed.
In hindsight, this is needless and violates the principle of least
surprise. We shouldn't be touching the user's data (command), so remove
this invalidation. If the user uses the same ars_status command again,
we will simply report that ARS is still in progress.
Signed-off-by: Vishal Verma <vishal.l.verma(a)intel.com>
---
ndctl/lib/ars.c | 11 +----------
1 file changed, 1 insertion(+), 10 deletions(-)
diff --git a/ndctl/lib/ars.c b/ndctl/lib/ars.c
index b199646..48a9819 100644
--- a/ndctl/lib/ars.c
+++ b/ndctl/lib/ars.c
@@ -224,16 +224,7 @@ NDCTL_EXPORT int ndctl_cmd_ars_in_progress(struct ndctl_cmd *cmd)
if (!validate_ars_stat(ctx, cmd))
return 0;
- if (ndctl_cmd_get_firmware_status(cmd) == 1 << 16) {
- /*
- * If in-progress, invalidate the ndctl_cmd, so
- * that if we're called again without a fresh
- * ars_status command, we fail.
- */
- cmd->status = 1;
- return 1;
- }
- return 0;
+ return (ndctl_cmd_get_firmware_status(cmd) == 1 << 16);
}
NDCTL_EXPORT unsigned int ndctl_cmd_ars_num_records(struct ndctl_cmd *ars_stat)
--
2.17.0
2 years, 9 months
[PATCH v10] mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPS
by Dan Williams
In preparation for fixing dax-dma-vs-unmap issues, filesystems need to
be able to rely on the fact that they will get wakeups on dev_pagemap
page-idle events. Introduce MEMORY_DEVICE_FS_DAX and
generic_dax_page_free() as common indicator / infrastructure for dax
filesytems to require. With this change there are no users of the
MEMORY_DEVICE_HOST designation, so remove it.
The HMM sub-system extended dev_pagemap to arrange a callback when a
dev_pagemap managed page is freed. Since a dev_pagemap page is free /
idle when its reference count is 1 it requires an additional branch to
check the page-type at put_page() time. Given put_page() is a hot-path
we do not want to incur that check if HMM is not in use, so a static
branch is used to avoid that overhead when not necessary.
Now, the FS_DAX implementation wants to reuse this mechanism for
receiving dev_pagemap ->page_free() callbacks. Rework the HMM-specific
static-key into a generic mechanism that either HMM or FS_DAX code paths
can enable.
For ARCH=um builds, and any other arch that lacks ZONE_DEVICE support,
care must be taken to compile out the DEV_PAGEMAP_OPS infrastructure.
However, we still need to support FS_DAX in the FS_DAX_LIMITED case
implemented by the s390/dcssblk driver.
Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
Cc: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Reported-by: kbuild test robot <lkp(a)intel.com>
Reported-by: Thomas Meyer <thomas(a)m3y3r.de>
Reported-by: Dave Jiang <dave.jiang(a)intel.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
Cc: Jan Kara <jack(a)suse.cz>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
This patch replaces and consolidates patch 2 [1] and 4 [2] from the v9
series [3] for "dax: fix dma vs truncate/hole-punch".
The original implementation which introduced fs_dax_claim() was broken
in the presence of partitions as filesystems on partitions of a pmem
device would collide when attempting to issue fs_dax_claim().
Instead, since this new page wakeup behavior is a property of
dev_pagemap pages and there is a 1:1 relationship between a pmem device
and its dev_pagemap instance, make the pmem driver own the page wakeup
initialization rather than the filesystem.
This simplifies the implementation considerably. The diffstat for the
series is now:
21 files changed, 546 insertions(+), 277 deletions(-)
...down from:
24 files changed, 730 insertions(+), 297 deletions(-)
The other patches in the series are not included since they did not
change in any meaningful way. Let me know if anyone wants a full resend,
it will otherwise be available in -next shortly. Given the change in
approach I did not carry Reviewed-by tags from patch 2 and 4 to this
patch.
[1]: [PATCH v9 2/9] mm, dax: enable filesystems to trigger dev_pagemap ->page_free callbacks
https://lists.01.org/pipermail/linux-nvdimm/2018-April/015459.html
[2]: [PATCH v9 4/9] mm, dev_pagemap: introduce CONFIG_DEV_PAGEMAP_OPS
https://lists.01.org/pipermail/linux-nvdimm/2018-April/015461.html
[3]: [PATCH v9 0/9] dax: fix dma vs truncate/hole-punch
https://lists.01.org/pipermail/linux-nvdimm/2018-April/015457.html
drivers/dax/super.c | 18 ++++++++---
drivers/nvdimm/pfn_devs.c | 2 -
drivers/nvdimm/pmem.c | 20 +++++++++++++
fs/Kconfig | 1 +
include/linux/memremap.h | 41 ++++++++++----------------
include/linux/mm.h | 71 ++++++++++++++++++++++++++++++++++-----------
kernel/memremap.c | 37 +++++++++++++++++++++--
mm/Kconfig | 5 +++
mm/hmm.c | 13 +-------
mm/swap.c | 3 +-
10 files changed, 143 insertions(+), 68 deletions(-)
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 2b2332b605e4..4928b7fcfb71 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -134,15 +134,21 @@ int __bdev_dax_supported(struct super_block *sb, int blocksize)
* on being able to do (page_address(pfn_to_page())).
*/
WARN_ON(IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API));
+ return 0;
} else if (pfn_t_devmap(pfn)) {
- /* pass */;
- } else {
- pr_debug("VFS (%s): error: dax support not enabled\n",
- sb->s_id);
- return -EOPNOTSUPP;
+ struct dev_pagemap *pgmap;
+
+ pgmap = get_dev_pagemap(pfn_t_to_pfn(pfn), NULL);
+ if (pgmap && pgmap->type == MEMORY_DEVICE_FS_DAX) {
+ put_dev_pagemap(pgmap);
+ return 0;
+ }
+ put_dev_pagemap(pgmap);
}
- return 0;
+ pr_debug("VFS (%s): error: dax support not enabled\n",
+ sb->s_id);
+ return -EOPNOTSUPP;
}
EXPORT_SYMBOL_GPL(__bdev_dax_supported);
#endif
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 30b08791597d..3f7ad5bc443e 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -561,8 +561,6 @@ static int __nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap)
res->start += start_pad;
res->end -= end_trunc;
- pgmap->type = MEMORY_DEVICE_HOST;
-
if (nd_pfn->mode == PFN_MODE_RAM) {
if (offset < SZ_8K)
return -EINVAL;
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 9d714926ecf5..e6d94604e9a4 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -294,6 +294,22 @@ static void pmem_release_disk(void *__pmem)
put_disk(pmem->disk);
}
+static void pmem_release_pgmap_ops(void *__pgmap)
+{
+ dev_pagemap_put_ops();
+}
+
+static int setup_pagemap_fsdax(struct device *dev, struct dev_pagemap *pgmap)
+{
+ dev_pagemap_get_ops();
+ if (devm_add_action_or_reset(dev, pmem_release_pgmap_ops, pgmap))
+ return -ENOMEM;
+ pgmap->type = MEMORY_DEVICE_FS_DAX;
+ pgmap->page_free = generic_dax_pagefree;
+
+ return 0;
+}
+
static int pmem_attach_disk(struct device *dev,
struct nd_namespace_common *ndns)
{
@@ -353,6 +369,8 @@ static int pmem_attach_disk(struct device *dev,
pmem->pfn_flags = PFN_DEV;
pmem->pgmap.ref = &q->q_usage_counter;
if (is_nd_pfn(dev)) {
+ if (setup_pagemap_fsdax(dev, &pmem->pgmap))
+ return -ENOMEM;
addr = devm_memremap_pages(dev, &pmem->pgmap);
pfn_sb = nd_pfn->pfn_sb;
pmem->data_offset = le64_to_cpu(pfn_sb->dataoff);
@@ -364,6 +382,8 @@ static int pmem_attach_disk(struct device *dev,
} else if (pmem_should_map_pages(dev)) {
memcpy(&pmem->pgmap.res, &nsio->res, sizeof(pmem->pgmap.res));
pmem->pgmap.altmap_valid = false;
+ if (setup_pagemap_fsdax(dev, &pmem->pgmap))
+ return -ENOMEM;
addr = devm_memremap_pages(dev, &pmem->pgmap);
pmem->pfn_flags |= PFN_MAP;
memcpy(&bb_res, &pmem->pgmap.res, sizeof(bb_res));
diff --git a/fs/Kconfig b/fs/Kconfig
index bc821a86d965..1e050e012eb9 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -38,6 +38,7 @@ config FS_DAX
bool "Direct Access (DAX) support"
depends on MMU
depends on !(ARM || MIPS || SPARC)
+ select DEV_PAGEMAP_OPS if (ZONE_DEVICE && !FS_DAX_LIMITED)
select FS_IOMAP
select DAX
help
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 7b4899c06f49..29ea63544c4d 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -1,7 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _LINUX_MEMREMAP_H_
#define _LINUX_MEMREMAP_H_
-#include <linux/mm.h>
#include <linux/ioport.h>
#include <linux/percpu-refcount.h>
@@ -30,13 +29,6 @@ struct vmem_altmap {
* Specialize ZONE_DEVICE memory into multiple types each having differents
* usage.
*
- * MEMORY_DEVICE_HOST:
- * Persistent device memory (pmem): struct page might be allocated in different
- * memory and architecture might want to perform special actions. It is similar
- * to regular memory, in that the CPU can access it transparently. However,
- * it is likely to have different bandwidth and latency than regular memory.
- * See Documentation/nvdimm/nvdimm.txt for more information.
- *
* MEMORY_DEVICE_PRIVATE:
* Device memory that is not directly addressable by the CPU: CPU can neither
* read nor write private memory. In this case, we do still have struct pages
@@ -53,11 +45,19 @@ struct vmem_altmap {
* driver can hotplug the device memory using ZONE_DEVICE and with that memory
* type. Any page of a process can be migrated to such memory. However no one
* should be allow to pin such memory so that it can always be evicted.
+ *
+ * MEMORY_DEVICE_FS_DAX:
+ * Host memory that has similar access semantics as System RAM i.e. DMA
+ * coherent and supports page pinning. In support of coordinating page
+ * pinning vs other operations MEMORY_DEVICE_FS_DAX arranges for a
+ * wakeup event whenever a page is unpinned and becomes idle. This
+ * wakeup is used to coordinate physical address space management (ex:
+ * fs truncate/hole punch) vs pinned pages (ex: device dma).
*/
enum memory_type {
- MEMORY_DEVICE_HOST = 0,
- MEMORY_DEVICE_PRIVATE,
+ MEMORY_DEVICE_PRIVATE = 1,
MEMORY_DEVICE_PUBLIC,
+ MEMORY_DEVICE_FS_DAX,
};
/*
@@ -123,15 +123,18 @@ struct dev_pagemap {
};
#ifdef CONFIG_ZONE_DEVICE
+void generic_dax_pagefree(struct page *page, void *data);
void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap);
struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
struct dev_pagemap *pgmap);
unsigned long vmem_altmap_offset(struct vmem_altmap *altmap);
void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns);
-
-static inline bool is_zone_device_page(const struct page *page);
#else
+static inline void generic_dax_pagefree(struct page *page, void *data)
+{
+}
+
static inline void *devm_memremap_pages(struct device *dev,
struct dev_pagemap *pgmap)
{
@@ -161,20 +164,6 @@ static inline void vmem_altmap_free(struct vmem_altmap *altmap,
}
#endif /* CONFIG_ZONE_DEVICE */
-#if defined(CONFIG_DEVICE_PRIVATE) || defined(CONFIG_DEVICE_PUBLIC)
-static inline bool is_device_private_page(const struct page *page)
-{
- return is_zone_device_page(page) &&
- page->pgmap->type == MEMORY_DEVICE_PRIVATE;
-}
-
-static inline bool is_device_public_page(const struct page *page)
-{
- return is_zone_device_page(page) &&
- page->pgmap->type == MEMORY_DEVICE_PUBLIC;
-}
-#endif /* CONFIG_DEVICE_PRIVATE || CONFIG_DEVICE_PUBLIC */
-
static inline void put_dev_pagemap(struct dev_pagemap *pgmap)
{
if (pgmap)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1ac1f06a4be6..6e19265ee8f8 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -821,27 +821,65 @@ static inline bool is_zone_device_page(const struct page *page)
}
#endif
-#if defined(CONFIG_DEVICE_PRIVATE) || defined(CONFIG_DEVICE_PUBLIC)
-void put_zone_device_private_or_public_page(struct page *page);
-DECLARE_STATIC_KEY_FALSE(device_private_key);
-#define IS_HMM_ENABLED static_branch_unlikely(&device_private_key)
-static inline bool is_device_private_page(const struct page *page);
-static inline bool is_device_public_page(const struct page *page);
-#else /* CONFIG_DEVICE_PRIVATE || CONFIG_DEVICE_PUBLIC */
-static inline void put_zone_device_private_or_public_page(struct page *page)
+#ifdef CONFIG_DEV_PAGEMAP_OPS
+void dev_pagemap_get_ops(void);
+void dev_pagemap_put_ops(void);
+void __put_devmap_managed_page(struct page *page);
+DECLARE_STATIC_KEY_FALSE(devmap_managed_key);
+static inline bool put_devmap_managed_page(struct page *page)
+{
+ if (!static_branch_unlikely(&devmap_managed_key))
+ return false;
+ if (!is_zone_device_page(page))
+ return false;
+ switch (page->pgmap->type) {
+ case MEMORY_DEVICE_PRIVATE:
+ case MEMORY_DEVICE_PUBLIC:
+ case MEMORY_DEVICE_FS_DAX:
+ __put_devmap_managed_page(page);
+ return true;
+ default:
+ break;
+ }
+ return false;
+}
+
+static inline bool is_device_private_page(const struct page *page)
{
+ return is_zone_device_page(page) &&
+ page->pgmap->type == MEMORY_DEVICE_PRIVATE;
}
-#define IS_HMM_ENABLED 0
+
+static inline bool is_device_public_page(const struct page *page)
+{
+ return is_zone_device_page(page) &&
+ page->pgmap->type == MEMORY_DEVICE_PUBLIC;
+}
+
+#else /* CONFIG_DEV_PAGEMAP_OPS */
+static inline void dev_pagemap_get_ops(void)
+{
+}
+
+static inline void dev_pagemap_put_ops(void)
+{
+}
+
+static inline bool put_devmap_managed_page(struct page *page)
+{
+ return false;
+}
+
static inline bool is_device_private_page(const struct page *page)
{
return false;
}
+
static inline bool is_device_public_page(const struct page *page)
{
return false;
}
-#endif /* CONFIG_DEVICE_PRIVATE || CONFIG_DEVICE_PUBLIC */
-
+#endif /* CONFIG_DEV_PAGEMAP_OPS */
static inline void get_page(struct page *page)
{
@@ -859,16 +897,13 @@ static inline void put_page(struct page *page)
page = compound_head(page);
/*
- * For private device pages we need to catch refcount transition from
- * 2 to 1, when refcount reach one it means the private device page is
- * free and we need to inform the device driver through callback. See
+ * For devmap managed pages we need to catch refcount transition from
+ * 2 to 1, when refcount reach one it means the page is free and we
+ * need to inform the device driver through callback. See
* include/linux/memremap.h and HMM for details.
*/
- if (IS_HMM_ENABLED && unlikely(is_device_private_page(page) ||
- unlikely(is_device_public_page(page)))) {
- put_zone_device_private_or_public_page(page);
+ if (put_devmap_managed_page(page))
return;
- }
if (put_page_testzero(page))
__put_page(page);
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 37a9604133f6..3e546c6beb42 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -9,12 +9,19 @@
#include <linux/memory_hotplug.h>
#include <linux/swap.h>
#include <linux/swapops.h>
+#include <linux/wait_bit.h>
static DEFINE_MUTEX(pgmap_lock);
static RADIX_TREE(pgmap_radix, GFP_KERNEL);
#define SECTION_MASK ~((1UL << PA_SECTION_SHIFT) - 1)
#define SECTION_SIZE (1UL << PA_SECTION_SHIFT)
+void generic_dax_pagefree(struct page *page, void *data)
+{
+ wake_up_var(&page->_refcount);
+}
+EXPORT_SYMBOL_GPL(generic_dax_pagefree);
+
static unsigned long order_at(struct resource *res, unsigned long pgoff)
{
unsigned long phys_pgoff = PHYS_PFN(res->start) + pgoff;
@@ -301,8 +308,30 @@ struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
return pgmap;
}
-#if IS_ENABLED(CONFIG_DEVICE_PRIVATE) || IS_ENABLED(CONFIG_DEVICE_PUBLIC)
-void put_zone_device_private_or_public_page(struct page *page)
+#ifdef CONFIG_DEV_PAGEMAP_OPS
+DEFINE_STATIC_KEY_FALSE(devmap_managed_key);
+EXPORT_SYMBOL_GPL(devmap_managed_key);
+static atomic_t devmap_enable;
+
+/*
+ * Toggle the static key for ->page_free() callbacks when dev_pagemap
+ * pages go idle.
+ */
+void dev_pagemap_get_ops(void)
+{
+ if (atomic_inc_return(&devmap_enable) == 1)
+ static_branch_enable(&devmap_managed_key);
+}
+EXPORT_SYMBOL_GPL(dev_pagemap_get_ops);
+
+void dev_pagemap_put_ops(void)
+{
+ if (atomic_dec_and_test(&devmap_enable))
+ static_branch_disable(&devmap_managed_key);
+}
+EXPORT_SYMBOL_GPL(dev_pagemap_put_ops);
+
+void __put_devmap_managed_page(struct page *page)
{
int count = page_ref_dec_return(page);
@@ -322,5 +351,5 @@ void put_zone_device_private_or_public_page(struct page *page)
} else if (!count)
__put_page(page);
}
-EXPORT_SYMBOL(put_zone_device_private_or_public_page);
-#endif /* CONFIG_DEVICE_PRIVATE || CONFIG_DEVICE_PUBLIC */
+EXPORT_SYMBOL_GPL(__put_devmap_managed_page);
+#endif /* CONFIG_DEV_PAGEMAP_OPS */
diff --git a/mm/Kconfig b/mm/Kconfig
index d5004d82a1d6..bf9d6366bced 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -692,6 +692,9 @@ config ARCH_HAS_HMM
config MIGRATE_VMA_HELPER
bool
+config DEV_PAGEMAP_OPS
+ bool
+
config HMM
bool
select MIGRATE_VMA_HELPER
@@ -712,6 +715,7 @@ config DEVICE_PRIVATE
bool "Unaddressable device memory (GPU memory, ...)"
depends on ARCH_HAS_HMM
select HMM
+ select DEV_PAGEMAP_OPS
help
Allows creation of struct pages to represent unaddressable device
@@ -722,6 +726,7 @@ config DEVICE_PUBLIC
bool "Addressable device memory (like GPU memory)"
depends on ARCH_HAS_HMM
select HMM
+ select DEV_PAGEMAP_OPS
help
Allows creation of struct pages to represent addressable device
diff --git a/mm/hmm.c b/mm/hmm.c
index 486dc394a5a3..de7b6bf77201 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -35,15 +35,6 @@
#define PA_SECTION_SIZE (1UL << PA_SECTION_SHIFT)
-#if defined(CONFIG_DEVICE_PRIVATE) || defined(CONFIG_DEVICE_PUBLIC)
-/*
- * Device private memory see HMM (Documentation/vm/hmm.txt) or hmm.h
- */
-DEFINE_STATIC_KEY_FALSE(device_private_key);
-EXPORT_SYMBOL(device_private_key);
-#endif /* CONFIG_DEVICE_PRIVATE || CONFIG_DEVICE_PUBLIC */
-
-
#if IS_ENABLED(CONFIG_HMM_MIRROR)
static const struct mmu_notifier_ops hmm_mmu_notifier_ops;
@@ -1167,7 +1158,7 @@ struct hmm_devmem *hmm_devmem_add(const struct hmm_devmem_ops *ops,
resource_size_t addr;
int ret;
- static_branch_enable(&device_private_key);
+ dev_pagemap_get_ops();
devmem = devres_alloc_node(&hmm_devmem_release, sizeof(*devmem),
GFP_KERNEL, dev_to_node(device));
@@ -1261,7 +1252,7 @@ struct hmm_devmem *hmm_devmem_add_resource(const struct hmm_devmem_ops *ops,
if (res->desc != IORES_DESC_DEVICE_PUBLIC_MEMORY)
return ERR_PTR(-EINVAL);
- static_branch_enable(&device_private_key);
+ dev_pagemap_get_ops();
devmem = devres_alloc_node(&hmm_devmem_release, sizeof(*devmem),
GFP_KERNEL, dev_to_node(device));
diff --git a/mm/swap.c b/mm/swap.c
index 3dd518832096..26fc9b5f1b6c 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -29,6 +29,7 @@
#include <linux/cpu.h>
#include <linux/notifier.h>
#include <linux/backing-dev.h>
+#include <linux/memremap.h>
#include <linux/memcontrol.h>
#include <linux/gfp.h>
#include <linux/uio.h>
@@ -743,7 +744,7 @@ void release_pages(struct page **pages, int nr)
flags);
locked_pgdat = NULL;
}
- put_zone_device_private_or_public_page(page);
+ put_devmap_managed_page(page);
continue;
}
2 years, 9 months
[ndctl PATCH v2] ndctl: refactor validation of the ars_status command
by Vishal Verma
The APIs that iterate over the information contained in an ars_atatus
command require a prior, successfully completed ars_status command
struct. We were neglecting to verify that the firmware status too
indicates a success. We were also incorrectly requiring that
ars_status->status be zero, where as a positive status only indicates an
underrun. The underrun is fine as the firmware is not expected to
completely fill the max_ars_out sized buffer.
Refactor this checking to mimic validate_ars_cap() which checks the
firmware status, and fixes the check for the cmd status. Use this for
ndctl_cmd_ars_in_progress as well which had the same (incorrect)
cmd->status check.
Reported-by: Tomasz Rochumski <tomasz.rochumski(a)intel.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma(a)intel.com>
---
ndctl/lib/ars.c | 65 +++++++++++++++++++++++++++++++------------------
1 file changed, 41 insertions(+), 24 deletions(-)
diff --git a/ndctl/lib/ars.c b/ndctl/lib/ars.c
index 1ff6cf7..b199646 100644
--- a/ndctl/lib/ars.c
+++ b/ndctl/lib/ars.c
@@ -195,24 +195,44 @@ NDCTL_EXPORT unsigned int ndctl_cmd_ars_cap_get_clear_unit(
return 0;
}
+static bool __validate_ars_stat(struct ndctl_cmd *ars_stat)
+{
+ /*
+ * A positive status indicates an underrun, but that is fine since
+ * the firmware is not expected to completely fill the max_ars_out
+ * sized buffer.
+ */
+ if (ars_stat->type != ND_CMD_ARS_STATUS || ars_stat->status < 0)
+ return false;
+ if ((ndctl_cmd_get_firmware_status(ars_stat) & ARS_STATUS_MASK) != 0)
+ return false;
+ return true;
+}
+
+#define validate_ars_stat(ctx, ars_stat) \
+({ \
+ bool __valid = __validate_ars_stat(ars_stat); \
+ if (!__valid) \
+ dbg(ctx, "expected sucessfully completed ars_stat command\n"); \
+ __valid; \
+})
+
NDCTL_EXPORT int ndctl_cmd_ars_in_progress(struct ndctl_cmd *cmd)
{
struct ndctl_ctx *ctx = ndctl_bus_get_ctx(cmd_to_bus(cmd));
- if (cmd->type == ND_CMD_ARS_STATUS && cmd->status == 0) {
- if (cmd->ars_status->status == 1 << 16) {
- /*
- * If in-progress, invalidate the ndctl_cmd, so
- * that if we're called again without a fresh
- * ars_status command, we fail.
- */
- cmd->status = 1;
- return 1;
- }
+ if (!validate_ars_stat(ctx, cmd))
return 0;
- }
- dbg(ctx, "invalid ars_status\n");
+ if (ndctl_cmd_get_firmware_status(cmd) == 1 << 16) {
+ /*
+ * If in-progress, invalidate the ndctl_cmd, so
+ * that if we're called again without a fresh
+ * ars_status command, we fail.
+ */
+ cmd->status = 1;
+ return 1;
+ }
return 0;
}
@@ -220,11 +240,10 @@ NDCTL_EXPORT unsigned int ndctl_cmd_ars_num_records(struct ndctl_cmd *ars_stat)
{
struct ndctl_ctx *ctx = ndctl_bus_get_ctx(cmd_to_bus(ars_stat));
- if (ars_stat->type == ND_CMD_ARS_STATUS && ars_stat->status == 0)
- return ars_stat->ars_status->num_records;
+ if (!validate_ars_stat(ctx, ars_stat))
+ return 0;
- dbg(ctx, "invalid ars_status\n");
- return 0;
+ return ars_stat->ars_status->num_records;
}
NDCTL_EXPORT unsigned long long ndctl_cmd_ars_get_record_addr(
@@ -237,11 +256,10 @@ NDCTL_EXPORT unsigned long long ndctl_cmd_ars_get_record_addr(
return 0;
}
- if (ars_stat->type == ND_CMD_ARS_STATUS && ars_stat->status == 0)
- return ars_stat->ars_status->records[rec_index].err_address;
+ if (!validate_ars_stat(ctx, ars_stat))
+ return 0;
- dbg(ctx, "invalid ars_status\n");
- return 0;
+ return ars_stat->ars_status->records[rec_index].err_address;
}
NDCTL_EXPORT unsigned long long ndctl_cmd_ars_get_record_len(
@@ -254,11 +272,10 @@ NDCTL_EXPORT unsigned long long ndctl_cmd_ars_get_record_len(
return 0;
}
- if (ars_stat->type == ND_CMD_ARS_STATUS && ars_stat->status == 0)
- return ars_stat->ars_status->records[rec_index].length;
+ if (!validate_ars_stat(ctx, ars_stat))
+ return 0;
- dbg(ctx, "invalid ars_status\n");
- return 0;
+ return ars_stat->ars_status->records[rec_index].length;
}
NDCTL_EXPORT struct ndctl_cmd *ndctl_bus_cmd_new_clear_error(
--
2.17.0
2 years, 9 months
[PATCH] libnvdimm: debug probe times
by Dan Williams
Instrument nvdimm_bus_probe() to emit timestamps for the start and end
of libnvdimm device probing. This is useful for identifying sources of
libnvdimm sub-system initialization latency.
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
drivers/nvdimm/bus.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index b9e0d30e317a..27902a8799b1 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -100,6 +100,9 @@ static int nvdimm_bus_probe(struct device *dev)
if (!try_module_get(provider))
return -ENXIO;
+ dev_dbg(&nvdimm_bus->dev, "START: %s.probe(%s)\n",
+ dev->driver->name, dev_name(dev));
+
nvdimm_bus_probe_start(nvdimm_bus);
rc = nd_drv->probe(dev);
if (rc == 0)
@@ -108,7 +111,7 @@ static int nvdimm_bus_probe(struct device *dev)
nd_region_disable(nvdimm_bus, dev);
nvdimm_bus_probe_end(nvdimm_bus);
- dev_dbg(&nvdimm_bus->dev, "%s.probe(%s) = %d\n", dev->driver->name,
+ dev_dbg(&nvdimm_bus->dev, "END: %s.probe(%s) = %d\n", dev->driver->name,
dev_name(dev), rc);
if (rc != 0)
2 years, 9 months
[PATCH] nvdimm: preserve read-only setting for pmem devices
by Robert Elliott
The pmem driver don't honor a forced read-only setting for very long:
$ blockdev --setro /dev/pmem0
$ blockdev --getro /dev/pmem0
1
followed by various commands like these:
$ blockdev --rereadpt /dev/pmem0
or
$ mkfs.ext4 /dev/pmem0
results in this in the kernel serial log:
nd_pmem namespace0.0: region0 read-write, marking pmem0 read-write
with the read-only setting lost:
$ blockdev --getro /dev/pmem0
0
That's from bus.c nvdimm_revalidate_disk(), which always applies the
setting from nd_region (which is initially based on the ACPI NFIT
NVDIMM state flags not_armed bit).
In contrast, commit 20bd1d026aac ("scsi: sd: Keep disk read-only when
re-reading partition") fixed this issue for SCSI devices to preserve
the previous setting if it was set to read-only.
This patch modifies bus.c to preserve any previous read-only setting.
It also eliminates the kernel serial log print except for cases where
read-write is changed to read-only, so it doesn't print read-only to
read-only non-changes.
Signed-off-by: Robert Elliott <elliott(a)hpe.com>
---
drivers/nvdimm/bus.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index a64023690cad..87762afa45b0 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -566,14 +566,17 @@ int nvdimm_revalidate_disk(struct gendisk *disk)
{
struct device *dev = disk_to_dev(disk)->parent;
struct nd_region *nd_region = to_nd_region(dev->parent);
- const char *pol = nd_region->ro ? "only" : "write";
+ int disk_ro = get_disk_ro(disk);
- if (nd_region->ro == get_disk_ro(disk))
+ /* upgrade to read-only if the region is read-only
+ * preserve as read-only if the disk is already read-only
+ */
+ if (disk_ro || nd_region->ro == disk_ro)
return 0;
- dev_info(dev, "%s read-%s, marking %s read-%s\n",
- dev_name(&nd_region->dev), pol, disk->disk_name, pol);
- set_disk_ro(disk, nd_region->ro);
+ dev_info(dev, "%s read-only, marking %s read-only\n",
+ dev_name(&nd_region->dev), disk->disk_name);
+ set_disk_ro(disk, 1);
return 0;
--
2.14.3
2 years, 9 months