block I/O accounting improvements v2
by Christoph Hellwig
Hi Jens,
they series contains various improvement for block I/O accounting. The
first bunch of patches switch the bio based drivers to better accounting
helpers compared to the current mess. The end contains a fix and various
performanc improvements. Most of this comes from a series Konstantin
sent a few weeks ago, rebased on changes that landed in your tree since
and my change to always use the percpu version of the disk stats.
Changes since v1:
- add an ifdef CONFIG_BLOCK to work around the sad state of our headers
- add reviewed-by tags to all patches
2 years, 1 month
Re: [PATCH 01/16] block: add disk/bio-based accounting helpers
by Christoph Hellwig
On Mon, May 25, 2020 at 03:28:07PM +0300, Konstantin Khlebnikov wrote:
> I think it would be better to leave this jiffies legacy nonsense in
> callers and pass here request duration in nanoseconds.
jiffies is what the existing interfaces uses. But now that they come
from the start helper fixing this will become easier. I'll do that
as a follow on, as I'd rather not bloat this series even more.
2 years, 1 month
[ndctl PATCH v4 0/6] Add support for reporting papr-scm nvdimm health
by Vaibhav Jain
Changes since v3 [1]:
* Updated 'papr_scm_psdm.h' to recent kernel version that removes
'payload_offset' field from 'struct nd_pdsm_cmd_pkg'.
* Changes to 'papr_scm.c' to remove code that was populating
'payload_offset'.
[1] https://lore.kernel.org/linux-nvdimm/20200519190147.258142-1-vaibhav@linu...
---
This patch-set proposes changes to libndctl to add support for reporting
health for nvdimms that support the PAPR standard[2]. The standard defines
machenism (HCALL) through which a guest kernel can query and fetch health
and performance stats of an nvdimm attached to the hypervisor[3]. Until
now 'ndctl' was unable to report these stats for papr_scm dimms on PPC64
guests due to absence of ACPI/NFIT, a limitation which this patch-set tries
to address.
The patch-set introduces support for the new PAPR-SCM PDSM family
defined at [4] via a new dimm-ops named
'papr_scm_dimm_ops'. Infrastructure to probe and distinguish papr-scm
dimms from other dimm families that may support ACPI/NFIT is
implemented by updating the 'struct ndctl_dimm' initialization
routines to bifurcate based on the nvdimm type. We also introduce two
new dimm-ops member for handling initialization of dimm specific data
for specific DSM families.
These changes coupled with proposed kernel changes located at Ref[1] should
provide a way for the user to retrieve NVDIMM health status using ndtcl for
papr_scm guests. Below is a sample output using proposed kernel + ndctl
changes:
# ndctl list -DH
[
{
"dev":"nmem0",
"flag_smart_event":true,
"health":{
"health_state":"fatal",
"shutdown_state":"dirty"
}
}
]
Structure of the patchset
=========================
We start with a re-factoring patch that splits the 'add_dimm()' function
into two functions one that take care of allocating and initializing
'struct ndctl_dimm' and another that takes care of initializing nfit
specific dimm attributes.
Patch-2 introduces probe function of papr_scm nvdimms and assigning
'papr_scm_dimm_ops' defined in 'papr_scm.c' to 'dimm->ops' if
needed. The patch also code to parse the dimm flags specific to
papr-scm nvdimms
Patch-3 introduces new dimm ops 'dimm_init()' & 'dimm_uninit()' to handle
DSM family specific initialization of 'struct ndctl_dimm'.
Patches-4,5 implements scaffolding to add support for PAPR_SCM PDSM
requests and pull in their definitions from the kernel.
Finally Patch-6 add support for issuing and handling the result of
'struct ndctl_cmd' to request dimm health stats from papr_scm kernel module
and returning appropriate health status to libndctl for reporting.
References
==========
[2] "Power Architecture Platform Reference"
https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
[3] "Hypercall Op-codes (hcalls)"
https://github.com/torvalds/linux/blob/master/Documentation/powerpc/papr_...
[4] "ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods"
https://lore.kernel.org/linux-nvdimm/20200527041244.37821-5-vaibhav@linux...
---
Vaibhav Jain (6):
libndctl: Refactor out add_dimm() to handle NFIT specific init
libncdtl: Add initial support for NVDIMM_FAMILY_PAPR_SCM dimm family
libndctl: Introduce new dimm-ops dimm_init() & dimm_uninit()
libndctl,papr_scm: Add definitions for PAPR nvdimm specific methods
libndctl,papr_scm: Add scaffolding to issue and handle PDSM requests
libndctl,papr_scm: Implement support for PAPR_SCM_PDSM_HEALTH
ndctl/lib/Makefile.am | 1 +
ndctl/lib/libndctl.c | 260 +++++++++++++++++++++++---------
ndctl/lib/papr_scm.c | 301 ++++++++++++++++++++++++++++++++++++++
ndctl/lib/papr_scm_pdsm.h | 175 ++++++++++++++++++++++
ndctl/lib/private.h | 9 ++
ndctl/libndctl.h | 1 +
ndctl/ndctl.h | 1 +
7 files changed, 678 insertions(+), 70 deletions(-)
create mode 100644 ndctl/lib/papr_scm.c
create mode 100644 ndctl/lib/papr_scm_pdsm.h
--
2.26.2
2 years, 1 month
[RESEND PATCH v7 0/5] powerpc/papr_scm: Add support for reporting nvdimm health
by Vaibhav Jain
The PAPR standard[1][3] provides mechanisms to query the health and
performance stats of an NVDIMM via various hcalls as described in
Ref[2]. Until now these stats were never available nor exposed to the
user-space tools like 'ndctl'. This is partly due to PAPR platform not
having support for ACPI and NFIT. Hence 'ndctl' is unable to query and
report the dimm health status and a user had no way to determine the
current health status of a NDVIMM.
To overcome this limitation, this patch-set updates papr_scm kernel
module to query and fetch NVDIMM health stats using hcalls described
in Ref[2]. This health and performance stats are then exposed to
userspace via sysfs and PAPR-NVDIMM-Specific-Methods(PDSM) issued by
libndctl.
These changes coupled with proposed ndtcl changes located at Ref[4]
should provide a way for the user to retrieve NVDIMM health status
using ndtcl.
Below is a sample output using proposed kernel + ndctl for PAPR NVDIMM
in a emulation environment:
# ndctl list -DH
[
{
"dev":"nmem0",
"health":{
"health_state":"fatal",
"shutdown_state":"dirty"
}
}
]
Dimm health report output on a pseries guest lpar with vPMEM or HMS
based NVDIMMs that are in perfectly healthy conditions:
# ndctl list -d nmem0 -H
[
{
"dev":"nmem0",
"health":{
"health_state":"ok",
"shutdown_state":"clean"
}
}
]
PAPR NVDIMM-Specific-Methods(PDSM)
==================================
PDSM requests are issued by vendor specific code in libndctl to
execute certain operations or fetch information from NVDIMMS. PDSMs
requests can be sent to papr_scm module via libndctl(userspace) and
libnvdimm (kernel) using the ND_CMD_CALL ioctl command which can be
handled in the dimm control function papr_scm_ndctl(). Current
patchset proposes a single PDSM to retrieve NVDIMM health, defined in
the newly introduced uapi header named 'papr_scm_pdsm.h'. Support for
more PDSMs will be added in future.
Structure of the patch-set
==========================
The patch-set starts with a doc patch documenting details of hcall
H_SCM_HEALTH. Second patch exports kernel symbol seq_buf_printf()
thats used in subsequent patches to generate sysfs attribute content.
Third patch implements support for fetching NVDIMM health information
from PHYP and partially exposing it to user-space via a NVDIMM sysfs
flag.
Fourth patches deal with implementing support for servicing PDSM
commands in papr_scm module.
Finally the last patch implements support for servicing PDSM
'PAPR_SCM_PDSM_HEALTH' that returns the NVDIMM health information to
libndctl.
Changelog:
==========
Resend:
* Added ack from Steven Rostedt on Patch-2 that exports kernel symbol
seq_buf_printf()
v6..v7:
* Incorporate various review comments from Mpe. Removed papr_scm.h
* Added a patch to export seq_buf_printf() [Mpe, Steven Rostedt]
* header file and moved its contents to papr_scm.c.
* Split function drc_pmem_query_health() into two functions, one that takes
care of caching and concurrency and other one that doesn't.
* Fixed a possible incorrect way to make local copy of nvdimm health data.
* Some variable renames changed as suggested in previous review.
* Removed unused macros/defines from papr_scm_pdsm.h
* Updated papr_scm_pdsm.h to remove usage of __KERNEL__ define.
* Updated papr_scm_pdsm.h to remove redefinition of __packed macro.
v5..v6:
* Incorporate review comments from Mpe and Dan Williams.
* Changed the usage of term DSM to PDSM as former conflicted with
usage in ACPI context.
* UAPI updates to remove usage of bool and marking the structs
defined as 'packed'.
* Simplified the health-bitmap handling in papr_scm to use u64
instead of __be64 integers.
* Caching of the health information so reading the dimm-flag file
doesn't result in costly hcalls everytime.
* Changed dimm-flag 'save_fail' to 'flush_fail'
* Moved the dimm flag file from 'papr_flags' to 'papr/flags'.
* Added a patch to document H_SCM_HEALTH hcall return values.
* Added sysfs ABI documentation for newly introduce dimm-flag
sysfs file 'papr/flags'
v4..v5:
* Fixed a bug in new implementation of papr_scm_ndctl() that was triggering
a false error condition.
v3..v4:
* Restructured papr_scm_ndctl() to dispatch ND_CMD_CALL commands to a new
function named papr_scm_service_dsm() to serivice DSM requests. [Aneesh]
v2..v3:
* Updated the papr_scm_dsm.h header to be more confimant general kernel
guidelines for UAPI headers. [Aneesh]
* Changed the definition of macro PAPR_SCM_DIMM_UNARMED_MASK to not
include case when the NVDIMM is unarmed because its a vPMEM
NVDIMM. [Aneesh]
v1..v2:
* Restructured the patch-set based on review comments on V1 patch-set to
simplify the patch review. Multiple small patches have been combined into
single patches to reduce cross referencing that was needed in earlier
patch-set. Hence most of the patches in this patch-set as now new. [Aneesh]
* Removed the initial work done for fetch NVDIMM performance statistics.
These changes will be re-proposed in a separate patch-set. [Aneesh]
* Simplified handling of versioning of 'struct
nd_papr_scm_dimm_health_stat_v1' as only one version of the structure is
currently in existence.
References:
[1] "Power Architecture Platform Reference"
https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
[2] commit 58b278f568f0
("powerpc: Provide initial documentation for PAPR hcalls")
[3] "Linux on Power Architecture Platform Reference"
https://members.openpowerfoundation.org/document/dl/469
[4] https://github.com/vaibhav92/ndctl/tree/papr_scm_health_v7
Vaibhav Jain (5):
powerpc: Document details on H_SCM_HEALTH hcall
seq_buf: Export seq_buf_printf() to external modules
powerpc/papr_scm: Fetch nvdimm health information from PHYP
ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods
powerpc/papr_scm: Implement support for PAPR_SCM_PDSM_HEALTH
Documentation/ABI/testing/sysfs-bus-papr-scm | 27 ++
Documentation/powerpc/papr_hcalls.rst | 43 ++-
arch/powerpc/include/uapi/asm/papr_scm_pdsm.h | 173 +++++++++
arch/powerpc/platforms/pseries/papr_scm.c | 363 +++++++++++++++++-
include/uapi/linux/ndctl.h | 1 +
lib/seq_buf.c | 1 +
6 files changed, 595 insertions(+), 13 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-scm
create mode 100644 arch/powerpc/include/uapi/asm/papr_scm_pdsm.h
--
2.26.2
2 years, 1 month
[PATCH] nvdimm: fixes to maintainter-entry-profile
by Randy Dunlap
From: Randy Dunlap <rdunlap(a)infradead.org>
Fix punctuation and wording in a few places.
Signed-off-by: Randy Dunlap <rdunlap(a)infradead.org>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Cc: Dave Jiang <dave.jiang(a)intel.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: linux-nvdimm(a)lists.01.org
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: linux-doc(a)vger.kernel.org
---
Documentation/nvdimm/maintainer-entry-profile.rst | 14 ++++++------
1 file changed, 7 insertions(+), 7 deletions(-)
--- linux-next-20200521.orig/Documentation/nvdimm/maintainer-entry-profile.rst
+++ linux-next-20200521/Documentation/nvdimm/maintainer-entry-profile.rst
@@ -4,15 +4,15 @@ LIBNVDIMM Maintainer Entry Profile
Overview
--------
The libnvdimm subsystem manages persistent memory across multiple
-architectures. The mailing list, is tracked by patchwork here:
+architectures. The mailing list is tracked by patchwork here:
https://patchwork.kernel.org/project/linux-nvdimm/list/
...and that instance is configured to give feedback to submitters on
patch acceptance and upstream merge. Patches are merged to either the
-'libnvdimm-fixes', or 'libnvdimm-for-next' branch. Those branches are
+'libnvdimm-fixes' or 'libnvdimm-for-next' branch. Those branches are
available here:
https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git/
-In general patches can be submitted against the latest -rc, however if
+In general patches can be submitted against the latest -rc; however, if
the incoming code change is dependent on other pending changes then the
patch should be based on the libnvdimm-for-next branch. However, since
persistent memory sits at the intersection of storage and memory there
@@ -35,12 +35,12 @@ getting the test environment set up.
ACPI Device Specific Methods (_DSM)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Before patches enabling for a new _DSM family will be considered it must
+Before patches enabling a new _DSM family will be considered, it must
be assigned a format-interface-code from the NVDIMM Sub-team of the ACPI
Specification Working Group. In general, the stance of the subsystem is
-to push back on the proliferation of NVDIMM command sets, do strongly
+to push back on the proliferation of NVDIMM command sets, so do strongly
consider implementing support for an existing command set. See
-drivers/acpi/nfit/nfit.h for the set of support command sets.
+drivers/acpi/nfit/nfit.h for the set of supported command sets.
Key Cycle Dates
@@ -48,7 +48,7 @@ Key Cycle Dates
New submissions can be sent at any time, but if they intend to hit the
next merge window they should be sent before -rc4, and ideally
stabilized in the libnvdimm-for-next branch by -rc6. Of course if a
-patch set requires more than 2 weeks of review -rc4 is already too late
+patch set requires more than 2 weeks of review, -rc4 is already too late
and some patches may require multiple development cycles to review.
2 years, 1 month
block I/O accounting improvements
by Christoph Hellwig
Hi Jens,
they series contains various improvement for block I/O accounting. The
first bunch of patches switch the bio based drivers to better accounting
helpers compared to the current mess. The end contains a fix and various
performanc improvements. Most of this comes from a series Konstantin
sent a few weeks ago, rebased on changes that landed in your tree since
and my change to always use the percpu version of the disk stats.
2 years, 1 month
[ndctl PATCH 1/2] ndctl/test: Fix region selection in align.sh
by Vishal Verma
Fix the region filtering/selection for the align.sh test. The current jq
filter falls over if two regions match the criteria because array
elements got flattened to the top level.
Cc: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma(a)intel.com>
---
test/align.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/test/align.sh b/test/align.sh
index 0129255..81d1fbc 100755
--- a/test/align.sh
+++ b/test/align.sh
@@ -34,7 +34,7 @@ is_aligned() {
set -e
trap 'err $LINENO cleanup' ERR
-region=$($NDCTL list -R -b ACPI.NFIT | jq -r '.[] | [select(.available_size == .size)] | .[0].dev')
+region=$($NDCTL list -R -b ACPI.NFIT | jq -r '[.[] | select(.available_size == .size)][0] | .dev')
if [ "x$region" = "xnull" ]; then
unset $region
--
2.26.2
2 years, 1 month
[PATCH] nvdimm/region: always show the 'align' attribute
by Vishal Verma
It is possible that a platform that is capable of 'namespace labels'
comes up without the labels properly initialized. In this case, the
region's 'align' attribute is hidden. Howerver, once the user does
initialize he labels, the 'align' attribute still stays hidden, which is
unexpected.
The sysfs_update_group() API is meant to address this, and could be
called during region probe, but it has entanglements with the device
'lockdep_mutex'. Therefore, simply make the 'align' attribute always
visible. It doesn't matter what it says for label-less namespaces, since
it is not possible to change their allocation anyway.
Cc: Dan Williams <dan.j.williams(a)intel.com>
Suggested-by: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma(a)intel.com>
---
drivers/nvdimm/region_devs.c | 14 ++------------
1 file changed, 2 insertions(+), 12 deletions(-)
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index ccbb5b43b8b2..4502f9c4708d 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -679,18 +679,8 @@ static umode_t region_visible(struct kobject *kobj, struct attribute *a, int n)
return a->mode;
}
- if (a == &dev_attr_align.attr) {
- int i;
-
- for (i = 0; i < nd_region->ndr_mappings; i++) {
- struct nd_mapping *nd_mapping = &nd_region->mapping[i];
- struct nvdimm *nvdimm = nd_mapping->nvdimm;
-
- if (test_bit(NDD_LABELING, &nvdimm->flags))
- return a->mode;
- }
- return 0;
- }
+ if (a == &dev_attr_align.attr)
+ return a->mode;
if (a != &dev_attr_set_cookie.attr
&& a != &dev_attr_available_size.attr)
--
2.26.2
2 years, 1 month
[PATCH v3 0/2] Renovate memcpy_mcsafe with copy_mc_to_{user, kernel}
by Dan Williams
Changes since v2 [1]:
- Replace the non-descriptive copy_safe() with copy_mc_to_user() and
copy_mc_to_kernel() several code organization cleanups resulted from
this rename, which further proves the point that the name deeply
matters for maintainability in this case. (Linus)
- Do not use copy_user_generic() as the generic x86 backend for
copy_mc_to_user() since the #MC handler explicitly wants the exception
to be trapped by a _ASM_EXTABLE_FAULT handler. (Vivek).
- Rename copy_safe_slow() to copy_mc_fragile() to better indicate what
the implementation is handling. copy_safe_fast() is replaced with
copy_mc_generic().
---
The primary motivation to go touch memcpy_mcsafe() is that the existing
benefit of doing slow "handle with care" copies is obviated on newer
CPUs. With that concern lifted it also obviates the need to continue to
update the MCA-recovery capability detection code currently gated by
"mcsafe_key". Now the old "mcsafe_key" opt-in to perform the copy with
concerns for recovery fragility can instead be made an opt-out from the
default fast copy implementation (enable_copy_mc_fragile()).
The discussion with Linus on the first iteration of this patch
identified that memcpy_mcsafe() was misnamed relative to its usage. The
new names copy_mc_to_user() and copy_mc_to_kernel() clearly indicate the
intended use case and lets the architecture organize the implementation
accordingly.
For both powerpc and x86 a copy_mc_generic() implementation is added as
the backend for these interfaces.
Patches are relative to tip/master.
---
Dan Williams (2):
x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user,kernel}()
x86/copy_mc: Introduce copy_mc_generic()
arch/powerpc/Kconfig | 2
arch/powerpc/include/asm/string.h | 2
arch/powerpc/include/asm/uaccess.h | 40 +++--
arch/powerpc/lib/Makefile | 2
arch/powerpc/lib/copy_mc_64.S | 4
arch/x86/Kconfig | 2
arch/x86/Kconfig.debug | 2
arch/x86/include/asm/copy_mc_test.h | 75 +++++++++
arch/x86/include/asm/mcsafe_test.h | 75 ---------
arch/x86/include/asm/string_64.h | 32 ----
arch/x86/include/asm/uaccess_64.h | 35 ++--
arch/x86/kernel/cpu/mce/core.c | 8 -
arch/x86/kernel/quirks.c | 9 -
arch/x86/lib/Makefile | 1
arch/x86/lib/copy_mc.c | 64 ++++++++
arch/x86/lib/copy_mc_64.S | 165 ++++++++++++++++++++
arch/x86/lib/memcpy_64.S | 115 --------------
arch/x86/lib/usercopy_64.c | 21 ---
drivers/md/dm-writecache.c | 15 +-
drivers/nvdimm/claim.c | 2
drivers/nvdimm/pmem.c | 6 -
include/linux/string.h | 9 -
include/linux/uaccess.h | 9 +
include/linux/uio.h | 10 +
lib/Kconfig | 7 +
lib/iov_iter.c | 43 +++--
tools/arch/x86/include/asm/copy_safe_test.h | 13 ++
tools/arch/x86/include/asm/mcsafe_test.h | 13 --
tools/arch/x86/lib/memcpy_64.S | 115 --------------
tools/objtool/check.c | 5 -
tools/perf/bench/Build | 1
tools/perf/bench/mem-memcpy-x86-64-lib.c | 24 ---
tools/testing/nvdimm/test/nfit.c | 48 +++---
.../testing/selftests/powerpc/copyloops/.gitignore | 2
tools/testing/selftests/powerpc/copyloops/Makefile | 6 -
.../testing/selftests/powerpc/copyloops/copy_mc.S | 0
36 files changed, 460 insertions(+), 522 deletions(-)
rename arch/powerpc/lib/{memcpy_mcsafe_64.S => copy_mc_64.S} (98%)
create mode 100644 arch/x86/include/asm/copy_mc_test.h
delete mode 100644 arch/x86/include/asm/mcsafe_test.h
create mode 100644 arch/x86/lib/copy_mc.c
create mode 100644 arch/x86/lib/copy_mc_64.S
create mode 100644 tools/arch/x86/include/asm/copy_safe_test.h
delete mode 100644 tools/arch/x86/include/asm/mcsafe_test.h
delete mode 100644 tools/perf/bench/mem-memcpy-x86-64-lib.c
rename tools/testing/selftests/powerpc/copyloops/{memcpy_mcsafe_64.S => copy_mc.S} (100%)
base-commit: bba413deb1065f1291cb1f366247513f11215520
2 years, 1 month