[PATCH v2 0/3] Maintainer Entry Profiles
by Dan Williams
Changes since v1 [1]:
- Simplify the profile to a hopefully non-controversial set of
attributes that address the most common sources of contributor
confusion, or maintainer frustration.
- Rename "Subsystem Profile" to "Maintainer Entry Profile". Not every
entry in MAINTAINERS represents a full subsystem. There may be driver
local considerations to communicate to a submitter in addition to wider
subsystem guidelines.
- Delete the old P: tag in MAINTAINERS rather than convert to a new E:
tag (Joe Perches).
[1]: http://lore.kernel.org/r/154225759358.2499188.15268218778137905050.stgit@...
---
At last years Plumbers Conference I proposed the Maintainer Entry
Profile as a document that a maintainer can provide to set contributor
expectations and provide fodder for a discussion between maintainers
about the merits of different maintainer policies.
For those that did not attend, the goal of the Maintainer Entry Profile,
and the Maintainer Handbook more generally, is to provide a desk
reference for maintainers both new and experienced. The session
introduction was:
The first rule of kernel maintenance is that there are no hard and
fast rules. That state of affairs is both a blessing and a curse. It
has served the community well to be adaptable to the different
people and different problem spaces that inhabit the kernel
community. However, that variability also leads to inconsistent
experiences for contributors, little to no guidance for new
contributors, and unnecessary stress on current maintainers. There
are quite a few of people who have been around long enough to make
enough mistakes that they have gained some hard earned proficiency.
However if the kernel community expects to keep growing it needs to
be able both scale the maintainers it has and ramp new ones without
necessarily let them make a decades worth of mistakes to learn the
ropes.
To be clear, the proposed document does not impose or suggest new
rules. Instead it provides an outlet to document the unwritten rules
and policies in effect for each subsystem, and that each subsystem
might decide differently for whatever reason.
---
Dan Williams (3):
MAINTAINERS: Reclaim the P: tag for Maintainer Entry Profile
Maintainer Handbook: Maintainer Entry Profile
libnvdimm, MAINTAINERS: Maintainer Entry Profile
Documentation/maintainer/index.rst | 1
.../maintainer/maintainer-entry-profile.rst | 99 ++++++++++++++++++++
Documentation/nvdimm/maintainer-entry-profile.rst | 64 +++++++++++++
MAINTAINERS | 20 ++--
4 files changed, 175 insertions(+), 9 deletions(-)
create mode 100644 Documentation/maintainer/maintainer-entry-profile.rst
create mode 100644 Documentation/nvdimm/maintainer-entry-profile.rst
2 years
[PATCH v6 0/6] dax/pmem: Provide a dax operation to zero page range
by Vivek Goyal
Hi,
This is V6 of patches. These patches are also available at.
Changes since V5:
- Dan Williams preferred ->zero_page_range() to only accept PAGE_SIZE
aligned request and clear poison only on page size aligned zeroing. So
I changed it accordingly.
- Dropped all the modifications which were required to support arbitrary
range zeroing with-in a page.
- This patch series also fixes the issue where "truncate -s 512 foo.txt"
will fail if first sector of file is poisoned. Currently it succeeds
and filesystem expectes whole of the filesystem block to be free of
poison at the end of the operation.
Christoph, I have dropped your Reviewed-by tag on 1-2 patches because
these patches changed substantially. Especially signature of of
dax zero_page_range() helper.
Thanks
Vivek
Vivek Goyal (6):
pmem: Add functions for reading/writing page to/from pmem
dax, pmem: Add a dax operation zero_page_range
s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver
dm,dax: Add dax zero_page_range operation
dax: Use new dax zero page method for zeroing a page
dax,iomap: Add helper dax_iomap_zero() to zero a range
drivers/dax/super.c | 20 ++++++++
drivers/md/dm-linear.c | 18 +++++++
drivers/md/dm-log-writes.c | 17 ++++++
drivers/md/dm-stripe.c | 23 +++++++++
drivers/md/dm.c | 30 +++++++++++
drivers/nvdimm/pmem.c | 97 ++++++++++++++++++++++-------------
drivers/s390/block/dcssblk.c | 15 ++++++
fs/dax.c | 59 ++++++++++-----------
fs/iomap/buffered-io.c | 9 +---
include/linux/dax.h | 21 +++-----
include/linux/device-mapper.h | 3 ++
11 files changed, 221 insertions(+), 91 deletions(-)
--
2.20.1
2 years, 1 month
[ndctl PATCH] monitor: Add epoll timeout for forcing a full dimm health check
by Vaibhav Jain
This patch adds a new command argument to the 'monitor' command namely
'--check-interval' that triggers a call to notify_dimm_event() at
regular intervals forcing a periodic check of dimm smart events.
This behavior is useful for dimms that do not support event notifications
in case the health status of an nvdimm changes. This is especially
true in case of PAPR-SCM dimms as the PHYP hyper-visor doesn't provide
any notifications to the guest kernel on a change in nvdimm health
status. In such case periodic polling of the is the only way to track
the health of a nvdimm.
The patch updates monitor_event() adding a timeout value to
epoll_wait() call. Also to prevent the possibility of a single dimm
generating enough events thereby preventing check of health status of
other nvdimms, a 'fullpoll_ts' time-stamp is added to keep track of
when full health check of all dimms happened. If after epoll_wait()
returns 'fullpoll_ts' time-stamp indicates last full dimm health check
happened beyond 'check-interval' seconds then a full dimm health check
is enforced.
Signed-off-by: Vaibhav Jain <vaibhav(a)linux.ibm.com>
---
Documentation/ndctl/ndctl-monitor.txt | 4 ++++
ndctl/monitor.c | 31 ++++++++++++++++++++++++---
2 files changed, 32 insertions(+), 3 deletions(-)
diff --git a/Documentation/ndctl/ndctl-monitor.txt b/Documentation/ndctl/ndctl-monitor.txt
index 2239f047266d..14cc59d57157 100644
--- a/Documentation/ndctl/ndctl-monitor.txt
+++ b/Documentation/ndctl/ndctl-monitor.txt
@@ -108,6 +108,10 @@ will not work if "--daemon" is specified.
The monitor will attempt to enable the alarm control bits for all
specified events.
+-i::
+--check-interval=::
+ Force a recheck of dimm health every <n> seconds.
+
-u::
--human::
Output monitor notification as human friendly json format instead
diff --git a/ndctl/monitor.c b/ndctl/monitor.c
index 1755b87a5eeb..b72c5852524e 100644
--- a/ndctl/monitor.c
+++ b/ndctl/monitor.c
@@ -4,6 +4,7 @@
#include <stdio.h>
#include <json-c/json.h>
#include <libgen.h>
+#include <time.h>
#include <dirent.h>
#include <util/json.h>
#include <util/filter.h>
@@ -33,6 +34,7 @@ static struct monitor {
bool daemon;
bool human;
bool verbose;
+ unsigned int poll_timeout;
unsigned int event_flags;
struct log_ctx ctx;
} monitor;
@@ -322,9 +324,14 @@ static int monitor_event(struct ndctl_ctx *ctx,
struct monitor_filter_arg *mfa)
{
struct epoll_event ev, *events;
- int nfds, epollfd, i, rc = 0;
+ int nfds, epollfd, i, rc = 0, polltimeout = -1;
struct monitor_dimm *mdimm;
char buf;
+ /* last time a full poll happened */
+ struct timespec fullpoll_ts, ts;
+
+ if (monitor.poll_timeout)
+ polltimeout = monitor.poll_timeout * 1000;
events = calloc(mfa->num_dimm, sizeof(struct epoll_event));
if (!events) {
@@ -354,14 +361,30 @@ static int monitor_event(struct ndctl_ctx *ctx,
}
}
+ clock_gettime(CLOCK_BOOTTIME, &fullpoll_ts);
while (1) {
did_fail = 0;
- nfds = epoll_wait(epollfd, events, mfa->num_dimm, -1);
- if (nfds <= 0 && errno != EINTR) {
+ nfds = epoll_wait(epollfd, events, mfa->num_dimm, polltimeout);
+ if (nfds < 0 && errno != EINTR) {
err(&monitor, "epoll_wait error: (%s)\n", strerror(errno));
rc = -errno;
goto out;
}
+
+ /* If needed force a full poll of dimm health */
+ clock_gettime(CLOCK_BOOTTIME, &ts);
+ if ((fullpoll_ts.tv_sec - ts.tv_sec) > monitor.poll_timeout) {
+ nfds = 0;
+ dbg(&monitor, "forcing a full poll\n");
+ }
+
+ /* If we timed out then fill events array with all dimms */
+ if (nfds == 0) {
+ list_for_each(&mfa->dimms, mdimm, list)
+ events[nfds++].data.ptr = mdimm;
+ fullpoll_ts = ts;
+ }
+
for (i = 0; i < nfds; i++) {
mdimm = events[i].data.ptr;
if (util_dimm_event_filter(mdimm, monitor.event_flags)) {
@@ -570,6 +593,8 @@ int cmd_monitor(int argc, const char **argv, struct ndctl_ctx *ctx)
"use human friendly output formats"),
OPT_BOOLEAN('v', "verbose", &monitor.verbose,
"emit extra debug messages to log"),
+ OPT_UINTEGER('i', "check-interval", &monitor.poll_timeout,
+ "force a dimm health recheck every <n> seconds"),
OPT_END(),
};
const char * const u[] = {
--
2.24.1
2 years, 1 month
[PATCH] tools/test/nvdimm: Fix out of tree build
by Santosh Sivaraj
Out of tree build using
make M=tools/test/nvdimm O=/tmp/build -C /tmp/build
fails with the following error
make: Entering directory '/tmp/build'
CC [M] tools/testing/nvdimm/test/nfit.o
linux/tools/testing/nvdimm/test/nfit.c:19:10: fatal error: nd-core.h: No such file or directory
19 | #include <nd-core.h>
| ^~~~~~~~~~~
compilation terminated.
That is because the kbuild file uses $(src) which points to
tools/testing/nvdimm, $(srctree) correctly points to root of the linux
source tree.
Reported-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Signed-off-by: Santosh Sivaraj <santosh(a)fossix.org>
---
tools/testing/nvdimm/Kbuild | 4 ++--
tools/testing/nvdimm/test/Kbuild | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/nvdimm/Kbuild b/tools/testing/nvdimm/Kbuild
index 6aca8d5be159..0615fa3d9f7e 100644
--- a/tools/testing/nvdimm/Kbuild
+++ b/tools/testing/nvdimm/Kbuild
@@ -22,8 +22,8 @@ DRIVERS := ../../../drivers
NVDIMM_SRC := $(DRIVERS)/nvdimm
ACPI_SRC := $(DRIVERS)/acpi/nfit
DAX_SRC := $(DRIVERS)/dax
-ccflags-y := -I$(src)/$(NVDIMM_SRC)/
-ccflags-y += -I$(src)/$(ACPI_SRC)/
+ccflags-y := -I$(srctree)/drivers/nvdimm/
+ccflags-y += -I$(srctree)/drivers/acpi/nfit/
obj-$(CONFIG_LIBNVDIMM) += libnvdimm.o
obj-$(CONFIG_BLK_DEV_PMEM) += nd_pmem.o
diff --git a/tools/testing/nvdimm/test/Kbuild b/tools/testing/nvdimm/test/Kbuild
index fb3c3d7cdb9b..75baebf8f4ba 100644
--- a/tools/testing/nvdimm/test/Kbuild
+++ b/tools/testing/nvdimm/test/Kbuild
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
-ccflags-y := -I$(src)/../../../../drivers/nvdimm/
-ccflags-y += -I$(src)/../../../../drivers/acpi/nfit/
+ccflags-y := -I$(srctree)/drivers/nvdimm/
+ccflags-y += -I$(srctree)/drivers/acpi/nfit/
obj-m += nfit_test.o
obj-m += nfit_test_iomap.o
--
2.24.1
2 years, 1 month
[PATCH v2] libnvdimm: Update persistence domain value for of_pmem and papr_scm device
by Aneesh Kumar K.V
Currently, kernel shows the below values
"persistence_domain":"cpu_cache"
"persistence_domain":"memory_controller"
"persistence_domain":"unknown"
"cpu_cache" indicates no extra instructions is needed to ensure the persistence
of data in the pmem media on power failure.
"memory_controller" indicates platform provided instructions need to be issued
as per documented sequence to make sure data get flushed so that it is
guaranteed to be on pmem media in case of system power loss.
Based on the above use memory_controller for non volatile regions on ppc64.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
---
arch/powerpc/platforms/pseries/papr_scm.c | 7 ++++++-
drivers/nvdimm/of_pmem.c | 4 +++-
include/linux/libnvdimm.h | 1 -
3 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index 7525635a8536..ffcd0d7a867c 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -359,8 +359,13 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
if (p->is_volatile)
p->region = nvdimm_volatile_region_create(p->bus, &ndr_desc);
- else
+ else {
+ /*
+ * We need to flush things correctly to guarantee persistance
+ */
+ set_bit(ND_REGION_PERSIST_MEMCTRL, &ndr_desc.flags);
p->region = nvdimm_pmem_region_create(p->bus, &ndr_desc);
+ }
if (!p->region) {
dev_err(dev, "Error registering region %pR from %pOF\n",
ndr_desc.res, p->dn);
diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
index 8224d1431ea9..6826a274a1f1 100644
--- a/drivers/nvdimm/of_pmem.c
+++ b/drivers/nvdimm/of_pmem.c
@@ -62,8 +62,10 @@ static int of_pmem_region_probe(struct platform_device *pdev)
if (is_volatile)
region = nvdimm_volatile_region_create(bus, &ndr_desc);
- else
+ else {
+ set_bit(ND_REGION_PERSIST_MEMCTRL, &ndr_desc.flags);
region = nvdimm_pmem_region_create(bus, &ndr_desc);
+ }
if (!region)
dev_warn(&pdev->dev, "Unable to register region %pR from %pOF\n",
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 0f366706b0aa..771d888a5ed7 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -54,7 +54,6 @@ enum {
/*
* Platform provides mechanisms to automatically flush outstanding
* write data from memory controler to pmem on system power loss.
- * (ADR)
*/
ND_REGION_PERSIST_MEMCTRL = 2,
--
2.24.1
2 years, 1 month
[ndctl PATCH 00/36] Multiple topics / backlog for v68
by Dan Williams
Changes from review:
- Add NDCTL_LIST_LINT to not regress list output (Jeff)
- Add kernel-doc description for ndctl_region_set_align() (Jeff)
---
About half of these have been posted previously, but have been reworked
and revised as they have percolated in my tree relative to other
arriving features. Yes, it is quite a lot to ingest at once, but given
the interdependencies and need to catch up I decided to post it all
together.
The recommendation on how to review is to start with the new tests,
those introduce some new commands, and those new commands introduce some
new library routines. The rest are miscellaneous updates, fixes, and
cleanups.
New tests:
----------
dm.sh: When touching the kernel's core dax infrastructure there are
simple bugs that can be caught by simply firing up a device-mapper dax
configuration.
track-uuid.sh: The kernel bug that lead to this condition of the kernel
garbage collecting the wrong label required a specific namespace
manipulation sequence. Record and replay that sequence against future
kernel updates.
sub-section.sh: Sub-section support adds machinery to add/remove
namespaces on sub-128M boundaries. Force these section collisions and
validate the kernel tracks the mappings.
align.sh: The kernel has grown the ability to align namespace capacity
provisioning at the region level, since the sub-section changes it has
stopped creating namespace with non-zero start_pad, and now it also
needs to enforce up to a 16M minimum alignment on PowerPC. These
alignment requirements / checks need to be balanced against not
regressing currently defined namespaces. Given the kernel will stop
generating problematic configurations by default a tool was needed to
regression test old configs. The new read-infoblock and write-infoblock
commands are added to support that testing and debug.
New commands:
-------------
ndctl read-infoblock: A compliment to read-labels, it can validate and
dump the known infoblock formats in json format. It can also dump the
raw binary infoblock image from a namespace for backup purposes.
ndctl write-infoblock: Generate an fsdax, or devdax infoblock from
passed in parameters. It currently does not support generating btt
infoblocks since that support would also want btt-arena generation
support... maybe in a future update.
New library apis:
-----------------
ndctl_namespace_get_target_node:
ndctl_region_get_target_node: The target_node attribute indicates the
numa node that this memory range would instantiate / join if it were
hot-added via the dax_kmem driver. The target_node is also used to query
platform firmware performance data.
ndctl_region_get_align
ndctl_region_set_align: Support the new alignment settings at the region
level. This augments namespace creation to provision aligned spans of dpa
(device-physical-address).
New command options:
--------------------
ndctl create-namespace: --force: By default ndctl will disallow sub-16M
aligned dax-mode namespace creation to encourage cross-arch
compatibility. However, non-x86 supports 2M aligned namespaces, and
--force will bypass the 16M alignment check.
ndctl create-namespace: --no-autorecover: For debug cases the automatic
cleanup of namespace creation failures destroys some forensic details.
Exit without cleanup when this option is specified.
ndctl list: --configured: The --idle option can be used to enumerate
namespaces that failed to initialize, but the output is cluttered with
seed devices. The --configured option limits the listing to any
namespace that has capacity allocated.
ndctl list: NDCTL_LIST_LINT (environment variable): Allow environments
to opt-in to 'ndctl list' fixes that structurally change the output
format. This prevents ndctl invalidating scripts that are dependent on
the buggy output.
Fixes:
------
- Multiple fixups to create-namespace error reporting
- Require 16M minimum size for any non-raw namespace mode
- Fix destruction of tpm.handle in test/security.sh
- Fix ndctl_namespace_get_resource() to return the updated resource base
after ndctl_namespace_set_size()
- Fix the warning spew from taking the address of a packed structure
member.
---
Dan Williams (36):
ndctl/list: Add 'target_node' to region and namespace verbose listings
ndctl/docs: Fix mailing list sign-up link
ndctl/list: Drop named list objects from verbose listing
daxctl/list: Avoid memory operations without resource data
ndctl/build: Fix distcheck
ndctl/namespace: Fix destroy-namespace accounting relative to seed devices
ndctl/region: Support ndctl_region_{get,set}_align()
ndctl/namespace: Emit better errors on failure
ndctl/namespace: Check for region alignment violations
ndctl/util: Up-level is_power_of_2() and introduce IS_ALIGNED
ndctl/namespace: Validate resource alignment for dax-mode namespaces
ndctl/namespace: Add read-infoblock command
ndctl/test: Update dax-dev to handle multiple e820 ranges
ndctl/namespace: Always zero info-blocks
ndctl/namespace: Disable autorecovery of create-namespace failures
ndctl/build: Fix EXTRA_DIST already defined errors
ndctl/test: Checkout device-mapper + dax operation
ndctl/test: Exercise sub-section sized namespace creation/deletion
ndctl/namespace: Kill off the legacy mode names
ndctl/namespace: Introduce mode-to-name and name-to-mode helpers
ndctl/namespace: Validate namespace size within validate_namespace_options()
ndctl/namespace: Clarify 16M minimum size requirement
ndctl/test: Regression test 'failed to track'
ndctl/dimm: Rework dimm command status reporting
ndctl/dimm: Rework iteration to drop unaligned pointers
ndctl/test: Fix typos / loss of tpm.handle in security test
ndctl/test: Relax dax_pmem_compat requirement
ndctl/namespace: Fix namespace-action vs namespace-mode confusion
ndctl/namespace: Update 'pfn' infoblock definition
ndctl/util: Return 0 for NULL arguments to parse_size64()
ndctl/namespace: Fix read-info-block vs read-infoblock
ndctl/namespace: Parse infoblocks from stdin
ndctl/namespace: Add write-infoblock command
ndctl/list: Add option to list configured + disabled namespaces
ndctl/lib/namespace: Fix resource retrieval after size change
ndctl/test: Regression test misaligned namespaces
CONTRIBUTING.md | 2
Documentation/ndctl/Makefile.am | 4
Documentation/ndctl/ndctl-create-namespace.txt | 17
Documentation/ndctl/ndctl-list.txt | 64 +
Documentation/ndctl/ndctl-read-infoblock.txt | 94 ++
Documentation/ndctl/ndctl-write-infoblock.txt | 132 +++
ndctl/action.h | 2
ndctl/builtin.h | 2
ndctl/check.c | 20
ndctl/lib/ars.c | 34 +
ndctl/lib/hpe1.c | 17
ndctl/lib/hyperv.c | 7
ndctl/lib/intel.c | 56 +
ndctl/lib/libndctl.c | 191 ++++
ndctl/lib/libndctl.sym | 4
ndctl/lib/msft.c | 8
ndctl/lib/nfit.c | 36 +
ndctl/lib/private.h | 16
ndctl/libndctl-nfit.h | 11
ndctl/libndctl.h | 5
ndctl/list.c | 59 +
ndctl/namespace.c | 1050 +++++++++++++++++++++---
ndctl/namespace.h | 61 +
ndctl/ndctl.c | 2
test/Makefile.am | 14
test/align.sh | 118 +++
test/blk-exhaust.sh | 2
test/blk_namespaces.c | 1
test/btt-check.sh | 2
test/btt-errors.sh | 7
test/btt-pad-compat.sh | 5
test/clear.sh | 2
test/core.c | 8
test/create.sh | 2
test/dax-dev.c | 17
test/dax.sh | 2
test/daxctl-devices.sh | 2
test/daxdev-errors.sh | 2
test/device-dax-fio.sh | 2
test/dm.sh | 75 ++
test/dpa-alloc.c | 10
test/dsm-fail.c | 5
test/firmware-update.sh | 2
test/inject-error.sh | 2
test/inject-smart.sh | 2
test/label-compat.sh | 5
test/libndctl.c | 16
test/max_available_extent_ns.sh | 2
test/mmap.sh | 2
test/monitor.sh | 2
test/multi-dax.sh | 2
test/parent-uuid.c | 1
test/pfn-meta-errors.sh | 2
test/pmem-errors.sh | 11
test/rescan-partitions.sh | 2
test/sector-mode.sh | 2
test/security.sh | 6
test/sub-section.sh | 77 ++
test/track-uuid.sh | 40 +
util/filter.c | 46 +
util/filter.h | 3
util/fletcher.h | 1
util/json.c | 18
util/json.h | 1
util/size.c | 2
util/size.h | 8
66 files changed, 2112 insertions(+), 313 deletions(-)
create mode 100644 Documentation/ndctl/ndctl-read-infoblock.txt
create mode 100644 Documentation/ndctl/ndctl-write-infoblock.txt
create mode 100755 test/align.sh
create mode 100755 test/dm.sh
create mode 100755 test/sub-section.sh
create mode 100755 test/track-uuid.sh
base-commit: 56f4a91b51b532fcdb8b44ace422dce48ed27c7d
2 years, 2 months
[PATCH v3 00/27] Add support for OpenCAPI Persistent Memory devices
by Alastair D'Silva
From: Alastair D'Silva <alastair(a)d-silva.org>
This series adds support for OpenCAPI Persistent Memory devices, exposing
them as nvdimms so that we can make use of the existing infrastructure.
Alastair D'Silva (27):
powerpc: Add OPAL calls for LPC memory alloc/release
mm/memory_hotplug: Allow check_hotplug_memory_addressable to be called
from drivers
powerpc: Map & release OpenCAPI LPC memory
ocxl: Remove unnecessary externs
ocxl: Address kernel doc errors & warnings
ocxl: Tally up the LPC memory on a link & allow it to be mapped
ocxl: Add functions to map/unmap LPC memory
ocxl: Emit a log message showing how much LPC memory was detected
ocxl: Save the device serial number in ocxl_fn
powerpc: Add driver for OpenCAPI Persistent Memory
powerpc: Enable the OpenCAPI Persistent Memory driver for
powernv_defconfig
powerpc/powernv/pmem: Add register addresses & status values to the
header
powerpc/powernv/pmem: Read the capability registers & wait for device
ready
powerpc/powernv/pmem: Add support for Admin commands
powerpc/powernv/pmem: Add support for near storage commands
powerpc/powernv/pmem: Register a character device for userspace to
interact with
powerpc/powernv/pmem: Implement the Read Error Log command
powerpc/powernv/pmem: Add controller dump IOCTLs
powerpc/powernv/pmem: Add an IOCTL to report controller statistics
powerpc/powernv/pmem: Forward events to userspace
powerpc/powernv/pmem: Add an IOCTL to request controller health & perf
data
powerpc/powernv/pmem: Implement the heartbeat command
powerpc/powernv/pmem: Add debug IOCTLs
powerpc/powernv/pmem: Expose SMART data via ndctl
powerpc/powernv/pmem: Expose the serial number in sysfs
powerpc/powernv/pmem: Expose the firmware version in sysfs
MAINTAINERS: Add myself & nvdimm/ocxl to ocxl
MAINTAINERS | 3 +
arch/powerpc/configs/powernv_defconfig | 5 +
arch/powerpc/include/asm/opal-api.h | 2 +
arch/powerpc/include/asm/opal.h | 3 +
arch/powerpc/include/asm/pnv-ocxl.h | 40 +-
arch/powerpc/platforms/powernv/Kconfig | 3 +
arch/powerpc/platforms/powernv/Makefile | 1 +
arch/powerpc/platforms/powernv/ocxl.c | 43 +
arch/powerpc/platforms/powernv/opal-call.c | 2 +
arch/powerpc/platforms/powernv/pmem/Kconfig | 21 +
arch/powerpc/platforms/powernv/pmem/Makefile | 7 +
arch/powerpc/platforms/powernv/pmem/ocxl.c | 1991 +++++++++++++++++
.../platforms/powernv/pmem/ocxl_internal.c | 213 ++
.../platforms/powernv/pmem/ocxl_internal.h | 254 +++
.../platforms/powernv/pmem/ocxl_sysfs.c | 46 +
drivers/misc/ocxl/config.c | 74 +-
drivers/misc/ocxl/core.c | 61 +
drivers/misc/ocxl/link.c | 53 +
drivers/misc/ocxl/ocxl_internal.h | 45 +-
include/linux/memory_hotplug.h | 5 +
include/misc/ocxl.h | 122 +-
include/uapi/linux/ndctl.h | 1 +
include/uapi/nvdimm/ocxl-pmem.h | 127 ++
mm/memory_hotplug.c | 4 +-
24 files changed, 3029 insertions(+), 97 deletions(-)
create mode 100644 arch/powerpc/platforms/powernv/pmem/Kconfig
create mode 100644 arch/powerpc/platforms/powernv/pmem/Makefile
create mode 100644 arch/powerpc/platforms/powernv/pmem/ocxl.c
create mode 100644 arch/powerpc/platforms/powernv/pmem/ocxl_internal.c
create mode 100644 arch/powerpc/platforms/powernv/pmem/ocxl_internal.h
create mode 100644 arch/powerpc/platforms/powernv/pmem/ocxl_sysfs.c
create mode 100644 include/uapi/nvdimm/ocxl-pmem.h
--
2.24.1
2 years, 2 months
[PATCH] block: refactor duplicated macros
by Matteo Croce
The macros PAGE_SECTORS, PAGE_SECTORS_SHIFT and SECTOR_MASK are defined
several times in different flavours across the whole tree.
Define them just once in a common header.
Signed-off-by: Matteo Croce <mcroce(a)redhat.com>
---
block/blk-lib.c | 2 +-
drivers/block/brd.c | 3 ---
drivers/block/null_blk_main.c | 4 ----
drivers/block/zram/zram_drv.c | 8 ++++----
drivers/block/zram/zram_drv.h | 2 --
drivers/dax/super.c | 2 +-
drivers/md/bcache/util.h | 2 --
drivers/md/dm-bufio.c | 6 +++---
drivers/md/dm-integrity.c | 10 +++++-----
drivers/md/md.c | 4 ++--
drivers/md/raid1.c | 2 +-
drivers/mmc/core/host.c | 3 ++-
drivers/scsi/xen-scsifront.c | 4 ++--
fs/iomap/buffered-io.c | 2 +-
fs/nfs/blocklayout/blocklayout.h | 2 --
include/linux/blkdev.h | 4 ++++
include/linux/device-mapper.h | 1 -
17 files changed, 26 insertions(+), 35 deletions(-)
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 5f2c429d4378..f5e705d307e0 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -260,7 +260,7 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
*/
static unsigned int __blkdev_sectors_to_bio_pages(sector_t nr_sects)
{
- sector_t pages = DIV_ROUND_UP_SECTOR_T(nr_sects, PAGE_SIZE / 512);
+ sector_t pages = DIV_ROUND_UP_SECTOR_T(nr_sects, PAGE_SECTORS);
return min(pages, (sector_t)BIO_MAX_PAGES);
}
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 220c5e18aba0..33e2cbe11400 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -25,9 +25,6 @@
#include <linux/uaccess.h>
-#define PAGE_SECTORS_SHIFT (PAGE_SHIFT - SECTOR_SHIFT)
-#define PAGE_SECTORS (1 << PAGE_SECTORS_SHIFT)
-
/*
* Each block ramdisk device has a radix_tree brd_pages of pages that stores
* the pages containing the block device's contents. A brd page's ->index is
diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c
index 16510795e377..c42af6cf0b97 100644
--- a/drivers/block/null_blk_main.c
+++ b/drivers/block/null_blk_main.c
@@ -11,10 +11,6 @@
#include <linux/init.h>
#include "null_blk.h"
-#define PAGE_SECTORS_SHIFT (PAGE_SHIFT - SECTOR_SHIFT)
-#define PAGE_SECTORS (1 << PAGE_SECTORS_SHIFT)
-#define SECTOR_MASK (PAGE_SECTORS - 1)
-
#define FREE_BATCH 16
#define TICKS_PER_SEC 50ULL
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 1bdb5793842b..6ee59da4a6e2 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1548,9 +1548,9 @@ static void __zram_make_request(struct zram *zram, struct bio *bio)
struct bio_vec bvec;
struct bvec_iter iter;
- index = bio->bi_iter.bi_sector >> SECTORS_PER_PAGE_SHIFT;
+ index = bio->bi_iter.bi_sector >> PAGE_SECTORS_SHIFT;
offset = (bio->bi_iter.bi_sector &
- (SECTORS_PER_PAGE - 1)) << SECTOR_SHIFT;
+ SECTOR_MASK) << SECTOR_SHIFT;
switch (bio_op(bio)) {
case REQ_OP_DISCARD:
@@ -1643,8 +1643,8 @@ static int zram_rw_page(struct block_device *bdev, sector_t sector,
goto out;
}
- index = sector >> SECTORS_PER_PAGE_SHIFT;
- offset = (sector & (SECTORS_PER_PAGE - 1)) << SECTOR_SHIFT;
+ index = sector >> PAGE_SECTORS_SHIFT;
+ offset = (sector & SECTOR_MASK) << SECTOR_SHIFT;
bv.bv_page = page;
bv.bv_len = PAGE_SIZE;
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index f2fd46daa760..12309175d55e 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -21,8 +21,6 @@
#include "zcomp.h"
-#define SECTORS_PER_PAGE_SHIFT (PAGE_SHIFT - SECTOR_SHIFT)
-#define SECTORS_PER_PAGE (1 << SECTORS_PER_PAGE_SHIFT)
#define ZRAM_LOGICAL_BLOCK_SHIFT 12
#define ZRAM_LOGICAL_BLOCK_SIZE (1 << ZRAM_LOGICAL_BLOCK_SHIFT)
#define ZRAM_SECTOR_PER_LOGICAL_BLOCK \
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 0aa4b6bc5101..7f7672f72085 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -92,7 +92,7 @@ bool __generic_fsdax_supported(struct dax_device *dax_dev,
return false;
}
- last_page = PFN_DOWN((start + sectors - 1) * 512) * PAGE_SIZE / 512;
+ last_page = PFN_DOWN((start + sectors - 1) * 512) * PAGE_SECTORS;
err = bdev_dax_pgoff(bdev, last_page, PAGE_SIZE, &pgoff_end);
if (err) {
pr_debug("%s: error: unaligned partition for dax\n",
diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
index c029f7443190..55196e0f37c3 100644
--- a/drivers/md/bcache/util.h
+++ b/drivers/md/bcache/util.h
@@ -15,8 +15,6 @@
#include "closure.h"
-#define PAGE_SECTORS (PAGE_SIZE / 512)
-
struct closure;
#ifdef CONFIG_BCACHE_DEBUG
diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index 2d519c223562..f4496ce0d598 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -384,7 +384,7 @@ static void *alloc_buffer_data(struct dm_bufio_client *c, gfp_t gfp_mask,
gfp_mask & __GFP_NORETRY) {
*data_mode = DATA_MODE_GET_FREE_PAGES;
return (void *)__get_free_pages(gfp_mask,
- c->sectors_per_block_bits - (PAGE_SHIFT - SECTOR_SHIFT));
+ c->sectors_per_block_bits - PAGE_SECTORS_SHIFT);
}
*data_mode = DATA_MODE_VMALLOC;
@@ -422,7 +422,7 @@ static void free_buffer_data(struct dm_bufio_client *c,
case DATA_MODE_GET_FREE_PAGES:
free_pages((unsigned long)data,
- c->sectors_per_block_bits - (PAGE_SHIFT - SECTOR_SHIFT));
+ c->sectors_per_block_bits - PAGE_SECTORS_SHIFT);
break;
case DATA_MODE_VMALLOC:
@@ -597,7 +597,7 @@ static void use_bio(struct dm_buffer *b, int rw, sector_t sector,
unsigned vec_size, len;
vec_size = b->c->block_size >> PAGE_SHIFT;
- if (unlikely(b->c->sectors_per_block_bits < PAGE_SHIFT - SECTOR_SHIFT))
+ if (unlikely(b->c->sectors_per_block_bits < PAGE_SECTORS_SHIFT))
vec_size += 2;
bio = bio_kmalloc(GFP_NOWAIT | __GFP_NORETRY | __GFP_NOWARN, vec_size);
diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c
index b225b3e445fa..4e60cda465cc 100644
--- a/drivers/md/dm-integrity.c
+++ b/drivers/md/dm-integrity.c
@@ -652,7 +652,7 @@ static void page_list_location(struct dm_integrity_c *ic, unsigned section, unsi
sector = section * ic->journal_section_sectors + offset;
- *pl_index = sector >> (PAGE_SHIFT - SECTOR_SHIFT);
+ *pl_index = sector >> PAGE_SECTORS_SHIFT;
*pl_offset = (sector << SECTOR_SHIFT) & (PAGE_SIZE - 1);
}
@@ -951,7 +951,7 @@ static void rw_journal_sectors(struct dm_integrity_c *ic, int op, int op_flags,
return;
}
- pl_index = sector >> (PAGE_SHIFT - SECTOR_SHIFT);
+ pl_index = sector >> PAGE_SECTORS_SHIFT;
pl_offset = (sector << SECTOR_SHIFT) & (PAGE_SIZE - 1);
io_req.bi_op = op;
@@ -1072,7 +1072,7 @@ static void copy_from_journal(struct dm_integrity_c *ic, unsigned section, unsig
sector = section * ic->journal_section_sectors + JOURNAL_BLOCK_SECTORS + offset;
- pl_index = sector >> (PAGE_SHIFT - SECTOR_SHIFT);
+ pl_index = sector >> PAGE_SECTORS_SHIFT;
pl_offset = (sector << SECTOR_SHIFT) & (PAGE_SIZE - 1);
io_req.bi_op = REQ_OP_WRITE;
@@ -3343,7 +3343,7 @@ static int create_journal(struct dm_integrity_c *ic, char **error)
ic->commit_ids[3] = cpu_to_le64(0x4444444444444444ULL);
journal_pages = roundup((__u64)ic->journal_sections * ic->journal_section_sectors,
- PAGE_SIZE >> SECTOR_SHIFT) >> (PAGE_SHIFT - SECTOR_SHIFT);
+ PAGE_SIZE >> SECTOR_SHIFT) >> PAGE_SECTORS_SHIFT;
journal_desc_size = journal_pages * sizeof(struct page_list);
if (journal_pages >= totalram_pages() - totalhigh_pages() || journal_desc_size > ULONG_MAX) {
*error = "Journal doesn't fit into memory";
@@ -4075,7 +4075,7 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv)
spin_lock_init(&bbs->bio_queue_lock);
sector = i * (BITMAP_BLOCK_SIZE >> SECTOR_SHIFT);
- pl_index = sector >> (PAGE_SHIFT - SECTOR_SHIFT);
+ pl_index = sector >> PAGE_SECTORS_SHIFT;
pl_offset = (sector << SECTOR_SHIFT) & (PAGE_SIZE - 1);
bbs->bitmap = lowmem_page_address(ic->journal[pl_index].page) + pl_offset;
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 469f551863be..b28f9390608f 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -1734,7 +1734,7 @@ static int super_1_load(struct md_rdev *rdev, struct md_rdev *refdev, int minor_
__le64 *bbp;
int i;
int sectors = le16_to_cpu(sb->bblog_size);
- if (sectors > (PAGE_SIZE / 512))
+ if (sectors > PAGE_SECTORS)
return -EINVAL;
offset = le32_to_cpu(sb->bblog_offset);
if (offset == 0)
@@ -8733,7 +8733,7 @@ void md_do_sync(struct md_thread *thread)
/*
* Tune reconstruction:
*/
- window = 32 * (PAGE_SIZE / 512);
+ window = 32 * PAGE_SECTORS;
pr_debug("md: using %dk window, over a total of %lluk.\n",
window/2, (unsigned long long)max_sectors/2);
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index cd810e195086..37a0b571903a 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2129,7 +2129,7 @@ static void process_checks(struct r1bio *r1_bio)
int vcnt;
/* Fix variable parts of all bios */
- vcnt = (r1_bio->sectors + PAGE_SIZE / 512 - 1) >> (PAGE_SHIFT - 9);
+ vcnt = (r1_bio->sectors + PAGE_SECTORS - 1) >> (PAGE_SHIFT - 9);
for (i = 0; i < conf->raid_disks * 2; i++) {
blk_status_t status;
struct bio *b = r1_bio->bios[i];
diff --git a/drivers/mmc/core/host.c b/drivers/mmc/core/host.c
index c8768726d925..4a23fb9d5642 100644
--- a/drivers/mmc/core/host.c
+++ b/drivers/mmc/core/host.c
@@ -18,6 +18,7 @@
#include <linux/export.h>
#include <linux/leds.h>
#include <linux/slab.h>
+#include <linux/blkdev.h>
#include <linux/mmc/host.h>
#include <linux/mmc/card.h>
@@ -427,7 +428,7 @@ struct mmc_host *mmc_alloc_host(int extra, struct device *dev)
host->max_req_size = PAGE_SIZE;
host->max_blk_size = 512;
- host->max_blk_count = PAGE_SIZE / 512;
+ host->max_blk_count = PAGE_SECTORS;
host->fixed_drv_type = -EINVAL;
host->ios.power_delay_ms = 10;
diff --git a/drivers/scsi/xen-scsifront.c b/drivers/scsi/xen-scsifront.c
index f0068e96a177..e6b29e54d07a 100644
--- a/drivers/scsi/xen-scsifront.c
+++ b/drivers/scsi/xen-scsifront.c
@@ -852,7 +852,7 @@ static int scsifront_probe(struct xenbus_device *dev,
host->max_id = VSCSIIF_MAX_TARGET;
host->max_channel = 0;
host->max_lun = VSCSIIF_MAX_LUN;
- host->max_sectors = (host->sg_tablesize - 1) * PAGE_SIZE / 512;
+ host->max_sectors = (host->sg_tablesize - 1) * PAGE_SECTORS;
host->max_cmd_len = VSCSIIF_MAX_COMMAND_SIZE;
err = scsi_add_host(host, &dev->dev);
@@ -1073,7 +1073,7 @@ static void scsifront_read_backend_params(struct xenbus_device *dev,
host->sg_tablesize, nr_segs);
host->sg_tablesize = nr_segs;
- host->max_sectors = (nr_segs - 1) * PAGE_SIZE / 512;
+ host->max_sectors = (nr_segs - 1) * PAGE_SECTORS;
}
static void scsifront_backend_changed(struct xenbus_device *dev,
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 7c84c4c027c4..60505fc156c5 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -29,7 +29,7 @@ struct iomap_page {
atomic_t read_count;
atomic_t write_count;
spinlock_t uptodate_lock;
- DECLARE_BITMAP(uptodate, PAGE_SIZE / 512);
+ DECLARE_BITMAP(uptodate, PAGE_SECTORS);
};
static inline struct iomap_page *to_iomap_page(struct page *page)
diff --git a/fs/nfs/blocklayout/blocklayout.h b/fs/nfs/blocklayout/blocklayout.h
index 716bc75e9ed2..22407751e0fd 100644
--- a/fs/nfs/blocklayout/blocklayout.h
+++ b/fs/nfs/blocklayout/blocklayout.h
@@ -40,8 +40,6 @@
#include "../pnfs.h"
#include "../netns.h"
-#define PAGE_CACHE_SECTORS (PAGE_SIZE >> SECTOR_SHIFT)
-#define PAGE_CACHE_SECTOR_SHIFT (PAGE_SHIFT - SECTOR_SHIFT)
#define SECTOR_SIZE (1 << SECTOR_SHIFT)
struct pnfs_block_dev;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 053ea4b51988..b3c9be6906a0 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -910,6 +910,10 @@ static inline struct request_queue *bdev_get_queue(struct block_device *bdev)
#define SECTOR_SIZE (1 << SECTOR_SHIFT)
#endif
+#define PAGE_SECTORS_SHIFT (PAGE_SHIFT - SECTOR_SHIFT)
+#define PAGE_SECTORS (1 << PAGE_SECTORS_SHIFT)
+#define SECTOR_MASK (PAGE_SECTORS - 1)
+
/*
* blk_rq_pos() : the current sector
* blk_rq_bytes() : bytes left in the entire request
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index 475668c69dbc..c98a533f8ffa 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -141,7 +141,6 @@ typedef long (*dm_dax_direct_access_fn) (struct dm_target *ti, pgoff_t pgoff,
long nr_pages, void **kaddr, pfn_t *pfn);
typedef size_t (*dm_dax_copy_iter_fn)(struct dm_target *ti, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i);
-#define PAGE_SECTORS (PAGE_SIZE / 512)
void dm_error(const char *message);
--
2.24.1
2 years, 2 months
[PATCH v3 0/5] libnvdimm: Cross-arch compatible namespace alignment
by Dan Williams
Changes since v2 [1]:
- Fix up a missing space in flags_show() (Jeff)
- Prompted by Jeff saying that v2 only worked for him if
memremap_compat_align() returned PAGE_SIZE (which defeats the purpose)
I developed a new ndctl unit test that runs through the possible
legacy configurations that the kernel needs to support. Several changes
fell out as a result:
- Update nd_pfn_validate() to add more -EOPNOTSUPP cases. That error
code indicates "Stop, the pfn looks coherent, but invalid. Do not
proceed with exposing a raw namespace, require the user to
investigate whether the infoblock needs to be rewritten, or the
kernel configuration (like PAGE_SIZE) needs to change."
- Move the validation of fsdax and devdax infoblocks to
nd_pfn_validate() so that the presence of non-zero 'start_pad' and
'end_trunc' can be considered in the alignment validation.
- Fail namespace creation when the base address is misaligned. A
non-zero-start_pad prevents dax operation due to original bug of
->data_offset being base address relative when it should have been
->start_pad relative. So, reject all base address misaligned
namespaces in nd_pfn_init().
[1]: http://lore.kernel.org/r/158155489850.3343782.2687127373754434980.stgit@d...
---
Review / merge logistics notes:
Patch "libnvdimm/namespace: Enforce memremap_compat_align()" has
changed enough that it needs to be reviewed again.
Patch "mm/memremap_pages: Introduce memremap_compat_align()" still
needs a PowerPC maintainer ack for the touches to
arch/powerpc/mm/ioremap.c.
---
Aneesh reports that PowerPC requires 16MiB alignment for the address
range passed to devm_memremap_pages(), and Jeff reports that it is
possible to create a misaligned namespace which blocks future namespace
creation in that region. Both of these issues require namespace
alignment to be managed at the region level rather than padding at the
namespace level which has been a broken approach to date.
Introduce memremap_compat_align() to indicate the hard requirements of
an arch's memremap_pages() implementation. Use the maximum known
memremap_compat_align() to set the default namespace alignment for
libnvdimm. Consult that alignment when allocating free space. Finally,
allow the default region alignment to be overridden to maintain the same
namespace creation capability as previous kernels.
The ndctl unit tests, which have some misaligned namespace assumptions,
are updated to use the alignment override where necessary.
Thanks to Aneesh for early feedback and testing on this improved
alignment handling.
---
Dan Williams (5):
mm/memremap_pages: Introduce memremap_compat_align()
libnvdimm/pfn: Prevent raw mode fallback if pfn-infoblock valid
libnvdimm/namespace: Enforce memremap_compat_align()
libnvdimm/region: Introduce NDD_LABELING
libnvdimm/region: Introduce an 'align' attribute
arch/powerpc/Kconfig | 1
arch/powerpc/mm/ioremap.c | 21 +++++
arch/powerpc/platforms/pseries/papr_scm.c | 2
drivers/acpi/nfit/core.c | 4 +
drivers/nvdimm/dimm.c | 2
drivers/nvdimm/dimm_devs.c | 95 +++++++++++++++++----
drivers/nvdimm/namespace_devs.c | 23 ++++-
drivers/nvdimm/nd.h | 3 -
drivers/nvdimm/pfn_devs.c | 34 ++++++-
drivers/nvdimm/region_devs.c | 132 ++++++++++++++++++++++++++---
include/linux/libnvdimm.h | 2
include/linux/memremap.h | 8 ++
include/linux/mmzone.h | 1
lib/Kconfig | 3 +
mm/memremap.c | 23 +++++
15 files changed, 307 insertions(+), 47 deletions(-)
base-commit: 11a48a5a18c63fd7621bb050228cebf13566e4d8
2 years, 2 months