[GIT PULL] libnvdimm for 4.6
by Williams, Dan J
Hi Linus, please pull from...
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm tags/libnvdimm-for-4.6
...to receive the libnvdimm update for 4.6.
This has appeared in -next with no reported issues, and a test merge
with latest master passes the regression tests. Note that this
includes development that was based on the tip/core/resources branch
which you merged yesterday. The post merge diffstat is cleaner than
the 6e2452dff444..libnvdimm-for-4.6 diffstat that "git request-pull"
picked.
arch/x86/include/asm/pmem.h | 5 +
drivers/acpi/nfit.c | 798 +++++++++++++++++++++++++++++++++++++++--------------
drivers/acpi/nfit.h | 30 +-
drivers/nvdimm/blk.c | 18 +-
drivers/nvdimm/btt.c | 19 +-
drivers/nvdimm/bus.c | 131 +++++++--
drivers/nvdimm/core.c | 110 +++++---
drivers/nvdimm/dimm_devs.c | 6 +-
drivers/nvdimm/namespace_devs.c | 7 +
drivers/nvdimm/nd.h | 4 +
drivers/nvdimm/pfn.h | 23 +-
drivers/nvdimm/pfn_devs.c | 61 ++++
drivers/nvdimm/pmem.c | 219 +++++++++++----
drivers/nvdimm/region.c | 12 +
include/linux/ioport.h | 1 +
include/linux/libnvdimm.h | 5 +-
include/linux/nd.h | 7 +
include/linux/pmem.h | 19 ++
include/uapi/linux/ndctl.h | 13 +
kernel/resource.c | 60 +++-
tools/testing/nvdimm/test/nfit.c | 285 ++++++++++++++-----
21 files changed, 1394 insertions(+), 439 deletions(-)
The following changes since commit 6e2452dff4441e3dc24d415c8b2cda8a3ba52116:
nfit: Continue init even if ARS commands are unimplemented (2016-03-04 16:46:13 -0800)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm tags/libnvdimm-for-4.6
for you to fetch changes up to 489011652a2d5555901def04c24d68874e8ba9a1:
Merge branch 'for-4.6/pfn' into libnvdimm-for-next (2016-03-09 17:15:43 -0800)
----------------------------------------------------------------
libnvdimm for 4.6
1/ Asynchronous address range scrub:
Given the capacities of next generation persistent memory devices a
scrub operation to find all poison may take 10s of seconds. We want
this scrub work to be done asynchronously with the rest of system
initialization, so we move it out of line from the NFIT probing, i.e.
acpi_nfit_add().
2/ Clear poison:
ACPI 6.1 introduces the ability to send "clear error" commands to the
ACPI0012:00 device representing the root of an "nvdimm bus". Similar to
relocating a bad block on a disk, this support clears media errors in
response to a write.
3/ Persistent memory resource tracking:
A persistent memory range may be designated as simply "reserved" by
platform firmware in the efi/e820 memory map. Later when the NFIT
driver loads it discovers that the range is "Persistent Memory". The
NFIT bus driver inserts a resource to advertise that "persistent"
attribute in the system resource tree for /proc/iomem and
kernel-internal usages.
4/ Miscellaneous cleanups and fixes:
Workaround section misaligned pmem ranges when allocating a struct page
memmap, fix handling of the read-only case in the ioctl path, and clean
up block device major number allocation.
----------------------------------------------------------------
Dan Williams (18):
nfit, tools/testing/nvdimm: add format interface code definitions
nfit, tools/testing/nvdimm: test multiple control regions per-dimm
libnvdimm, nfit: centralize command status translation
libnvdimm: protect nvdimm_{bus|namespace}_add_poison() with nvdimm_bus_lock()
libnvdimm: async notification support
nfit, tools/testing/nvdimm: unify common init for acpi_nfit_desc
nfit, libnvdimm: async region scrub workqueue
nfit: scrub and register regions in a workqueue
nfit: disable userspace initiated ars during scrub
tools/testing/nvdimm: expand ars unit testing
libnvdimm, pmem: fix 'pfn' support for section-misaligned namespaces
libnvdimm, pmem: adjust for section collisions with 'System RAM'
libnvdimm, pfn: 'resource'-address and 'size' attributes for pfn devices
nfit, libnvdimm: clear poison command support
libnvdimm, pmem: fix ia64 build, use PHYS_PFN
libnvdimm, pmem: fix kmap_atomic() leak in error path
libnvdimm, pmem: clear poison on write
Merge branch 'for-4.6/pfn' into libnvdimm-for-next
Jerry Hoemann (2):
libnvdimm: Clean-up access mode check.
libnvdimm: Fix security issue with DSM IOCTL.
NeilBrown (3):
pmem: don't allocate unused major device number
nvdimm/blk: don't allocate unused major device number
nvdimm/btt: don't allocate unused major device number
Toshi Kani (4):
resource: Change __request_region to inherit from immediate parent
resource: Add remove_resource interface
resource: Export insert_resource and remove_resource
ACPI: Change NFIT driver to insert new resource
arch/arm/kernel/setup.c | 6 +-
arch/arm/plat-samsung/pm-check.c | 4 +-
arch/arm64/kernel/setup.c | 6 +-
arch/avr32/kernel/setup.c | 6 +-
arch/ia64/kernel/efi.c | 13 +-
arch/ia64/kernel/setup.c | 6 +-
arch/m32r/kernel/setup.c | 4 +-
arch/mips/kernel/setup.c | 10 +-
arch/parisc/mm/init.c | 6 +-
arch/powerpc/mm/mem.c | 2 +-
arch/s390/kernel/setup.c | 8 +-
arch/score/kernel/setup.c | 2 +-
arch/sh/kernel/setup.c | 8 +-
arch/sparc/mm/init_64.c | 8 +-
arch/tile/kernel/setup.c | 11 +-
arch/unicore32/kernel/setup.c | 6 +-
arch/x86/include/asm/pmem.h | 5 +
arch/x86/kernel/crash.c | 41 +-
arch/x86/kernel/e820.c | 38 +-
arch/x86/kernel/pmem.c | 4 +-
arch/x86/kernel/setup.c | 6 +-
drivers/acpi/acpi_platform.c | 2 +-
drivers/acpi/apei/einj.c | 15 +-
drivers/acpi/nfit.c | 798 +++++++++++++++++++++++++++----------
drivers/acpi/nfit.h | 30 +-
drivers/nvdimm/blk.c | 18 +-
drivers/nvdimm/btt.c | 19 +-
drivers/nvdimm/bus.c | 131 +++++-
drivers/nvdimm/core.c | 110 +++--
drivers/nvdimm/dimm_devs.c | 6 +-
drivers/nvdimm/e820.c | 2 +-
drivers/nvdimm/namespace_devs.c | 7 +
drivers/nvdimm/nd.h | 4 +
drivers/nvdimm/pfn.h | 23 +-
drivers/nvdimm/pfn_devs.c | 61 +++
drivers/nvdimm/pmem.c | 219 +++++++---
drivers/nvdimm/region.c | 12 +
drivers/parisc/eisa_enumerator.c | 4 +-
drivers/rapidio/rio.c | 8 +-
drivers/sh/superhyway/superhyway.c | 2 +-
drivers/xen/balloon.c | 2 +-
include/linux/ioport.h | 34 +-
include/linux/libnvdimm.h | 5 +-
include/linux/mm.h | 3 +-
include/linux/nd.h | 7 +
include/linux/pmem.h | 19 +
include/uapi/linux/ndctl.h | 13 +
kernel/kexec_core.c | 8 +-
kernel/kexec_file.c | 8 +-
kernel/memremap.c | 13 +-
kernel/resource.c | 149 +++++--
mm/memory_hotplug.c | 2 +-
tools/testing/nvdimm/test/nfit.c | 285 ++++++++++---
53 files changed, 1623 insertions(+), 596 deletions(-)
6 years, 3 months
[ndctl PATCH] ndctl: fix definition of conditional functionality
by Dan Williams
Resolve HAVE_NDCTL_ARS and HAVE_NDCTL_CLEAR_ERROR in libndctl.h from the
results of configure.
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
configure.ac | 21 +++++++++++++++++++++
lib/ndctl/libndctl.h.in | 20 +++++++++++---------
2 files changed, 32 insertions(+), 9 deletions(-)
rename lib/ndctl/{libndctl.h => libndctl.h.in} (98%)
diff --git a/configure.ac b/configure.ac
index 7e006641b197..1ab376885215 100644
--- a/configure.ac
+++ b/configure.ac
@@ -144,6 +144,27 @@ AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
)
AM_CONDITIONAL([ENABLE_CLEAR_ERROR], [test "x$enable_clear_err" = "xyes"])
+AC_CONFIG_COMMANDS([gen-libndctl.h],
+ [[
+ if test "x$enable_ars" = "xyes"; then
+ enable_ars=1
+ else
+ enable_ars=0
+ fi
+ if test "x$enable_clear_err" = "xyes"; then
+ enable_clear_err=1
+ else
+ enable_clear_err=0
+ fi
+ sed -e s/HAVE_NDCTL_ARS/$enable_ars/ \
+ -e s/HAVE_NDCTL_CLEAR_ERROR/$enable_clear_err/ \
+ < lib/ndctl/libndctl.h.in > lib/ndctl/libndctl.h
+ ]],
+ [[
+ enable_ars=$enable_ars
+ enable_clear_err=$enable_clear_err
+ ]])
+
AC_CHECK_HEADERS_ONCE([linux/version.h])
AC_CHECK_FUNCS([ \
diff --git a/lib/ndctl/libndctl.h b/lib/ndctl/libndctl.h.in
similarity index 98%
rename from lib/ndctl/libndctl.h
rename to lib/ndctl/libndctl.h.in
index 456288f66aee..b16f26dae535 100644
--- a/lib/ndctl/libndctl.h
+++ b/lib/ndctl/libndctl.h.in
@@ -148,7 +148,8 @@ int ndctl_dimm_disable(struct ndctl_dimm *dimm);
int ndctl_dimm_enable(struct ndctl_dimm *dimm);
struct ndctl_cmd;
-#ifdef HAVE_NDCTL_ARS
+#define HAS_ARS HAVE_NDCTL_ARS
+#if HAS_ARS == 1
struct ndctl_cmd *ndctl_bus_cmd_new_ars_cap(struct ndctl_bus *bus,
unsigned long long address, unsigned long long len);
struct ndctl_cmd *ndctl_bus_cmd_new_ars_start(struct ndctl_cmd *ars_cap, int type);
@@ -167,18 +168,18 @@ unsigned long long ndctl_cmd_ars_get_record_addr(struct ndctl_cmd *ars_stat,
unsigned long long ndctl_cmd_ars_get_record_len(struct ndctl_cmd *ars_stat,
unsigned int rec_index);
-#ifdef HAVE_NDCTL_CLEAR_ERROR
+#define HAS_CLEAR_ERROR HAVE_NDCTL_CLEAR_ERROR
+#if HAS_CLEAR_ERROR == 1
/*
- * clear_error requires ars_cap, so we require HAVE_NDCTL_ARS to export the
- * clear_error capability
+ * clear_error requires ars_cap, so we require HAS_CLEAR_ERROR to export
+ * the clear_error capability
*/
struct ndctl_cmd *ndctl_bus_cmd_new_clear_error(unsigned long long address,
unsigned long long len, struct ndctl_cmd *ars_cap);
unsigned long long ndctl_cmd_clear_error_get_cleared(
struct ndctl_cmd *clear_err);
-#define HAS_CLEAR_ERROR 1
#endif
-#else /* HAVE_NDCTL_ARS */
+#else /* HAS_ARS == 0 */
static inline struct ndctl_cmd *ndctl_bus_cmd_new_ars_cap(struct ndctl_bus *bus,
unsigned long long address, unsigned long long len)
{
@@ -202,7 +203,7 @@ static inline unsigned int ndctl_cmd_ars_cap_get_size(struct ndctl_cmd *ars_cap)
return 0;
}
-
+struct ndctl_range;
static inline int ndctl_cmd_ars_cap_get_range(struct ndctl_cmd *ars_cap,
struct ndctl_range *range)
{
@@ -230,9 +231,10 @@ static inline unsigned long long ndctl_cmd_ars_get_record_len(
{
return 0;
}
-#endif /* HAVE_NDCTL_ARS */
+#define HAS_CLEAR_ERROR 0
+#endif /* HAS_ARS */
-#ifndef HAS_CLEAR_ERROR
+#if HAS_CLEAR_ERROR == 0
static inline struct ndctl_cmd *ndctl_bus_cmd_new_clear_error(
unsigned long long address, unsigned long long len,
struct ndctl_cmd *ars_cap)
6 years, 3 months
[RFC] [PATCH 0/12] DAX page fault locking
by Jan Kara
Hello,
this is my first attempt at DAX page fault locking rewrite. It is mostly
just a dump of current status of my git tree so that people can see where I'm
going before weekend. It should be complete but so far it is only compile
tested.
The basic idea is that we use a bit in an exceptional radix tree entry as
a lock bit and use it similarly to how page lock is used for normal faults.
That way we fix races between hole instantiation and read faults of the
same index. For now I have disabled PMD faults since there the issues with
page fault locking are even worse. I think we will need something like
Matthew's multi-order radix tree to fix the races for PMD faults but I
want to have a look at those once locking for normal pages is working.
In the end I have decided to implement the bit locking directly in the DAX
code. Originally I was thinking we could provide something generic directly
in the radix tree code but the functions DAX needs are rather specific.
Maybe someone else will have a good idea how to distill some generally useful
functions out of what I've implemented for DAX but for now I didn't bother
with that.
Honza
6 years, 3 months
[PATCH 0/4] Remove un-needed 'major' registration when alloc_disk(0) is used.
by NeilBrown
When alloc_disk(0) is used, the ->major number is ignored and
irrelevant. Yet several drivers register a major number anyway.
This series of patches removes the pointless registrations. The pmem
driver also does this, but a patch has already been sent for that
driver.
Note that I am not in a position to test these beyond simple compile
testing.
Thanks,
NeilBrown
---
NeilBrown (4):
nvdimm/blk: don't allocate unused major device number
nvdimm/btt: don't allocate unused major device number
memstick: don't allocate unused major for ms_block
NVMe: don't allocate unused nvme_major
drivers/memstick/core/ms_block.c | 17 ++---------------
drivers/nvdimm/blk.c | 18 +-----------------
drivers/nvdimm/btt.c | 19 ++-----------------
drivers/nvme/host/core.c | 16 +---------------
4 files changed, 6 insertions(+), 64 deletions(-)
--
Signature
6 years, 3 months
[ndctl PATCH v2] ndctl: Grab kernel version from utsname()
by Johannes Thumshirn
Grab the kernel version used for tests dynamically via utsname() instead of
hardcoding the version of the build host.
Otherwise tests will be skipped if the build host had a too old kernel
version.
flodin:~ # ./ndctl test
__ndctl_test_attempt: skip test_libndctl:1950 requires: 4.2.0 current: 4.1.0
test-libndctl: SKIP
__ndctl_test_attempt: skip test_dpa_alloc:300 requires: 4.2.0 current: 4.1.0
test-dpa-alloc: SKIP
__ndctl_test_attempt: skip test_parent_uuid:230 requires: 4.3.0 current: 4.1.0
test-parent-uuid: SKIP
attempted: 3 skipped: 3
Signed-off-by: Johannes Thumshirn <jthumshirn(a)suse.de>
---
test/core.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
Changes to v1:
Use utsname.release which is obviously correct.
On my test system utsname.release = "4.4.4-default"
I can only imagine the sscanf() failing and on stack garbage in a, b, c being
greater than KERNEL_VERSION(4, 3, 0).
--- a/test/core.c
+++ b/test/core.c
@@ -1,4 +1,5 @@
#include <linux/version.h>
+#include <sys/utsname.h>
#include <stdlib.h>
#include <stdio.h>
#include <test.h>
@@ -11,6 +12,18 @@ struct ndctl_test {
int skip;
};
+static unsigned int get_system_kver(void)
+{
+ struct utsname utsname;
+ int a, b, c;
+
+ uname(&utsname);
+
+ sscanf(utsname.release, "%d.%d.%d", &a, &b, &c);
+
+ return KERNEL_VERSION(a,b,c);
+}
+
struct ndctl_test *ndctl_test_new(unsigned int kver)
{
struct ndctl_test *test = calloc(1, sizeof(*test));
@@ -19,7 +32,7 @@ struct ndctl_test *ndctl_test_new(unsign
return NULL;
if (!kver)
- test->kver = LINUX_VERSION_CODE;
+ test->kver = get_system_kver();
else
test->kver = kver;
6 years, 3 months
[ndctl PATCH] ndctl: Bail out with unkown command in case of an unknown command
by Johannes Thumshirn
handle_internal_command() only returns to main if it doesn't find the
specified command and does not set an errno.
When running an unknown command you get the following error message:
$ ./ndctl asd
Failed to run command 'asd': Success
Instead of a more appropriate:
$ ./ndctl asd
Unknown command: 'asd'
Signed-off-by: Johannes Thumshirn <jthumshirn(a)suse.de>
---
ndctl.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/ndctl.c b/ndctl.c
index ded1588..18b5dc6 100644
--- a/ndctl.c
+++ b/ndctl.c
@@ -159,8 +159,7 @@ int main(int argc, const char **argv)
goto out;
}
handle_internal_command(argc, argv);
- fprintf(stderr, "Failed to run command '%s': %s\n",
- argv[0], strerror(errno));
+ fprintf(stderr, "Unknown command: '%s'\n", argv[0]);
out:
return 1;
}
--
2.7.2
6 years, 3 months
[PATCH 0/3] Make pfn_t suitable for placing in the radix tree
by Matthew Wilcox
I did some experimenting with converting the DAX radix tree from storing
sector numbers to storing pfn_t. While we're not ready to do that
conversion yet, these pieces make sense to at least get reviewed now,
and maybe get upstream.
I think the first patch is worthwhile all by itself as a stepping stone to
making SG lists contain PFNs instead of pages.
Matthew Wilcox (3):
pfn_t: Change the encoding
pfn_t: Support for huge PFNs
pfn_t: New functions pfn_t_add and pfn_t_cmp
include/linux/pfn_t.h | 72 +++++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 61 insertions(+), 11 deletions(-)
--
2.7.0
6 years, 3 months
[RFC v6 0/8] nvdimm: Add an IOCTL pass thru for DSM calls
by Jerry Hoemann
The NVDIMM code in the kernel supports an IOCTL interface to user
space based upon the Intel Example DSM:
http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
This interface cannot be used by other NVDIMM DSMs that support
incompatible functions.
This patch set adds a generic "passthru" IOCTL interface which
is not tied to a particular DSM.
A new _IOC_NR ND_CMD_CALL_DSM == "10" is added for the pass thru call.
The new data structure nd_cmd_pkg serves as a wrapper for the
passthru calls. This wrapper supplies the data that the kernel
needs to make the _DSM call.
Unlike the definitions of the _DSM functions themselves, the nd_cmd_pkg
provides the calling information (input/output sizes) in an uniform
manner making the kernel marshaling of the arguments straight
forward.
This shifts the marshaling burden from the kernel to the user
space application while still permitting the kernel to internally
call _DSM functions.
The kernel functions __nd_ioctl and acpi_nfit_ctl were modified
to accommodate ND_CMD_CALL_DSM.
Changes in version 6:
---------------------
Built against
git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm.git
libnvdimm-pending
0. Patches "Clean-up access mode check" and "Fix security issue with DSM IOCTL"
already in above libnvdimm-pending. So omitted here.
1. Incorporated changes from Dan's RFC patch set
https://lists.01.org/pipermail/linux-nvdimm/2016-January/004049.html
2. Dan asked me to abstract out the DSM aspects from the ndm_cmd_dsmcall_pkg.
This became nd_cmd_pkg. UUIDs are no longer passed in from
user applications.
3. To accommodate multiple UUIDS, added table cmd_type_tbl which is used
to determine UUID for the acpi object by calling function 0 for
each UUID in table until success.
This table also provides a MASK field that the kernel can use
to exclude functions being called.
This table can be thought of a list of "acceptable" DSMs.
4. The cmd_type_tbl is also used by acpi_nfit_ctl to map the
external handle of calls to internal handle, UUID.
Note, code only validates that the requested type of call is one in
cmd_type_tbl, but it might not necessarily be the same found during
acpi_nfit_add_dimm. The ACPI SPEC appears to allow and firmware
does implement multiple UUID per object.
In the case where type is in table, but the UUID isn't supported
by the underlying firmware, firmware shall return an error when
called.
This allows for use of a secondary DSM on an object. This could
be considered a feature or a defect. This can be tightened
up if needed.
Changes in version 5:
---------------------
0. Fixed submit comment for drivers/acpi/utils.c.
Changes in version 4:
---------------------
0. Added patch to correct parameter type passed to acpi_evaluate_dsm
ACPI defines arguments rev and fun as 64 bit quantities and the ioctl
exports to user face rev and func. We want those to match the ACPI spec.
Also modified acpi_evaluate_dsm_typed and acpi_check dsm which had
similar issue.
1. nd_cmd_dsmcall_pkg rearrange a reserve and rounded up total size
to 16 byte boundary.
2. Created stand alone patch for the pre-existing security issue related
to "read only" IOCTL calls.
3. Added patch for increasing envelope size of IOCTL. Needed to
be able to read in the wrapper to know remaining size to copy in.
Note: in_env, out_env are statics sized based upon this change.
4. Moved copyin code to table driven nd_cmd_desc
Note, the last 40 lines or so of acpi_nfit_ctl will not return _DSM
data unless the size allocated in user space buffer equals
out_obj->buffer.length.
The semantic we want in the pass thru case is to return as much
of the _DSM data as the user space buffer would accommodate.
Hence, in acpi_nfit_ctl I have retained the line:
memcpy(pkg->dsm_buf + pkg->h.dsm_in,
out_obj->buffer.pointer,
min(pkg->h.dsm_size, pkg->h.dsm_out));
and the early return from the function.
Changes in version 3:
---------------------
1. Changed name ND_CMD_PASSTHRU to ND_CMD_CALL_DSM.
2. Value of ND_CMD_CALL_DSM is 10, not 100.
3. Changed name of nd_passthru_pkg to nd_cmd_dsmcall_pkg.
4. Removed separate functions for handling ND_CMD_CALL_DSM.
Moved functionality to __nd_ioctl and acpi_nfit_ctl proper.
The resultant code looks very different from prior versions.
5. BUGFIX: __nd_ioctl: Change the if read_only switch to use
_IOC_NR cmd (not ioctl_cmd) for better protection.
Do we want to make a stand alone patch for this issue?
Changes in version 2:
---------------------
1. Cleanup access mode check in nd_ioctl and nvdimm_ioctl.
2. Change name of ndn_pkg to nd_passthru_pkg
3. Adjust sizes in nd_passthru_pkg. DSM integers are 64 bit.
4. No new ioctl type, instead tunnel into the existing number space.
5. Push down one function level where determine ioctl cmd type.
6. re-work diagnostic print/dump message in pass-thru functions.
Jerry Hoemann (8):
ACPI / util: Fix acpi_evaluate_dsm() argument type
nvdimm: Add wrapper for IOCTL pass thru
nvdimm: Increase max envelope size for IOCTL
nvdimm: Add UUIDs
nvdimm: data structure changes for dsm calls
libnvdimm: advertise 'call_dsm' support
nvdimm: Add IOCTL pass thru functions
tools/testing/nvdimm: 'call_dsm' support
drivers/acpi/nfit.c | 149 +++++++++++++++++++++++++++++++++------
drivers/acpi/nfit.h | 5 ++
drivers/acpi/utils.c | 4 +-
drivers/nvdimm/bus.c | 45 +++++++++++-
drivers/nvdimm/core.c | 3 +
drivers/nvdimm/dimm_devs.c | 8 ++-
include/acpi/acpi_bus.h | 6 +-
include/linux/libnvdimm.h | 4 +-
include/uapi/linux/ndctl.h | 22 ++++++
tools/testing/nvdimm/test/nfit.c | 13 +++-
10 files changed, 226 insertions(+), 33 deletions(-)
--
1.7.11.3
6 years, 3 months
[PATCH] x86, pmem: use memcpy_mcsafe() for memcpy_from_pmem()
by Dan Williams
Update the definition of memcpy_from_pmem() to return 0 or -EIO on
error. Implement x86::arch_memcpy_from_pmem() with memcpy_mcsafe().
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Tony Luck <tony.luck(a)intel.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)amacapital.net>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
Andrew, now that all the pre-requisites for this patch are in -next
(tip/core/ras, tip/x86/asm, nvdimm/libnvdimm-for-next) may I ask you to
carry it in -mm?
Alternatively I can do an octopus merge and post a branch, but that
seems messy/risky for me to be merging 3 branches that are still subject
to a merge window disposition.
arch/x86/include/asm/pmem.h | 9 +++++++++
drivers/nvdimm/pmem.c | 4 ++--
include/linux/pmem.h | 14 ++++++++------
3 files changed, 19 insertions(+), 8 deletions(-)
diff --git a/arch/x86/include/asm/pmem.h b/arch/x86/include/asm/pmem.h
index bf8b35d2035a..4df3820535c6 100644
--- a/arch/x86/include/asm/pmem.h
+++ b/arch/x86/include/asm/pmem.h
@@ -47,6 +47,15 @@ static inline void arch_memcpy_to_pmem(void __pmem *dst, const void *src,
BUG();
}
+static inline int arch_memcpy_from_pmem(void *dst, const void __pmem *src,
+ size_t n)
+{
+ if (static_cpu_has(X86_FEATURE_MCE_RECOVERY))
+ return memcpy_mcsafe(dst, (void __force *) src, n) ? 0 : -EIO;
+ memcpy(dst, (void __force *) src, n);
+ return 0;
+}
+
/**
* arch_wmb_pmem - synchronize writes to persistent memory
*
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index adc387236fe7..2022d08c60ce 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -98,7 +98,7 @@ static int pmem_do_bvec(struct pmem_device *pmem, struct page *page,
if (unlikely(bad_pmem))
rc = -EIO;
else {
- memcpy_from_pmem(mem + off, pmem_addr, len);
+ rc = memcpy_from_pmem(mem + off, pmem_addr, len);
flush_dcache_page(page);
}
} else {
@@ -295,7 +295,7 @@ static int pmem_rw_bytes(struct nd_namespace_common *ndns,
if (unlikely(is_bad_pmem(&pmem->bb, offset / 512, sz_align)))
return -EIO;
- memcpy_from_pmem(buf, pmem->virt_addr + offset, size);
+ return memcpy_from_pmem(buf, pmem->virt_addr + offset, size);
} else {
memcpy_to_pmem(pmem->virt_addr + offset, buf, size);
wmb_pmem();
diff --git a/include/linux/pmem.h b/include/linux/pmem.h
index 3ec5309e29f3..c46c5cf6538e 100644
--- a/include/linux/pmem.h
+++ b/include/linux/pmem.h
@@ -66,14 +66,16 @@ static inline void arch_invalidate_pmem(void __pmem *addr, size_t size)
#endif
/*
- * Architectures that define ARCH_HAS_PMEM_API must provide
- * implementations for arch_memcpy_to_pmem(), arch_wmb_pmem(),
- * arch_copy_from_iter_pmem(), arch_clear_pmem(), arch_wb_cache_pmem()
- * and arch_has_wmb_pmem().
+ * memcpy_from_pmem - read from persistent memory with error handling
+ * @dst: destination buffer
+ * @src: source buffer
+ *
+ * Returns 0 on success -EIO on failure.
*/
-static inline void memcpy_from_pmem(void *dst, void __pmem const *src, size_t size)
+static inline int memcpy_from_pmem(void *dst, void __pmem const *src,
+ size_t size)
{
- memcpy(dst, (void __force const *) src, size);
+ return arch_memcpy_from_pmem(dst, src, size);
}
static inline bool arch_has_pmem_api(void)
6 years, 3 months
[RFC 0/2] New MAP_PMEM_AWARE mmap flag
by Boaz Harrosh
Hi all
Recent DAX code fixed the cl_flushing ie durability of mmap access
of direct persistent-memory from applications. It uses the radix-tree
per inode to track the indexes of a file that where page-faulted for
write. Then at m/fsync time it would cl_flush these pages and clean
the radix-tree, for the next round.
Sigh, that is life, for legacy applications this is the price we must
pay. But for NV aware applications like nvml library, we pay extra extra
price, even if we do not actually call m/fsync eventually. For these
applications these extra resources and especially the extra radix locking
per page-fault, costs a lot, like x3 a lot.
What we propose here is a way for those applications to enjoy the
boost and still not sacrifice any correctness of legacy applications.
Any concurrent access from legacy apps vs nv-aware apps even to the same
file / same page, will work correctly.
We do that by defining a new MMAP flag that is set by the nv-aware
app. this flag is carried by the VMA. In the dax code we bypass any
radix handling of the page if this flag is set. Those pages accessed *without*
this flag will be added to the radix-tree, those with will not.
At m/fsync time if the radix tree is then empty nothing will happen.
These are very simple none intrusive patches with minimum risk. (I think)
They are based on v4.5-rc5. If you need a rebase on any other tree please
say.
Please consider this new flag for those of us people who specialize in
persistent-memory setups and want to extract any possible mileage out
of our systems.
Also attached for reference a 3rd patch to the nvml library to use
the new flag. Which brings me to the issue of persistent_memcpy / persistent_flush.
Currently this library is for x86_64 only, using the movnt instructions. The gcc
compiler should have a per ARCH facility for durable memory accesses. So applications
can be portable across systems.
Please advise?
list of patches:
[RFC 1/2] mmap: Define a new MAP_PMEM_AWARE mmap flag
[RFC 2/2] REVIEWME: dax: Support MAP_PMEM_AWARE for optimal
Two Kernel patches
[RFC 1/1] util: add pmem-aware flag to mmap
A patch for the nvml library
Thanks
Boaz
6 years, 3 months