[PATCH v3 0/7] dax: I/O path enhancements
by Ross Zwisler
The goal of this series is to enhance the DAX I/O path so that all operations
that store data (I/O writes, zeroing blocks, punching holes, etc.) properly
synchronize the stores to media using the PMEM API. This ensures that the data
DAX is writing is durable on media before the operation completes.
Patches 1-4 are a few random cleanups.
Changes from v2:
- Introduce copy_from_iter_pmem() as part of the PMEM API. Keep the use of
__arch_wmb_cache_pmem() internal to the implmentation of the PMEM API. (Dan)
Ross Zwisler (7):
brd: make rd_size static
pmem, x86: move x86 PMEM API to new pmem.h header
pmem: remove layer when calling arch_has_wmb_pmem()
pmem, x86: clean up conditional pmem includes
pmem: add copy_from_iter_pmem() and clear_pmem()
dax: update I/O path to do proper PMEM flushing
pmem, dax: have direct_access use __pmem annotation
Documentation/filesystems/Locking | 3 +-
MAINTAINERS | 1 +
arch/powerpc/sysdev/axonram.c | 7 +-
arch/x86/include/asm/cacheflush.h | 71 ------------------
arch/x86/include/asm/pmem.h | 152 ++++++++++++++++++++++++++++++++++++++
drivers/block/brd.c | 6 +-
drivers/nvdimm/pmem.c | 4 +-
drivers/s390/block/dcssblk.c | 10 ++-
fs/block_dev.c | 2 +-
fs/dax.c | 68 ++++++++++-------
include/linux/blkdev.h | 8 +-
include/linux/pmem.h | 78 +++++++++++++++----
12 files changed, 282 insertions(+), 128 deletions(-)
create mode 100644 arch/x86/include/asm/pmem.h
--
2.1.0
6 years, 10 months
[PATCH v2 0/7] dax: I/O path enhancements
by Ross Zwisler
The goal of this series is to enhance the DAX I/O path so that all operations
that store data (I/O writes, zeroing blocks, punching holes, etc.) properly
synchronize the stores to media using the PMEM API. This ensures that the data
DAX is writing is durable on media before the operation completes.
Patches 1-4 are a few random cleanups.
Changes from v1:
- Removed patches to PMEM for the "read flush" _DSM flag. These are different
enough that they deserve their own series, and they have a separate baseline
which is currently moving (Dan's memremap() series).
- Added clear_pmem() PMEM API to zero DAX memory and flush it in one call.
(Dave)
- Open coded flushing in arch_wb_cache_pmem() instead of adding a generic
clwb_flush_range(). This allowed me to avoid having extra memory barriers
and instead rely completely on arch_wmb_pmem() for ordering. (Dave)
- Moved the arch implementation of the PMEM API into it's own arch header
(Christoph).
Ross Zwisler (7):
brd: make rd_size static
pmem, x86: move x86 PMEM API to new pmem.h header
pmem: remove layer when calling arch_has_wmb_pmem()
pmem, x86: clean up conditional pmem includes
pmem: add wb_cache_pmem() and clear_pmem()
dax: update I/O path to do proper PMEM flushing
pmem, dax: have direct_access use __pmem annotation
Documentation/filesystems/Locking | 3 +-
MAINTAINERS | 1 +
arch/powerpc/sysdev/axonram.c | 7 ++-
arch/x86/include/asm/cacheflush.h | 71 ----------------------
arch/x86/include/asm/pmem.h | 123 ++++++++++++++++++++++++++++++++++++++
drivers/block/brd.c | 6 +-
drivers/nvdimm/pmem.c | 4 +-
drivers/s390/block/dcssblk.c | 10 ++--
fs/block_dev.c | 2 +-
fs/dax.c | 73 ++++++++++++++--------
include/linux/blkdev.h | 8 +--
include/linux/pmem.h | 66 ++++++++++++++++----
12 files changed, 247 insertions(+), 127 deletions(-)
create mode 100644 arch/x86/include/asm/pmem.h
--
2.1.0
6 years, 10 months
[PATCH] nd_blk: add support for "read flush" DSM flag
by Ross Zwisler
Add support for the "read flush" _DSM flag, as outlined in the DSM spec:
http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
This flag tells the ND BLK driver that it needs to flush the cache lines
associated with the aperture after the aperture is moved but before any
new data is read. This ensures that any stale cache lines from the
previous contents of the aperture will be discarded from the processor
cache, and the new data will be read properly from the DIMM. We know
that the cache lines are clean and will be discarded without any
writeback because either a) the previous aperture operation was a read,
and we never modified the contents of the aperture, or b) the previous
aperture operation was a write and we must have written back the dirtied
contents of the aperture to the DIMM before the I/O was completed.
By supporting the "read flush" flag we can also change the ND BLK
aperture mapping from write-combining to write-back via memremap().
In order to add support for the "read flush" flag I needed to add a
generic routine to invalidate cache lines, mmio_flush_range(). This is
protected by the ARCH_HAS_MMIO_FLUSH Kconfig variable, and is currently
only supported on x86.
Signed-off-by: Ross Zwisler <ross.zwisler(a)linux.intel.com>
---
arch/x86/Kconfig | 1 +
arch/x86/include/asm/cacheflush.h | 2 ++
drivers/acpi/Kconfig | 1 +
drivers/acpi/nfit.c | 55 ++++++++++++++++++++++-----------------
drivers/acpi/nfit.h | 16 ++++++++----
lib/Kconfig | 3 +++
6 files changed, 49 insertions(+), 29 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b3a1a5d..5d4980e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -28,6 +28,7 @@ config X86
select ARCH_HAS_FAST_MULTIPLIER
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_PMEM_API
+ select ARCH_HAS_MMIO_FLUSH
select ARCH_HAS_SG_CHAIN
select ARCH_HAVE_NMI_SAFE_CMPXCHG
select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI
diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
index 9bf3ea1..7f3104f 100644
--- a/arch/x86/include/asm/cacheflush.h
+++ b/arch/x86/include/asm/cacheflush.h
@@ -89,6 +89,8 @@ int set_pages_rw(struct page *page, int numpages);
void clflush_cache_range(void *addr, unsigned int size);
+#define mmio_flush_range(addr, size) clflush_cache_range(addr, size)
+
#ifdef CONFIG_DEBUG_RODATA
void mark_rodata_ro(void);
extern const int rodata_test_data;
diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index 114cf48..4baeb85 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -410,6 +410,7 @@ config ACPI_NFIT
tristate "ACPI NVDIMM Firmware Interface Table (NFIT)"
depends on PHYS_ADDR_T_64BIT
depends on BLK_DEV
+ depends on ARCH_HAS_MMIO_FLUSH
select LIBNVDIMM
help
Infrastructure to probe ACPI 6 compliant platforms for
diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index 628a42c..816c778 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -1032,7 +1032,7 @@ static u64 read_blk_stat(struct nfit_blk *nfit_blk, unsigned int bw)
if (mmio->num_lines)
offset = to_interleave_offset(offset, mmio);
- return readq(mmio->base + offset);
+ return readq(mmio->addr.base + offset);
}
static void write_blk_ctl(struct nfit_blk *nfit_blk, unsigned int bw,
@@ -1057,11 +1057,11 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, unsigned int bw,
if (mmio->num_lines)
offset = to_interleave_offset(offset, mmio);
- writeq(cmd, mmio->base + offset);
+ writeq(cmd, mmio->addr.base + offset);
wmb_blk(nfit_blk);
if (nfit_blk->dimm_flags & ND_BLK_DCR_LATCH)
- readq(mmio->base + offset);
+ readq(mmio->addr.base + offset);
}
static int acpi_nfit_blk_single_io(struct nfit_blk *nfit_blk,
@@ -1093,11 +1093,16 @@ static int acpi_nfit_blk_single_io(struct nfit_blk *nfit_blk,
}
if (rw)
- memcpy_to_pmem(mmio->aperture + offset,
+ memcpy_to_pmem(mmio->addr.aperture + offset,
iobuf + copied, c);
- else
+ else {
+ if (nfit_blk->dimm_flags & ND_BLK_READ_FLUSH)
+ mmio_flush_range((void __force *)
+ mmio->addr.aperture + offset, c);
+
memcpy_from_pmem(iobuf + copied,
- mmio->aperture + offset, c);
+ mmio->addr.aperture + offset, c);
+ }
copied += c;
len -= c;
@@ -1144,7 +1149,10 @@ static void nfit_spa_mapping_release(struct kref *kref)
WARN_ON(!mutex_is_locked(&acpi_desc->spa_map_mutex));
dev_dbg(acpi_desc->dev, "%s: SPA%d\n", __func__, spa->range_index);
- iounmap(spa_map->iomem);
+ if (spa_map->type == SPA_MAP_APERTURE)
+ memunmap((void __force *)spa_map->addr.aperture);
+ else
+ iounmap(spa_map->addr.base);
release_mem_region(spa->address, spa->length);
list_del(&spa_map->list);
kfree(spa_map);
@@ -1190,7 +1198,7 @@ static void __iomem *__nfit_spa_map(struct acpi_nfit_desc *acpi_desc,
spa_map = find_spa_mapping(acpi_desc, spa);
if (spa_map) {
kref_get(&spa_map->kref);
- return spa_map->iomem;
+ return spa_map->addr.base;
}
spa_map = kzalloc(sizeof(*spa_map), GFP_KERNEL);
@@ -1206,20 +1214,19 @@ static void __iomem *__nfit_spa_map(struct acpi_nfit_desc *acpi_desc,
if (!res)
goto err_mem;
- if (type == SPA_MAP_APERTURE) {
- /*
- * TODO: memremap_pmem() support, but that requires cache
- * flushing when the aperture is moved.
- */
- spa_map->iomem = ioremap_wc(start, n);
- } else
- spa_map->iomem = ioremap_nocache(start, n);
+ spa_map->type = type;
+ if (type == SPA_MAP_APERTURE)
+ spa_map->addr.aperture = (void __pmem *)memremap(start, n,
+ MEMREMAP_WB);
+ else
+ spa_map->addr.base = ioremap_nocache(start, n);
+
- if (!spa_map->iomem)
+ if (!spa_map->addr.base)
goto err_map;
list_add_tail(&spa_map->list, &acpi_desc->spa_maps);
- return spa_map->iomem;
+ return spa_map->addr.base;
err_map:
release_mem_region(start, n);
@@ -1282,7 +1289,7 @@ static int acpi_nfit_blk_get_flags(struct nvdimm_bus_descriptor *nd_desc,
nfit_blk->dimm_flags = flags.flags;
else if (rc == -ENOTTY) {
/* fall back to a conservative default */
- nfit_blk->dimm_flags = ND_BLK_DCR_LATCH;
+ nfit_blk->dimm_flags = ND_BLK_DCR_LATCH | ND_BLK_READ_FLUSH;
rc = 0;
} else
rc = -ENXIO;
@@ -1322,9 +1329,9 @@ static int acpi_nfit_blk_region_enable(struct nvdimm_bus *nvdimm_bus,
/* map block aperture memory */
nfit_blk->bdw_offset = nfit_mem->bdw->offset;
mmio = &nfit_blk->mmio[BDW];
- mmio->base = nfit_spa_map(acpi_desc, nfit_mem->spa_bdw,
+ mmio->addr.base = nfit_spa_map(acpi_desc, nfit_mem->spa_bdw,
SPA_MAP_APERTURE);
- if (!mmio->base) {
+ if (!mmio->addr.base) {
dev_dbg(dev, "%s: %s failed to map bdw\n", __func__,
nvdimm_name(nvdimm));
return -ENOMEM;
@@ -1345,9 +1352,9 @@ static int acpi_nfit_blk_region_enable(struct nvdimm_bus *nvdimm_bus,
nfit_blk->cmd_offset = nfit_mem->dcr->command_offset;
nfit_blk->stat_offset = nfit_mem->dcr->status_offset;
mmio = &nfit_blk->mmio[DCR];
- mmio->base = nfit_spa_map(acpi_desc, nfit_mem->spa_dcr,
+ mmio->addr.base = nfit_spa_map(acpi_desc, nfit_mem->spa_dcr,
SPA_MAP_CONTROL);
- if (!mmio->base) {
+ if (!mmio->addr.base) {
dev_dbg(dev, "%s: %s failed to map dcr\n", __func__,
nvdimm_name(nvdimm));
return -ENOMEM;
@@ -1414,7 +1421,7 @@ static void acpi_nfit_blk_region_disable(struct nvdimm_bus *nvdimm_bus,
for (i = 0; i < 2; i++) {
struct nfit_blk_mmio *mmio = &nfit_blk->mmio[i];
- if (mmio->base)
+ if (mmio->addr.base)
nfit_spa_unmap(acpi_desc, mmio->spa);
}
nd_blk_region_set_provider_data(ndbr, NULL);
diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
index 79b6d83..fd7d41a 100644
--- a/drivers/acpi/nfit.h
+++ b/drivers/acpi/nfit.h
@@ -41,6 +41,7 @@ enum nfit_uuids {
};
enum {
+ ND_BLK_READ_FLUSH = 1,
ND_BLK_DCR_LATCH = 2,
};
@@ -116,12 +117,16 @@ enum nd_blk_mmio_selector {
DCR,
};
+struct nd_blk_addr {
+ union {
+ void __iomem *base;
+ void __pmem *aperture;
+ };
+};
+
struct nfit_blk {
struct nfit_blk_mmio {
- union {
- void __iomem *base;
- void __pmem *aperture;
- };
+ struct nd_blk_addr addr;
u64 size;
u64 base_offset;
u32 line_size;
@@ -148,7 +153,8 @@ struct nfit_spa_mapping {
struct acpi_nfit_system_address *spa;
struct list_head list;
struct kref kref;
- void __iomem *iomem;
+ enum spa_map_type type;
+ struct nd_blk_addr addr;
};
static inline struct nfit_spa_mapping *to_spa_map(struct kref *kref)
diff --git a/lib/Kconfig b/lib/Kconfig
index 3a2ef67..a938a39 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -531,4 +531,7 @@ config ARCH_HAS_SG_CHAIN
config ARCH_HAS_PMEM_API
bool
+config ARCH_HAS_MMIO_FLUSH
+ bool
+
endmenu
--
2.1.0
6 years, 10 months
[PATCH v5 0/8] memremap for 4.3
by Dan Williams
Changes since v4 [1]:
1/ Squashed the pmem memremap conversion into one patch and dropped the
boilerplate for looking a mapping-type by range. The architecture
now optionally defines ARCH_MEMREMAP_PMEM flags to override the
default. (Christoph)
2/ Fixed memunmap_pmem() to be devm based to match memremap_pmem()
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-August/001728.html
---
While developing the pmem driver we noticed that the __iomem annotation
on the return value from ioremap_cache() was being mishandled by several
callers. We also observed that all of the call sites expected to be
able to treat the return value from ioremap_cache() as normal
(non-__iomem) pointer to memory.
See also, the LWN write up: https://lwn.net/Articles/653585/
---
Christoph Hellwig (2):
devres: add devm_memremap
pmem: switch to devm_ allocations
Dan Williams (6):
mm: enhance region_is_ram() to region_intersects()
arch, drivers: don't include <asm/io.h> directly, use <linux/io.h> instead
cleanup IORESOURCE_CACHEABLE vs ioremap()
arch: introduce memremap()
visorbus: switch from ioremap_cache to memremap
pmem: convert to generic memremap
arch/arm/mach-clps711x/board-cdb89712.c | 2
arch/arm/mach-shmobile/pm-rcar.c | 2
arch/ia64/include/asm/io.h | 1
arch/ia64/kernel/cyclone.c | 2
arch/powerpc/kernel/pci_of_scan.c | 2
arch/sh/include/asm/io.h | 1
arch/sparc/kernel/pci.c | 3 -
arch/x86/include/asm/io.h | 6 -
arch/xtensa/include/asm/io.h | 1
drivers/isdn/icn/icn.h | 2
drivers/mtd/devices/slram.c | 2
drivers/mtd/nand/diskonchip.c | 2
drivers/mtd/onenand/generic.c | 2
drivers/nvdimm/pmem.c | 36 ++----
drivers/pci/probe.c | 3 -
drivers/pnp/manager.c | 2
drivers/scsi/aic94xx/aic94xx_init.c | 7 -
drivers/scsi/arcmsr/arcmsr_hba.c | 5 -
drivers/scsi/mvsas/mv_init.c | 15 +--
drivers/scsi/sun3x_esp.c | 2
drivers/staging/comedi/drivers/ii_pci20kc.c | 1
drivers/staging/unisys/visorbus/visorchannel.c | 16 ++-
drivers/staging/unisys/visorbus/visorchipset.c | 17 ++-
drivers/tty/serial/8250/8250_core.c | 2
drivers/video/fbdev/ocfb.c | 1
drivers/video/fbdev/s1d13xxxfb.c | 3 -
drivers/video/fbdev/stifb.c | 1
include/linux/io-mapping.h | 2
include/linux/io.h | 13 ++
include/linux/mm.h | 9 +-
include/linux/mtd/map.h | 2
include/linux/pmem.h | 36 ++----
include/video/vga.h | 2
kernel/Makefile | 2
kernel/memremap.c | 137 ++++++++++++++++++++++++
kernel/resource.c | 61 ++++++-----
lib/devres.c | 13 +-
lib/pci_iomap.c | 7 -
tools/testing/nvdimm/Kbuild | 4 -
tools/testing/nvdimm/test/iomap.c | 46 ++++++--
40 files changed, 309 insertions(+), 164 deletions(-)
create mode 100644 kernel/memremap.c
6 years, 10 months
RFC: prepare for struct scatterlist entries without page backing
by Christoph Hellwig
Dan Williams started to look into addressing I/O to and from
Persistent Memory in his series from June:
http://thread.gmane.org/gmane.linux.kernel.cross-arch/27944
I've started looking into DMA mapping of these SGLs specifically instead
of the map_pfn method in there. In addition to supporting NVDIMM backed
I/O I also suspect this would be highly useful for media drivers that
go through nasty hoops to be able to DMA from/to their ioremapped regions,
with vb2_dc_get_userptr in drivers/media/v4l2-core/videobuf2-dma-contig.c
being a prime example for the unsafe hacks currently used.
It turns out most DMA mapping implementation can handle SGLs without
page structures with some fairly simple mechanical work. Most of it
is just about consistently using sg_phys. For implementations that
need to flush caches we need a new helper that skips these cache
flushes if a entry doesn't have a kernel virtual address.
However the ccio (parisc) and sba_iommu (parisc & ia64) IOMMUs seem
to be operate mostly on virtual addresses. It's a fairly odd concept
that I don't fully grasp, so I'll need some help with those if we want
to bring this forward.
Additional this series skips ARM entirely for now. The reason is
that most arm implementations of the .map_sg operation just iterate
over all entries and call ->map_page for it, which means we'd need
to convert those to a ->map_pfn similar to Dan's previous approach.
6 years, 10 months
[ndctl PATCH] ndctl: add a unit test for parent_uuid verification
by Vishal Verma
BTT autodetect should correctly check for the UUID of the parent
namespace, and enable the BTT accordingly. This unit test checks for
both cases, when a BTT should be correctly autodetected and enabled
(matching parent_uuid), and when it shouldn't have been enabled (updated
parent_uuid for the namespace).
Signed-off-by: Vishal Verma <vishal.l.verma(a)intel.com>
---
Makefile.am | 9 +-
builtin-test.c | 6 ++
lib/test-parent-uuid.c | 282 +++++++++++++++++++++++++++++++++++++++++++++++++
test-parent-uuid.h | 4 +
4 files changed, 298 insertions(+), 3 deletions(-)
create mode 100644 lib/test-parent-uuid.c
create mode 100644 test-parent-uuid.h
diff --git a/Makefile.am b/Makefile.am
index 070192f..c4fa423 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -60,7 +60,7 @@ ndctl_SOURCES = ndctl.c \
util/wrapper.c
if ENABLE_TEST
-ndctl_SOURCES += lib/test-libndctl.c lib/test-dpa-alloc.c
+ndctl_SOURCES += lib/test-libndctl.c lib/test-dpa-alloc.c lib/test-parent-uuid.c
endif
if ENABLE_DESTRUCTIVE
@@ -99,8 +99,8 @@ pkgconfig_DATA = lib/libndctl.pc
EXTRA_DIST += lib/libndctl.pc.in
CLEANFILES += lib/libndctl.pc
-TESTS = lib/test-libndctl lib/test-dpa-alloc
-check_PROGRAMS = lib/test-libndctl lib/test-dpa-alloc
+TESTS = lib/test-libndctl lib/test-dpa-alloc lib/test-parent-uuid
+check_PROGRAMS = lib/test-libndctl lib/test-dpa-alloc lib/test-parent-uuid
if ENABLE_DESTRUCTIVE
TESTS += lib/test-blk-ns lib/test-pmem-ns
@@ -118,3 +118,6 @@ lib_test_pmem_ns_LDADD = lib/libndctl.la -lkmod
lib_test_dpa_alloc_SOURCES = lib/test-dpa-alloc.c
lib_test_dpa_alloc_LDADD = lib/libndctl.la -luuid -lkmod
+
+lib_test_parent_uuid_SOURCES = lib/test-parent-uuid.c
+lib_test_parent_uuid_LDADD = lib/libndctl.la -luuid -lkmod
diff --git a/builtin-test.c b/builtin-test.c
index b739924..73a24e0 100644
--- a/builtin-test.c
+++ b/builtin-test.c
@@ -2,6 +2,7 @@
#include <syslog.h>
#include <test-libndctl.h>
#include <test-dpa-alloc.h>
+#include <test-parent-uuid.h>
#include <util/parse-options.h>
int cmd_test(int argc, const char **argv)
@@ -32,5 +33,10 @@ int cmd_test(int argc, const char **argv)
rc = test_dpa_alloc(loglevel);
fprintf(stderr, "test-dpa-alloc: %s\n", rc ? "FAIL" : "PASS");
+ if (rc)
+ return rc;
+
+ rc = test_parent_uuid(loglevel);
+ fprintf(stderr, "test-parent-uuid: %s\n", rc ? "FAIL" : "PASS");
return rc;
}
diff --git a/lib/test-parent-uuid.c b/lib/test-parent-uuid.c
new file mode 100644
index 0000000..46e0060
--- /dev/null
+++ b/lib/test-parent-uuid.c
@@ -0,0 +1,282 @@
+/*
+ * blk_namespaces: tests functionality of multiple block namespaces
+ *
+ * Copyright (c) 2015, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU Lesser General Public License,
+ * version 2.1, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT ANY
+ * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+ * FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for
+ * more details.
+ */
+#include <stdio.h>
+#include <stddef.h>
+#include <stdlib.h>
+#include <string.h>
+#include <fcntl.h>
+#include <ctype.h>
+#include <errno.h>
+#include <unistd.h>
+#include <limits.h>
+#include <syslog.h>
+#include <libkmod.h>
+#include <uuid/uuid.h>
+#include <test-parent-uuid.h>
+
+#include <ndctl/libndctl.h>
+
+#ifdef HAVE_NDCTL_H
+#include <linux/ndctl.h>
+#else
+#include <ndctl.h>
+#endif
+
+
+static const char *NFIT_TEST_MODULE = "nfit_test";
+static const char *PROVIDER = "nfit_test.0";
+
+static struct ndctl_bus *get_bus_by_provider(struct ndctl_ctx *ctx,
+ const char *provider)
+{
+ struct ndctl_bus *bus;
+
+ ndctl_bus_foreach(ctx, bus)
+ if (strcmp(provider, ndctl_bus_get_provider(bus)) == 0)
+ return bus;
+
+ return NULL;
+}
+
+static struct ndctl_btt *get_idle_btt(struct ndctl_region *region)
+{
+ struct ndctl_btt *btt;
+
+ ndctl_btt_foreach(region, btt)
+ if (!ndctl_btt_is_enabled(btt)
+ && !ndctl_btt_is_configured(btt))
+ return btt;
+ return NULL;
+}
+
+static struct ndctl_namespace *create_blk_namespace(int region_fraction,
+ struct ndctl_region *region, unsigned long long req_size,
+ uuid_t uuid)
+{
+ struct ndctl_namespace *ndns, *seed_ns = NULL;
+ unsigned long long size;
+
+ ndctl_namespace_foreach(region, ndns)
+ if (ndctl_namespace_get_size(ndns) == 0) {
+ seed_ns = ndns;
+ break;
+ }
+
+ if (!seed_ns)
+ return NULL;
+
+ size = ndctl_region_get_size(region)/region_fraction;
+ if (req_size)
+ size = req_size;
+
+ if (ndctl_namespace_set_uuid(seed_ns, uuid) < 0)
+ return NULL;
+
+ if (ndctl_namespace_set_size(seed_ns, size) < 0)
+ return NULL;
+
+ if (ndctl_namespace_set_sector_size(seed_ns, 512) < 0)
+ return NULL;
+
+ if (ndctl_namespace_enable(seed_ns) < 0)
+ return NULL;
+
+ return seed_ns;
+}
+
+static int disable_blk_namespace(struct ndctl_namespace *ndns)
+{
+ if (ndctl_namespace_disable(ndns) < 0)
+ return -ENODEV;
+
+ if (ndctl_namespace_delete(ndns) < 0)
+ return -ENODEV;
+
+ return 0;
+}
+
+static struct ndctl_btt *check_valid_btt(struct ndctl_region *region,
+ struct ndctl_namespace *ndns, uuid_t btt_uuid)
+{
+ struct ndctl_btt *btt = NULL;
+ ndctl_btt_foreach(region, btt) {
+ struct ndctl_namespace *btt_ndns;
+ uuid_t uu;
+
+ ndctl_btt_get_uuid(btt, uu);
+ if (uuid_compare(uu, btt_uuid) != 0)
+ continue;
+ if (!ndctl_btt_is_enabled(btt))
+ continue;
+ btt_ndns = ndctl_btt_get_namespace(btt);
+ if (strcmp(ndctl_namespace_get_devname(btt_ndns),
+ ndctl_namespace_get_devname(ndns)) != 0)
+ continue;
+ return btt;
+ }
+ return NULL;
+}
+
+
+static int do_test(struct ndctl_ctx *ctx)
+{
+ int rc;
+ struct ndctl_bus *bus;
+ struct ndctl_btt *btt, *found = NULL;
+ struct ndctl_region *region, *blk_region;
+ struct ndctl_namespace *ndns;
+ unsigned long long ns_size = 18874368;
+ uuid_t uuid = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16};
+ uuid_t btt_uuid;
+
+ bus = get_bus_by_provider(ctx, PROVIDER);
+ if (!bus) {
+ fprintf(stderr, "failed to find NFIT-provider: %s\n", PROVIDER);
+ rc = -ENODEV;
+ goto err_nobus;
+ }
+
+ ndctl_region_foreach(bus, region)
+ if (ndctl_region_get_nstype(region) == ND_DEVICE_NAMESPACE_BLK) {
+ blk_region = region;
+ break;
+ }
+
+ if (!blk_region) {
+ fprintf(stderr, "failed to find block region\n");
+ rc = -ENODEV;
+ goto err_cleanup;
+ }
+
+ /* create a blk namespace */
+ ndns = create_blk_namespace(1, blk_region, ns_size, uuid);
+ if (!ndns) {
+ fprintf(stderr, "failed to create block namespace\n");
+ goto err_cleanup;
+ }
+
+ /* create a btt for this namespace */
+ uuid_generate(btt_uuid);
+ btt = get_idle_btt(region);
+ if (!btt)
+ return -ENXIO;
+
+ ndctl_btt_set_uuid(btt, btt_uuid);
+ ndctl_btt_set_sector_size(btt, 512);
+ ndctl_btt_set_namespace(btt, ndns);
+ ndctl_namespace_disable(ndns);
+ rc = ndctl_btt_enable(btt);
+ if (rc) {
+ fprintf(stderr, "failed to create btt 0\n");
+ goto err_cleanup;
+ }
+
+ /* disable the btt */
+ ndctl_btt_delete(btt);
+
+ /* re-create the namespace - this should auto-enable the btt */
+ disable_blk_namespace(ndns);
+ ndns = create_blk_namespace(1, blk_region, ns_size, uuid);
+ if (!ndns) {
+ fprintf(stderr, "failed to re-create block namespace\n");
+ goto err_cleanup;
+ }
+
+ /* Verify btt was auto-created */
+ found = check_valid_btt(blk_region, ndns, btt_uuid);
+ if (!found) {
+ rc = -ENXIO;
+ goto err_cleanup;
+ }
+ btt = found;
+
+ /*disable the btt and namespace again */
+ ndctl_btt_delete(btt);
+ disable_blk_namespace(ndns);
+
+ /* recreate the namespace with a different uuid */
+ uuid_generate(uuid);
+ ndns = create_blk_namespace(1, blk_region, ns_size, uuid);
+ if (!ndns) {
+ fprintf(stderr, "failed to re-create block namespace\n");
+ goto err_cleanup;
+ }
+
+ /* make sure there is no btt on this namespace */
+ found = check_valid_btt(blk_region, ndns, btt_uuid);
+ if (found) {
+ fprintf(stderr, "found a stale btt\n");
+ rc = -ENXIO;
+ goto err_cleanup;
+ }
+
+err_cleanup:
+ ndctl_btt_foreach(blk_region, btt)
+ ndctl_btt_delete(btt);
+
+ ndctl_namespace_foreach(blk_region, ndns)
+ if (ndctl_namespace_get_size(ndns) != 0)
+ disable_blk_namespace(ndns);
+ ndctl_region_foreach(bus, region)
+ ndctl_region_disable_invalidate(region);
+
+
+ err_nobus:
+ ndctl_unref(ctx);
+ return rc;
+}
+
+int test_parent_uuid(int loglevel)
+{
+ struct ndctl_ctx *ctx;
+ struct kmod_module *mod;
+ struct kmod_ctx *kmod_ctx;
+ int err, result = EXIT_FAILURE;
+
+ err = ndctl_new(&ctx);
+ if (err < 0)
+ exit(EXIT_FAILURE);
+
+ ndctl_set_log_priority(ctx, loglevel);
+
+ kmod_ctx = kmod_new(NULL, NULL);
+ if (!kmod_ctx)
+ goto err_kmod;
+
+ err = kmod_module_new_from_name(kmod_ctx, NFIT_TEST_MODULE, &mod);
+ if (err < 0)
+ goto err_module;
+
+ err = kmod_module_probe_insert_module(mod, KMOD_PROBE_APPLY_BLACKLIST,
+ NULL, NULL, NULL, NULL);
+ if (err < 0)
+ goto err_module;
+
+ err = do_test(ctx);
+ if (err == 0)
+ result = EXIT_SUCCESS;
+ kmod_module_remove_module(mod, 0);
+
+err_module:
+ kmod_unref(kmod_ctx);
+err_kmod:
+ ndctl_unref(ctx);
+ return result;
+}
+
+int __attribute__((weak)) main(int argc, char *argv[])
+{
+ return test_parent_uuid(LOG_DEBUG);
+}
diff --git a/test-parent-uuid.h b/test-parent-uuid.h
new file mode 100644
index 0000000..57a5ff7
--- /dev/null
+++ b/test-parent-uuid.h
@@ -0,0 +1,4 @@
+#ifndef __TEST_PARENT_UUID__
+#define __TEST_PARENT_UUID__
+int test_parent_uuid(int loglevel);
+#endif
--
2.4.3
6 years, 10 months
[PATCH v5 0/5] introduce __pfn_t for unmapped pfn I/O and DAX lifetime
by Dan Williams
Changes since v4 [1]:
1/ Allow up to PAGE_SHIFT bits in PFN_ flags. Previously the __pfn_t
value was a union with a 'struct page *', but now __pfn_t_to_page()
internally does a pfn_to_page() instead of type-punning the value.
(Linus, Matthew)
2/ Move the definition to include/linux/mm.h and squash the
kmap_atomic_pfn_t() definition into the same patch. (Christoph)
3/ Kill dax_get_pfn(). Now replaced with dax_map_bh() (Matthew)
4/ The scatterlist cleanup patches are moved to their own series being
carried by Christoph.
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-June/001094.html
---
We want persistent memory to have 4 modes of access:
1/ Block device: persistent memory treated as a ram disk (done)
2/ DAX: userspace mmap (done)
3/ Kernel "page-less". (this series)
4/ Kernel and userspace references to page-mapped persistent memory
(future series)
The "kernel 'page-less'" case leverages the fact that a 'struct page'
object is not necessarily required for describing a DMA transfer from a
device to a persistent memory address. A pfn will do, but code needs to
be careful to not perform a pfn_to_page() operation on unmapped
persistent memory. The __pfn_t type enforces that safety and
kmap_atomic_pfn_t() covers cases where the I/O stack needs to touch the
buffer on its way to the low-level-device-driver (i.e. current usages of
kmap_atomic() in the block-layer).
A subsequent patch series will add struct page coverage for persistent,
"device", memory.
We also use kmap_atomic_pfn_t() to solve races of pmem driver unbind vs
usage in DAX. rcu_read_lock() protects the driver from unbinding while a
mapping is held.
---
Christoph Hellwig (1):
mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h
Dan Williams (4):
allow mapping page-less memremaped areas into KVA
dax: drop size parameter to ->direct_access()
dax: fix mapping lifetime handling, convert to __pfn_t + kmap_atomic_pfn_t()
scatterlist: convert to __pfn_t
arch/arm/include/asm/memory.h | 6 --
arch/arm64/include/asm/memory.h | 6 --
arch/powerpc/platforms/Kconfig | 1
arch/powerpc/sysdev/axonram.c | 24 +++++--
arch/unicore32/include/asm/memory.h | 6 --
drivers/block/brd.c | 9 +--
drivers/nvdimm/Kconfig | 1
drivers/nvdimm/pmem.c | 24 ++++---
drivers/s390/block/Kconfig | 1
drivers/s390/block/dcssblk.c | 23 ++++++-
fs/Kconfig | 1
fs/block_dev.c | 4 +
fs/dax.c | 79 +++++++++++++++++-------
include/asm-generic/memory_model.h | 6 ++
include/linux/blkdev.h | 7 +-
include/linux/kmap_pfn.h | 31 +++++++++
include/linux/mm.h | 78 +++++++++++++++++++++++
include/linux/scatterlist.h | 111 +++++++++++++++++++++++----------
mm/Kconfig | 3 +
mm/Makefile | 1
mm/kmap_pfn.c | 117 +++++++++++++++++++++++++++++++++++
samples/kfifo/dma-example.c | 8 +-
22 files changed, 435 insertions(+), 112 deletions(-)
create mode 100644 include/linux/kmap_pfn.h
create mode 100644 mm/kmap_pfn.c
6 years, 10 months
[PATCH v4 00/10] memremap for 4.3
by Dan Williams
Changes since v3: [1]
1/ Include devm_memremap() support (Christoph)
2/ Rebase the series to defer the removal of ioremap_cache() and drop
any of the ioremap_cache()-to-memremap() conversions that have yet to be
acked by the appropriate maintainer. This avoids any potential for
bisection breakage during the 4.3 merge and the cleanup can be done for
4.4. (Christoph)
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-July/001649.html
---
While developing the pmem driver we noticed that the __iomem annotation
on the return value from ioremap_cache() was being mishandled by several
callers. We also observed that all of the call sites expected to be
able to treat the return value from ioremap_cache() as normal
(non-__iomem) pointer to memory.
See also, the LWN write up: https://lwn.net/Articles/653585/
This has passed a 0day run and will appear in libnvdimm-for-next
shortly.
---
Christoph Hellwig (2):
devres: add devm_memremap
pmem: switch to devm_ allocations
Dan Williams (8):
mm: enhance region_is_ram() to region_intersects()
arch, drivers: don't include <asm/io.h> directly, use <linux/io.h> instead
cleanup IORESOURCE_CACHEABLE vs ioremap()
arch: introduce memremap()
visorbus: switch from ioremap_cache to memremap
libnvdimm, pmem: push call to ioremap_cache out of line
pmem: switch from ioremap_wt to memremap
pmem: convert to generic memremap
arch/arm/mach-clps711x/board-cdb89712.c | 2
arch/arm/mach-shmobile/pm-rcar.c | 2
arch/ia64/include/asm/io.h | 1
arch/ia64/kernel/cyclone.c | 2
arch/powerpc/kernel/pci_of_scan.c | 2
arch/sh/include/asm/io.h | 1
arch/sparc/kernel/pci.c | 3 -
arch/x86/include/asm/io.h | 7 -
arch/x86/mm/ioremap.c | 10 ++
arch/xtensa/include/asm/io.h | 1
drivers/isdn/icn/icn.h | 2
drivers/mtd/devices/slram.c | 2
drivers/mtd/nand/diskonchip.c | 2
drivers/mtd/onenand/generic.c | 2
drivers/nvdimm/pmem.c | 36 ++----
drivers/pci/probe.c | 3 -
drivers/pnp/manager.c | 2
drivers/scsi/aic94xx/aic94xx_init.c | 7 -
drivers/scsi/arcmsr/arcmsr_hba.c | 5 -
drivers/scsi/mvsas/mv_init.c | 15 +--
drivers/scsi/sun3x_esp.c | 2
drivers/staging/comedi/drivers/ii_pci20kc.c | 1
drivers/staging/unisys/visorbus/visorchannel.c | 16 ++-
drivers/staging/unisys/visorbus/visorchipset.c | 17 ++-
drivers/tty/serial/8250/8250_core.c | 2
drivers/video/fbdev/ocfb.c | 1
drivers/video/fbdev/s1d13xxxfb.c | 3 -
drivers/video/fbdev/stifb.c | 1
include/linux/io-mapping.h | 2
include/linux/io.h | 13 ++
include/linux/mm.h | 9 +-
include/linux/mtd/map.h | 2
include/linux/pmem.h | 30 +++--
include/video/vga.h | 2
kernel/Makefile | 2
kernel/memremap.c | 138 ++++++++++++++++++++++++
kernel/resource.c | 61 ++++++-----
lib/devres.c | 13 +-
lib/pci_iomap.c | 7 -
tools/testing/nvdimm/Kbuild | 4 -
tools/testing/nvdimm/test/iomap.c | 46 ++++++--
41 files changed, 323 insertions(+), 156 deletions(-)
create mode 100644 kernel/memremap.c
6 years, 10 months
[PATCH v4 0/9] introduce __pfn_t, evacuate struct page from sgls
by Dan Williams
Introduce __pfn_t which:
1/ Allows kernel internal DAX mappings to adhere to the lifetime of the
the underlying block device. In general, it enables a mechanism to
allow any device driver to advertise "device memory" (CONFIG_DEV_PFN)
to other parts of the kernel.
2/ Replaces usage of struct page in struct scatterlist. A scatterlist
need only carry enough information to generate a dma address, and
removing struct page from scatterlists is a precursor to allowing DMA to
device memory. Some dma mapping implementations are not ready for a
scatterlist-pfn to reference unampped device memory, those
implementations are disabled by CONFIG_DEV_PFN=y.
Changes since v4 [1]:
1/ Drop the bio_vec conversion of struct page to __pfn_t for now. Wait
until there's a hierarchical block driver that would make use of direct
dma to pmem. (Christoph)
2/ Reorder the patch set to put the dax fixes first.
3/ Unconditionally convert struct scatterlist to use a pfn. Strictly
speaking the scatterlist conversion could also be deferred until we have
a driver that attempts dma to pmem, but struct scatterlist really has no
valid reason to carry a struct page. (Christoph)
4/ Rebased on block.git/for-next
---
Dan Williams (9):
introduce __pfn_t for scatterlists and pmem
x86: support kmap_atomic_pfn_t() for persistent memory
dax: drop size parameter to ->direct_access()
dax: fix mapping lifetime handling, convert to __pfn_t + kmap_atomic_pfn_t()
dma-mapping: allow archs to optionally specify a ->map_pfn() operation
scatterlist: use sg_phys()
scatterlist: cleanup sg_chain() and sg_unmark_end()
scatterlist: convert to __pfn_t
x86: convert dma_map_ops to support mapping a __pfn_t.
arch/Kconfig | 6 +
arch/arm/mm/dma-mapping.c | 2
arch/microblaze/kernel/dma.c | 2
arch/powerpc/sysdev/axonram.c | 26 ++++--
arch/x86/Kconfig | 7 ++
arch/x86/kernel/amd_gart_64.c | 22 ++++-
arch/x86/kernel/pci-nommu.c | 22 ++++-
arch/x86/kernel/pci-swiotlb.c | 4 +
arch/x86/pci/sta2x11-fixup.c | 4 +
block/blk-merge.c | 2
drivers/block/brd.c | 9 --
drivers/block/pmem.c | 16 +++
drivers/crypto/omap-sham.c | 2
drivers/dma/imx-dma.c | 8 --
drivers/dma/ste_dma40.c | 5 -
drivers/iommu/amd_iommu.c | 21 +++--
drivers/iommu/intel-iommu.c | 26 ++++--
drivers/iommu/iommu.c | 2
drivers/mmc/card/queue.c | 4 -
drivers/pci/Kconfig | 2
drivers/s390/block/dcssblk.c | 26 +++++-
drivers/staging/android/ion/ion_chunk_heap.c | 4 -
fs/block_dev.c | 4 -
fs/dax.c | 62 +++++++++++--
include/asm-generic/dma-mapping-common.h | 30 +++++++
include/asm-generic/memory_model.h | 1
include/asm-generic/pfn.h | 120 ++++++++++++++++++++++++++
include/crypto/scatterwalk.h | 9 --
include/linux/blkdev.h | 7 +-
include/linux/dma-debug.h | 23 ++++-
include/linux/dma-mapping.h | 8 ++
include/linux/highmem.h | 23 +++++
include/linux/mm.h | 1
include/linux/scatterlist.h | 103 ++++++++++++++++------
include/linux/swiotlb.h | 4 +
init/Kconfig | 13 +++
lib/dma-debug.c | 10 +-
lib/swiotlb.c | 20 +++-
mm/Makefile | 1
mm/pfn.c | 98 +++++++++++++++++++++
samples/kfifo/dma-example.c | 8 +-
41 files changed, 626 insertions(+), 141 deletions(-)
create mode 100644 include/asm-generic/pfn.h
create mode 100644 mm/pfn.c
6 years, 10 months