[RFC v3 00/19] kunit: introduce KUnit, the Linux kernel unit testing framework
by Brendan Higgins
This patch set proposes KUnit, a lightweight unit testing and mocking
framework for the Linux kernel.
Unlike Autotest and kselftest, KUnit is a true unit testing framework;
it does not require installing the kernel on a test machine or in a VM
and does not require tests to be written in userspace running on a host
kernel. Additionally, KUnit is fast: From invocation to completion KUnit
can run several dozen tests in under a second. Currently, the entire
KUnit test suite for KUnit runs in under a second from the initial
invocation (build time excluded).
KUnit is heavily inspired by JUnit, Python's unittest.mock, and
Googletest/Googlemock for C++. KUnit provides facilities for defining
unit test cases, grouping related test cases into test suites, providing
common infrastructure for running tests, mocking, spying, and much more.
## What's so special about unit testing?
A unit test is supposed to test a single unit of code in isolation,
hence the name. There should be no dependencies outside the control of
the test; this means no external dependencies, which makes tests orders
of magnitudes faster. Likewise, since there are no external dependencies,
there are no hoops to jump through to run the tests. Additionally, this
makes unit tests deterministic: a failing unit test always indicates a
problem. Finally, because unit tests necessarily have finer granularity,
they are able to test all code paths easily solving the classic problem
of difficulty in exercising error handling code.
## Is KUnit trying to replace other testing frameworks for the kernel?
No. Most existing tests for the Linux kernel are end-to-end tests, which
have their place. A well tested system has lots of unit tests, a
reasonable number of integration tests, and some end-to-end tests. KUnit
is just trying to address the unit test space which is currently not
being addressed.
## More information on KUnit
There is a bunch of documentation near the end of this patch set that
describes how to use KUnit and best practices for writing unit tests.
For convenience I am hosting the compiled docs here:
https://google.github.io/kunit-docs/third_party/kernel/docs/
Additionally for convenience, I have applied these patches to a branch:
https://kunit.googlesource.com/linux/+/kunit/rfc/4.19/v3
The repo may be cloned with:
git clone https://kunit.googlesource.com/linux
This patchset is on the kunit/rfc/4.19/v3 branch.
## Changes Since Last Version
- Changed namespace prefix from `test_*` to `kunit_*` as requested by
Shuah.
- Started converting/cleaning up the device tree unittest to use KUnit.
- Started adding KUnit expectations with custom messages.
--
2.20.0.rc0.387.gc7a69e6b6c-goog
1 year, 6 months
[mm PATCH v6 0/7] Deferred page init improvements
by Alexander Duyck
This patchset is essentially a refactor of the page initialization logic
that is meant to provide for better code reuse while providing a
significant improvement in deferred page initialization performance.
In my testing on an x86_64 system with 384GB of RAM and 3TB of persistent
memory per node I have seen the following. In the case of regular memory
initialization the deferred init time was decreased from 3.75s to 1.06s on
average. For the persistent memory the initialization time dropped from
24.17s to 19.12s on average. This amounts to a 253% improvement for the
deferred memory initialization performance, and a 26% improvement in the
persistent memory initialization performance.
I have called out the improvement observed with each patch.
Note: This patch set is meant as a replacment for the v5 set that is already
in the MM tree.
I had considered just doing incremental changes but Pavel at the time
had suggested I submit it as a whole set, however that was almost 3
weeks ago so if incremental changes are preferred let me know and
I can submit the changes as incremental updates.
I appologize for the delay in submitting this follow-on set. I had been
trying to address the DAX PageReserved bit issue at the same time but
that is taking more time than I anticipated so I decided to push this
before the code sits too much longer.
Commit bf416078f1d83 ("mm/page_alloc.c: memory hotplug: free pages as
higher order") causes issues with the revert of patch 7. It was
necessary to replace all instances of __free_pages_boot_core with
__free_pages_core.
v1->v2:
Fixed build issue on PowerPC due to page struct size being 56
Added new patch that removed __SetPageReserved call for hotplug
v2->v3:
Rebased on latest linux-next
Removed patch that had removed __SetPageReserved call from init
Added patch that folded __SetPageReserved into set_page_links
Tweaked __init_pageblock to use start_pfn to get section_nr instead of pfn
v3->v4:
Updated patch description and comments for mm_zero_struct_page patch
Replaced "default" with "case 64"
Removed #ifndef mm_zero_struct_page
Fixed typo in comment that ommited "_from" in kerneldoc for iterator
Added Reviewed-by for patches reviewed by Pavel
Added Acked-by from Michal Hocko
Added deferred init times for patches that affect init performance
Swapped patches 5 & 6, pulled some code/comments from 4 into 5
v4->v5:
Updated Acks/Reviewed-by
Rebased on latest linux-next
Split core bits of zone iterator patch from MAX_ORDER_NR_PAGES init
v5->v6:
Rebased on linux-next with previous v5 reverted
Drop the "This patch" or "This change" from patch desriptions.
Cleaned up patch descriptions for patches 3 & 4
Fixed kerneldoc for __next_mem_pfn_range_in_zone
Updated several Reviewed-by, and incorporated suggestions from Pavel
Added __init_single_page_nolru to patch 5 to consolidate code
Refactored iterator in patch 7 and fixed several issues
---
Alexander Duyck (7):
mm: Use mm_zero_struct_page from SPARC on all 64b architectures
mm: Drop meminit_pfn_in_nid as it is redundant
mm: Implement new zone specific memblock iterator
mm: Initialize MAX_ORDER_NR_PAGES at a time instead of doing larger sections
mm: Move hot-plug specific memory init into separate functions and optimize
mm: Add reserved flag setting to set_page_links
mm: Use common iterator for deferred_init_pages and deferred_free_pages
arch/sparc/include/asm/pgtable_64.h | 30 --
include/linux/memblock.h | 41 +++
include/linux/mm.h | 50 +++
mm/memblock.c | 64 ++++
mm/page_alloc.c | 571 +++++++++++++++++++++--------------
5 files changed, 498 insertions(+), 258 deletions(-)
--
2 years
[PATCH v3 1/2] nfit, mce: only handle uncorrectable machine checks
by Vishal Verma
The mce handler for 'nfit' devices is called for memory errors on a
Non-Volatile DIMM, and adds the error location to a 'badblocks' list.
This list is used by the various NVDIMM drivers to avoid consuming known
poison locations during IO.
The mce handler gets called for both corrected and uncorrectable errors.
Until now, both kinds of errors have been added to the badblocks list.
However, corrected memory errors indicate that the problem has already
been fixed by hardware, and the resulting interrupt is merely a
notification to Linux. As far as future accesses to that location are
concerned, it is perfectly fine to use, and thus doesn't need to be
included in the above badblocks list.
Add a check in the nfit mce handler to filter out corrected mce events,
and only process uncorrectable errors.
Reported-by: Omar Avelar <omar.avelar(a)intel.com>
Fixes: 6839a6d96f4e ("nfit: do an ARS scrub on hitting a latent media error")
Cc: stable(a)vger.kernel.org
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Tony Luck <tony.luck(a)intel.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Signed-off-by: Vishal Verma <vishal.l.verma(a)intel.com>
---
arch/x86/include/asm/mce.h | 1 +
arch/x86/kernel/cpu/mcheck/mce.c | 3 ++-
drivers/acpi/nfit/mce.c | 4 ++--
3 files changed, 5 insertions(+), 3 deletions(-)
v3: Unchanged from v2
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 3a17107594c8..3111b3cee2ee 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -216,6 +216,7 @@ static inline int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *s
int mce_available(struct cpuinfo_x86 *c);
bool mce_is_memory_error(struct mce *m);
+bool mce_is_correctable(struct mce *m);
DECLARE_PER_CPU(unsigned, mce_exception_count);
DECLARE_PER_CPU(unsigned, mce_poll_count);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 953b3ce92dcc..27015948bc41 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -534,7 +534,7 @@ bool mce_is_memory_error(struct mce *m)
}
EXPORT_SYMBOL_GPL(mce_is_memory_error);
-static bool mce_is_correctable(struct mce *m)
+bool mce_is_correctable(struct mce *m)
{
if (m->cpuvendor == X86_VENDOR_AMD && m->status & MCI_STATUS_DEFERRED)
return false;
@@ -544,6 +544,7 @@ static bool mce_is_correctable(struct mce *m)
return true;
}
+EXPORT_SYMBOL_GPL(mce_is_correctable);
static bool cec_add_mce(struct mce *m)
{
diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
index e9626bf6ca29..7a51707f87e9 100644
--- a/drivers/acpi/nfit/mce.c
+++ b/drivers/acpi/nfit/mce.c
@@ -25,8 +25,8 @@ static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
struct acpi_nfit_desc *acpi_desc;
struct nfit_spa *nfit_spa;
- /* We only care about memory errors */
- if (!mce_is_memory_error(mce))
+ /* We only care about uncorrectable memory errors */
+ if (!mce_is_memory_error(mce) || mce_is_correctable(mce))
return NOTIFY_DONE;
/*
--
2.17.1
2 years, 1 month
[PATCH 0/8] Introduce a device-dax bus-based device-model
by Dan Williams
Prompted by the review of "[PATCH 0/9] Allow persistent memory to be
used like normal RAM" [1] introduce a new bus / device-driver-model
for device-dax.
Currently device-dax instances result from attaching an nvdimm namespace
device to the dax_pmem driver. These instances are registered with the
/sys/class/dax sub-system. With the expectation that platforms will
describe performance differentiated memory [2] for ranges other than
persistent memory (pmem) a new device-model is needed.
Arrange for dax_pmem to be one of potentially several drivers that know
how to discover differentiated memory and register a device instance on
the dax bus. The expectation is that, by default, this device is
consumed by the typical device-dax driver that will expose the range
through a /dev/daxX.Y character device. Optionally other drivers can
consume the dax device instance. For example, the kmem driver [1] can
attach to device-dax device instance to hot-add the related memory range
to the core page-allocator.
Going forward, provider drivers outside of dax_pmem can be created to
register other memories with unique performance properties.
Since /sys/class/dax is a released ABI, a compat driver is provided so
that distros can opt-in to the new bus based ABI. The /sys/class/dax
interface is then deprecated and scheduled to be removed.
[1]: https://lkml.org/lkml/2018/10/23/9
[2]: Section 5.2.27 Heterogeneous Memory Attribute Table (HMAT)
http://www.uefi.org/sites/default/files/resources/ACPI%206_2_A_Sept29.pdf
---
Dan Williams (8):
device-dax: Kill dax_region ida
device-dax: Kill dax_region base
device-dax: Remove multi-resource infrastructure
device-dax: Start defining a dax bus model
device-dax: Introduce bus + driver model
device-dax: Move resource pinning+mapping into the common driver
device-dax: Add support for a dax override driver
device-dax: Add /sys/class/dax backwards compatibility
Documentation/ABI/obsolete/sysfs-class-dax | 22 +
drivers/dax/Kconfig | 12 +
drivers/dax/Makefile | 5
drivers/dax/bus.c | 449 ++++++++++++++++++++++++++++
drivers/dax/bus.h | 60 ++++
drivers/dax/dax-private.h | 30 +-
drivers/dax/dax.h | 18 -
drivers/dax/device-dax.h | 25 --
drivers/dax/device.c | 365 +++++------------------
drivers/dax/pmem.c | 161 ----------
drivers/dax/pmem/Makefile | 7
drivers/dax/pmem/compat.c | 73 +++++
drivers/dax/pmem/core.c | 69 ++++
drivers/dax/pmem/pmem.c | 40 ++
drivers/dax/super.c | 41 ++-
tools/testing/nvdimm/Kbuild | 7
tools/testing/nvdimm/dax-dev.c | 16 -
17 files changed, 880 insertions(+), 520 deletions(-)
create mode 100644 Documentation/ABI/obsolete/sysfs-class-dax
create mode 100644 drivers/dax/bus.c
create mode 100644 drivers/dax/bus.h
delete mode 100644 drivers/dax/dax.h
delete mode 100644 drivers/dax/device-dax.h
delete mode 100644 drivers/dax/pmem.c
create mode 100644 drivers/dax/pmem/Makefile
create mode 100644 drivers/dax/pmem/compat.c
create mode 100644 drivers/dax/pmem/core.c
create mode 100644 drivers/dax/pmem/pmem.c
2 years, 2 months
[PATCH 1/4] libndctl: Use the supported_alignment attribute
by Oliver O'Halloran
Newer kernels provide the "supported_alignments" sysfs attribute that
indicates what alignments can be used with a PFN or DAX namespace. This
patch adds the plumbing inside of libndctl to allow users to query this
information through using:
ndctl_{dax|pfn}_get_supported_alignment(), and
ndctl_{dax|pfn}_get_num_alignments()
Signed-off-by: Oliver O'Halloran <oohall(a)gmail.com>
---
ndctl/lib/libndctl.c | 40 ++++++++++++++++++++++++++++++++++++++++
ndctl/lib/libndctl.sym | 7 +++++++
ndctl/libndctl.h | 6 ++++++
3 files changed, 53 insertions(+)
diff --git a/ndctl/lib/libndctl.c b/ndctl/lib/libndctl.c
index 0c3a35e5bcc9..4d0e58a22953 100644
--- a/ndctl/lib/libndctl.c
+++ b/ndctl/lib/libndctl.c
@@ -31,6 +31,7 @@
#include <ccan/build_assert/build_assert.h>
#include <ndctl.h>
+#include <util/size.h>
#include <util/sysfs.h>
#include <ndctl/libndctl.h>
#include <ndctl/namespace.h>
@@ -237,6 +238,7 @@ struct ndctl_pfn {
int buf_len;
uuid_t uuid;
int id, generation;
+ struct ndctl_lbasize alignments;
};
struct ndctl_dax {
@@ -4781,6 +4783,18 @@ static void *__add_pfn(struct ndctl_pfn *pfn, const char *pfn_base)
else
pfn->size = strtoull(buf, NULL, 0);
+ /*
+ * If the kernel doesn't provide the supported_alignments sysfs
+ * attribute then it's safe to assume that we are running on x86
+ * which will always support 2MB and 4KB alignments.
+ */
+ sprintf(path, "%s/supported_alignments", pfn_base);
+ if (sysfs_read_attr(ctx, path, buf) < 0)
+ sprintf(buf, "%d %d", SZ_4K, SZ_2M);
+
+ if (parse_lbasize_supported(ctx, pfn_base, buf, &pfn->alignments) < 0)
+ goto err_read;
+
free(path);
return pfn;
@@ -5015,6 +5029,22 @@ NDCTL_EXPORT int ndctl_pfn_set_align(struct ndctl_pfn *pfn, unsigned long align)
return 0;
}
+NDCTL_EXPORT unsigned int ndctl_pfn_get_num_alignments(struct ndctl_pfn *pfn)
+{
+ return pfn->alignments.num;
+}
+
+NDCTL_EXPORT int ndctl_pfn_get_supported_alignment(struct ndctl_pfn *pfn, int i)
+{
+ if (pfn->alignments.num == 0)
+ return 0;
+
+ if (i < 0 || i > pfn->alignments.num)
+ return UINT_MAX;
+ else
+ return pfn->alignments.supported[i];
+}
+
NDCTL_EXPORT int ndctl_pfn_set_namespace(struct ndctl_pfn *pfn,
struct ndctl_namespace *ndns)
{
@@ -5237,6 +5267,16 @@ NDCTL_EXPORT unsigned long ndctl_dax_get_align(struct ndctl_dax *dax)
return ndctl_pfn_get_align(&dax->pfn);
}
+NDCTL_EXPORT unsigned int ndctl_dax_get_num_alignments(struct ndctl_dax *dax)
+{
+ return ndctl_pfn_get_num_alignments(&dax->pfn);
+}
+
+NDCTL_EXPORT int ndctl_dax_get_supported_alignment(struct ndctl_dax *dax, int i)
+{
+ return ndctl_pfn_get_supported_alignment(&dax->pfn, i);
+}
+
NDCTL_EXPORT int ndctl_dax_has_align(struct ndctl_dax *dax)
{
return ndctl_pfn_has_align(&dax->pfn);
diff --git a/ndctl/lib/libndctl.sym b/ndctl/lib/libndctl.sym
index 6c4c8b4dfb8e..0103c1b71a1d 100644
--- a/ndctl/lib/libndctl.sym
+++ b/ndctl/lib/libndctl.sym
@@ -385,3 +385,10 @@ global:
ndctl_namespace_get_next_badblock;
ndctl_dimm_get_dirty_shutdown;
} LIBNDCTL_17;
+
+LIBNDCTL_19 {
+ ndctl_pfn_get_supported_alignment;
+ ndctl_pfn_get_num_alignments;
+ ndctl_dax_get_supported_alignment;
+ ndctl_dax_get_num_alignments;
+} LIBNDCTL_18;
diff --git a/ndctl/libndctl.h b/ndctl/libndctl.h
index 62cef9e82da3..4ff25c0a4783 100644
--- a/ndctl/libndctl.h
+++ b/ndctl/libndctl.h
@@ -681,6 +681,12 @@ enum ND_FW_STATUS ndctl_cmd_fw_xlat_firmware_status(struct ndctl_cmd *cmd);
struct ndctl_cmd *ndctl_dimm_cmd_new_ack_shutdown_count(struct ndctl_dimm *dimm);
int ndctl_dimm_fw_update_supported(struct ndctl_dimm *dimm);
+unsigned int ndctl_pfn_get_num_alignments(struct ndctl_pfn *pfn);
+int ndctl_pfn_get_supported_alignment(struct ndctl_pfn *pfn, int i);
+
+unsigned int ndctl_dax_get_num_alignments(struct ndctl_dax *dax);
+int ndctl_dax_get_supported_alignment(struct ndctl_dax *dax, int i);
+
#ifdef __cplusplus
} /* extern "C" */
#endif
--
2.17.2
2 years, 3 months
[PATCH V2 1/1] device-dax: check for vma range while dax_mmap.
by Zhang Yi
This patch prevents a user mapping an illegal vma range that is larger
than a dax device physical resource.
When qemu maps the dax device for virtual nvdimm's backend device, the
v-nvdimm label area is defined at the end of mapped range. By using an
illegal size that exceeds the range of the device dax, it will trigger a
fault with qemu.
Signed-off-by: Zhang Yi <yi.z.zhang(a)linux.intel.com>
---
drivers/dax/device.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 108c37f..6fe8c30 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -177,6 +177,33 @@ static const struct attribute_group *dax_attribute_groups[] = {
NULL,
};
+static int check_vma_range(struct dev_dax *dev_dax, struct vm_area_struct *vma,
+ const char *func)
+{
+ struct device *dev = &dev_dax->dev;
+ struct resource *res;
+ unsigned long size;
+ int ret, i;
+
+ if (!dax_alive(dev_dax->dax_dev))
+ return -ENXIO;
+
+ size = vma->vm_end - vma->vm_start + (vma->vm_pgoff << PAGE_SHIFT);
+ ret = -EINVAL;
+ for (i = 0; i < dev_dax->num_resources; i++) {
+ res = &dev_dax->res[i];
+ if (size > resource_size(res)) {
+ dev_info_ratelimited(dev,
+ "%s: %s: fail, vma range overflow\n",
+ current->comm, func);
+ ret = -EINVAL;
+ continue;
+ } else
+ return 0;
+ }
+ return ret;
+}
+
static int check_vma(struct dev_dax *dev_dax, struct vm_area_struct *vma,
const char *func)
{
@@ -469,6 +496,8 @@ static int dax_mmap(struct file *filp, struct vm_area_struct *vma)
*/
id = dax_read_lock();
rc = check_vma(dev_dax, vma, __func__);
+ if (!rc)
+ rc = check_vma_range(dev_dax, vma, __func__);
dax_read_unlock(id);
if (rc)
return rc;
--
2.7.4
2 years, 3 months
Snapshot target and DAX-capable devices
by Jan Kara
Hi,
I've been analyzing why fstest generic/081 fails when the backing device is
capable of DAX. The problem boils down to the failure of:
lvm vgcreate -f vg0 /dev/pmem0
lvm lvcreate -L 128M -n lv0 vg0
lvm lvcreate -s -L 4M -n snap0 vg0/lv0
The last command fails like:
device-mapper: reload ioctl on (253:0) failed: Invalid argument
Failed to lock logical volume vg0/lv0.
Aborting. Manual intervention required.
And the core of the problem is that volume vg0/lv0 is originally of
DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to
switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX.
The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent
DAX mounts if not supported".
The question is whether / how this should be fixed. The current inability
to create snapshots of DAX-capable devices looks weird and the cryptic
failure makes it even worse (it took me quite a while to understand what is
failing and why). OTOH I see the rationale behind Ross' change as well.
Honza
--
Jan Kara <jack(a)suse.com>
SUSE Labs, CR
2 years, 3 months
[PATCH 1/2] tools/testing/nvdimm: Align test resources to 128M
by Dan Williams
In preparation for libnvdimm growing new restrictions to detect section
conflicts between persistent memory regions, enable nfit_test to
allocate aligned resources. Use a gen_pool to allocate nfit_test's fake
resources in a separate address space from the virtual translation of
the same.
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
tools/testing/nvdimm/test/nfit.c | 36 ++++++++++++++++++++++++++++++++++--
1 file changed, 34 insertions(+), 2 deletions(-)
diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
index 01ec04bf91b5..ca4e61c864d5 100644
--- a/tools/testing/nvdimm/test/nfit.c
+++ b/tools/testing/nvdimm/test/nfit.c
@@ -15,6 +15,7 @@
#include <linux/dma-mapping.h>
#include <linux/workqueue.h>
#include <linux/libnvdimm.h>
+#include <linux/genalloc.h>
#include <linux/vmalloc.h>
#include <linux/device.h>
#include <linux/module.h>
@@ -215,6 +216,8 @@ struct nfit_test {
static struct workqueue_struct *nfit_wq;
+static struct gen_pool *nfit_pool;
+
static struct nfit_test *to_nfit_test(struct device *dev)
{
struct platform_device *pdev = to_platform_device(dev);
@@ -1132,6 +1135,9 @@ static void release_nfit_res(void *data)
list_del(&nfit_res->list);
spin_unlock(&nfit_test_lock);
+ if (resource_size(&nfit_res->res) >= DIMM_SIZE)
+ gen_pool_free(nfit_pool, nfit_res->res.start,
+ resource_size(&nfit_res->res));
vfree(nfit_res->buf);
kfree(nfit_res);
}
@@ -1144,7 +1150,7 @@ static void *__test_alloc(struct nfit_test *t, size_t size, dma_addr_t *dma,
GFP_KERNEL);
int rc;
- if (!buf || !nfit_res)
+ if (!buf || !nfit_res || !*dma)
goto err;
rc = devm_add_action(dev, release_nfit_res, nfit_res);
if (rc)
@@ -1164,6 +1170,8 @@ static void *__test_alloc(struct nfit_test *t, size_t size, dma_addr_t *dma,
return nfit_res->buf;
err:
+ if (*dma && size >= DIMM_SIZE)
+ gen_pool_free(nfit_pool, *dma, size);
if (buf)
vfree(buf);
kfree(nfit_res);
@@ -1172,9 +1180,16 @@ static void *__test_alloc(struct nfit_test *t, size_t size, dma_addr_t *dma,
static void *test_alloc(struct nfit_test *t, size_t size, dma_addr_t *dma)
{
+ struct genpool_data_align data = {
+ .align = SZ_128M,
+ };
void *buf = vmalloc(size);
- *dma = (unsigned long) buf;
+ if (size >= DIMM_SIZE)
+ *dma = gen_pool_alloc_algo(nfit_pool, size,
+ gen_pool_first_fit_align, &data);
+ else
+ *dma = (unsigned long) buf;
return __test_alloc(t, size, dma, buf);
}
@@ -2839,6 +2854,18 @@ static __init int nfit_test_init(void)
goto err_register;
}
+ nfit_pool = gen_pool_create(ilog2(SZ_4M), NUMA_NO_NODE);
+ if (!nfit_pool) {
+ rc = -ENOMEM;
+ goto err_register;
+ }
+
+ if (gen_pool_add(nfit_pool, VMALLOC_START,
+ VMALLOC_END + 1 - VMALLOC_START, NUMA_NO_NODE)) {
+ rc = -ENOMEM;
+ goto err_register;
+ }
+
for (i = 0; i < NUM_NFITS; i++) {
struct nfit_test *nfit_test;
struct platform_device *pdev;
@@ -2894,6 +2921,9 @@ static __init int nfit_test_init(void)
return 0;
err_register:
+ if (nfit_pool)
+ gen_pool_destroy(nfit_pool);
+
destroy_workqueue(nfit_wq);
for (i = 0; i < NUM_NFITS; i++)
if (instances[i])
@@ -2917,6 +2947,8 @@ static __exit void nfit_test_exit(void)
platform_driver_unregister(&nfit_test_driver);
nfit_test_teardown();
+ gen_pool_destroy(nfit_pool);
+
for (i = 0; i < NUM_NFITS; i++)
put_device(&instances[i]->pdev.dev);
class_destroy(nfit_test_dimm);
2 years, 4 months
[PATCH] dax: Fix Xarray conversion of dax_unlock_mapping_entry()
by Dan Williams
Internal to dax_unlock_mapping_entry(), dax_unlock_entry() is used to
store a replacement entry in the Xarray at the given xas-index with the
DAX_LOCKED bit clear. When called, dax_unlock_entry() expects the unlocked
value of the entry relative to the current Xarray state to be specified.
In most contexts dax_unlock_entry() is operating in the same scope as
the matched dax_lock_entry(). However, in the dax_unlock_mapping_entry()
case the implementation needs to recall the original entry. In the case
where the original entry is a 'pmd' entry it is possible that the pfn
performed to do the lookup is misaligned to the value retrieved in the
Xarray.
When creating the 'unlocked' entry be sure to align it to the expected
size as reflected by the DAX_PMD flag. Otherwise, future lookups become
confused by finding a 'pte' aligned value at an index that should return
a 'pmd' aligned value. This mismatch results in failure signatures like
the following:
WARNING: CPU: 38 PID: 1396 at fs/dax.c:340 dax_insert_entry+0x2b2/0x2d0
RIP: 0010:dax_insert_entry+0x2b2/0x2d0
[..]
Call Trace:
dax_iomap_pte_fault.isra.41+0x791/0xde0
ext4_dax_huge_fault+0x16f/0x1f0
? up_read+0x1c/0xa0
__do_fault+0x1f/0x160
__handle_mm_fault+0x1033/0x1490
handle_mm_fault+0x18b/0x3d0
...and potential corruption of nearby page state as housekeeping
routines, like dax_disassociate_entry(), may overshoot their expected
bounds starting at the wrong page.
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Jan Kara <jack(a)suse.cz>
Fixes: 9f32d221301c ("dax: Convert dax_lock_mapping_entry to XArray")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
fs/dax.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/fs/dax.c b/fs/dax.c
index 3f592dc18d67..6c5f8f345b1a 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -59,6 +59,7 @@ static inline unsigned int pe_order(enum page_entry_size pe_size)
/* The order of a PMD entry */
#define PMD_ORDER (PMD_SHIFT - PAGE_SHIFT)
+#define PMD_ORDER_MASK ~((1UL << PMD_ORDER) - 1)
static wait_queue_head_t wait_table[DAX_WAIT_TABLE_ENTRIES];
@@ -93,9 +94,13 @@ static unsigned long dax_to_pfn(void *entry)
return xa_to_value(entry) >> DAX_SHIFT;
}
-static void *dax_make_entry(pfn_t pfn, unsigned long flags)
+static void *dax_make_entry(pfn_t pfn_t, unsigned long flags)
{
- return xa_mk_value(flags | (pfn_t_to_pfn(pfn) << DAX_SHIFT));
+ unsigned long pfn = pfn_t_to_pfn(pfn_t);
+
+ if (flags & DAX_PMD)
+ pfn &= PMD_ORDER_MASK;
+ return xa_mk_value(flags | (pfn << DAX_SHIFT));
}
static bool dax_is_locked(void *entry)
2 years, 4 months
[PATCH 0/2] kvm: Use huge pages for DAX-backed files
by Barret Rhoden
This patch series depends on dax pages not being PageReserved. Once
that is in place, these changes will let KVM use huge pages with
dax-backed files. Without the PageReserved change, KVM and DAX still
work with these patches, simply without huge pages - which is the
current situation.
RFC/discussion thread:
https://lore.kernel.org/lkml/20181029210716.212159-1-brho@google.com/
Barret Rhoden (2):
mm: make dev_pagemap_mapping_shift() externally visible
kvm: Use huge pages for DAX-backed files
arch/x86/kvm/mmu.c | 34 ++++++++++++++++++++++++++++++++--
include/linux/mm.h | 3 +++
mm/memory-failure.c | 38 +++-----------------------------------
mm/util.c | 34 ++++++++++++++++++++++++++++++++++
4 files changed, 72 insertions(+), 37 deletions(-)
--
2.19.1.930.g4563a0d9d0-goog
2 years, 4 months