Re: [LSF/MM TOPIC] Filesystem-DAX, page-pinning, and RDMA
by hch@infradead.org
On Thu, Jan 25, 2018 at 09:08:48AM -0700, Jason Gunthorpe wrote:
> On Wed, Jan 24, 2018 at 11:02:16PM -0800, Dan Williams wrote:
>
> > No, in 3 dimensions since there is a need to support non-ODP RDMA
> > hardware, hypervisors want to coordinate DMA for guests, and non-RDMA
> > hardware also pins memory indefinitely like V4L2. So it's bigger than
> > RDMA, but that will likely be the first consumer of this 'longterm
> > pin' mechanism.
>
> BTW, did you look at VFIO? I think it should also have this problem
> right?
VFIO seems to have the same issue. In practice I don't think people
use file system backed pages for vfio, so it's not as urgent.
4 years, 5 months
Re:(6) linux-modules@vger.kernel.org E-mail database for $10 per country / E-mail базы по $10 за страну...
by Business Group (rgmgxodf)
Hello linux-modules(a)vger.kernel.org
E-mail database for $10 per country.[tgnyqac]
For marketing, advertising, newsletters.[bmuvgie]
+ Bonuses, discounts and much more.[rigktpn]
+ First base, then money.[bzcamgw]
Hurry up. Details on: andrey100077(a)gmail.com or by ICQ: 666784430 [atvrhc]
Добрый день linux-modules(a)vger.kernel.org
E-mail базы по $10 за страну.[dipoy]
Для маркетинговых, рекламных, рассылок.[kwfbbqp]
+ Бонусы, скидки и многое другое.[wwrckr]
+ Сначала базы, потом деньги.[whpufqx]
Спешите. Детали по: wbase(a)list.ru или по ICQ: 666784430 [saxwpdg]
4 years, 5 months
[PATCH] device-dax: Fix trailing semicolon
by Luis de Bethencourt
The trailing semicolon is an empty statement that does no operation.
Removing it since it doesn't do anything.
Signed-off-by: Luis de Bethencourt <luisbg(a)kernel.org>
---
Hi Dan,
After fixing the same thing in drivers/staging/rtl8723bs/, Joe Perches
suggested I fix it treewide [0].
Best regards
Luis
[0] http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2018-Ja...
[1] http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2018-Ja...
drivers/dax/device.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 7b0bf825c4e7..2137dbc29877 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -133,7 +133,7 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
dax_region->base = addr;
if (sysfs_create_groups(&parent->kobj, dax_region_attribute_groups)) {
kfree(dax_region);
- return NULL;;
+ return NULL;
}
kref_get(&dax_region->kref);
--
2.15.1
4 years, 5 months
Re: KVM "fake DAX" flushing interface - discussion
by Xiao Guangrong
On 11/22/2017 02:19 AM, Rik van Riel wrote:
> We can go with the "best" interface for what
> could be a relatively slow flush (fsync on a
> file on ssd/disk on the host), which requires
> that the flushing task wait on completion
> asynchronously.
I'd like to clarify the interface of "wait on completion
asynchronously" and KVM async page fault a bit more.
Current design of async-page-fault only works on RAM rather
than MMIO, i.e, if the page fault caused by accessing the
device memory of a emulated device, it needs to go to
userspace (QEMU) which emulates the operation in vCPU's
thread.
As i mentioned before the memory region used for vNVDIMM
flush interface should be MMIO and consider its support
on other hypervisors, so we do better push this async
mechanism into the flush interface design itself rather
than depends on kvm async-page-fault.
4 years, 5 months
[fstests PATCH 1/2] shared/272: don't use data journaling with DAX
by Ross Zwisler
shared/272 fails with kernels v4.15-rc1 and beyond when you are mounted
with DAX:
shared/272 [failed, exit status 1] - output mismatch (see
/root/project/xfstests/results//shared/272.out.bad)
--- tests/shared/272.out 2015-12-05 13:12:17.038257578 -0700
+++ /root/project/xfstests/results//shared/272.out.bad 2018-01-17
15:37:18.581631116 -0700
@@ -1,3 +1,3 @@
QA output created by 272
Switch data journalling mode. Silence is golden.
-Check filesystem
+/usr/bin/chattr: Device or resource busy while setting flags on
/mnt/xfstests_scratch/file.1
...
(Run 'diff -u tests/shared/272.out
/root/project/xfstests/results//shared/272.out.bad' to see the entire
diff)
This is expected. The following kernel commit:
commit e9072d859df3 ("ext4: prevent data corruption with journaling + DAX")
makes "chattr +j", which is attempting to turn on data journaling, return
-EBUSY if the ext4 DAX mount option is in use. This was done to prevent
the data corruption shown in xfstest ext4/030, added by this xfstests
commit:
commit 750a24e99e48 ("ext4: test for DAX + journaling corruption")
So, just skip shared/272 if the DAX mount option is in use.
Signed-off-by: Ross Zwisler <ross.zwisler(a)linux.intel.com>
---
tests/shared/272 | 1 +
1 file changed, 1 insertion(+)
diff --git a/tests/shared/272 b/tests/shared/272
index 7023b657..0c9763df 100755
--- a/tests/shared/272
+++ b/tests/shared/272
@@ -83,6 +83,7 @@ chattr_opt: $chattr_opt" >>$seqres.full
_supported_fs ext3 ext4
_supported_os Linux
_require_scratch
+_exclude_scratch_mount_option dax
rm -f $seqres.full
_scratch_mkfs_sized $((64 * 1024 * 1024)) >> $seqres.full 2>&1
--
2.14.3
4 years, 5 months
[ndctl PATCH] ndctl, test: fix stale json in btt-pad-compat.sh
by Vishal Verma
We weren't using the updated results of any but the first of the ndctl
create-namespace commands. This could potentially result in the test
being unreliable.
Use the json being emitted by the create-namespace commands to get the
device etc. for future operations. Also do a 'reset' before attempting
the old format restoration test.
Cc: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma(a)intel.com>
---
test/btt-pad-compat.sh | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
Dan - If it is easier, I can send a patch for just adding the final
version of this test as a standalone patch. So far there have been three
patches (including this) that touch this test. This one if a minor
bugfix, so it could also be squashed with the first patch where this is
introduced.
diff --git a/test/btt-pad-compat.sh b/test/btt-pad-compat.sh
index d10efe3..129401b 100755
--- a/test/btt-pad-compat.sh
+++ b/test/btt-pad-compat.sh
@@ -144,14 +144,14 @@ copy_xxd_img()
create_oldfmt_ns()
{
# create null-uuid namespace
- $ndctl create-namespace -b "$bus" -t pmem -m raw -l 4096 -u 00000000-0000-0000-0000-000000000000
+ json=$($ndctl create-namespace -b "$bus" -t pmem -m raw -l 4096 -u 00000000-0000-0000-0000-000000000000)
eval "$(echo "$json" | sed -e "$json2var")"
[ -n "$dev" ] || err "$LINENO" 2
[ -n "$size" ] || err "$LINENO" 2
[ $size -gt 0 ] || err "$LINENO" 2
# reconfig it to sector mode
- $ndctl create-namespace -b "$bus" -e $dev -m sector --force
+ json=$($ndctl create-namespace -b "$bus" -e $dev -m sector --force)
eval "$(echo "$json" | sed -e "$json2var")"
[ -n "$dev" ] || err "$LINENO" 2
[ -n "$size" ] || err "$LINENO" 2
@@ -185,6 +185,7 @@ do_tests()
verify_idx 0 1
# do the same with an old format namespace
+ reset
create_oldfmt_ns
verify_idx 0 2
--
2.14.3
4 years, 5 months
[PATCH] NVDIMM: Reduced-the-ND_MIN_NAMESPACE_SIZE-from-4MB-to-4KB
by Cheng-mean Liu (SOCCER)
In the case of emulated NVDIMM devices in the VM environment, there
are scenarios that NVDIMM device with much smaller sizes are desired, for example, we might
use a single enumerated NVDIMM DAX device for representing each container layer, which in some
cases could be just a few KBs size.The current ND_MIN_NAMESPACE_SIZE is 4MB. To avoid wasting
address and inefficient zero padding for meeting this 4MB min requirement, the proposed change is to
reduce it to 4KB, a single page size, is a size good for all platforms.
Two patches are included in this request :
1. A patch for Linux kernel changes
2. A patch for ndctl project to keep it in sync with the Linux kernel header file
>From 29e173c32661d976cda073438979991167ee13fc Mon Sep 17 00:00:00 2001
From: Cheng-mean Liu <soccerl(a)microsoft.com>
Date: Thu, 11 Jan 2018 10:06:13 -0800
Subject: [PATCH] reduced the ND_MIN_NAMESPACE_SIZE from 4MB to 4KB
Signed-off-by: Cheng-mean Liu <soccerl(a)microsoft.com>
---
include/uapi/linux/ndctl.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
index 3f03567631cb..e63c201ed1ef 100644
--- a/include/uapi/linux/ndctl.h
+++ b/include/uapi/linux/ndctl.h
@@ -263,7 +263,7 @@ enum nd_driver_flags {
};
enum {
- ND_MIN_NAMESPACE_SIZE = 0x00400000,
+ ND_MIN_NAMESPACE_SIZE = 0x00001000,
};
enum ars_masks {
--
2.11.0
>From 2bf3e2bbfae81ab50d141571414c0e6556bc0e0c Mon Sep 17 00:00:00 2001
From: Cheng-mean Liu <soccerl(a)microsoft.com>
Date: Thu, 11 Jan 2018 10:02:52 -0800
Subject: [PATCH] reduced the ND_MIN_NAMESPACE_SIZE from 4MB to 4KB
Signed-off-by: Cheng-mean Liu <soccerl(a)microsoft.com>
---
ndctl/ndctl.h | 2 +-
test/dpa-alloc.c | 6 ++++++
2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/ndctl/ndctl.h b/ndctl/ndctl.h
index 5e6905c..8c14d90 100644
--- a/ndctl/ndctl.h
+++ b/ndctl/ndctl.h
@@ -263,7 +263,7 @@ enum nd_driver_flags {
};
enum {
- ND_MIN_NAMESPACE_SIZE = 0x00400000,
+ ND_MIN_NAMESPACE_SIZE = 0x00001000,
};
enum ars_masks {
diff --git a/test/dpa-alloc.c b/test/dpa-alloc.c
index d13cf5d..ba3deed 100644
--- a/test/dpa-alloc.c
+++ b/test/dpa-alloc.c
@@ -237,6 +237,12 @@ static int do_test(struct ndctl_ctx *ctx, struct ndctl_test *test)
uuid_unparse(namespaces[i].uuid, uuid_str);
size = ndctl_namespace_get_size(victim);
+
+ rc = ndctl_namespace_disable_invalidate(victim);
+ if (rc) {
+ fprintf(stderr, "failed to disable %s\n", uuid_str);
+ return rc;
+ }
rc = ndctl_namespace_delete(victim);
if (rc) {
fprintf(stderr, "failed to delete %s\n", uuid_str);
--
2.11.0
4 years, 5 months
转发:/ 学习华为的独特的人才培养经验
by 狄依玟
linux-nvdimm(a)lists.01.org
丢掉一次战略机会,可能就功亏一篑,而好的战略同时也是公司最大的“活力”牵引力,好的战略制定和战略执行,是一家企业不可缺乏的两个核心能力。
华为在过去的20多年里,抓住战略机遇的三次转型:
所以他不断走在每个关键时期的风口浪尖上,这是他的一个战略的选择。
详-细-课-纲-附件-查-阅
5:10:52
4 years, 5 months
[patch] btt: fix uninitialized err_lock
by Jeff Moyer
Hi,
When a sector mode namespace is initially created, the arena's err_lock
is not initialized. If, on the other hand, the namespace already
exists, the mutex is initialized. To fix the issue, I moved the mutex
initialization into the arena_alloc, which is called by both
discover_arenas and create_arenas.
This was discovered on an older kernel where mutex_trylock checks the
count to determine whether the lock is held. Because the data structure
is kzalloc-d, that count was 0 (held), and I/O to the device would hang
forever waiting for the lock to be released (see btt_write_pg, for
example). Current kernels have a different mutex implementation that
checks for a non-null owner, and so this doesn't show up as a problem.
If that lock were ever contended, it might cause issues, but you'd have
to be really unlucky, I think.
Signed-off-by: Jeff Moyer <jmoyer(a)redhat.com>
diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index e949e33..5860f99 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -630,6 +630,7 @@ static struct arena_info *alloc_arena(struct btt *btt, size_t size,
return NULL;
arena->nd_btt = btt->nd_btt;
arena->sector_size = btt->sector_size;
+ mutex_init(&arena->err_lock);
if (!size)
return arena;
@@ -758,7 +759,6 @@ static int discover_arenas(struct btt *btt)
arena->external_lba_start = cur_nlba;
parse_arena_meta(arena, super, cur_off);
- mutex_init(&arena->err_lock);
ret = btt_freelist_init(arena);
if (ret)
goto out;
4 years, 5 months
revamp vmem_altmap / dev_pagemap handling V3
by Christoph Hellwig
Hi all,
this series started with two patches from Logan that now are in the
middle of the series to kill the memremap-internal pgmap structure
and to redo the dev_memreamp_pages interface to be better suitable
for future PCI P2P uses. I reviewed them and noticed that there
isn't really any good reason to keep struct vmem_altmap either,
and that a lot of these alternative device page map access should
be better abstracted out instead of being sprinkled all over the
mm code. But when we got the RCU warnings in V1 I went for yet
another approach, and now struct vmem_altmap is kept for now,
but passed explicitly through the memory hotplug code instead of
having to do unprotected lookups through the radix tree. The
end result is that only the get_user_pages path ever looks up
struct dev_pagemap, and struct vmem_altmap is now always embedded
into struct dev_pagemap, and explicitly passed where needed.
Please review carefully, this has only been tested with my legacy
e820 NVDIMM system.
Chances since V2:
- properly pass altmap from dev_devm_memremap_pages through add_pages
(Dan Williams)
- small changelog updates
- a comment type fix
- dropped the patch to just rely on the radix_tree_insert return value
- initialize pgmap->type
4 years, 5 months