[PATCH 1/1] ndctl: fix memory leak in libndctl
by Lukasz Plewa
From: Łukasz Plewa <lukasz.plewa(a)intel.com>
leak found by PMDK test suite:
==29103== 48 bytes in 4 blocks are definitely lost in loss record 3 of 3
==29103== at 0x4C31C15: realloc (vg_replace_malloc.c:785)
==29103== by 0x5B76F74: parse_lbasize_supported.isra.11 (libndctl.c:4378)
==29103== by 0x5B78967: __add_pfn (libndctl.c:4830)
==29103== by 0x5B7FBEC: add_dax (libndctl.c:4882)
==29103== by 0x5B71959: __sysfs_device_parse (sysfs.c:118)
==29103== by 0x5B78683: device_parse (libndctl.c:725)
==29103== by 0x5B78683: daxs_init (libndctl.c:3833)
==29103== by 0x5B7FB6F: ndctl_dax_get_first (libndctl.c:5248)
==29103== by 0x5B80000: ndctl_namespace_get_dax (libndctl.c:3543)
==29103== by 0x4E4CA7E: os_dimm_region_namespace (os_dimm_ndctl.c:131)
==29103== by 0x4E4CD0D: os_dimm_interleave_set (os_dimm_ndctl.c:194)
==29103== by 0x4E4CE92: os_dimm_uid (os_dimm_ndctl.c:230)
==29103== by 0x4E5F632: shutdown_state_add_part (shutdown_state.c:112)
Ref: pmem/issues#1020
Reported-by: Grzegorz Brzeziński <grzegorz.brzezinski(a)intel.com>
Signed-off-by: Łukasz Plewa <lukasz.plewa(a)intel.com>
---
ndctl/lib/libndctl.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/ndctl/lib/libndctl.c b/ndctl/lib/libndctl.c
index c9e2875..6de5463 100644
--- a/ndctl/lib/libndctl.c
+++ b/ndctl/lib/libndctl.c
@@ -518,6 +518,7 @@ static void __free_pfn(struct ndctl_pfn *pfn, struct list_head *head, void *to_f
free(pfn->pfn_path);
free(pfn->pfn_buf);
free(pfn->bdev);
+ free(pfn->alignments.supported);
free(to_free);
}
--
2.17.1
3 years, 5 months
[ndctl PATCH v2 0/4] add the support for NVDIMM_FAMILY_HYPERV
by Dexuan Cui
NVDIMM_FAMILY_HYPERV has been supported on this branch of the kernel:
https://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm.git/log/?h=li...
Now, let's add the ndctl part as well.
Patch 0001 and 0002 have been posted on Feb 5, and this is just a resend.
Patch 0003 and 0004 are a split version of the single patch I posted on Feb 14.
In v2, I split the single patch into 2 separate patches for easy review, and I
also added an explicit warning if the user specifies unsupported events for
"ndctl monitor". Thanks Qi Fuli for the suggestion, and thanks
Johannes Thumshirn for reviewing the patch!
Please review the new patchset. Thanks!
Dexuan Cui (4):
libndctl: add support for NVDIMM_FAMILY_HYPERV's _DSM Function 1
libndctl: NVDIMM_FAMILY_HYPERV: add .smart_get_shutdown_count
(Function 2)
ndctl, lib: implement ndctl_dimm_get_cmd_family()
ndctl, monitor: support NVDIMM_FAMILY_HYPERV
ndctl/lib/Makefile.am | 1 +
ndctl/lib/hyperv.c | 177 +++++++++++++++++++++++++++++++++++++++++
ndctl/lib/hyperv.h | 58 ++++++++++++++
ndctl/lib/libndctl.c | 7 ++
ndctl/lib/libndctl.sym | 1 +
ndctl/lib/private.h | 3 +
ndctl/libndctl.h | 1 +
ndctl/monitor.c | 42 ++++++++--
ndctl/ndctl.h | 1 +
9 files changed, 284 insertions(+), 7 deletions(-)
create mode 100644 ndctl/lib/hyperv.c
create mode 100644 ndctl/lib/hyperv.h
--
2.19.1
3 years, 5 months
[PATCH v2 0/6] nfit/ars: Improve polling and short-ARS execution
by Dan Williams
Changes since v1: [1]
* Fix the root poll interval support to avoid a infinite loop condition
when the polling is faster than the ARS completion.
* Move the introduction of scrub_flags earlier in the series and
introduce ARS_POLL to fix the above issue.
[1]: https://lists.01.org/pipermail/linux-nvdimm/2019-February/019964.html
---
Here is a small pile of updates to better coordinate the Linux ARS state
machine with platform-BIOS implementations. Specifically, take advantage
of opportunities to run short-ARS whenever the ARS interface is found to
be idle at init, always run short-ARS even if no_init_ars is specified,
allow root to reset the exponential backoff polling interval for ARS
completion, and protect the kernel against the consumption of stale ARS
results.
---
Dan Williams (6):
nfit/ars: Attempt a short-ARS whenever the ARS state is idle at boot
nfit/ars: Attempt short-ARS even in the no_init_ars case
nfit/ars: Remove ars_start_flags
nfit/ars: Introduce scrub_flags
nfit/ars: Allow root to busy-poll the ARS state machine
nfit/ars: Avoid stale ARS results
drivers/acpi/nfit/core.c | 70 ++++++++++++++++++++++++++++++++--------------
drivers/acpi/nfit/nfit.h | 11 +++++--
2 files changed, 57 insertions(+), 24 deletions(-)
3 years, 5 months
Fw: Why only devdax guarantees guest data persistence ?
by bipin.tomar@yahoo.com
I haven't heard from anyone on qemu-devel. There are a lot of vNVDIMM experts here and I'm hoping that someone here may throw some light.
-BT
----- Forwarded Message ----- From: bipin.tomar(a)yahoo.com <bipin.tomar(a)yahoo.com>To: qemu-devel(a)nongnu.org <qemu-devel(a)nongnu.org>Sent: Friday, February 15, 2019, 12:09:31 PM ESTSubject: Why only devdax guarantees guest data persistence ?
Text from "docs/nvdimm.txt" says:
Guest Data Persistence
----------------------
Though QEMU supports multiple types of vNVDIMM backends on Linux,
currently the only one that can guarantee the guest write persistence
is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
which all guest access do not involve any host-side kernel cache.
I think here "host-side kernel cache" imply "page cache". Why does fsdax NOT have the same persistence guarantees as devdax for vNVDIMM?
Both the modes avoid using page cache then why is devdax explicitly called out?
-BT
3 years, 5 months
[LSF/MM TOPIC] Standardizing semantics around the per-file DAX flag
by Theodore Y. Ts'o
There's been a long-term disagreement about how the per-file DAX flag
should work.
* Should it exist at all?
* What happens when the DAX flag is cleared?
* Should it be not allowed and return an error?
(Or maybe only if the file is otherwise opened anywhere in the system?)
* Should it only takes effect when the file system is unmounted,
or when the inode drops out of the inode cache?
* Should we remove the flag entirely and make it be something the
system automagically infers?
I had hoped consensus would be achieved before the ext4 per-file DAX
flag lands, but it hasn't for a *long* time. Technically the DAX flag
is "experimental", which technically means it could be removed ---
although I suspect at this point, it would break some userspace, so
our options about how to adjust the semantics of the flag are probably
constrained.
- Ted
3 years, 5 months
Delivery failed
by Returned mail
�B-���I�Sce��2���r��������
����n����aj�u�B�n�iv����{�Qv�t/�Y
[���
����HT���0�v`��� ��$����M&~QK� �����:����Jq��X����D~8&�_�����3�S^��Q�;i{�*�D��QY��[-
3 years, 5 months
[ndctl PATCH] ndctl, monitor: support NVDIMM_FAMILY_HYPERV
by Dexuan Cui
Currently "ndctl monitor" fails for NVDIMM_FAMILY_HYPERV due to
"no smart support".
Actually NVDIMM_FAMILY_HYPERV doesn't use ND_CMD_SMART to get the health
info. Instead, it uses ND_CMD_CALL, so the checking here can't apply,
and NVDIMM_FAMILY_HYPERV doesn't support threshold alarms.
Let's skip the unnecessary checking for NVDIMM_FAMILY_HYPERV.
With the patch, when an error happens, we can log it with such a message:
{"timestamp":"1550209474.683237420","pid":3874,"event":
{"dimm-spares-remaining":false,"dimm-media-temperature":false,
"dimm-controller-temperature":false,"dimm-health-state":true,
"dimm-unclean-shutdown":false},"dimm":{"dev":"nmem1",
"id":"04d5-01-1701-01000000","handle":1,"phys_id":0,"health":
{"health_state":"critical","shutdown_count":8}}}
Here the meaning info is:
"health": {"health_state":"critical","shutdown_count":8}
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
---
ndctl/lib/libndctl.c | 5 +++++
ndctl/lib/libndctl.sym | 1 +
ndctl/libndctl.h | 1 +
ndctl/monitor.c | 33 ++++++++++++++++++++++++++-------
4 files changed, 33 insertions(+), 7 deletions(-)
diff --git a/ndctl/lib/libndctl.c b/ndctl/lib/libndctl.c
index 48bdb27..1186579 100644
--- a/ndctl/lib/libndctl.c
+++ b/ndctl/lib/libndctl.c
@@ -1550,6 +1550,11 @@ NDCTL_EXPORT struct ndctl_dimm *ndctl_dimm_get_next(struct ndctl_dimm *dimm)
return list_next(&bus->dimms, dimm, list);
}
+NDCTL_EXPORT unsigned long ndctl_dimm_get_cmd_family(struct ndctl_dimm *dimm)
+{
+ return dimm->cmd_family;
+}
+
NDCTL_EXPORT unsigned int ndctl_dimm_get_handle(struct ndctl_dimm *dimm)
{
return dimm->handle;
diff --git a/ndctl/lib/libndctl.sym b/ndctl/lib/libndctl.sym
index cb9f769..470e895 100644
--- a/ndctl/lib/libndctl.sym
+++ b/ndctl/lib/libndctl.sym
@@ -38,6 +38,7 @@ global:
ndctl_bus_wait_probe;
ndctl_dimm_get_first;
ndctl_dimm_get_next;
+ ndctl_dimm_get_cmd_family;
ndctl_dimm_get_handle;
ndctl_dimm_get_phys_id;
ndctl_dimm_get_vendor;
diff --git a/ndctl/libndctl.h b/ndctl/libndctl.h
index 0debdb6..cb5a8fc 100644
--- a/ndctl/libndctl.h
+++ b/ndctl/libndctl.h
@@ -145,6 +145,7 @@ struct ndctl_dimm *ndctl_dimm_get_next(struct ndctl_dimm *dimm);
for (dimm = ndctl_dimm_get_first(bus); \
dimm != NULL; \
dimm = ndctl_dimm_get_next(dimm))
+unsigned long ndctl_dimm_get_cmd_family(struct ndctl_dimm *dimm);
unsigned int ndctl_dimm_get_handle(struct ndctl_dimm *dimm);
unsigned short ndctl_dimm_get_phys_id(struct ndctl_dimm *dimm);
unsigned short ndctl_dimm_get_vendor(struct ndctl_dimm *dimm);
diff --git a/ndctl/monitor.c b/ndctl/monitor.c
index 43b2abe..6adc305 100644
--- a/ndctl/monitor.c
+++ b/ndctl/monitor.c
@@ -265,31 +265,50 @@ static bool filter_region(struct ndctl_region *region,
return true;
}
-static void filter_dimm(struct ndctl_dimm *dimm, struct util_filter_ctx *fctx)
+static bool ndctl_dimm_test_and_enable_notification(struct ndctl_dimm *dimm)
{
- struct monitor_dimm *mdimm;
- struct monitor_filter_arg *mfa = fctx->monitor;
const char *name = ndctl_dimm_get_devname(dimm);
+ /*
+ * Hyper-V Virtual NVDIMM doesn't use ND_CMD_SMART to get the health
+ * info. Instead, it uses ND_CMD_CALL, so the checking here can't
+ * apply, and it doesn't support threshold alarms.
+ */
+ if (ndctl_dimm_get_cmd_family(dimm) == NVDIMM_FAMILY_HYPERV)
+ return true;
+
if (!ndctl_dimm_is_cmd_supported(dimm, ND_CMD_SMART)) {
err(&monitor, "%s: no smart support\n", name);
- return;
+ return false;
}
if (!ndctl_dimm_is_cmd_supported(dimm, ND_CMD_SMART_THRESHOLD)) {
err(&monitor, "%s: no smart threshold support\n", name);
- return;
+ return false;
}
if (!ndctl_dimm_is_flag_supported(dimm, ND_SMART_ALARM_VALID)) {
err(&monitor, "%s: smart alarm invalid\n", name);
- return;
+ return false;
}
if (enable_dimm_supported_threshold_alarms(dimm)) {
err(&monitor, "%s: enable supported threshold alarms failed\n", name);
- return;
+ return false;
}
+ return true;
+}
+
+static void filter_dimm(struct ndctl_dimm *dimm, struct util_filter_ctx *fctx)
+{
+ struct monitor_dimm *mdimm;
+ struct monitor_filter_arg *mfa = fctx->monitor;
+ const char *name = ndctl_dimm_get_devname(dimm);
+
+
+ if (!ndctl_dimm_test_and_enable_notification(dimm))
+ return;
+
mdimm = calloc(1, sizeof(struct monitor_dimm));
if (!mdimm) {
err(&monitor, "%s: calloc for monitor dimm failed\n", name);
--
2.19.1
3 years, 5 months
[LSF/MM TOPIC] Software RAID Support for NV-DIMM
by Johannes Thumshirn
(This is a joint proposal with Hannes Reinecke)
Servers with NV-DIMM are slowly emerging in data centers but one key feature
for reliability of these systems hasn't been addressed up to now, data
redundancy.
While it would be best to solve this issue in the memory controller of the CPU
itself, I don't see this coming in the next few years. This puts us as the OS
in the burden to create the redundant copies of data for the users.
If we leave of the DAX support Linux' software RAID implementations (MD,
device-mapper and BTRFS RAID) do already work on top of pmem devices, but they
are incompatible with DAX.
In this session Hannes and I would like to discuss eventual ways how we as an
operating system can mitigate these issues for our users.
Byte,
Johannes
--
Johannes Thumshirn SUSE Labs Filesystems
jthumshirn(a)suse.de +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
3 years, 5 months
Is nvdimm driver RT compatible?
by Liu, Yongxin
Hi experts,
Could anyone tell me whether Linux nvdimm driver is RT compatible?
When I was testing PMEM performance with fio using the following command, I got calltrace below.
# fio -filename=/dev/pmem0s -direct=1 -iodepth 1 -thread -rw=randrw -rwmixread=70 -ioengine=psync -bs=16k -size=1G -numjobs=30 -runtime=100 -group_reporting -name=mytest
BUG: scheduling while atomic: fio/2514/0x00000002
Modules linked in: intel_rapl nd_pmem nd_btt skx_edac iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp
crct10dif_pclmul crct10dif_common aesni_intel aes_x86_64 crypto_simd cryptd i40e nvme glue_helper nvme_core lpc_ich i2c_i801 nfit pcc_cpufreq wmi libnvdimm acpi_pad
acpi_power_meter
Preemption disabled at:
[<ffffffffc03608d9>] nd_region_acquire_lane+0x19/0x80 [libnvdimm]
CPU: 44 PID: 2514 Comm: fio Tainted: G W 4.18.20-rt8-preempt-rt #1
Call Trace:
dump_stack+0x4f/0x6a
? nd_region_acquire_lane+0x19/0x80 [libnvdimm]
__schedule_bug.cold.17+0x38/0x55
__schedule+0x484/0x6c0
? _raw_spin_lock+0x17/0x40
schedule+0x3d/0xe0
rt_spin_lock_slowlock_locked+0x118/0x2a0
rt_spin_lock_slowlock+0x57/0x90
rt_spin_lock+0x52/0x60
btt_write_pg.isra.16+0x280/0x4b0 [nd_btt]
btt_make_request+0x1b1/0x320 [nd_btt]
generic_make_request+0x1dc/0x3f0
submit_bio+0x49/0x140
nd_region_acquire_lane() disables preemption with get_cpu() which causes "scheduling while atomic" spews on RT.
Is it safe to replace get_cpu()/put_cpu() with get_cpu_light()/put_cpu_light() in nd_region_acquire_lane()/nd_region_release_lane()?
After this replacement, the codes protected by nd_region_release_lane/nd_region_release_lane would become pre-emptible.
So are these codes reentrant?
Thanks,
Yongxin
3 years, 5 months
[PATCH] ndctl: Generalized make-git-snapshot.sh
by ira.weiny@intel.com
From: Ira Weiny <ira.weiny(a)intel.com>
make-git-snapshot.sh made an assumption of the git tree location.
Furthermore, it assumed the user has an rpmbuild environment directory
structure set up.
Enhance the script to figure out where in what location it has been
cloned and create the rpmbuild directory if the user does not already
have it.
Signed-off-by: Ira Weiny <ira.weiny(a)intel.com>
---
make-git-snapshot.sh | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/make-git-snapshot.sh b/make-git-snapshot.sh
index 142419d623fe..26e29bd7953d 100755
--- a/make-git-snapshot.sh
+++ b/make-git-snapshot.sh
@@ -2,10 +2,18 @@
set -e
NAME=ndctl
-REFDIR="$HOME/git/ndctl" # for faster cloning, if available
+
+pushd `dirname $0`
+REFDIR=`pwd`
+popd
+
UPSTREAM=$REFDIR #TODO update once we have a public upstream
OUTDIR=$HOME/rpmbuild/SOURCES
+if [ ! -d $OUTDIR ]; then
+ mkdir -p $OUTDIR
+fi
+
[ -n "$1" ] && HEAD="$1" || HEAD="HEAD"
WORKDIR="$(mktemp -d --tmpdir "$NAME.XXXXXXXXXX")"
@@ -14,7 +22,7 @@ trap 'rm -rf $WORKDIR' exit
[ -d "$REFDIR" ] && REFERENCE="--reference $REFDIR"
git clone $REFERENCE "$UPSTREAM" "$WORKDIR"
-VERSION=$(./git-version)
+VERSION=$($REFDIR/git-version)
DIRNAME="ndctl-${VERSION}"
git archive --remote="$WORKDIR" --format=tar --prefix="$DIRNAME/" HEAD | gzip > $OUTDIR/"ndctl-${VERSION}.tar.gz"
--
2.20.1
3 years, 5 months