On Sun, Jul 23, 2017 at 10:50 PM, Yasunori Goto <y-goto(a)jp.fujitsu.com> wrote:
Hi,
> > Another approach could be to integrate NVDIMM event
> > monitoring into some other utility, like the rasdaemon. I'm interested in
> > your thoughts.
>
> Though I'm not sure which (existing or new) utility is appropriate yet.
> I prefer this way. So, I'll think about it.
I investigated the issue that notification/monitoring feature of over-
threshold event with my co-worker. Here is current our understandings.
a) rasdaemon
It is good tools for machine check error, and if machine check occurs on
NVDIMM, I suppose it will work not only traditional RAM but also NVDIMM.
But, it may not fit the purpose of notification/monitoring threshold event.
My concern with rasdaemon is that its heuristics are built for
off-lining volatile system-ram, not managing persistent media errors.
b) smartmontools (
https://www.smartmontools.org/)
This tool may fit the purpose of notification/monitoring of health of NVDIMMs.
However, it may a bit troublesome due to the followings.
- The smartd seems to check smart values of each devices with
ioctl() periodically (In other words, "polling").
Probably, other devices does not have the
notification interface like "ndctl_dimm_get_health_eventfd()
and poll()/select()".
- smartmontools supports many OSs (Windows, darwin, xxxBSDs, os2(!)).
I'm not sure other OSs have similar notification interface like Linux.
So, it may need to "polling" like other devices.
One of the explicit goals of ndctl vs smartmontools is trying to make
sure that vendor-specific details don't leak into the output data
format. ndctl is also built to leverage the Linux specific
capabilities of the libnvdimm sub-system vs some
lowest-common-denominator implementation that results from trying to
be cross-OS compatible with an abstraction layer.
c) udev
Udev can kick any programs if udev.rules is created.
However, there is no uevent for the event of over-threshold currently.
In addition, I'm not sure that udev fits this type of event notification.
There are some drivers that use uevents for logging, I prefer poll(2)
capable sysfs files.
d) make a new tiny daemon in ndctl tree
This may be simpler way.
It can use ndctl_dimm_get_health_eventfd() and poll()/select().
But, ndctl may be included in kernel source,
and I don't know whether kernel includes other daemon tools or not.
The kernel does include a few daemons in the tools/ directory, so I
don't see this being a problem. Now, that said, Linus still has the
prerogative to not pull ndctl into the kernel for 4.14, but at this
point I feel more likely than not that the next version of ndctl will
be v4.14-rc1 instead of v58.
Though I feel like selecting d) now.....
Any thoughts?
I'm also in favor of d).