Hi,
> Another approach could be to integrate NVDIMM event
> monitoring into some other utility, like the rasdaemon. I'm interested in
> your thoughts.
Though I'm not sure which (existing or new) utility is appropriate yet.
I prefer this way. So, I'll think about it.
I investigated the issue that notification/monitoring feature of over-
threshold event with my co-worker. Here is current our understandings.
a) rasdaemon
It is good tools for machine check error, and if machine check occurs on
NVDIMM, I suppose it will work not only traditional RAM but also NVDIMM.
But, it may not fit the purpose of notification/monitoring threshold event.
b) smartmontools (
https://www.smartmontools.org/)
This tool may fit the purpose of notification/monitoring of health of NVDIMMs.
However, it may a bit troublesome due to the followings.
- The smartd seems to check smart values of each devices with
ioctl() periodically (In other words, "polling").
Probably, other devices does not have the
notification interface like "ndctl_dimm_get_health_eventfd()
and poll()/select()".
- smartmontools supports many OSs (Windows, darwin, xxxBSDs, os2(!)).
I'm not sure other OSs have similar notification interface like Linux.
So, it may need to "polling" like other devices.
c) udev
Udev can kick any programs if udev.rules is created.
However, there is no uevent for the event of over-threshold currently.
In addition, I'm not sure that udev fits this type of event notification.
d) make a new tiny daemon in ndctl tree
This may be simpler way.
It can use ndctl_dimm_get_health_eventfd() and poll()/select().
But, ndctl may be included in kernel source,
and I don't know whether kernel includes other daemon tools or not.
Though I feel like selecting d) now.....
Any thoughts?
Thanks,
---
Yasunori Goto