On Fri, Nov 6, 2015 at 9:35 AM, Thomas Gleixner <tglx(a)linutronix.de> wrote:
On Fri, 6 Nov 2015, Dan Williams wrote:
> On Fri, Nov 6, 2015 at 12:06 AM, Thomas Gleixner <tglx(a)linutronix.de> wrote:
> > Just for the record. Such a flush mechanism with
> > on_each_cpu()
> > wbinvd()
> > ...
> > will make that stuff completely unusable on Real-Time systems. We've
> > been there with the big hammer approach of the intel graphics
> > driver.
> Noted. This means RT systems either need to disable DAX or avoid
> fsync. Yes, this is a wart, but not an unexpected one in a first
> generation persistent memory platform.
And it's not just only RT. The folks who are aiming for 100%
undisturbed user space (NOHZ_FULL) will be massively unhappy about
that as well.
Is it really required to do that on all cpus?
I believe it is, but I'll double check.
I assume the folks that want undisturbed userspace are ok with the
mitigation to modify their application to flush by individual cache
lines if they want to use DAX without fsync. At least until the
platform can provide a cheaper fsync implementation.
The option to drive cache flushing from the radix is at least
interruptible, but it might be long running depending on how much
virtual address space is dirty. Altogether, the options in the
current generation are:
1/ wbinvd driven: quick flush O(size of cache), but long interrupt-off latency
2/ radix driven: long flush O(size of dirty range), but at least preempt-able
3/ DAX without calling fsync: userspace takes direct responsibility
for cache management of DAX mappings
4/ DAX disabled: fsync is the standard page cache writeback latency
We could potentially argue about 1 vs 2 ad nauseum, but I wonder if
there is room to it punt it to a configuration option or make it
dynamic? My stance is do 1 with the hope of riding options 3 and 4
until the platform happens to provide a better alternative.