On 09/02/2015 06:19 AM, Ross Zwisler wrote:
On Wed, Sep 02, 2015 at 08:21:20AM +1000, Dave Chinner wrote:
> Which means applications that should "just work" without
> modification on DAX are now subtly broken and don't actually
> guarantee data is safe after a crash. That's a pretty nasty
> landmine, and goes against *everything* we've claimed about using
> DAX with existing applications.
> That's wrong, and needs fixing.
I agree that we need to fix fsync as well, and that the fsync solution could
be used to implement msync if we choose to go that route. I think we might
want to consider keeping the msync and fsync implementations separate, though,
for two reasons.
1) The current msync implementation is much more efficient than what will be
needed for fsync. Fsync will need to call into the filesystem, traverse all
the blocks, get kernel virtual addresses from those and then call
wb_cache_pmem() on those kernel addresses.
I was thinking about this some more, and no this is not what we need to do
because of the virtual-based-cache ARCHs. And what we do for these systems
will also work for physical-based-cache ARCHs.
What we need to do, is dig into the mapping structure and pic up the current
VMA on the call to fsync. Then just flush that one on that virtual address,
(since it is current at the context of the fsync sys call)
And of course we need to do like I wrote, we must call fsync on vm_operations->close
before the VMA mappings goes away. Then an fsync after unmap is a no-op.
I think this is a necessary evil
for fsync since you don't have a VMA, but for msync we do and we can just
flush using the user addresses without any fs lookups.
right see above
2) I believe that the near-term fsync code will rely on struct pages
PMEM, which I believe are possible but optional as of Dan's last patch set:
I believe that this means that if we don't have struct pages for PMEM (becuase
ZONE_DEVICE et al. are turned off) fsync won't work. I'd be nice not to lose
msync as well.
Please see above it can be made to work. Actually what we do is the
traversal-kernel-ptr thing, and the fsync-on-unmap. And it works we have heavy
persistence testing and it is all very good.
So no, without pages it can all work very-well. There is only the sync problem
that I intend to fix soon, is only a matter of keeping a dax-dirty inode-list
So no this is not an excuse.