On Tue, Feb 2, 2016 at 3:19 PM, Al Viro <viro(a)zeniv.linux.org.uk> wrote:
On Tue, Feb 02, 2016 at 04:11:42PM -0700, Ross Zwisler wrote:
> However, for raw block devices and for XFS with a real-time device, the
> value in inode->i_sb->s_bdev is not correct. With the code as it is
> currently written, an fsync or msync to a DAX enabled raw block device will
> cause a NULL pointer dereference kernel BUG. For this to work correctly we
> need to ask the block device or filesystem what struct block_device is
> appropriate for our inode.
> To that end, add a get_bdev(struct inode *) entry point to struct
> super_operations. If this function pointer is non-NULL, this notifies DAX
> that it needs to use it to look up the correct block_device. If
> i_sb->get_bdev() is NULL DAX will default to inode->i_sb->s_bdev.
Umm... It assumes that bdev will stay pinned for as long as inode is
referenced, presumably? If so, that needs to be documented (and verified
for existing fs instances). In principle, multi-disk fs might want to
support things like "silently move the inodes backed by that disk to other
Dan, This is exactly the kind of thing I'm taking about WRT the
weirder device models and directly calling bdev_direct_access().
Filesystems don't have the monogamous relationship with a device that
is implicitly assumed in DAX, you have to ask the filesystem what the
relationship is and is migrating to, and allow the filesystem to
update DAX when the relationship is changing. As we start to see many
DIMM's and 10s TiB pmem systems this is going be an even bigger deal
as load balancing, wear leveling, and fault tolerance concerned are
inevitably driven by the filesystem.