On Fri, Jan 29, 2016 at 3:34 PM, Ross Zwisler
On Fri, Jan 29, 2016 at 11:28:15AM -0700, Ross Zwisler wrote:
> On Thu, Jan 28, 2016 at 01:38:58PM -0800, Christoph Hellwig wrote:
> > On Thu, Jan 28, 2016 at 12:35:04PM -0700, Ross Zwisler wrote:
> > > There are a number of places in dax.c that look up the struct block_device
> > > associated with an inode. Previously this was done by just using
> > > inode->i_sb->s_bdev. This is correct for inodes that exist within
> > > filesystems supported by DAX (ext2, ext4 & XFS), but when running DAX
> > > against raw block devices this value is NULL. This causes NULL pointer
> > > dereferences when these block_device pointers are used.
> > It's also wrong for an XFS file system with a RT device..
> > > +#define DAX_BDEV(inode) (S_ISBLK(inode->i_mode) ? I_BDEV(inode) \
> > > + : inode->i_sb->s_bdev)
> > .. but this isn't going to fix it. You must use a bdev returned by
> > get_blocks or a similar file system method.
> I guess I need to go off and understand if we can have DAX mappings on such a
> device. If we can, we may have a problem - we can get the block_device from
> get_block() in I/O path and the various fault paths, but we don't have access
> to get_block() when flushing via dax_writeback_mapping_range(). We avoid
> needing it the normal case by storing the sector results from get_block() in
> the radix tree.
> /me is off to play with RT devices...
Well, RT devices are completely broken as far as I can see. I've reported the
breakage to the XFS list. Anything I do that triggers a RT block allocation
in XFS causes a lockdep splat + a kernel BUG - I've tried regular pwrite(),
xfs_rtcp and mmap() + write to address. Not a new bug either - happens just
the same with v4.4. Happens with both PMEM and BRD, and has no relationship
to whether I'm using DAX or not.
Does it work for this patch to go in as-is since it fixes an immediate OOPS
with raw block devices + DAX, and when RT devices are alive again I'll figure
out how to make them work too?
Can we step back and be clear about which lookups should be coming
from get_blocks(). Which ones are critical vs ones we just
opportunistically lookup for a debug print.
Right now xfs and ext4 are basically disagreeing on whether
get_blocks() reliably sets ->bh_bdev, and checking for a raw
block-device inode in dax_clear_blocks() does not make sense. So this
all seems a bit confused.