On Mon, 2015-11-23 at 12:53 -0800, Dan Williams wrote:
On Mon, Nov 23, 2015 at 12:04 PM, Toshi Kani
<toshi.kani(a)hpe.com> wrote:
> The following oops was observed when mmap() with MAP_POPULATE
> pre-faulted pmd mappings of a DAX file. follow_trans_huge_pmd()
> expects that a target address has a struct page.
>
> BUG: unable to handle kernel paging request at ffffea0012220000
> follow_trans_huge_pmd+0xba/0x390
> follow_page_mask+0x33d/0x420
> __get_user_pages+0xdc/0x800
> populate_vma_page_range+0xb5/0xe0
> __mm_populate+0xc5/0x150
> vm_mmap_pgoff+0xd5/0xe0
> SyS_mmap_pgoff+0x1c1/0x290
> SyS_mmap+0x1b/0x30
>
> Fix it by making the PMD pre-fault handling consistent with PTE.
> After pre-faulted in faultin_page(), follow_page_mask() calls
> follow_trans_huge_pmd(), which is changed to call follow_pfn_pmd()
> for VM_PFNMAP or VM_MIXEDMAP. follow_pfn_pmd() handles FOLL_TOUCH
> and returns with -EEXIST.
As of 4.4.-rc2 DAX pmd mappings are disabled. So we have time to do
something more comprehensive in 4.5.
Yes, I noticed during my testing that I could not use pmd...
> Reported-by: Mauricio Porto <mauricio.porto(a)hpe.com>
> Signed-off-by: Toshi Kani <toshi.kani(a)hpe.com>
> Cc: Andrew Morton <akpm(a)linux-foundation.org>
> Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
> Cc: Matthew Wilcox <willy(a)linux.intel.com>
> Cc: Dan Williams <dan.j.williams(a)intel.com>
> Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com>
> ---
> mm/huge_memory.c | 34 ++++++++++++++++++++++++++++++++++
> 1 file changed, 34 insertions(+)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index d5b8920..f56e034 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
[..]
> @@ -1288,6 +1315,13 @@ struct page *follow_trans_huge_pmd(struct
> vm_area_struct *vma,
> if ((flags & FOLL_NUMA) && pmd_protnone(*pmd))
> goto out;
>
> + /* pfn map does not have a struct page */
> + if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)) {
> + ret = follow_pfn_pmd(vma, addr, pmd, flags);
> + page = ERR_PTR(ret);
> + goto out;
> + }
> +
> page = pmd_page(*pmd);
> VM_BUG_ON_PAGE(!PageHead(page), page);
> if (flags & FOLL_TOUCH) {
I think it is already problematic that dax pmd mappings are getting
confused with transparent huge pages.
We had the same issue with dax pte mapping [1], and this change extends the pfn
map handling to pmd. So, this problem is not specific to pmd.
[1]
https://lkml.org/lkml/2015/6/23/181
They're more closely related to
a hugetlbfs pmd mappings in that they are mapping an explicit
allocation. I have some pending patches to address this dax-pmd vs
hugetlb-pmd vs thp-pmd classification that I will post shortly.
Not sure which way is better, but I am certainly interested in your changes.
By the way, I'm collecting DAX pmd regression tests [1], is this
just
a simple crash upon using MAP_POPULATE?
[1]:
https://github.com/pmem/ndctl/blob/master/lib/test-dax-pmd.c
Yes, this issue is easy to reproduce with MAP_POPULATE. In case it helps,
attached are the test I used for testing the patches. Sorry, the code is messy
since it was only intended for my internal use...
- The test was originally written for the pte change [1] and comments in
test.sh (ex. mlock fail, ok) reflect the results without the pte change.
- For the pmd test, I modified test-mmap.c to call posix_memalign() before
mmap(). By calling free(), the 2MB-aligned address from posix_memalign() can be
used for mmap(). This keeps the mmap'd address aligned on 2MB.
- I created test file(s) with dd (i.e. all blocks written) in my test.
- The other infinite loop issue (fixed by my other patch) was found by the test
case with option "-LMSr".
Thanks,
-Toshi