On Fri, Jul 13, 2018 at 05:28:05PM -0700, Dan Williams wrote:
On Fri, Jul 13, 2018 at 1:52 AM, Naoya Horiguchi
<n-horiguchi(a)ah.jp.nec.com> wrote:
> On Wed, Jul 04, 2018 at 02:41:06PM -0700, Dan Williams wrote:
...
>> +
>> + /*
>> + * Use this flag as an indication that the dax page has been
>> + * remapped UC to prevent speculative consumption of poison.
>> + */
>> + SetPageHWPoison(page);
>
> The number of hwpoison pages is maintained by num_poisoned_pages,
> so you can call num_poisoned_pages_inc()?
I don't think we want these pages accounted in num_poisoned_pages().
We have the badblocks infrastructure in libnvdimm to track how many
errors and where they are located, and since they can be repaired via
driver actions I think we should track them separately.
OK.
> Related to this, I'm interested in whether/how
unpoison_page() works
> on a hwpoisoned dev_pagemap page.
unpoison_page() is only triggered via freeing pages to the page
allocator, and that never happens for dev_pagemap / ZONE_DEVICE pages.
sorry, my bad comment.
I meant unpoison_memory() in mm/memory-failure.c, which is triggered
via debugfs:hwpoison/unpoison-pfn. This interface looks like below
int unpoison_memory(unsigned long pfn)
{
struct page *page;
struct page *p;
int freeit = 0;
static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
DEFAULT_RATELIMIT_BURST);
if (!pfn_valid(pfn))
return -ENXIO;
p = pfn_to_page(pfn);
page = compound_head(p);
if (!PageHWPoison(p)) {
unpoison_pr_info("Unpoison: Page was already unpoisoned
%#lx\n",
pfn, &unpoison_rs);
return 0;
}
...
so I think that we can add is_zone_device_page() check at the beginning
of this function to call hwpoison_clear() introduced in patch 13/13?
Otherwise maybe compound_head() will cause some critical issue like
general protection fault.
Thanks,
Naoya Horiguchi