On Wed 19-07-17 10:26:45, Ross Zwisler wrote:
On Wed, Jul 19, 2017 at 05:33:14PM +0200, Jan Kara wrote:
> On Wed 28-06-17 16:01:50, Ross Zwisler wrote:
> > Another major change is that we remove dax_pfn_mkwrite() from our fault
> > flow, and instead rely on the page fault itself to make the PTE dirty and
> > writeable. The following description from the patch adding the
> > vm_insert_mixed_mkwrite() call explains this a little more:
> > ***
> > To be able to use the common 4k zero page in DAX we need to have our PTE
> > fault path look more like our PMD fault path where a PTE entry can be
> > marked as dirty and writeable as it is first inserted, rather than
> > waiting for a follow-up dax_pfn_mkwrite() => finish_mkwrite_fault() call.
> > Right now we can rely on having a dax_pfn_mkwrite() call because we can
> > distinguish between these two cases in do_wp_page():
> > case 1: 4k zero page => writable DAX storage
> > case 2: read-only DAX storage => writeable DAX storage
> > This distinction is made by via vm_normal_page(). vm_normal_page()
> > returns false for the common 4k zero page, though, just as it does for
> > DAX ptes. Instead of special casing the DAX + 4k zero page case, we will
> > simplify our DAX PTE page fault sequence so that it matches our DAX PMD
> > sequence, and get rid of dax_pfn_mkwrite() completely.
> > This means that insert_pfn() needs to follow the lead of insert_pfn_pmd()
> > and allow us to pass in a 'mkwrite' flag. If 'mkwrite' is
> > insert_pfn() will do the work that was previously done by wp_page_reuse()
> > as part of the dax_pfn_mkwrite() call path.
> > ***
> Hum, thinking about this in context of this patch... So what if we have
> allocated storage, a process faults it read-only, we map it to page tables
> writeprotected. Then the process writes through mmap to the area - the code
> in handle_pte_fault() ends up in do_wp_page() if I'm reading it right.
> Then, since we are missing ->pfn_mkwrite() handlers, the PTE will be marked
> writeable but radix tree entry stays clean - bug. Am I missing something?
I don't think we ever end up with a writeable PTE but with a clean radix tree
entry. When we get the write fault we do a full fault through
dax_iomap_pte_fault() and dax_insert_mapping().
dax_insert_mapping() sets up the dirty radix tree entry via
dax_insert_mapping_entry() before it does anything with the page tables via
So, this mkwrite fault path is exactly the path we would have taken if the
initial read to real storage hadn't happened, and we end up in the same end
state - with a dirty DAX radix tree entry and a writeable PTE.
Ah sorry, I have missed that it is not that you would not have
->pfn_mkwrite() handler - you still have it but it is the same as standard
fault handler now. So maybe can you rephrase the changelog a bit saying
that: "We get rid of dax_pfn_mkwrite() helper and use dax_iomap_fault() to
handle write-protection faults instead." Thanks!
Jan Kara <jack(a)suse.com>
SUSE Labs, CR