[PATCH v2 00/25] replace ioremap_{cache|wt} with memremap
by Dan Williams
Changes since v1 [1]:
1/ Drop the attempt at unifying ioremap() prototypes, just focus on
converting ioremap_cache and ioremap_wt over to memremap (Christoph)
2/ Drop the unrelated cleanups to use %pa in __ioremap_caller (Thomas)
3/ Add support for memremap() attempts on "System RAM" to simply return
the kernel virtual address for that range. ARM depends on this
functionality in ioremap_cache() and ACPI was open coding a similar
solution. (Mark)
4/ Split the conversions of ioremap_{cache|wt} into separate patches per
driver / arch.
5/ Fix bisection breakage and other reports from 0day-kbuild
---
While developing the pmem driver we noticed that the __iomem annotation
on the return value from ioremap_cache() was being mishandled by several
callers. We also observed that all of the call sites expected to be
able to treat the return value from ioremap_cache() as normal
(non-__iomem) pointer to memory.
This patchset takes the opportunity to clean up the above confusion as
well as a few issues with the ioremap_{cache|wt} interface, including:
1/ Eliminating the possibility of function prototypes differing between
architectures by defining a central memremap() prototype that takes
flags to determine the mapping type.
2/ Returning NULL rather than falling back silently to a different
mapping-type. This allows drivers to be stricter about the
mapping-type fallbacks that are permissible.
[1]: http://marc.info/?l=linux-arm-kernel&m=143735199029255&w=2
---
Dan Williams (22):
mm: enhance region_is_ram() to distinguish 'unknown' vs 'mixed'
arch, drivers: don't include <asm/io.h> directly, use <linux/io.h> instead
cleanup IORESOURCE_CACHEABLE vs ioremap()
intel_iommu: fix leaked ioremap mapping
arch: introduce memremap()
arm: switch from ioremap_cache to memremap
x86: switch from ioremap_cache to memremap
gma500: switch from acpi_os_ioremap to ioremap
i915: switch from acpi_os_ioremap to ioremap
acpi: switch from ioremap_cache to memremap
toshiba laptop: replace ioremap_cache with ioremap
memconsole: fix __iomem mishandling, switch to memremap
visorbus: switch from ioremap_cache to memremap
intel-iommu: switch from ioremap_cache to memremap
libnvdimm, pmem: switch from ioremap_cache to memremap
pxa2xx-flash: switch from ioremap_cache to memremap
sfi: switch from ioremap_cache to memremap
fbdev: switch from ioremap_wt to memremap
pmem: switch from ioremap_wt to memremap
arch: remove ioremap_cache, replace with arch_memremap
arch: remove ioremap_wt, replace with arch_memremap
pmem: convert to generic memremap
Toshi Kani (3):
mm, x86: Fix warning in ioremap RAM check
mm, x86: Remove region_is_ram() call from ioremap
mm: Fix bugs in region_is_ram()
arch/arc/include/asm/io.h | 1
arch/arm/Kconfig | 1
arch/arm/include/asm/io.h | 13 +++-
arch/arm/include/asm/xen/page.h | 4 +
arch/arm/mach-clps711x/board-cdb89712.c | 2 -
arch/arm/mach-shmobile/pm-rcar.c | 2 -
arch/arm/mm/ioremap.c | 12 +++-
arch/arm/mm/nommu.c | 11 ++-
arch/arm64/Kconfig | 1
arch/arm64/include/asm/acpi.h | 10 +--
arch/arm64/include/asm/dmi.h | 8 +--
arch/arm64/include/asm/io.h | 8 ++-
arch/arm64/kernel/efi.c | 9 ++-
arch/arm64/kernel/smp_spin_table.c | 19 +++---
arch/arm64/mm/ioremap.c | 20 ++----
arch/avr32/include/asm/io.h | 1
arch/frv/Kconfig | 1
arch/frv/include/asm/io.h | 17 ++---
arch/frv/mm/kmap.c | 6 ++
arch/ia64/Kconfig | 1
arch/ia64/include/asm/io.h | 11 +++
arch/ia64/kernel/cyclone.c | 2 -
arch/m32r/include/asm/io.h | 1
arch/m68k/Kconfig | 1
arch/m68k/include/asm/io_mm.h | 14 +---
arch/m68k/include/asm/io_no.h | 12 ++--
arch/m68k/include/asm/raw_io.h | 4 +
arch/m68k/mm/kmap.c | 17 +++++
arch/m68k/mm/sun3kmap.c | 6 ++
arch/metag/include/asm/io.h | 3 -
arch/microblaze/include/asm/io.h | 1
arch/mn10300/include/asm/io.h | 1
arch/nios2/include/asm/io.h | 1
arch/powerpc/kernel/pci_of_scan.c | 2 -
arch/s390/include/asm/io.h | 1
arch/sh/Kconfig | 1
arch/sh/include/asm/io.h | 20 ++++--
arch/sh/mm/ioremap.c | 10 +++
arch/sparc/include/asm/io_32.h | 1
arch/sparc/include/asm/io_64.h | 1
arch/sparc/kernel/pci.c | 3 -
arch/tile/include/asm/io.h | 1
arch/x86/Kconfig | 1
arch/x86/include/asm/efi.h | 3 +
arch/x86/include/asm/io.h | 17 +++--
arch/x86/kernel/crash_dump_64.c | 6 +-
arch/x86/kernel/kdebugfs.c | 8 +--
arch/x86/kernel/ksysfs.c | 28 ++++-----
arch/x86/mm/ioremap.c | 76 ++++++++++--------------
arch/xtensa/Kconfig | 1
arch/xtensa/include/asm/io.h | 9 ++-
drivers/acpi/apei/einj.c | 9 ++-
drivers/acpi/apei/erst.c | 6 +-
drivers/acpi/nvs.c | 6 +-
drivers/acpi/osl.c | 70 ++++++----------------
drivers/char/toshiba.c | 2 -
drivers/firmware/google/memconsole.c | 7 +-
drivers/gpu/drm/gma500/opregion.c | 2 -
drivers/gpu/drm/i915/intel_opregion.c | 2 -
drivers/iommu/intel-iommu.c | 10 ++-
drivers/iommu/intel_irq_remapping.c | 4 +
drivers/isdn/icn/icn.h | 2 -
drivers/mtd/devices/slram.c | 2 -
drivers/mtd/maps/pxa2xx-flash.c | 4 +
drivers/mtd/nand/diskonchip.c | 2 -
drivers/mtd/onenand/generic.c | 2 -
drivers/nvdimm/Kconfig | 2 -
drivers/pci/probe.c | 3 -
drivers/pnp/manager.c | 2 -
drivers/scsi/aic94xx/aic94xx_init.c | 7 --
drivers/scsi/arcmsr/arcmsr_hba.c | 5 --
drivers/scsi/mvsas/mv_init.c | 15 +----
drivers/scsi/sun3x_esp.c | 2 -
drivers/sfi/sfi_core.c | 4 +
drivers/staging/comedi/drivers/ii_pci20kc.c | 1
drivers/staging/unisys/visorbus/visorchannel.c | 16 +++--
drivers/staging/unisys/visorbus/visorchipset.c | 17 +++--
drivers/tty/serial/8250/8250_core.c | 2 -
drivers/video/fbdev/Kconfig | 2 -
drivers/video/fbdev/amifb.c | 5 +-
drivers/video/fbdev/atafb.c | 5 +-
drivers/video/fbdev/hpfb.c | 6 +-
drivers/video/fbdev/ocfb.c | 1
drivers/video/fbdev/s1d13xxxfb.c | 3 -
drivers/video/fbdev/stifb.c | 1
include/acpi/acpi_io.h | 6 +-
include/asm-generic/io.h | 8 ---
include/asm-generic/iomap.h | 4 -
include/linux/io-mapping.h | 2 -
include/linux/io.h | 9 +++
include/linux/mtd/map.h | 2 -
include/linux/pmem.h | 26 +++++---
include/video/vga.h | 2 -
kernel/Makefile | 2 +
kernel/memremap.c | 74 +++++++++++++++++++++++
kernel/resource.c | 43 +++++++-------
lib/Kconfig | 5 +-
lib/devres.c | 13 +---
lib/pci_iomap.c | 7 +-
tools/testing/nvdimm/Kbuild | 4 +
tools/testing/nvdimm/test/iomap.c | 34 ++++++++---
101 files changed, 482 insertions(+), 398 deletions(-)
create mode 100644 kernel/memremap.c
6 years, 3 months
[PATCH v4 0/8] Support for transparent PUD pages for DAX files
by Matthew Wilcox
We have customer demand to use 1GB pages to map DAX files. Unlike the 2MB
page support, the Linux MM does not currently support PUD pages, so I have
attempted to add support for the necessary pieces for DAX huge PUD pages.
Filesystems still need work to allocate 1GB pages. With ext4, I can
only get 16MB of contiguous space, although it is aligned. With XFS,
I can get 80MB less than 1GB, and it's not aligned. The XFS problem
may be due to the small amount of RAM in my test machine.
This patch set is against something approximately current -mm. I'd like
to thank Dave Chinner & Kirill Shutemov for their reviews of v1.
The conversion of pmd_fault & pud_fault to huge_fault is thanks to
Dave's poking, and Kirill spotted a couple of problems in the MM code.
Version 2 of the patch set is about 200 lines smaller (1016 insertions,
23 deletions in v1).
I've done some light testing using a program to mmap a block device
with DAX enabled, calling mincore() and examining /proc/smaps and
/proc/pagemap.
v4: Updated to current mmotm
Converted pud_trans_huge_lock to the same calling conventions as
pmd_trans_huge_lock.
Fill in vm_fault ->gfp_flags and ->pgoff, at Jan Kara's suggestion
Replace use of page table lock with pud_lock in __pud_alloc (cosmetic)
Fix compilation problems with various config settings
Convert dax_pmd_fault and dax_pud_fault to take a vm_fault instead of
individual pieces
Add copy_huge_pud() and follow_devmap_pud() so fork() should now work
Fix typo of PMD for PUD
v3: Rebased against current mmtom
v2: Reduced churn in filesystems by switching to ->huge_fault interface
Addressed concerns from Kirill
Matthew Wilcox (8):
mm: Convert an open-coded VM_BUG_ON_VMA
mm,fs,dax: Change ->pmd_fault to ->huge_fault
mm: Add support for PUD-sized transparent hugepages
mincore: Add support for PUDs
procfs: Add support for PUDs to smaps, clear_refs and pagemap
x86: Add support for PUD-sized transparent hugepages
dax: Support for transparent PUD pages
ext4: Support for PUD-sized transparent huge pages
Documentation/filesystems/dax.txt | 12 +-
arch/Kconfig | 3 +
arch/x86/Kconfig | 1 +
arch/x86/include/asm/paravirt.h | 11 ++
arch/x86/include/asm/paravirt_types.h | 2 +
arch/x86/include/asm/pgtable-2level.h | 19 +++
arch/x86/include/asm/pgtable-3level.h | 31 ++++
arch/x86/include/asm/pgtable.h | 134 +++++++++++++++
arch/x86/include/asm/pgtable_64.h | 13 ++
arch/x86/kernel/paravirt.c | 1 +
arch/x86/mm/pgtable.c | 31 ++++
fs/block_dev.c | 10 +-
fs/dax.c | 295 +++++++++++++++++++++++++---------
fs/ext2/file.c | 27 +---
fs/ext4/file.c | 60 +++----
fs/proc/task_mmu.c | 109 +++++++++++++
fs/xfs/xfs_file.c | 25 ++-
fs/xfs/xfs_trace.h | 2 +-
include/asm-generic/pgtable.h | 74 ++++++++-
include/asm-generic/tlb.h | 14 ++
include/linux/dax.h | 17 --
include/linux/huge_mm.h | 78 ++++++++-
include/linux/mm.h | 48 +++++-
include/linux/mmu_notifier.h | 14 ++
include/linux/pfn_t.h | 8 +
mm/gup.c | 7 +
mm/huge_memory.c | 246 ++++++++++++++++++++++++++++
mm/memory.c | 135 ++++++++++++++--
mm/mincore.c | 13 ++
mm/pagewalk.c | 19 ++-
mm/pgtable-generic.c | 14 ++
31 files changed, 1261 insertions(+), 212 deletions(-)
--
2.7.0.rc3
6 years, 4 months
[PATCH] ext2, ext4: Fix issue with missing journal entry
by Ross Zwisler
As it is currently written ext4_dax_mkwrite() assumes that the call into
__dax_mkwrite() will not have to do a block allocation so it doesn't create
a journal entry. For a read that creates a zero page to cover a hole
followed by a write that actually allocates storage this is incorrect. The
ext4_dax_mkwrite() -> __dax_mkwrite() -> __dax_fault() path calls
get_blocks() to allocate storage.
Fix this by having the ->page_mkwrite fault handler call ext4_dax_fault()
as this function already has all the logic needed to allocate a journal
entry and call __dax_fault().
Also update the ext2 fault handlers in this same way to remove duplicate
code and keep the logic between ext2 and ext4 the same.
Signed-off-by: Ross Zwisler <ross.zwisler(a)linux.intel.com>
---
fs/ext2/file.c | 19 +------------------
fs/ext4/file.c | 19 ++-----------------
2 files changed, 3 insertions(+), 35 deletions(-)
diff --git a/fs/ext2/file.c b/fs/ext2/file.c
index 2c88d68..c1400b1 100644
--- a/fs/ext2/file.c
+++ b/fs/ext2/file.c
@@ -80,23 +80,6 @@ static int ext2_dax_pmd_fault(struct vm_area_struct *vma, unsigned long addr,
return ret;
}
-static int ext2_dax_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
-{
- struct inode *inode = file_inode(vma->vm_file);
- struct ext2_inode_info *ei = EXT2_I(inode);
- int ret;
-
- sb_start_pagefault(inode->i_sb);
- file_update_time(vma->vm_file);
- down_read(&ei->dax_sem);
-
- ret = __dax_mkwrite(vma, vmf, ext2_get_block, NULL);
-
- up_read(&ei->dax_sem);
- sb_end_pagefault(inode->i_sb);
- return ret;
-}
-
static int ext2_dax_pfn_mkwrite(struct vm_area_struct *vma,
struct vm_fault *vmf)
{
@@ -124,7 +107,7 @@ static int ext2_dax_pfn_mkwrite(struct vm_area_struct *vma,
static const struct vm_operations_struct ext2_dax_vm_ops = {
.fault = ext2_dax_fault,
.pmd_fault = ext2_dax_pmd_fault,
- .page_mkwrite = ext2_dax_mkwrite,
+ .page_mkwrite = ext2_dax_fault,
.pfn_mkwrite = ext2_dax_pfn_mkwrite,
};
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 1126436..d2e8500 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -262,23 +262,8 @@ static int ext4_dax_pmd_fault(struct vm_area_struct *vma, unsigned long addr,
return result;
}
-static int ext4_dax_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
-{
- int err;
- struct inode *inode = file_inode(vma->vm_file);
-
- sb_start_pagefault(inode->i_sb);
- file_update_time(vma->vm_file);
- down_read(&EXT4_I(inode)->i_mmap_sem);
- err = __dax_mkwrite(vma, vmf, ext4_dax_mmap_get_block, NULL);
- up_read(&EXT4_I(inode)->i_mmap_sem);
- sb_end_pagefault(inode->i_sb);
-
- return err;
-}
-
/*
- * Handle write fault for VM_MIXEDMAP mappings. Similarly to ext4_dax_mkwrite()
+ * Handle write fault for VM_MIXEDMAP mappings. Similarly to ext4_dax_fault()
* handler we check for races agaist truncate. Note that since we cycle through
* i_mmap_sem, we are sure that also any hole punching that began before we
* were called is finished by now and so if it included part of the file we
@@ -311,7 +296,7 @@ static int ext4_dax_pfn_mkwrite(struct vm_area_struct *vma,
static const struct vm_operations_struct ext4_dax_vm_ops = {
.fault = ext4_dax_fault,
.pmd_fault = ext4_dax_pmd_fault,
- .page_mkwrite = ext4_dax_mkwrite,
+ .page_mkwrite = ext4_dax_fault,
.pfn_mkwrite = ext4_dax_pfn_mkwrite,
};
#else
--
2.5.0
6 years, 5 months
[PATCH 0/6] DAX cleanups
by Matthew Wilcox
Very little exciting in here. This is all based on the PUD support code
that I just sent, mostly addressing things that came up during review
of the PUD code but weren't really justifiable as being mixed into the
adding of PUD support.
Matthew Wilcox (6):
dax: Use vmf->gfp_mask
dax: Remove unnecessary rechecking of i_size
dax: Use vmf->pgoff in fault handlers
dax: Use PAGE_CACHE_SIZE where appropriate
dax: Factor dax_insert_pmd_mapping out of dax_pmd_fault
dax: Factor dax_insert_pud_mapping out of dax_pud_fault
fs/dax.c | 395 ++++++++++++++++++++++++++-------------------------------------
1 file changed, 164 insertions(+), 231 deletions(-)
--
2.7.0.rc3
6 years, 5 months
[PATCH v2 0/2] Expose known poison in SPA ranges to the block layer
by Vishal Verma
v2:
- Move poison list walking from pmem to core (Dan)
- If the pmem namespace starts at an offset, account for that (Dan)
- Fix a bug in extended status checking for ars_status
- Remove a duplicate include in pmem.c (only introduced in v1)
- When doing an ars_status, don't error out if an ARS has not yet
been performed.
- When checking if ARS is supported, also check the extended status
and make sure ARS for persistent memory is supported (as opposed to
just volatile memory)
- Print a dev_err message if find_poison fails
- Collapse patches 2 and 3 into a single patch
This series does a few things:
- Retrieve all known poison in the system physical address (SPA) space
using ARS (Address Range Scrub) commands to firmware
- Store this poison in a new 'nd_poison' structure
- In pmem, consume the poison list and expose the ranges as bad sectors
This depends on the badblocks series sent out previously.
A tree with the latest revisions of both the badblocks patchset and this
can be found at:
https://git.kernel.org/cgit/linux/kernel/git/vishal/nvdimm.git/log/?h=err...
Vishal Verma (2):
nfit_test: Enable DSMs for all test NFITs
libnvdimm: Add a poison list and export badblocks
drivers/acpi/nfit.c | 203 +++++++++++++++++++++++++++++++++++++++
drivers/nvdimm/core.c | 187 ++++++++++++++++++++++++++++++++++++
drivers/nvdimm/nd-core.h | 3 +
drivers/nvdimm/nd.h | 6 ++
drivers/nvdimm/pmem.c | 6 ++
include/linux/libnvdimm.h | 1 +
tools/testing/nvdimm/test/nfit.c | 9 ++
7 files changed, 415 insertions(+)
--
2.5.0
6 years, 5 months
[PATCH 1/2] block: fix pfn_mkwrite() DAX fault handler
by Ross Zwisler
Previously the pfn_mkwrite() fault handler for raw block devices called
bldev_dax_fault() -> __dax_fault() to do a full DAX page fault. Really
what the pfn_mkwrite() fault handler needs to do is call dax_pfn_mkwrite()
to make sure that the radix tree entry for the given PTE is marked as dirty
so that a follow-up fsync or msync call will flush it durably to media.
Signed-off-by: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Fixes: 5a023cdba50c ("block: enable dax for raw block devices")
---
fs/block_dev.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 7b9cd49..fa0507a 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1730,6 +1730,12 @@ static int blkdev_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
return __dax_fault(vma, vmf, blkdev_get_block, NULL);
}
+static int blkdev_dax_pfn_mkwrite(struct vm_area_struct *vma,
+ struct vm_fault *vmf)
+{
+ return dax_pfn_mkwrite(vma, vmf);
+}
+
static int blkdev_dax_pmd_fault(struct vm_area_struct *vma, unsigned long addr,
pmd_t *pmd, unsigned int flags)
{
@@ -1761,7 +1767,7 @@ static const struct vm_operations_struct blkdev_dax_vm_ops = {
.close = blkdev_vm_close,
.fault = blkdev_dax_fault,
.pmd_fault = blkdev_dax_pmd_fault,
- .pfn_mkwrite = blkdev_dax_fault,
+ .pfn_mkwrite = blkdev_dax_pfn_mkwrite,
};
static const struct vm_operations_struct blkdev_default_vm_ops = {
--
2.5.0
6 years, 6 months
[PATCH v6 0/7] DAX fsync/msync support
by Ross Zwisler
Changes since v5 [1]:
1) Merged with Dan's changes to fs/dax.c that were staged in -mm and -next.
2) Store sectors in the address_space radix tree for DAX entries instead of
addresses. This allows us to get the addresses from the block driver
via dax_map_atomic() during fsync/msync so that we can protect against
races with block device removal. (Dan)
3) Reordered things a bit in dax_writeback_one() so we clear the
PAGECACHE_TAG_TOWRITE tag even if the radix tree entry is corrupt. This
prevents us from getting into an infinite loop where we don't proceed far
enough in dax_writeback_one() to clear that flag, but
dax_writeback_mapping_range() will keep finding that entry via
find_get_entries_tag().
4) Changed the ordering of the radix tree insertion so that it happens
before the page insertion into the page tables. This ensures that we don't
end up in a case where the page table insertion succeeds and the radix tree
insertion fails which could give us a writeable PTE that has no
corresponding radix tree entry.
5) Got rid of the 'nrdax' variable in struct address_space and renamed
'nrshadows' to 'nrexceptional' so that it can be used for both DAX and
shadow exceptional entries. We explicitly prevent shadow entries from
being added to radix trees for DAX mappings, so the single counter can
safely be reused for both purposes. (Jan)
6) Updated all my WARN_ON() calls so I use the return value to know whether
I've hit an erorr. (Andrew)
This series applies cleanly and was tested against next-20151223.
A working tree can be found here:
https://git.kernel.org/cgit/linux/kernel/git/zwisler/linux.git/log/?h=fsy...
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-December/003588.html
Ross Zwisler (7):
pmem: add wb_cache_pmem() to the PMEM API
dax: support dirty DAX entries in radix tree
mm: add find_get_entries_tag()
dax: add support for fsync/msync
ext2: call dax_pfn_mkwrite() for DAX fsync/msync
ext4: call dax_pfn_mkwrite() for DAX fsync/msync
xfs: call dax_pfn_mkwrite() for DAX fsync/msync
arch/x86/include/asm/pmem.h | 11 +--
fs/block_dev.c | 2 +-
fs/dax.c | 196 ++++++++++++++++++++++++++++++++++++++++++--
fs/ext2/file.c | 4 +-
fs/ext4/file.c | 4 +-
fs/inode.c | 2 +-
fs/xfs/xfs_file.c | 7 +-
include/linux/dax.h | 7 ++
include/linux/fs.h | 3 +-
include/linux/pagemap.h | 3 +
include/linux/pmem.h | 22 ++++-
include/linux/radix-tree.h | 9 ++
mm/filemap.c | 91 ++++++++++++++++++--
mm/truncate.c | 69 +++++++++-------
mm/vmscan.c | 9 +-
mm/workingset.c | 4 +-
16 files changed, 384 insertions(+), 59 deletions(-)
--
2.6.3
6 years, 6 months
[PATCH v8 0/9] DAX fsync/msync support
by Ross Zwisler
Changes since v7 [1]:
1) Update patch 1 so that we initialize bh->b_bdev before passing it to
get_block() instead of working around the fact that it could still be NULL
after get_block() completes. (Dan)
2) Add a check to dax_radix_entry() so that we WARN_ON_ONCE() and exit
gracefully if we find a page cache entry still in the radix tree when
trying to insert a DAX entry.
This series replaces v7 in the MM tree and in the "akpm" branch of the next
tree. A working tree can be found here:
https://git.kernel.org/cgit/linux/kernel/git/zwisler/linux.git/log/?h=fsy...
[1]: https://lists.01.org/pipermail/linux-nvdimm/2016-January/003886.html
Ross Zwisler (9):
dax: fix NULL pointer dereference in __dax_dbg()
dax: fix conversion of holes to PMDs
pmem: add wb_cache_pmem() to the PMEM API
dax: support dirty DAX entries in radix tree
mm: add find_get_entries_tag()
dax: add support for fsync/msync
ext2: call dax_pfn_mkwrite() for DAX fsync/msync
ext4: call dax_pfn_mkwrite() for DAX fsync/msync
xfs: call dax_pfn_mkwrite() for DAX fsync/msync
arch/x86/include/asm/pmem.h | 11 +--
fs/block_dev.c | 2 +-
fs/dax.c | 215 ++++++++++++++++++++++++++++++++++++++++----
fs/ext2/file.c | 4 +-
fs/ext4/file.c | 4 +-
fs/inode.c | 2 +-
fs/xfs/xfs_file.c | 7 +-
include/linux/dax.h | 7 ++
include/linux/fs.h | 3 +-
include/linux/pagemap.h | 3 +
include/linux/pmem.h | 22 ++++-
include/linux/radix-tree.h | 9 ++
mm/filemap.c | 91 +++++++++++++++++--
mm/truncate.c | 69 +++++++-------
mm/vmscan.c | 9 +-
mm/workingset.c | 4 +-
16 files changed, 393 insertions(+), 69 deletions(-)
--
2.5.0
6 years, 6 months
[PATCH v8 0/3] Machine check recovery when kernel accesses poison
by Tony Luck
This series is initially targeted at the folks doing filesystems
on top of NVDIMMs. They really want to be able to return -EIO
when there is a h/w error (just like spinning rust, and SSD does).
I plan to use the same infrastructure to write a machine check aware
"copy_from_user()" that will SIGBUS the calling application when a
syscall touches poison in user space (just like we do when the application
touches the poison itself).
Changes V7-V8
Boris: Would be so much cleaner if we added a new field to the exception table
instead of squeezing bits into the fixup field. New field added
Tony: Documentation needs to be updated. Done
Changes V6-V7:
Boris: Why add/subtract 0x20000000? Added better comment provided by Andy
Boris: Churn. Part2 changes things only introduced in part1.
Merged parts 1&2 into one patch.
Ingo: Missing my sign off on part1. Added.
Changes V5-V6
Andy: Provoked massive re-write by providing what is now part1 of this
patch series. This frees up two bits in the exception table
fixup field that can be used to tag exception table entries
as different "classes". This means we don't need my separate
exception table fro machine checks. Also avoids duplicating
fixup actions for #PF and #MC cases that were in version 5.
Andy: Use C99 array initializers to tie the various class fixup
functions back to the defintions of each class. Also give the
functions meanningful names (not fixup_class0() etc.).
Boris: Cleaned up my lousy assembly code removing many spurious 'l'
modifiers on instructions.
Boris: Provided some helper functions for the machine check severity
calculation that make the code more readable.
Boris: Have __mcsafe_copy() return a structure with the 'remaining bytes'
in a separate field from the fault indicator. Boris had suggested
Linux -EFAULT/-EINVAL ... but I thought it made more sense to return
the exception number (X86_TRAP_MC, etc.) This finally kills off
BIT(63) which has been controversial throughout all the early versions
of this patch series.
Changes V4-V5
Tony: Extended __mcsafe_copy() to have fixup entries for both machine
check and page fault.
Changes V3-V4:
Andy: Simplify fixup_mcexception() by dropping used-once local variable
Andy: "Reviewed-by" tag added to part1
Boris: Moved new functions to memcpy_64.S and declaration to asm/string_64.h
Boris: Changed name s/mcsafe_memcpy/__mcsafe_copy/ to make it clear that this
is an internal function and that return value doesn't follow memcpy() semantics.
Boris: "Reviewed-by" tag added to parts 1&2
Changes V2-V3:
Andy: Don't hack "regs->ax = BIT(63) | addr;" in the machine check
handler. Now have better fixup code that computes the number
of remaining bytes (just like page-fault fixup).
Andy: #define for BIT(63). Done, plus couple of extra macros using it.
Boris: Don't clutter up generic code (like mm/extable.c) with this.
I moved everything under arch/x86 (the asm-generic change is
a more generic #define).
Boris: Dependencies for CONFIG_MCE_KERNEL_RECOVERY are too generic.
I made it a real menu item with default "n". Dan Williams
will use "select MCE_KERNEL_RECOVERY" from his persistent
filesystem code.
Boris: Simplify conditionals in mce.c by moving tolerant/kill_it
checks earlier, with a skip to end if they aren't set.
Boris: Miscellaneous grammar/punctuation. Fixed.
Boris: Don't leak spurious __start_mcextable symbols into kernels
that didn't configure MCE_KERNEL_RECOVERY. Done.
Tony: New code doesn't belong in user_copy_64.S/uaccess*.h. Moved
to new .S/.h files
Elliott:Cacheing behavior non-optimal. Could use movntdqa, vmovntdqa
or vmovntdqa on source addresses. I didn't fix this yet. Think
of the current mcsafe_memcpy() as the first of several functions.
This one is useful for small copies (meta-data) where the overhead
of saving SSE/AVX state isn't justified.
Changes V1->V2:
0-day: Reported build errors and warnings on 32-bit systems. Fixed
0-day: Reported bloat to tinyconfig. Fixed
Boris: Suggestions to use extra macros to reduce code duplication in _ASM_*EXTABLE. Done
Boris: Re-write "tolerant==3" check to reduce indentation level. See below.
Andy: Check IP is valid before searching kernel exception tables. Done.
Andy: Explain use of BIT(63) on return value from mcsafe_memcpy(). Done (added decode macros).
Andy: Untangle mess of code in tail of do_machine_check() to make it
clear what is going on (e.g. that we only enter the ist_begin_non_atomic()
if we were called from user code, not from kernel!). Done.
Tony Luck (3):
x86: Expand exception table to allow new handling options
x86, mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception
table entries
x86, mce: Add __mcsafe_copy()
Documentation/x86/exception-tables.txt | 34 ++++++++
arch/x86/include/asm/asm.h | 44 ++++++----
arch/x86/include/asm/string_64.h | 8 ++
arch/x86/include/asm/uaccess.h | 13 +--
arch/x86/kernel/cpu/mcheck/mce-severity.c | 32 ++++++-
arch/x86/kernel/cpu/mcheck/mce.c | 71 ++++++++--------
arch/x86/kernel/kprobes/core.c | 2 +-
arch/x86/kernel/traps.c | 6 +-
arch/x86/kernel/x8664_ksyms_64.c | 2 +
arch/x86/lib/memcpy_64.S | 133 ++++++++++++++++++++++++++++++
arch/x86/mm/extable.c | 84 ++++++++++++-------
arch/x86/mm/fault.c | 2 +-
scripts/sortextable.c | 30 +++++++
13 files changed, 370 insertions(+), 91 deletions(-)
--
2.1.4
6 years, 6 months
[PATCH 0/2] Fix BTT data corruptions after crash
by Toshi Kani
Data corruption issues were observed in tests which initiated a system
crash/reset while accessing BTT devices. This problem is reproducible.
The BTT driver calls pmem_rw_bytes() to update data in pmem devices.
This interface calls __copy_user_nocache(), which uses non-temporal
stores so that the stores to pmem are persistent.
__copy_user_nocache() uses non-temporal stores when a request size is
8 bytes or larger (and is aligned by 8 bytes). The BTT driver updates
the BTT map table, which entry size is 4 bytes. Therefore, updates to
the map table entries remain cached, and are not written to pmem after
a crash. Since the BTT driver makes previous blocks free and uses them
for subsequent writes, the map table ends up pointing to blocks allocated
for other LBAs after a crash.
Patch 1 extends __copy_user_nocache() to use non-temporal store for
4 byte copy. This patch fixes the BTT data corruption issue.
Patch 2 changes arch_memcpy_to_pmem() to flush processor caches when
a request is not naturally aligned or is less than 4 bytes. This is
defensive change.
---
Toshi Kani (2):
1/2 x86/lib/copy_user_64.S: Handle 4-byte uncached copy
2/2 pmem: Flush cache on unaligned request
---
arch/x86/include/asm/pmem.h | 11 +++++++++++
arch/x86/lib/copy_user_64.S | 44 +++++++++++++++++++++++++++++++++-----------
2 files changed, 44 insertions(+), 11 deletions(-)
6 years, 6 months