[PATCH v2 00/25] replace ioremap_{cache|wt} with memremap
by Dan Williams
Changes since v1 [1]:
1/ Drop the attempt at unifying ioremap() prototypes, just focus on
converting ioremap_cache and ioremap_wt over to memremap (Christoph)
2/ Drop the unrelated cleanups to use %pa in __ioremap_caller (Thomas)
3/ Add support for memremap() attempts on "System RAM" to simply return
the kernel virtual address for that range. ARM depends on this
functionality in ioremap_cache() and ACPI was open coding a similar
solution. (Mark)
4/ Split the conversions of ioremap_{cache|wt} into separate patches per
driver / arch.
5/ Fix bisection breakage and other reports from 0day-kbuild
---
While developing the pmem driver we noticed that the __iomem annotation
on the return value from ioremap_cache() was being mishandled by several
callers. We also observed that all of the call sites expected to be
able to treat the return value from ioremap_cache() as normal
(non-__iomem) pointer to memory.
This patchset takes the opportunity to clean up the above confusion as
well as a few issues with the ioremap_{cache|wt} interface, including:
1/ Eliminating the possibility of function prototypes differing between
architectures by defining a central memremap() prototype that takes
flags to determine the mapping type.
2/ Returning NULL rather than falling back silently to a different
mapping-type. This allows drivers to be stricter about the
mapping-type fallbacks that are permissible.
[1]: http://marc.info/?l=linux-arm-kernel&m=143735199029255&w=2
---
Dan Williams (22):
mm: enhance region_is_ram() to distinguish 'unknown' vs 'mixed'
arch, drivers: don't include <asm/io.h> directly, use <linux/io.h> instead
cleanup IORESOURCE_CACHEABLE vs ioremap()
intel_iommu: fix leaked ioremap mapping
arch: introduce memremap()
arm: switch from ioremap_cache to memremap
x86: switch from ioremap_cache to memremap
gma500: switch from acpi_os_ioremap to ioremap
i915: switch from acpi_os_ioremap to ioremap
acpi: switch from ioremap_cache to memremap
toshiba laptop: replace ioremap_cache with ioremap
memconsole: fix __iomem mishandling, switch to memremap
visorbus: switch from ioremap_cache to memremap
intel-iommu: switch from ioremap_cache to memremap
libnvdimm, pmem: switch from ioremap_cache to memremap
pxa2xx-flash: switch from ioremap_cache to memremap
sfi: switch from ioremap_cache to memremap
fbdev: switch from ioremap_wt to memremap
pmem: switch from ioremap_wt to memremap
arch: remove ioremap_cache, replace with arch_memremap
arch: remove ioremap_wt, replace with arch_memremap
pmem: convert to generic memremap
Toshi Kani (3):
mm, x86: Fix warning in ioremap RAM check
mm, x86: Remove region_is_ram() call from ioremap
mm: Fix bugs in region_is_ram()
arch/arc/include/asm/io.h | 1
arch/arm/Kconfig | 1
arch/arm/include/asm/io.h | 13 +++-
arch/arm/include/asm/xen/page.h | 4 +
arch/arm/mach-clps711x/board-cdb89712.c | 2 -
arch/arm/mach-shmobile/pm-rcar.c | 2 -
arch/arm/mm/ioremap.c | 12 +++-
arch/arm/mm/nommu.c | 11 ++-
arch/arm64/Kconfig | 1
arch/arm64/include/asm/acpi.h | 10 +--
arch/arm64/include/asm/dmi.h | 8 +--
arch/arm64/include/asm/io.h | 8 ++-
arch/arm64/kernel/efi.c | 9 ++-
arch/arm64/kernel/smp_spin_table.c | 19 +++---
arch/arm64/mm/ioremap.c | 20 ++----
arch/avr32/include/asm/io.h | 1
arch/frv/Kconfig | 1
arch/frv/include/asm/io.h | 17 ++---
arch/frv/mm/kmap.c | 6 ++
arch/ia64/Kconfig | 1
arch/ia64/include/asm/io.h | 11 +++
arch/ia64/kernel/cyclone.c | 2 -
arch/m32r/include/asm/io.h | 1
arch/m68k/Kconfig | 1
arch/m68k/include/asm/io_mm.h | 14 +---
arch/m68k/include/asm/io_no.h | 12 ++--
arch/m68k/include/asm/raw_io.h | 4 +
arch/m68k/mm/kmap.c | 17 +++++
arch/m68k/mm/sun3kmap.c | 6 ++
arch/metag/include/asm/io.h | 3 -
arch/microblaze/include/asm/io.h | 1
arch/mn10300/include/asm/io.h | 1
arch/nios2/include/asm/io.h | 1
arch/powerpc/kernel/pci_of_scan.c | 2 -
arch/s390/include/asm/io.h | 1
arch/sh/Kconfig | 1
arch/sh/include/asm/io.h | 20 ++++--
arch/sh/mm/ioremap.c | 10 +++
arch/sparc/include/asm/io_32.h | 1
arch/sparc/include/asm/io_64.h | 1
arch/sparc/kernel/pci.c | 3 -
arch/tile/include/asm/io.h | 1
arch/x86/Kconfig | 1
arch/x86/include/asm/efi.h | 3 +
arch/x86/include/asm/io.h | 17 +++--
arch/x86/kernel/crash_dump_64.c | 6 +-
arch/x86/kernel/kdebugfs.c | 8 +--
arch/x86/kernel/ksysfs.c | 28 ++++-----
arch/x86/mm/ioremap.c | 76 ++++++++++--------------
arch/xtensa/Kconfig | 1
arch/xtensa/include/asm/io.h | 9 ++-
drivers/acpi/apei/einj.c | 9 ++-
drivers/acpi/apei/erst.c | 6 +-
drivers/acpi/nvs.c | 6 +-
drivers/acpi/osl.c | 70 ++++++----------------
drivers/char/toshiba.c | 2 -
drivers/firmware/google/memconsole.c | 7 +-
drivers/gpu/drm/gma500/opregion.c | 2 -
drivers/gpu/drm/i915/intel_opregion.c | 2 -
drivers/iommu/intel-iommu.c | 10 ++-
drivers/iommu/intel_irq_remapping.c | 4 +
drivers/isdn/icn/icn.h | 2 -
drivers/mtd/devices/slram.c | 2 -
drivers/mtd/maps/pxa2xx-flash.c | 4 +
drivers/mtd/nand/diskonchip.c | 2 -
drivers/mtd/onenand/generic.c | 2 -
drivers/nvdimm/Kconfig | 2 -
drivers/pci/probe.c | 3 -
drivers/pnp/manager.c | 2 -
drivers/scsi/aic94xx/aic94xx_init.c | 7 --
drivers/scsi/arcmsr/arcmsr_hba.c | 5 --
drivers/scsi/mvsas/mv_init.c | 15 +----
drivers/scsi/sun3x_esp.c | 2 -
drivers/sfi/sfi_core.c | 4 +
drivers/staging/comedi/drivers/ii_pci20kc.c | 1
drivers/staging/unisys/visorbus/visorchannel.c | 16 +++--
drivers/staging/unisys/visorbus/visorchipset.c | 17 +++--
drivers/tty/serial/8250/8250_core.c | 2 -
drivers/video/fbdev/Kconfig | 2 -
drivers/video/fbdev/amifb.c | 5 +-
drivers/video/fbdev/atafb.c | 5 +-
drivers/video/fbdev/hpfb.c | 6 +-
drivers/video/fbdev/ocfb.c | 1
drivers/video/fbdev/s1d13xxxfb.c | 3 -
drivers/video/fbdev/stifb.c | 1
include/acpi/acpi_io.h | 6 +-
include/asm-generic/io.h | 8 ---
include/asm-generic/iomap.h | 4 -
include/linux/io-mapping.h | 2 -
include/linux/io.h | 9 +++
include/linux/mtd/map.h | 2 -
include/linux/pmem.h | 26 +++++---
include/video/vga.h | 2 -
kernel/Makefile | 2 +
kernel/memremap.c | 74 +++++++++++++++++++++++
kernel/resource.c | 43 +++++++-------
lib/Kconfig | 5 +-
lib/devres.c | 13 +---
lib/pci_iomap.c | 7 +-
tools/testing/nvdimm/Kbuild | 4 +
tools/testing/nvdimm/test/iomap.c | 34 ++++++++---
101 files changed, 482 insertions(+), 398 deletions(-)
create mode 100644 kernel/memremap.c
4 years, 9 months
[PATCH v1 00/10] uuid: convert users to generic UUID API
by Andy Shevchenko
There are few fumctions here and there along with type definitions that provide
UUID API. This series consolidates everything under one hood and converts
current users.
This has been tested for a while internally, however it doesn't mean we covered
all possible cases (especially accuracy of UUID constants after conversion).
So, please test this as much as you can and provide your tag. We appreciate the
effort.
Andy Shevchenko (10):
lib/vsprintf: simplify UUID printing
lib/uuid: move generate_random_uuid() to uuid.c
lib/uuid: introduce few more generic helpers for UUID
lib/uuid: remove FSF address
ACPI: switch to use generic UUID API
device property: switch to use UUID API
sysctl: drop away useless label
sysctl: use generic UUID library
efi: redefine type, constant, macro from generic code
efivars: use generic UUID library
drivers/acpi/acpi_extlog.c | 8 +-
drivers/acpi/bus.c | 29 +------
drivers/acpi/nfit.c | 34 ++++----
drivers/acpi/nfit.h | 3 +-
drivers/acpi/property.c | 18 ++---
drivers/acpi/utils.c | 4 +-
drivers/char/random.c | 21 +----
drivers/char/tpm/tpm_crb.c | 9 +--
drivers/char/tpm/tpm_ppi.c | 20 ++---
drivers/gpu/drm/i915/intel_acpi.c | 14 ++--
drivers/gpu/drm/nouveau/nouveau_acpi.c | 20 +++--
drivers/gpu/drm/nouveau/nvkm/subdev/mxm/base.c | 9 +--
drivers/hid/i2c-hid/i2c-hid.c | 9 +--
drivers/iommu/dmar.c | 11 ++-
drivers/pci/pci-acpi.c | 11 ++-
drivers/pci/pci-label.c | 4 +-
drivers/thermal/int340x_thermal/int3400_thermal.c | 6 +-
drivers/usb/host/xhci-pci.c | 9 +--
fs/btrfs/volumes.c | 2 +-
fs/efivarfs/inode.c | 40 +---------
fs/ext4/ioctl.c | 1 +
fs/f2fs/file.c | 2 +-
fs/reiserfs/objectid.c | 2 +-
fs/ubifs/sb.c | 2 +-
include/acpi/acpi_bus.h | 10 ++-
include/linux/acpi.h | 2 +-
include/linux/efi.h | 14 +---
include/linux/pci-acpi.h | 2 +-
include/linux/random.h | 1 -
include/linux/uuid.h | 21 +++--
include/uapi/linux/uuid.h | 4 -
kernel/sysctl_binary.c | 30 +++----
lib/uuid.c | 96 +++++++++++++++++++++--
lib/vsprintf.c | 21 ++---
sound/soc/intel/skylake/skl-nhlt.c | 7 +-
35 files changed, 237 insertions(+), 259 deletions(-)
--
2.7.0
4 years, 9 months
[PATCH v4 0/8] Support for transparent PUD pages for DAX files
by Matthew Wilcox
We have customer demand to use 1GB pages to map DAX files. Unlike the 2MB
page support, the Linux MM does not currently support PUD pages, so I have
attempted to add support for the necessary pieces for DAX huge PUD pages.
Filesystems still need work to allocate 1GB pages. With ext4, I can
only get 16MB of contiguous space, although it is aligned. With XFS,
I can get 80MB less than 1GB, and it's not aligned. The XFS problem
may be due to the small amount of RAM in my test machine.
This patch set is against something approximately current -mm. I'd like
to thank Dave Chinner & Kirill Shutemov for their reviews of v1.
The conversion of pmd_fault & pud_fault to huge_fault is thanks to
Dave's poking, and Kirill spotted a couple of problems in the MM code.
Version 2 of the patch set is about 200 lines smaller (1016 insertions,
23 deletions in v1).
I've done some light testing using a program to mmap a block device
with DAX enabled, calling mincore() and examining /proc/smaps and
/proc/pagemap.
v4: Updated to current mmotm
Converted pud_trans_huge_lock to the same calling conventions as
pmd_trans_huge_lock.
Fill in vm_fault ->gfp_flags and ->pgoff, at Jan Kara's suggestion
Replace use of page table lock with pud_lock in __pud_alloc (cosmetic)
Fix compilation problems with various config settings
Convert dax_pmd_fault and dax_pud_fault to take a vm_fault instead of
individual pieces
Add copy_huge_pud() and follow_devmap_pud() so fork() should now work
Fix typo of PMD for PUD
v3: Rebased against current mmtom
v2: Reduced churn in filesystems by switching to ->huge_fault interface
Addressed concerns from Kirill
Matthew Wilcox (8):
mm: Convert an open-coded VM_BUG_ON_VMA
mm,fs,dax: Change ->pmd_fault to ->huge_fault
mm: Add support for PUD-sized transparent hugepages
mincore: Add support for PUDs
procfs: Add support for PUDs to smaps, clear_refs and pagemap
x86: Add support for PUD-sized transparent hugepages
dax: Support for transparent PUD pages
ext4: Support for PUD-sized transparent huge pages
Documentation/filesystems/dax.txt | 12 +-
arch/Kconfig | 3 +
arch/x86/Kconfig | 1 +
arch/x86/include/asm/paravirt.h | 11 ++
arch/x86/include/asm/paravirt_types.h | 2 +
arch/x86/include/asm/pgtable-2level.h | 19 +++
arch/x86/include/asm/pgtable-3level.h | 31 ++++
arch/x86/include/asm/pgtable.h | 134 +++++++++++++++
arch/x86/include/asm/pgtable_64.h | 13 ++
arch/x86/kernel/paravirt.c | 1 +
arch/x86/mm/pgtable.c | 31 ++++
fs/block_dev.c | 10 +-
fs/dax.c | 295 +++++++++++++++++++++++++---------
fs/ext2/file.c | 27 +---
fs/ext4/file.c | 60 +++----
fs/proc/task_mmu.c | 109 +++++++++++++
fs/xfs/xfs_file.c | 25 ++-
fs/xfs/xfs_trace.h | 2 +-
include/asm-generic/pgtable.h | 74 ++++++++-
include/asm-generic/tlb.h | 14 ++
include/linux/dax.h | 17 --
include/linux/huge_mm.h | 78 ++++++++-
include/linux/mm.h | 48 +++++-
include/linux/mmu_notifier.h | 14 ++
include/linux/pfn_t.h | 8 +
mm/gup.c | 7 +
mm/huge_memory.c | 246 ++++++++++++++++++++++++++++
mm/memory.c | 135 ++++++++++++++--
mm/mincore.c | 13 ++
mm/pagewalk.c | 19 ++-
mm/pgtable-generic.c | 14 ++
31 files changed, 1261 insertions(+), 212 deletions(-)
--
2.7.0.rc3
4 years, 9 months
[RFC 0/2] New MAP_PMEM_AWARE mmap flag
by Boaz Harrosh
Hi all
Recent DAX code fixed the cl_flushing ie durability of mmap access
of direct persistent-memory from applications. It uses the radix-tree
per inode to track the indexes of a file that where page-faulted for
write. Then at m/fsync time it would cl_flush these pages and clean
the radix-tree, for the next round.
Sigh, that is life, for legacy applications this is the price we must
pay. But for NV aware applications like nvml library, we pay extra extra
price, even if we do not actually call m/fsync eventually. For these
applications these extra resources and especially the extra radix locking
per page-fault, costs a lot, like x3 a lot.
What we propose here is a way for those applications to enjoy the
boost and still not sacrifice any correctness of legacy applications.
Any concurrent access from legacy apps vs nv-aware apps even to the same
file / same page, will work correctly.
We do that by defining a new MMAP flag that is set by the nv-aware
app. this flag is carried by the VMA. In the dax code we bypass any
radix handling of the page if this flag is set. Those pages accessed *without*
this flag will be added to the radix-tree, those with will not.
At m/fsync time if the radix tree is then empty nothing will happen.
These are very simple none intrusive patches with minimum risk. (I think)
They are based on v4.5-rc5. If you need a rebase on any other tree please
say.
Please consider this new flag for those of us people who specialize in
persistent-memory setups and want to extract any possible mileage out
of our systems.
Also attached for reference a 3rd patch to the nvml library to use
the new flag. Which brings me to the issue of persistent_memcpy / persistent_flush.
Currently this library is for x86_64 only, using the movnt instructions. The gcc
compiler should have a per ARCH facility for durable memory accesses. So applications
can be portable across systems.
Please advise?
list of patches:
[RFC 1/2] mmap: Define a new MAP_PMEM_AWARE mmap flag
[RFC 2/2] REVIEWME: dax: Support MAP_PMEM_AWARE for optimal
Two Kernel patches
[RFC 1/1] util: add pmem-aware flag to mmap
A patch for the nvml library
Thanks
Boaz
4 years, 10 months
acpi_nfit_find_poison() question
by Linda Knippers
Hi Vishal,
I was looking at acpi_nfit_find_poison() and if I'm reading this
right, I think it's throwing away some ARS results and re-running
an ARS unnecessarily. More comments below...
-- ljk
> static int acpi_nfit_find_poison(struct acpi_nfit_desc *acpi_desc,
> struct nd_region_desc *ndr_desc)
> {
> struct nvdimm_bus_descriptor *nd_desc = &acpi_desc->nd_desc;
> struct nvdimm_bus *nvdimm_bus = acpi_desc->nvdimm_bus;
> struct nd_cmd_ars_status *ars_status = NULL;
> struct nd_cmd_ars_start *ars_start = NULL;
> struct nd_cmd_ars_cap *ars_cap = NULL;
> u64 start, len, cur, remaining;
> int rc;
>
> ars_cap = kzalloc(sizeof(*ars_cap), GFP_KERNEL);
> if (!ars_cap)
> return -ENOMEM;
>
> start = ndr_desc->res->start;
> len = ndr_desc->res->end - ndr_desc->res->start + 1;
>
> rc = ars_get_cap(nd_desc, ars_cap, start, len);
> if (rc)
> goto out;
>
> /*
> * If ARS is unsupported, or if the 'Persistent Memory Scrub' flag in
> * extended status is not set, skip this but continue initialization
> */
> if ((ars_cap->status & 0xffff) ||
> !(ars_cap->status >> 16 & ND_ARS_PERSISTENT)) {
> dev_warn(acpi_desc->dev,
> "ARS unsupported (status: 0x%x), won't create an error list\n",
> ars_cap->status);
> goto out;
> }
>
> /*
> * Check if a full-range ARS has been run. If so, use those results
> * without having to start a new ARS.
> */
> ars_status = kzalloc(ars_cap->max_ars_out + sizeof(*ars_status),
> GFP_KERNEL);
> if (!ars_status) {
> rc = -ENOMEM;
> goto out;
> }
>
> rc = ars_get_status(nd_desc, ars_status);
> if (rc)
> goto out;
>
> if (ars_status->address <= start &&
> (ars_status->address + ars_status->length >= start + len)) {
> rc = ars_status_process_records(nvdimm_bus, ars_status, start);
> goto out;
> }
The above code will process the records if the ARS ran to completion but
not if the ARS overflowed. It won't process partial results because it's
checking both the start and the length against the total range.
>
> /*
> * ARS_STATUS can overflow if the number of poison entries found is
> * greater than the maximum buffer size (ars_cap->max_ars_out)
> * To detect overflow, check if the length field of ars_status
> * is less than the length we supplied. If so, process the
> * error entries we got, adjust the start point, and start again
> */
This comment seems like the right idea but that's not what it's doing.
> ars_start = kzalloc(sizeof(*ars_start), GFP_KERNEL);
> if (!ars_start)
> return -ENOMEM;
>
> cur = start;
> remaining = len;
If we get here, we're starting over at the beginning, losing the
previous results. Shouldn't we process the previous results and
then enter this loop using
cur = ars_status->address + ars_status->length;
remaining = len - ars_status->length;
?
Or restructure the loop so that the existing results, if any, are
processed before doing an ars_do_start()? Or did I miss something?
> do {
> u64 done, end;
>
> rc = ars_do_start(nd_desc, ars_start, cur, remaining);
> if (rc)
> goto out;
>
> rc = ars_get_status(nd_desc, ars_status);
> if (rc)
> goto out;
>
> rc = ars_status_process_records(nvdimm_bus, ars_status, cur);
> if (rc)
> goto out;
>
> end = min(cur + remaining,
> ars_status->address + ars_status->length);
> done = end - cur;
> cur += done;
> remaining -= done;
> } while (remaining);
>
> out:
> kfree(ars_cap);
> kfree(ars_start);
> kfree(ars_status);
> return rc;
> }
4 years, 10 months
[PATCH 0/8] nfit, libnvdimm: async address range scrub
by Dan Williams
Given the capacities of next generation persistent memory devices a
scrub operation to find all poison may take 10s of seconds. We want
this scrub work to be done asynchronously with the rest of system
initialization, so we move it out of line from the NFIT probing, i.e.
acpi_nfit_add().
However, we may want to synchronously wait for that scrubbing to
complete before we probe any pmem devices. Consider the case where
consuming poison triggers a machine check and a reboot. That event will
trigger platform firmware to initiate a scrub. The kernel should
complete any firmware initiated scrubs as those likely indicate the
presence of known poison.
When errors are not present, platform firmware did not initiate
scrubbing, we still scrub, but asynchronously. This trades off a risk
of hitting new unknown poison ranges with making the data available
faster after loading the driver.
This async scrub capability is also useful in the future when we
integrate Tony Luck's mcsafe_copy() (or whatever it is
eventually called). After a machine check recovery event we can scrub
the pmem namespace to see if there are any other latent errors and
otherwise update the 'badblocks' list with the new entries.
This passes the libndctl unit test suite, with some minor updates to
account for the fact that when "modprobe nfit_test" returns not all
regions are registered.
---
Dan Williams (8):
libnvdimm, nfit: centralize command status translation
libnvdimm: protect nvdimm_{bus|namespace}_add_poison() with nvdimm_bus_lock()
libnvdimm: async notification support
nfit, tools/testing/nvdimm: unify common init for acpi_nfit_desc
nfit, libnvdimm: async region scrub workqueue
nfit: scrub and register regions in a workqueue
nfit: disable userspace initiated ars during scrub
tools/testing/nvdimm: expand ars unit testing
drivers/acpi/nfit.c | 761 +++++++++++++++++++++++++++-----------
drivers/acpi/nfit.h | 24 +
drivers/nvdimm/bus.c | 46 ++
drivers/nvdimm/core.c | 110 ++++-
drivers/nvdimm/dimm_devs.c | 6
drivers/nvdimm/nd.h | 2
drivers/nvdimm/pmem.c | 15 +
drivers/nvdimm/region.c | 12 +
include/linux/libnvdimm.h | 5
include/linux/nd.h | 7
tools/testing/nvdimm/test/nfit.c | 133 +++++--
11 files changed, 809 insertions(+), 312 deletions(-)
4 years, 10 months
[PATCH v2 0/3] ACPI 6.1 update for NFIT Control Region Structure
by Toshi Kani
ACPI 6.1, Table 5-133, updates NVDIMM Control Region Structure
as follows.
- Valid Fields, Manufacturing Location, and Manufacturing Date
are added from reserved range. No change in the structure size.
- IDs defined as SPD values are arrays of bytes. The spec
clarified that they need to be represented as arrays of bytes
as well.
Patch 1 changes 'struct acpi_nfit_control_region' and the NFIT driver to
comply ACPI 6.1.
Patch 2 adds a new sysfs file "id" to show NVDIMM ID defined in ACPI 6.1.
Patch 3 changes the nfit test driver.
link: http://www.uefi.org/sites/default/files/resources/ACPI_6_1.pdf
---
v2:
- Remove 'mfg_location' and 'mfg_date'. (Dan Williams)
- Rename 'unique_id' to 'id' and make this change as a separate patch.
(Dan Williams)
---
Toshi Kani (3):
1/3 ACPI/NFIT: Update Control Region Structure to comply ACPI 6.1
2/3 ACPI/NFIT: Add NVDIMM ID "id" under sysfs
3/3 nfit_test: Update SPD ID init handlings
---
drivers/acpi/nfit.c | 41 ++++++++++++++++++++-----
include/acpi/actbl1.h | 24 +++++++++------
tools/testing/nvdimm/test/nfit.c | 64 ++++++++++++++++++++++++----------------
3 files changed, 88 insertions(+), 41 deletions(-)
4 years, 10 months