[PATCH v3 0/2] Support ACPI 6.1 update in NFIT Control Region Structure
by Toshi Kani
ACPI 6.1, Table 5-133, updates NVDIMM Control Region Structure as
follows.
- Valid Fields, Manufacturing Location, and Manufacturing Date
are added from reserved range. No change in the structure size.
- IDs (SPD values) are stored as arrays of bytes (i.e. big-endian
format). The spec clarifies that they need to be represented
as arrays of bytes as well.
Patch 1 changes the NFIT driver to comply with ACPI 6.1.
Patch 2 adds a new sysfs file "id" to show NVDIMM ID defined in ACPI 6.1.
The patch-set applies on linux-pm.git acpica.
link: http://www.uefi.org/sites/default/files/resources/ACPI_6_1.pdf
---
v3:
- Need to coordinate with ACPICA update (Bob Moore, Dan Williams)
- Integrate with ACPICA changes in struct acpi_nfit_control_region.
(commit 138a95547ab0)
v2:
- Remove 'mfg_location' and 'mfg_date'. (Dan Williams)
- Rename 'unique_id' to 'id' and make this change as a separate patch.
(Dan Williams)
---
Toshi Kani (3):
1/2 acpi/nfit: Update nfit driver to comply with ACPI 6.1
2/3 acpi/nfit: Add sysfs "id" for NVDIMM ID
---
drivers/acpi/nfit.c | 29 ++++++++++++++++++++++++-----
1 file changed, 24 insertions(+), 5 deletions(-)
2 years, 7 months
[RFC v3] [PATCH 0/18] DAX page fault locking
by Jan Kara
Hello,
this is my third attempt at DAX page fault locking rewrite. The patch set has
passed xfstests both with and without DAX mount option on ext4 and xfs for
me and also additional page fault beating using the new page fault stress
tests I have added to xfstests. So I'd be grateful if you guys could have a
closer look at the patches so that they can be merged. Thanks.
Changes since v2:
- lot of additional ext4 fixes and cleanups
- make PMD page faults depend on CONFIG_BROKEN instead of #if 0
- fixed page reference leak when replacing hole page with a pfn
- added some reviewed-by tags
- rebased on top of current Linus' tree
Changes since v1:
- handle wakeups of exclusive waiters properly
- fix cow fault races
- other minor stuff
General description
The basic idea is that we use a bit in an exceptional radix tree entry as
a lock bit and use it similarly to how page lock is used for normal faults.
That way we fix races between hole instantiation and read faults of the
same index. For now I have disabled PMD faults since there the issues with
page fault locking are even worse. Now that Matthew's multi-order radix tree
has landed, I can have a look into using that for proper locking of PMD faults
but first I want normal pages sorted out.
In the end I have decided to implement the bit locking directly in the DAX
code. Originally I was thinking we could provide something generic directly
in the radix tree code but the functions DAX needs are rather specific.
Maybe someone else will have a good idea how to distill some generally useful
functions out of what I've implemented for DAX but for now I didn't bother
with that.
Honza
4 years, 8 months
[PATCH v4 0/7] dax: handling media errors
by Vishal Verma
Until now, dax has been disabled if media errors were found on
any device. This series attempts to address that.
The first three patches from Dan re-enable dax even when media
errors are present.
The fourth patch from Matthew removes the zeroout path from dax
entirely, making zeroout operations always go through the driver
(The motivation is that if a backing device has media errors,
and we create a sparse file on it, we don't want the initial
zeroing to happen via dax, we want to give the block driver a
chance to clear the errors).
The fifth patch changes how DAX IO is re-routed as direct IO.
We add a new iocb flag for DAX to distinguish it from actual
direct IO, and if we're in O_DIRECT, use the regular direct_IO
path instead of DAX. This gives us an opportunity to do recovery
by doing O_DIRECT writes that will go through the driver to clear
errors from bad sectors.
Patch 6 reduces our calls to clear_pmem from dax in the
truncate/hole-punch cases. We check if the range being truncated
is sector aligned/sized, and if so, send blkdev_issue_zeroout
instead of clear_pmem so that errors can be handled better by
the driver.
Patch 7 fixes a redundant comment in DAX and is mostly unrelated
to the rest of this series.
This series also depends on/is based on Jan Kara's DAX Locking
fixes series [1].
[1]: http://www.spinics.net/lists/linux-mm/msg105819.html
v4:
- Remove the dax->direct_IO fallbacks entirely. Instead, go through
the usual direct_IO path when we're in O_DIRECT, and use dax_IO
for other, non O_DIRECT IO. (Dan, Christoph)
v3:
- Wrapper-ize the direct_IO fallback again and make an exception
for -EIOCBQUEUED (Jeff, Dan)
- Reduce clear_pmem usage in DAX to the minimum
Dan Williams (3):
block, dax: pass blk_dax_ctl through to drivers
dax: fallback from pmd to pte on error
dax: enable dax in the presence of known media errors (badblocks)
Matthew Wilcox (1):
dax: use sb_issue_zerout instead of calling dax_clear_sectors
Vishal Verma (3):
fs: prioritize and separate direct_io from dax_io
dax: for truncate/hole-punch, do zeroing through the driver if
possible
dax: fix a comment in dax_zero_page_range and dax_truncate_page
arch/powerpc/sysdev/axonram.c | 10 +++---
block/ioctl.c | 9 -----
drivers/block/brd.c | 9 ++---
drivers/block/loop.c | 2 +-
drivers/nvdimm/pmem.c | 17 +++++++---
drivers/s390/block/dcssblk.c | 12 +++----
fs/block_dev.c | 19 ++++++++---
fs/dax.c | 78 +++++++++++++++----------------------------
fs/ext2/inode.c | 23 ++++++++-----
fs/ext4/file.c | 2 +-
fs/ext4/inode.c | 19 +++++++----
fs/xfs/xfs_aops.c | 20 +++++++----
fs/xfs/xfs_bmap_util.c | 15 +++------
fs/xfs/xfs_file.c | 4 +--
include/linux/blkdev.h | 3 +-
include/linux/dax.h | 1 -
include/linux/fs.h | 15 +++++++--
mm/filemap.c | 4 +--
18 files changed, 134 insertions(+), 128 deletions(-)
--
2.5.5
4 years, 8 months
[PATCH v11 0/5] libnvidmm, nfit: dimm command marshaling
by Dan Williams
Jerry and I have been working towards a way to support the ACPI DSM
command set needed by HPE DIMMs. The HPE command sets differ
from the original Intel-defined command set already upstream.
Ideally the kernel would only implement a single standard command
format, however the standard is not yet available and devices
implementing an alternate command set are already shipping.
This rework of Jerry's initial patches [1] aims to support shipping
devices while encouraging future / follow-on command definitions to wait
for the standardization process to complete by:
1/ Requiring public documentation of commands
2/ Providing a mechanism to disable vendor-specific functionality
See patch 2 for more details. This patch passes the existing nvdimm
unit tests, but I have yet to extend the tests to target this new
mechanism.
Changes since v10: [1]
1/ Rewrote the commit message for the patch that introduces ND_CMD_CALL
2/ Replace 'nfit_cmd_family_tbl' with nfit_mem->family to clean up some
lookup code.
3/ Squash and reorganize the 7 patches into a smaller set. Commit
8467ba4fc94a from my for-4.7/dsm branch [2] was also squashed.
4/ Add sysfs attributes for the dimm family and DSM function-supported
mask.
5/ Add a module parameter to disable vendor specific commands
[1]: https://lists.01.org/pipermail/linux-nvdimm/2016-April/005484.html
[2]: https://git.kernel.org/cgit/linux/kernel/git/djbw/nvdimm.git/commit/?h=fo...
---
Dan Williams (5):
nfit, libnvdimm: clarify "commands" vs "_DSMs"
nfit, libnvdimm: limited/whitelisted dimm command marshaling mechanism
nfit: disable vendor specific commands
tools/testing/nvdimm: ND_CMD_CALL support
nfit: add sysfs dimm 'family' and 'dsm_mask' attributes
drivers/acpi/nfit.c | 145 +++++++++++++++++++++++++++++++++-----
drivers/acpi/nfit.h | 18 ++++-
drivers/nvdimm/bus.c | 47 +++++++++++-
drivers/nvdimm/core.c | 2 -
drivers/nvdimm/dimm_devs.c | 18 +++--
drivers/nvdimm/nd-core.h | 2 -
include/linux/libnvdimm.h | 5 +
include/uapi/linux/ndctl.h | 42 +++++++++++
tools/testing/nvdimm/test/nfit.c | 46 ++++++++----
9 files changed, 275 insertions(+), 50 deletions(-)
4 years, 8 months
[PATCH v2 0/5] dax: handling of media errors
by Vishal Verma
Until now, dax has been disabled if media errors were found on
any device. This series attempts to address that.
The first three patches from Dan re-enable dax even when media
errors are present.
The fourth patch from Matthew removes the
zeroout path from dax entirely, making zeroout operations always
go through the driver (The motivation is that if a backing device
has media errors, and we create a sparse file on it, we don't
want the initial zeroing to happen via dax, we want to give the
block driver a chance to clear the errors).
One pending item is addressing clear_pmem usages in dax.c. clear_pmem is
'unsafe' as it attempts to simply memcpy, and does not go through the driver.
We have a few options of solving this:
1. Remove all usages of clear_pmem that are not sector-aligned. For the
ones that are aligned, replace them with a bio submission that goes
through the driver to clear errors.
2. Export from the block layer, either an API to zero sub-sector ranges,
or in general, clear errors in a range. The dax attempts to clear_pmem
can then use either of these and not be hit be media errors.
I'll send out a v3 with a crack at option 1, but I wanted to get these
changes (especially the ones in xfs) out for review.
The fifth patch changes all the callers of dax_do_io to check for
EIO, and fallback to direct_IO as needed. This forces the IO to
go through the block driver, and can attempt to clear the error.
v2:
- Use blockdev_issue_zeroout in xfs instead of sb_issue_zeroout (Christoph)
- Un-wrapper-ize dax_do_io and leave the fallback to direct_IO to callers
(Christoph)
- Rebase to v4.6-rc1 (fixup a couple of conflicts in ext4 and xfs)
Dan Williams (3):
block, dax: pass blk_dax_ctl through to drivers
dax: fallback from pmd to pte on error
dax: enable dax in the presence of known media errors (badblocks)
Vishal Verma (2):
dax: use sb_issue_zerout instead of calling dax_clear_sectors
dax: handle media errors in dax_do_io
arch/powerpc/sysdev/axonram.c | 10 +++++-----
block/ioctl.c | 9 ---------
drivers/block/brd.c | 9 +++++----
drivers/nvdimm/pmem.c | 17 +++++++++++++----
drivers/s390/block/dcssblk.c | 12 ++++++------
fs/block_dev.c | 19 +++++++++++++++----
fs/dax.c | 36 ++----------------------------------
fs/ext2/inode.c | 29 ++++++++++++++++++-----------
fs/ext4/indirect.c | 18 +++++++++++++-----
fs/ext4/inode.c | 21 ++++++++++++++-------
fs/xfs/xfs_aops.c | 14 ++++++++++++--
fs/xfs/xfs_bmap_util.c | 15 ++++-----------
include/linux/blkdev.h | 3 +--
include/linux/dax.h | 1 -
14 files changed, 108 insertions(+), 105 deletions(-)
--
2.5.5
4 years, 8 months
[PATCH 0/3] Add alignment check for DAX mount
by Toshi Kani
When a partition is not aligned by 4KB, mount -o dax succeeds,
but any read/write access to the filesystem fails, except for
metadata update.
Add alignment check to ext4, ext2, and xfs.
---
Toshi Kani (3):
1/3 ext4: Add alignment check for DAX mount
2/3 ext2: Add alignment check for DAX mount
3/3 xfs: Add alignment check for DAX mount
---
fs/ext2/super.c | 6 ++++++
fs/ext4/super.c | 6 ++++++
fs/xfs/xfs_super.c | 6 ++++++
3 files changed, 18 insertions(+)
4 years, 8 months
acpi_nfit_query_poison() broken on non-ARS systems
by Linda Knippers
I tested Vishal's original ARS patches and they worked correctly on my test
system, which doesn't support ARS. By worked correctly, I mean that they
properly looked at the status of the Query ARS Capabilities function and saw
that it indicated that the function is not supported. The original code
skipped the rest of the ARS processing.
The code I tested has been re-worked a number of times and now it no longer
looks at the status of the Query function. It assumes that if the DSM worked
at all, it must be ok and happily uses fields that is shouldn't be looking at.
If you look here:
> static int acpi_nfit_query_poison(struct acpi_nfit_desc *acpi_desc,
> struct nfit_spa *nfit_spa)
> {
> struct acpi_nfit_system_address *spa = nfit_spa->spa;
> int rc;
>
> if (!nfit_spa->max_ars) {
On a platform that doesn't support ARS, max_ars will always be zero
so you'll keep executing this code when it seems like you're trying
to avoid that.
> struct nd_cmd_ars_cap ars_cap;
>
> memset(&ars_cap, 0, sizeof(ars_cap));
> rc = ars_get_cap(acpi_desc, &ars_cap, nfit_spa);
> if (rc < 0)
> return rc;
The call succeeds so this return isn't taken, but then the code assumes
everything is good. It should check ars_cap.status so see if the function
is supported or if there was an error and return something appropriate.
In previous version of the code, that's what acpi_nfit_find_poison() did.
Instead, this function continues, using data that's not right and making
more calls that also aren't supported.
> nfit_spa->max_ars = ars_cap.max_ars_out;
> nfit_spa->clear_err_unit = ars_cap.clear_err_unit;
Since we don't bail out, the above values are zero but not actually checked
by subsequent users.
> /* check that the supported scrub types match the spa type */
> if (nfit_spa_type(spa) == NFIT_SPA_VOLATILE &&
> ((ars_cap.status >> 16) & ND_ARS_VOLATILE) == 0)
> return -ENOTTY;
> else if (nfit_spa_type(spa) == NFIT_SPA_PM &&
> ((ars_cap.status >> 16) & ND_ARS_PERSISTENT) == 0)
> return -ENOTTY;
> }
>
> if (ars_status_alloc(acpi_desc, nfit_spa->max_ars))
> return -ENOMEM;
>
> rc = ars_get_status(acpi_desc);
> if (rc < 0 && rc != -ENOSPC)
> return rc;
>
> if (ars_status_process_records(acpi_desc->nvdimm_bus,
> acpi_desc->ars_status))
> return -ENOMEM;
>
> return 0;
> }
I don't know if you want to change how this function works or change how
ars_get_cap works but something needs to change. You actually do check the
status in xlat_status() if the call was initiated from user space but you don't
check when it's initiated from within the kernel.
For reference, below is the dmesg output from my system running the latest
upstream kernel.
-- ljk
[ 36.807155] nfit ACPI0012:00: acpi_nfit_ctl:bus cmd: ars_cap input length: 16
[ 36.807159] ars_cap00000000: 80000000 00000004 00000000 00000002
................
[ 36.807336] nfit ACPI0012:00: acpi_nfit_ctl:bus cmd: ars_cap output length: 16
[ 36.807338] ars_cap00000000: 00000001 00000000 00000000 00000000
................
[ 36.807427] nfit ACPI0012:00: acpi_nfit_ctl:bus cmd: ars_start input length: 24
[ 36.807430] ars_start00000000: 80000000 00000004 00000000 00000002
................
[ 36.807431] ars_start00000010: 00000002 00000000 ........
[ 36.807513] nfit ACPI0012:00: acpi_nfit_ctl:bus cmd: ars_start output length: 4
[ 36.807515] ars_start00000000: 00000001 ....
[ 36.807516] nfit ACPI0012:00: acpi_nfit_ctl:bus output object underflow cmd:
ars_start field: 1
[ 36.807518] nfit ACPI0012:00: acpi_nfit_ctl:bus cmd: ars_start input length: 24
[ 36.807520] ars_start00000000: 80000000 00000006 00000000 00000002
................
[ 36.807521] ars_start00000010: 00000002 00000000 ........
[ 36.807593] nfit ACPI0012:00: acpi_nfit_ctl:bus cmd: ars_start output length: 4
[ 36.807594] ars_start00000000: 00000001 ....
[ 36.807596] nfit ACPI0012:00: acpi_nfit_ctl:bus output object underflow cmd:
ars_start field: 1
[ 36.807597] nfit ACPI0012:00: acpi_nfit_ctl:bus cmd: ars_start input length: 24
[ 36.807599] ars_start00000000: 80000000 0000000c 00000000 00000002
................
[ 36.807600] ars_start00000010: 00000002 00000000 ........
[ 36.807670] nfit ACPI0012:00: acpi_nfit_ctl:bus cmd: ars_start output length: 4
[ 36.807671] ars_start00000000: 00000001 ....
[ 36.807673] nfit ACPI0012:00: acpi_nfit_ctl:bus output object underflow cmd:
ars_start field: 1
[ 36.807674] nfit ACPI0012:00: acpi_nfit_ctl:bus cmd: ars_start input length: 24
[ 36.807676] ars_start00000000: 80000000 0000000e 00000000 00000002
................
[ 36.807677] ars_start00000010: 00000002 00000000 ........
[ 36.807747] nfit ACPI0012:00: acpi_nfit_ctl:bus cmd: ars_start output length: 4
[ 36.807748] ars_start00000000: 00000001 ....
[ 36.807749] nfit ACPI0012:00: acpi_nfit_ctl:bus output object underflow cmd:
ars_start field: 1
4 years, 8 months
[PATCH] nfit: export subsystem ids as attributes
by Dan Williams
Similar to pci-sysfs export the subsystem information available in the
NFIT. ACPI 6.1 clarifies that this data is copied as an array of bytes
from the DIMM SPD data.
Reported-by: Ryon Jensen <ryon.jensen(a)intel.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
drivers/acpi/nfit.c | 33 ++++++++++++++++++++++++++++++++-
1 file changed, 32 insertions(+), 1 deletion(-)
diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index 5a7199db2e06..0a1ba3d2e39a 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -847,6 +847,34 @@ static ssize_t format_show(struct device *dev,
}
static DEVICE_ATTR_RO(format);
+static ssize_t subsystem_vendor_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct acpi_nfit_control_region *dcr = to_nfit_dcr(dev);
+
+ return sprintf(buf, "0x%04x\n", be16_to_cpu(dcr->subsystem_vendor_id));
+}
+static DEVICE_ATTR_RO(subsystem_vendor);
+
+static ssize_t subsystem_rev_id_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct acpi_nfit_control_region *dcr = to_nfit_dcr(dev);
+
+ return sprintf(buf, "0x%04x\n",
+ be16_to_cpu(dcr->subsystem_revision_id));
+}
+static DEVICE_ATTR_RO(subsystem_rev_id);
+
+static ssize_t subsystem_device_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct acpi_nfit_control_region *dcr = to_nfit_dcr(dev);
+
+ return sprintf(buf, "0x%04x\n", be16_to_cpu(dcr->subsystem_device_id));
+}
+static DEVICE_ATTR_RO(subsystem_device);
+
static ssize_t serial_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
@@ -893,9 +921,12 @@ static struct attribute *acpi_nfit_dimm_attributes[] = {
&dev_attr_phys_id.attr,
&dev_attr_vendor.attr,
&dev_attr_device.attr,
+ &dev_attr_rev_id.attr,
+ &dev_attr_subsystem_vendor.attr,
+ &dev_attr_subsystem_device.attr,
+ &dev_attr_subsystem_rev_id.attr,
&dev_attr_format.attr,
&dev_attr_serial.attr,
- &dev_attr_rev_id.attr,
&dev_attr_flags.attr,
&dev_attr_id.attr,
NULL,
4 years, 8 months
[PATCH] libnvdimm, pfn: fix memmap reservation sizing
by Dan Williams
When configuring a pfn-device instance to allocate the memmap array it
needs to account for the fact that vmemmap_populate_hugepages()
allocates struct page blocks in HPAGE_SIZE chunks. We need to align the
reserved area size to 2MB otherwise arch_add_memory() runs out of memory
while establishing the memmap:
WARNING: CPU: 0 PID: 496 at arch/x86/mm/init_64.c:704 arch_add_memory+0xe7/0xf0
[..]
Call Trace:
[<ffffffff8148bdb3>] dump_stack+0x85/0xc2
[<ffffffff810a749b>] __warn+0xcb/0xf0
[<ffffffff810a75cd>] warn_slowpath_null+0x1d/0x20
[<ffffffff8106a497>] arch_add_memory+0xe7/0xf0
[<ffffffff811d2097>] devm_memremap_pages+0x287/0x450
[<ffffffff811d1ffa>] ? devm_memremap_pages+0x1ea/0x450
[<ffffffffa0000298>] __wrap_devm_memremap_pages+0x58/0x70 [nfit_test_iomap]
[<ffffffffa0047a58>] pmem_attach_disk+0x318/0x420 [nd_pmem]
[<ffffffffa0047bcf>] nd_pmem_probe+0x6f/0x90 [nd_pmem]
[<ffffffffa0009469>] nvdimm_bus_probe+0x69/0x110 [libnvdimm]
[..]
ndbus0: nd_pmem.probe(pfn3.0) = -12
nd_pmem: probe of pfn3.0 failed with error -12
libndctl: ndctl_pfn_enable: pfn3.0: failed to enable
Reported-by: Namratha Kothapalli <namratha.n.kothapalli(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
drivers/nvdimm/pmem.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index f798899338ed..5101f3ab4f29 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -397,10 +397,17 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
*/
start += start_pad;
npfns = (pmem->size - start_pad - end_trunc - SZ_8K) / SZ_4K;
- if (nd_pfn->mode == PFN_MODE_PMEM)
- offset = ALIGN(start + SZ_8K + 64 * npfns, nd_pfn->align)
+ if (nd_pfn->mode == PFN_MODE_PMEM) {
+ unsigned long memmap_size;
+
+ /*
+ * vmemmap_populate_hugepages() allocates the memmap array in
+ * HPAGE_SIZE chunks.
+ */
+ memmap_size = ALIGN(64 * npfns, HPAGE_SIZE);
+ offset = ALIGN(start + SZ_8K + memmap_size, nd_pfn->align)
- start;
- else if (nd_pfn->mode == PFN_MODE_RAM)
+ } else if (nd_pfn->mode == PFN_MODE_RAM)
offset = ALIGN(start + SZ_8K, nd_pfn->align) - start;
else
goto err;
4 years, 8 months