Your reply could not be delivered
by Airbnb
Sorry your reply was not delivered because the email address you are replying to has expired. Please visit our website or mobile app to continue the conversation.
4 years, 1 month
Re: [RFC PATCH v1 0/6] use mm to manage NVDIMM (pmem) zone
by Matthew Wilcox
On Mon, May 07, 2018 at 10:50:21PM +0800, Huaisheng Ye wrote:
> Traditionally, NVDIMMs are treated by mm(memory management) subsystem as
> DEVICE zone, which is a virtual zone and both its start and end of pfn
> are equal to 0, mm wouldn’t manage NVDIMM directly as DRAM, kernel uses
> corresponding drivers, which locate at \drivers\nvdimm\ and
> \drivers\acpi\nfit and fs, to realize NVDIMM memory alloc and free with
> memory hot plug implementation.
You probably want to let linux-nvdimm know about this patch set.
Adding to the cc. Also, I only received patch 0 and 4. What happened
to 1-3,5 and 6?
> With current kernel, many mm’s classical features like the buddy
> system, swap mechanism and page cache couldn’t be supported to NVDIMM.
> What we are doing is to expand kernel mm’s capacity to make it to handle
> NVDIMM like DRAM. Furthermore we make mm could treat DRAM and NVDIMM
> separately, that means mm can only put the critical pages to NVDIMM
> zone, here we created a new zone type as NVM zone. That is to say for
> traditional(or normal) pages which would be stored at DRAM scope like
> Normal, DMA32 and DMA zones. But for the critical pages, which we hope
> them could be recovered from power fail or system crash, we make them
> to be persistent by storing them to NVM zone.
>
> We installed two NVDIMMs to Lenovo Thinksystem product as development
> platform, which has 125GB storage capacity respectively. With these
> patches below, mm can create NVM zones for NVDIMMs.
>
> Here is dmesg info,
> Initmem setup node 0 [mem 0x0000000000001000-0x000000237fffffff]
> On node 0 totalpages: 36879666
> DMA zone: 64 pages used for memmap
> DMA zone: 23 pages reserved
> DMA zone: 3999 pages, LIFO batch:0
> mminit::memmap_init Initialising map node 0 zone 0 pfns 1 -> 4096
> DMA32 zone: 10935 pages used for memmap
> DMA32 zone: 699795 pages, LIFO batch:31
> mminit::memmap_init Initialising map node 0 zone 1 pfns 4096 -> 1048576
> Normal zone: 53248 pages used for memmap
> Normal zone: 3407872 pages, LIFO batch:31
> mminit::memmap_init Initialising map node 0 zone 2 pfns 1048576 -> 4456448
> NVM zone: 512000 pages used for memmap
> NVM zone: 32768000 pages, LIFO batch:31
> mminit::memmap_init Initialising map node 0 zone 3 pfns 4456448 -> 37224448
> Initmem setup node 1 [mem 0x0000002380000000-0x00000046bfffffff]
> On node 1 totalpages: 36962304
> Normal zone: 65536 pages used for memmap
> Normal zone: 4194304 pages, LIFO batch:31
> mminit::memmap_init Initialising map node 1 zone 2 pfns 37224448 -> 41418752
> NVM zone: 512000 pages used for memmap
> NVM zone: 32768000 pages, LIFO batch:31
> mminit::memmap_init Initialising map node 1 zone 3 pfns 41418752 -> 74186752
>
> This comes /proc/zoneinfo
> Node 0, zone NVM
> pages free 32768000
> min 15244
> low 48012
> high 80780
> spanned 32768000
> present 32768000
> managed 32768000
> protection: (0, 0, 0, 0, 0, 0)
> nr_free_pages 32768000
> Node 1, zone NVM
> pages free 32768000
> min 15244
> low 48012
> high 80780
> spanned 32768000
> present 32768000
> managed 32768000
>
> Huaisheng Ye (6):
> mm/memblock: Expand definition of flags to support NVDIMM
> mm/page_alloc.c: get pfn range with flags of memblock
> mm, zone_type: create ZONE_NVM and fill into GFP_ZONE_TABLE
> arch/x86/kernel: mark NVDIMM regions from e820_table
> mm: get zone spanned pages separately for DRAM and NVDIMM
> arch/x86/mm: create page table mapping for DRAM and NVDIMM both
>
> arch/x86/include/asm/e820/api.h | 3 +++
> arch/x86/kernel/e820.c | 20 +++++++++++++-
> arch/x86/kernel/setup.c | 8 ++++++
> arch/x86/mm/init_64.c | 16 +++++++++++
> include/linux/gfp.h | 57 ++++++++++++++++++++++++++++++++++++---
> include/linux/memblock.h | 19 +++++++++++++
> include/linux/mm.h | 4 +++
> include/linux/mmzone.h | 3 +++
> mm/Kconfig | 16 +++++++++++
> mm/memblock.c | 46 +++++++++++++++++++++++++++----
> mm/nobootmem.c | 5 ++--
> mm/page_alloc.c | 60 ++++++++++++++++++++++++++++++++++++++++-
> 12 files changed, 245 insertions(+), 12 deletions(-)
>
> --
> 1.8.3.1
>
4 years, 1 month
[PATCH 1/3] nvdimm: fix typo in label-size definition
by Ross Zwisler
Signed-off-by: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Fixes: commit da6789c27c2e ("nvdimm: add a macro for property "label-size"")
Cc: Haozhong Zhang <haozhong.zhang(a)intel.com>
Cc: Michael S. Tsirkin <mst(a)redhat.com>
Cc: Stefan Hajnoczi <stefanha(a)redhat.com>
---
hw/mem/nvdimm.c | 2 +-
include/hw/mem/nvdimm.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index acb656b672..4087aca25e 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -89,7 +89,7 @@ static void nvdimm_set_unarmed(Object *obj, bool value, Error **errp)
static void nvdimm_init(Object *obj)
{
- object_property_add(obj, NVDIMM_LABLE_SIZE_PROP, "int",
+ object_property_add(obj, NVDIMM_LABEL_SIZE_PROP, "int",
nvdimm_get_label_size, nvdimm_set_label_size, NULL,
NULL, NULL);
object_property_add_bool(obj, NVDIMM_UNARMED_PROP,
diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
index 7fd87c4e1c..74c60332e1 100644
--- a/include/hw/mem/nvdimm.h
+++ b/include/hw/mem/nvdimm.h
@@ -48,7 +48,7 @@
#define NVDIMM_GET_CLASS(obj) OBJECT_GET_CLASS(NVDIMMClass, (obj), \
TYPE_NVDIMM)
-#define NVDIMM_LABLE_SIZE_PROP "label-size"
+#define NVDIMM_LABEL_SIZE_PROP "label-size"
#define NVDIMM_UNARMED_PROP "unarmed"
struct NVDIMMDevice {
--
2.14.3
4 years, 1 month
[PATCH v9 0/9] dax: fix dma vs truncate/hole-punch
by Dan Williams
Changes since v8 [1]:
* Rebase on v4.17-rc2
* Fix get_user_pages_fast() for ZONE_DEVICE pages to revalidate the pte,
pmd, pud after taking references (Jan)
* Kill dax_layout_lock(). With get_user_pages_fast() for ZONE_DEVICE
fixed we can then rely on the {pte,pmd}_lock to synchronize
dax_layout_busy_page() vs new page references (Jan)
* Hold the iolock over repeated invocations of dax_layout_busy_page() to
enable truncate/hole-punch to make forward progress in the presence of
a constant stream of new direct-I/O requests (Jan).
[1]: https://lists.01.org/pipermail/linux-nvdimm/2018-March/015058.html
---
Background:
get_user_pages() in the filesystem pins file backed memory pages for
access by devices performing dma. However, it only pins the memory pages
not the page-to-file offset association. If a file is truncated the
pages are mapped out of the file and dma may continue indefinitely into
a page that is owned by a device driver. This breaks coherency of the
file vs dma, but the assumption is that if userspace wants the
file-space truncated it does not matter what data is inbound from the
device, it is not relevant anymore. The only expectation is that dma can
safely continue while the filesystem reallocates the block(s).
Problem:
This expectation that dma can safely continue while the filesystem
changes the block map is broken by dax. With dax the target dma page
*is* the filesystem block. The model of leaving the page pinned for dma,
but truncating the file block out of the file, means that the filesytem
is free to reallocate a block under active dma to another file and now
the expected data-incoherency situation has turned into active
data-corruption.
Solution:
Defer all filesystem operations (fallocate(), truncate()) on a dax mode
file while any page/block in the file is under active dma. This solution
assumes that dma is transient. Cases where dma operations are known to
not be transient, like RDMA, have been explicitly disabled via
commits like 5f1d43de5416 "IB/core: disable memory registration of
filesystem-dax vmas".
The dax_layout_busy_page() routine is called by filesystems with a lock
held against mm faults (i_mmap_lock) to find pinned / busy dax pages.
The process of looking up a busy page invalidates all mappings
to trigger any subsequent get_user_pages() to block on i_mmap_lock.
The filesystem continues to call dax_layout_busy_page() until it finally
returns no more active pages. This approach assumes that the page
pinning is transient, if that assumption is violated the system would
have likely hung from the uncompleted I/O.
---
Dan Williams (9):
dax, dm: introduce ->fs_{claim,release}() dax_device infrastructure
mm, dax: enable filesystems to trigger dev_pagemap ->page_free callbacks
memremap: split devm_memremap_pages() and memremap() infrastructure
mm, dev_pagemap: introduce CONFIG_DEV_PAGEMAP_OPS
mm: fix __gup_device_huge vs unmap
mm, fs, dax: handle layout changes to pinned dax mappings
xfs: prepare xfs_break_layouts() to be called with XFS_MMAPLOCK_EXCL
xfs: prepare xfs_break_layouts() for another layout type
xfs, dax: introduce xfs_break_dax_layouts()
drivers/dax/super.c | 99 ++++++++++++++++++++--
drivers/md/dm.c | 57 +++++++++++++
drivers/nvdimm/pmem.c | 3 -
fs/Kconfig | 2
fs/dax.c | 97 +++++++++++++++++++++
fs/ext2/super.c | 6 +
fs/ext4/super.c | 6 +
fs/xfs/xfs_file.c | 72 +++++++++++++++-
fs/xfs/xfs_inode.h | 16 ++++
fs/xfs/xfs_ioctl.c | 8 --
fs/xfs/xfs_iops.c | 16 ++--
fs/xfs/xfs_pnfs.c | 16 ++--
fs/xfs/xfs_pnfs.h | 6 +
fs/xfs/xfs_super.c | 20 ++--
include/linux/dax.h | 71 +++++++++++++++-
include/linux/memremap.h | 25 ++----
include/linux/mm.h | 71 ++++++++++++----
kernel/Makefile | 3 -
kernel/iomem.c | 167 +++++++++++++++++++++++++++++++++++++
kernel/memremap.c | 208 ++++++----------------------------------------
mm/Kconfig | 5 +
mm/gup.c | 37 ++++++--
mm/hmm.c | 13 ---
mm/swap.c | 3 -
24 files changed, 730 insertions(+), 297 deletions(-)
create mode 100644 kernel/iomem.c
4 years, 1 month
use memcpy_mcsafe() for copy_to_iter() (was: Re: [PATCH v3 0/9] Series short description)
by Dan Williams
Ingo, Thomas, Al, any concerns with this series?
On Thu, May 3, 2018 at 5:06 PM, Dan Williams <dan.j.williams(a)intel.com> wrote:
> Changes since v2 [1]:
>
> * Fix source address increment in mcsafe_handle_tail() (Mika)
>
> * Extend the unit test to inject simulated write faults and validate
> that data is properly transferred.
>
> * Rename MCSAFE_DEBUG to MCSAFE_TEST
>
> [1]: https://lists.01.org/pipermail/linux-nvdimm/2018-May/015583.html
>
> ---
>
> Currently memcpy_mcsafe() is only deployed in the pmem driver when
> reading through a /dev/pmemX block device. However, a filesystem in dax
> mode mounted on a /dev/pmemX block device will bypass the block layer
> and the driver for reads. The filesystem-dax (fsdax) read case uses
> dax_direct_access() and copy_to_iter() to bypass the block layer.
>
> The result of the bypass is that the kernel treats machine checks during
> read as system fatal (reboot) when they could simply be flagged as an
> I/O error, similar to performing reads through the pmem driver. Prevent
> this fatal condition by deploying memcpy_mcsafe() in the fsdax read
> path.
>
> The main differences between this copy_to_user_mcsafe() and
> copy_user_generic_unrolled() are:
>
> * Typical tail/residue handling after a fault retries the copy
> byte-by-byte until the fault happens again. Re-triggering machine
> checks is potentially fatal so the implementation uses source alignment
> and poison alignment assumptions to avoid re-triggering machine
> checks.
>
> * SMAP coordination is handled external to the assembly with
> __uaccess_begin() and __uaccess_end().
>
> * ITER_KVEC and ITER_BVEC can now end prematurely with an error.
>
> The new MCSAFE_TEST facility is proposed as a way to unit test the
> exception handling without requiring an ACPI EINJ capable platform.
>
> ---
>
> Dan Williams (9):
> x86, memcpy_mcsafe: remove loop unrolling
> x86, memcpy_mcsafe: add labels for write fault handling
> x86, memcpy_mcsafe: return bytes remaining
> x86, memcpy_mcsafe: add write-protection-fault handling
> x86, memcpy_mcsafe: define copy_to_iter_mcsafe()
> dax: introduce a ->copy_to_iter dax operation
> dax: report bytes remaining in dax_iomap_actor()
> pmem: switch to copy_to_iter_mcsafe()
> x86, nfit_test: unit test for memcpy_mcsafe()
>
>
> arch/x86/Kconfig | 1
> arch/x86/Kconfig.debug | 3 +
> arch/x86/include/asm/mcsafe_test.h | 75 ++++++++++++++++++++++++
> arch/x86/include/asm/string_64.h | 10 ++-
> arch/x86/include/asm/uaccess_64.h | 14 +++++
> arch/x86/lib/memcpy_64.S | 112 +++++++++++++++++-------------------
> arch/x86/lib/usercopy_64.c | 21 +++++++
> drivers/dax/super.c | 10 +++
> drivers/md/dm-linear.c | 16 +++++
> drivers/md/dm-log-writes.c | 15 +++++
> drivers/md/dm-stripe.c | 21 +++++++
> drivers/md/dm.c | 25 ++++++++
> drivers/nvdimm/claim.c | 3 +
> drivers/nvdimm/pmem.c | 13 +++-
> drivers/s390/block/dcssblk.c | 7 ++
> fs/dax.c | 21 ++++---
> include/linux/dax.h | 5 ++
> include/linux/device-mapper.h | 5 +-
> include/linux/string.h | 4 +
> include/linux/uio.h | 15 +++++
> lib/iov_iter.c | 61 ++++++++++++++++++++
> tools/testing/nvdimm/test/nfit.c | 104 +++++++++++++++++++++++++++++++++
> 22 files changed, 482 insertions(+), 79 deletions(-)
> create mode 100644 arch/x86/include/asm/mcsafe_test.h
> _______________________________________________
> Linux-nvdimm mailing list
> Linux-nvdimm(a)lists.01.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm
4 years, 1 month
Claudi27
by linux-nvdimm@lists.01.org
I want yours lollipop
4 years, 1 month
Anastaciya22
by linux-nvdimm@lists.01.org
I'm dripping between my legs
4 years, 1 month
[PATCH v6 0/4] ndctl: convert actions to use util_filter_walk
by Dave Jiang
util_filter_walk() does the looping through bus/dimm/region/namespace
that a lot of the operations in ndctl uses. Converting them to common
code and reduce maintenance on individual versions of the same code.
In this series we are convering namespace, region, and dimm actions.
---
v6:
- removed unintended changes (Vishal)
- added common function filter_bus_passhthrough() (Vishal, Dan)
v5:
- fix behavior regression in filter_namespace (Dan)
- fix segfault caused by no namespace for create_namespace actions.
v4:
- change struct names to be less confusing. (Dan)
v3:
- fixed some corner cases in namespace patch.
- changed param renaming to reduce change for util_filter_params. (Dan)
- Adding conversion to region
- Adding conversion to dimm
v2:
- split out the conversion of util_filter_params to make things more
readable (Dan).
- Not pass in mode as util_filter_params and put back the mode check in
util_filter_walk() (Dan).
Dave Jiang (4):
ndctl: convert namespace actions to use util_filter_params
ndctl: convert namespace actions to use util_filter_walk()
ndctl: convert region actions to use util_filter_walk()
ndctl: convert dimm actions to use util_filter_walk()
ndctl/dimm.c | 78 +++++++++++++---------
ndctl/namespace.c | 189 +++++++++++++++++++++++++++++------------------------
ndctl/region.c | 54 +++++++++------
util/filter.c | 11 +++
util/filter.h | 25 +++++++
5 files changed, 212 insertions(+), 145 deletions(-)
--
4 years, 1 month
[PATCH v5 0/4] ndctl: convert actions to use util_filter_walk
by Dave Jiang
util_filter_walk() does the looping through bus/dimm/region/namespace
that a lot of the operations in ndctl uses. Converting them to common
code and reduce maintenance on individual versions of the same code.
In this series we are convering namespace, region, and dimm actions.
---
v5:
- fix behavior regression in filter_namespace (Dan)
- fix segfault caused by no namespace for create_namespace actions.
v4:
- change struct names to be less confusing. (Dan)
v3:
- fixed some corner cases in namespace patch.
- changed param renaming to reduce change for util_filter_params. (Dan)
- Adding conversion to region
- Adding conversion to dimm
v2:
- split out the conversion of util_filter_params to make things more
readable (Dan).
- Not pass in mode as util_filter_params and put back the mode check in
util_filter_walk() (Dan).
Dave Jiang (4):
ndctl: convert namespace actions to use util_filter_params
ndctl: convert namespace actions to use util_filter_walk()
ndctl: convert region actions to use util_filter_walk()
ndctl: convert dimm actions to use util_filter_walk()
ndctl/dimm.c | 83 ++++++++++++++---------
ndctl/namespace.c | 194 +++++++++++++++++++++++++++++------------------------
ndctl/region.c | 59 ++++++++++------
test/btt-check.sh | 2 -
util/filter.c | 5 +
util/filter.h | 23 ++++++
6 files changed, 220 insertions(+), 146 deletions(-)
--
Signature
4 years, 1 month