[PATCH] libnvdimm, pmem: fix a possible OOB access when read and write pmem
by Li RongQing
If offset is not zero and length is bigger than PAGE_SIZE,
this will cause to out of boundary access to a page memory
Fixes: 98cc093cba1e "(block, THP: make block_device_operations.rw_page support THP)"
Co-developed-by: Liang ZhiCheng <liangzhicheng(a)baidu.com>
Signed-off-by: Liang ZhiCheng <liangzhicheng(a)baidu.com>
Signed-off-by: Li RongQing <lirongqing(a)baidu.com>
---
drivers/nvdimm/pmem.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index bc2f700feef8..0279eb1da3ef 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -113,13 +113,13 @@ static void write_pmem(void *pmem_addr, struct page *page,
while (len) {
mem = kmap_atomic(page);
- chunk = min_t(unsigned int, len, PAGE_SIZE);
+ chunk = min_t(unsigned int, len, PAGE_SIZE - off);
memcpy_flushcache(pmem_addr, mem + off, chunk);
kunmap_atomic(mem);
len -= chunk;
off = 0;
page++;
- pmem_addr += PAGE_SIZE;
+ pmem_addr += chunk;
}
}
@@ -132,7 +132,7 @@ static blk_status_t read_pmem(struct page *page, unsigned int off,
while (len) {
mem = kmap_atomic(page);
- chunk = min_t(unsigned int, len, PAGE_SIZE);
+ chunk = min_t(unsigned int, len, PAGE_SIZE - off);
rem = memcpy_mcsafe(mem + off, pmem_addr, chunk);
kunmap_atomic(mem);
if (rem)
@@ -140,7 +140,7 @@ static blk_status_t read_pmem(struct page *page, unsigned int off,
len -= chunk;
off = 0;
page++;
- pmem_addr += PAGE_SIZE;
+ pmem_addr += chunk;
}
return BLK_STS_OK;
}
--
2.16.2
3 years, 4 months
[PATCH v3 0/3] kvm: Use huge pages for DAX-backed files
by Barret Rhoden
This patch series depends on DAX pages not being PageReserved. Once
that is in place, these changes will let KVM use huge pages with
DAX-backed files.
>From previous discussions[1], it sounds like DAX might not need to keep
the PageReserved bit, but that it hadn't been sorted out yet.
Without the PageReserved change, KVM and DAX still work with these
patches, simply without huge pages - which is the current situation.
If you want to test the huge-page functionality as if DAX pages weren't
PageReserved for KVM, this hack does the trick:
------------------
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c44985375e7f..ee539eec1fb8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -153,6 +153,10 @@ __weak int kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
bool kvm_is_reserved_pfn(kvm_pfn_t pfn)
{
+ // XXX hack
+ if (is_zone_device_page(pfn_to_page(pfn)))
+ return false;
+
if (pfn_valid(pfn))
return PageReserved(pfn_to_page(pfn));
------------------
Perhaps if we are going to leave DAX pages marked PageReserved, then I
can make that hack into a proper commit and have KVM alone treat DAX
pages as if they are not reserved.
v2 -> v3:
v2: https://lore.kernel.org/lkml/20181114215155.259978-1-brho@google.com/
- Updated Acks/Reviewed-by
- Rebased onto linux-next
v1 -> v2:
https://lore.kernel.org/lkml/20181109203921.178363-1-brho@google.com/
- Updated Acks/Reviewed-by
- Minor touchups
- Added patch to remove redundant PageReserved() check
- Rebased onto linux-next
RFC/discussion thread:
https://lore.kernel.org/lkml/20181029210716.212159-1-brho@google.com/
[1] https://lore.kernel.org/lkml/ee8cc068-903c-d87e-f418-ade46786249e@redhat....
Barret Rhoden (3):
mm: make dev_pagemap_mapping_shift() externally visible
kvm: Use huge pages for DAX-backed files
kvm: remove redundant PageReserved() check
arch/x86/kvm/mmu.c | 33 +++++++++++++++++++++++++++++++--
include/linux/mm.h | 3 +++
mm/memory-failure.c | 38 +++-----------------------------------
mm/util.c | 34 ++++++++++++++++++++++++++++++++++
virt/kvm/kvm_main.c | 8 ++------
5 files changed, 73 insertions(+), 43 deletions(-)
--
2.21.0.392.gf8f6787159e-goog
3 years, 4 months
[PATCH v4 0/5] virtio pmem driver
by Pankaj Gupta
This patch series has implementation for "virtio pmem".
"virtio pmem" is fake persistent memory(nvdimm) in guest
which allows to bypass the guest page cache. This also
implements a VIRTIO based asynchronous flush mechanism.
Sharing guest kernel driver in this patchset with the
changes suggested in v3. Tested with Qemu side device
emulation [6] for virtio-pmem.
We have incorporated all the suggestions in V3. Documented
the impact of possible page cache side channel attacks with
suggested countermeasures.
Details of project idea for 'virtio pmem' flushing interface
is shared [3] & [4].
Implementation is divided into two parts:
New virtio pmem guest driver and qemu code changes for new
virtio pmem paravirtualized device.
1. Guest virtio-pmem kernel driver
---------------------------------
- Reads persistent memory range from paravirt device and
registers with 'nvdimm_bus'.
- 'nvdimm/pmem' driver uses this information to allocate
persistent memory region and setup filesystem operations
to the allocated memory.
- virtio pmem driver implements asynchronous flushing
interface to flush from guest to host.
2. Qemu virtio-pmem device
---------------------------------
- Creates virtio pmem device and exposes a memory range to
KVM guest.
- At host side this is file backed memory which acts as
persistent memory.
- Qemu side flush uses aio thread pool API's and virtio
for asynchronous guest multi request handling.
David Hildenbrand CCed also posted a modified version[7] of
qemu virtio-pmem code based on updated Qemu memory device API.
Virtio-pmem security implications and countermeasures:
-----------------------------------------------------
In previous posting of kernel driver, there was discussion [9]
on possible implications of page cache side channel attacks with
virtio pmem. After thorough analysis of details of known side
channel attacks, below are the suggestions:
- Depends entirely on how host backing image file is mapped
into guest address space.
- virtio-pmem device emulation, by default shared mapping is used
to map host backing file. It is recommended to use separate
backing file at host side for every guest. This will prevent
any possibility of executing common code from multiple guests
and any chance of inferring guest local data based based on
execution time.
- If backing file is required to be shared among multiple guests
it is recommended to don't support host page cache eviction
commands from the guest driver. This will avoid any possibility
of inferring guest local data or host data from another guest.
- Proposed device specification [8] for virtio-pmem device with
details of possible security implications and suggested
countermeasures for device emulation.
Virtio-pmem errors handling:
----------------------------------------
Checked behaviour of virtio-pmem for below types of errors
Need suggestions on expected behaviour for handling these errors?
- Hardware Errors: Uncorrectable recoverable Errors:
a] virtio-pmem:
- As per current logic if error page belongs to Qemu process,
host MCE handler isolates(hwpoison) that page and send SIGBUS.
Qemu SIGBUS handler injects exception to KVM guest.
- KVM guest then isolates the page and send SIGBUS to guest
userspace process which has mapped the page.
b] Existing implementation for ACPI pmem driver:
- Handles such errors with MCE notifier and creates a list
of bad blocks. Read/direct access DAX operation return EIO
if accessed memory page fall in bad block list.
- It also starts backgound scrubbing.
- Similar functionality can be reused in virtio-pmem with MCE
notifier but without scrubbing(no ACPI/ARS)? Need inputs to
confirm if this behaviour is ok or needs any change?
Changes from PATCH v3: [1]
- Use generic dax_synchronous() helper to check for DAXDEV_SYNC
flag - [Dan, Darrick, Jan]
- Add 'is_nvdimm_async' function
- Document page cache side channel attacks implications &
countermeasures - [Dave Chinner, Michael]
Changes from PATCH v2: [2]
- Disable MAP_SYNC for ext4 & XFS filesystems - [Dan]
- Use name 'virtio pmem' in place of 'fake dax'
Changes from PATCH v1:
- 0-day build test for build dependency on libnvdimm
Changes suggested by - [Dan Williams]
- Split the driver into two parts virtio & pmem
- Move queuing of async block request to block layer
- Add "sync" parameter in nvdimm_flush function
- Use indirect call for nvdimm_flush
- Don’t move declarations to common global header e.g nd.h
- nvdimm_flush() return 0 or -EIO if it fails
- Teach nsio_rw_bytes() that the flush can fail
- Rename nvdimm_flush() to generic_nvdimm_flush()
- Use 'nd_region->provider_data' for long dereferencing
- Remove virtio_pmem_freeze/restore functions
- Remove BSD license text with SPDX license text
- Add might_sleep() in virtio_pmem_flush - [Luiz]
- Make spin_lock_irqsave() narrow
Changes from RFC v3
- Rebase to latest upstream - Luiz
- Call ndregion->flush in place of nvdimm_flush- Luiz
- kmalloc return check - Luiz
- virtqueue full handling - Stefan
- Don't map entire virtio_pmem_req to device - Stefan
- request leak, correct sizeof req- Stefan
- Move declaration to virtio_pmem.c
Changes from RFC v2:
- Add flush function in the nd_region in place of switching
on a flag - Dan & Stefan
- Add flush completion function with proper locking and wait
for host side flush completion - Stefan & Dan
- Keep userspace API in uapi header file - Stefan, MST
- Use LE fields & New device id - MST
- Indentation & spacing suggestions - MST & Eric
- Remove extra header files & add licensing - Stefan
Changes from RFC v1:
- Reuse existing 'pmem' code for registering persistent
memory and other operations instead of creating an entirely
new block driver.
- Use VIRTIO driver to register memory information with
nvdimm_bus and create region_type accordingly.
- Call VIRTIO flush from existing pmem driver.
Pankaj Gupta (5):
libnvdimm: nd_region flush callback support
virtio-pmem: Add virtio-pmem guest driver
libnvdimm: add nd_region buffered dax_dev flag
ext4: disable map_sync for virtio pmem
xfs: disable map_sync for virtio pmem
[1] https://lkml.org/lkml/2019/1/9/471
[2] https://lkml.org/lkml/2018/10/13/117
[3] https://www.spinics.net/lists/kvm/msg149761.html
[4] https://www.spinics.net/lists/kvm/msg153095.html
[5] https://lkml.org/lkml/2018/8/31/413
[6] https://marc.info/?l=linux-kernel&m=153572228719237&w=2
[7] https://marc.info/?l=qemu-devel&m=153555721901824&w=2
[8] https://lists.oasis-open.org/archives/virtio-dev/201903/msg00083.html
3 years, 4 months
[PATCH 5.0 057/246] mm/resource: Return real error codes from walk failures
by Greg Kroah-Hartman
5.0-stable review patch. If anyone has any objections, please let me know.
------------------
[ Upstream commit 5cd401ace914dc68556c6d2fcae0c349444d5f86 ]
walk_system_ram_range() can return an error code either becuase
*it* failed, or because the 'func' that it calls returned an
error. The memory hotplug does the following:
ret = walk_system_ram_range(..., func);
if (ret)
return ret;
and 'ret' makes it out to userspace, eventually. The problem
s, walk_system_ram_range() failues that result from *it* failing
(as opposed to 'func') return -1. That leads to a very odd
-EPERM (-1) return code out to userspace.
Make walk_system_ram_range() return -EINVAL for internal
failures to keep userspace less confused.
This return code is compatible with all the callers that I
audited.
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Reviewed-by: Bjorn Helgaas <bhelgaas(a)google.com>
Acked-by: Michael Ellerman <mpe(a)ellerman.id.au> (powerpc)
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Dave Jiang <dave.jiang(a)intel.com>
Cc: Ross Zwisler <zwisler(a)kernel.org>
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: linux-nvdimm(a)lists.01.org
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-mm(a)kvack.org
Cc: Huang Ying <ying.huang(a)intel.com>
Cc: Fengguang Wu <fengguang.wu(a)intel.com>
Cc: Borislav Petkov <bp(a)suse.de>
Cc: Yaowei Bai <baiyaowei(a)cmss.chinamobile.com>
Cc: Takashi Iwai <tiwai(a)suse.de>
Cc: Jerome Glisse <jglisse(a)redhat.com>
Cc: Benjamin Herrenschmidt <benh(a)kernel.crashing.org>
Cc: Paul Mackerras <paulus(a)samba.org>
Cc: linuxppc-dev(a)lists.ozlabs.org
Cc: Keith Busch <keith.busch(a)intel.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/resource.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/resource.c b/kernel/resource.c
index 915c02e8e5dd..ca7ed5158cff 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -382,7 +382,7 @@ static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
int (*func)(struct resource *, void *))
{
struct resource res;
- int ret = -1;
+ int ret = -EINVAL;
while (start < end &&
!find_next_iomem_res(start, end, flags, desc, first_lvl, &res)) {
@@ -462,7 +462,7 @@ int walk_system_ram_range(unsigned long start_pfn, unsigned long nr_pages,
unsigned long flags;
struct resource res;
unsigned long pfn, end_pfn;
- int ret = -1;
+ int ret = -EINVAL;
start = (u64) start_pfn << PAGE_SHIFT;
end = ((u64)(start_pfn + nr_pages) << PAGE_SHIFT) - 1;
--
2.19.1
3 years, 4 months
[ndctl PATCH] monitor: removing the requirement of default config
by QI Fuli
Removing the requirement of default configuration file.
If "-c, --config-file" option is not specified, default configuration
file should be dispensable.
Link: https://github.com/pmem/ndctl/issues/83
Signed-off-by: QI Fuli <qi.fuli(a)jp.fujitsu.com>
---
ndctl/monitor.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/ndctl/monitor.c b/ndctl/monitor.c
index 346a6df..6829a6b 100644
--- a/ndctl/monitor.c
+++ b/ndctl/monitor.c
@@ -484,8 +484,11 @@ static int read_config_file(struct ndctl_ctx *ctx, struct monitor *_monitor,
f = fopen(config_file, "r");
if (!f) {
- err(&monitor, "config-file: %s cannot be opened\n", config_file);
- rc = -errno;
+ if (_monitor->config_file) {
+ err(&monitor, "config-file: %s cannot be opened\n",
+ config_file);
+ rc = -errno;
+ }
goto out;
}
--
2.21.0.rc1
3 years, 4 months
[PATCH] mm: Fix modifying of page protection by insert_pfn_pmd()
by Aneesh Kumar K.V
With some architectures like ppc64, set_pmd_at() cannot cope with
a situation where there is already some (different) valid entry present.
Use pmdp_set_access_flags() instead to modify the pfn which is built to
deal with modifying existing PMD entries.
This is similar to
commit cae85cb8add3 ("mm/memory.c: fix modifying of page protection by insert_pfn()")
We also do similar update w.r.t insert_pfn_pud eventhough ppc64 don't support
pud pfn entries now.
CC: stable(a)vger.kernel.org
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
---
mm/huge_memory.c | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 404acdcd0455..f7dca413c4b2 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -755,6 +755,20 @@ static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
spinlock_t *ptl;
ptl = pmd_lock(mm, pmd);
+ if (!pmd_none(*pmd)) {
+ if (write) {
+ if (pmd_pfn(*pmd) != pfn_t_to_pfn(pfn)) {
+ WARN_ON_ONCE(!is_huge_zero_pmd(*pmd));
+ goto out_unlock;
+ }
+ entry = pmd_mkyoung(*pmd);
+ entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
+ if (pmdp_set_access_flags(vma, addr, pmd, entry, 1))
+ update_mmu_cache_pmd(vma, addr, pmd);
+ }
+ goto out_unlock;
+ }
+
entry = pmd_mkhuge(pfn_t_pmd(pfn, prot));
if (pfn_t_devmap(pfn))
entry = pmd_mkdevmap(entry);
@@ -770,6 +784,7 @@ static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
set_pmd_at(mm, addr, pmd, entry);
update_mmu_cache_pmd(vma, addr, pmd);
+out_unlock:
spin_unlock(ptl);
}
@@ -821,6 +836,20 @@ static void insert_pfn_pud(struct vm_area_struct *vma, unsigned long addr,
spinlock_t *ptl;
ptl = pud_lock(mm, pud);
+ if (!pud_none(*pud)) {
+ if (write) {
+ if (pud_pfn(*pud) != pfn_t_to_pfn(pfn)) {
+ WARN_ON_ONCE(!is_huge_zero_pud(*pud));
+ goto out_unlock;
+ }
+ entry = pud_mkyoung(*pud);
+ entry = maybe_pud_mkwrite(pud_mkdirty(entry), vma);
+ if (pudp_set_access_flags(vma, addr, pud, entry, 1))
+ update_mmu_cache_pud(vma, addr, pud);
+ }
+ goto out_unlock;
+ }
+
entry = pud_mkhuge(pfn_t_pud(pfn, prot));
if (pfn_t_devmap(pfn))
entry = pud_mkdevmap(entry);
@@ -830,6 +859,8 @@ static void insert_pfn_pud(struct vm_area_struct *vma, unsigned long addr,
}
set_pud_at(mm, addr, pud, entry);
update_mmu_cache_pud(vma, addr, pud);
+
+out_unlock:
spin_unlock(ptl);
}
--
2.20.1
3 years, 4 months