[PATCH] acpi/nfit: Use kobj_to_dev() instead
by Wang Qing
Use kobj_to_dev() instead of container_of()
Signed-off-by: Wang Qing <wangqing(a)vivo.com>
---
drivers/acpi/nfit/core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index fa4500f..3bb350b
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -1382,7 +1382,7 @@ static bool ars_supported(struct nvdimm_bus *nvdimm_bus)
static umode_t nfit_visible(struct kobject *kobj, struct attribute *a, int n)
{
- struct device *dev = container_of(kobj, struct device, kobj);
+ struct device *dev = kobj_to_dev(kobj);
struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
if (a == &dev_attr_scrub.attr && !ars_supported(nvdimm_bus))
@@ -1667,7 +1667,7 @@ static struct attribute *acpi_nfit_dimm_attributes[] = {
static umode_t acpi_nfit_dimm_attr_visible(struct kobject *kobj,
struct attribute *a, int n)
{
- struct device *dev = container_of(kobj, struct device, kobj);
+ struct device *dev = kobj_to_dev(kobj);
struct nvdimm *nvdimm = to_nvdimm(dev);
struct nfit_mem *nfit_mem = nvdimm_provider_data(nvdimm);
--
2.7.4
1 year, 9 months
[PATCH v4 0/6] mm: introduce memfd_secret system call to create "secret" memory areas
by Mike Rapoport
From: Mike Rapoport <rppt(a)linux.ibm.com>
Hi,
This is an implementation of "secret" mappings backed by a file descriptor.
v4 changes:
* rebase on v5.9-rc1
* Do not redefine PMD_PAGE_ORDER in fs/dax.c, thanks Kirill
* Make secret mappings exclusive by default and only require flags to
memfd_secret() system call for uncached mappings, thanks again Kirill :)
v3 changes:
* Squash kernel-parameters.txt update into the commit that added the
command line option.
* Make uncached mode explicitly selectable by architectures. For now enable
it only on x86.
v2 changes:
* Follow Michael's suggestion and name the new system call 'memfd_secret'
* Add kernel-parameters documentation about the boot option
* Fix i386-tinyconfig regression reported by the kbuild bot.
CONFIG_SECRETMEM now depends on !EMBEDDED to disable it on small systems
from one side and still make it available unconditionally on
architectures that support SET_DIRECT_MAP.
The file descriptor backing secret memory mappings is created using a
dedicated memfd_secret system call The desired protection mode for the
memory is configured using flags parameter of the system call. The mmap()
of the file descriptor created with memfd_secret() will create a "secret"
memory mapping. The pages in that mapping will be marked as not present in
the direct map and will have desired protection bits set in the user page
table. For instance, current implementation allows uncached mappings.
Although normally Linux userspace mappings are protected from other users,
such secret mappings are useful for environments where a hostile tenant is
trying to trick the kernel into giving them access to other tenants
mappings.
Additionally, the secret mappings may be used as a mean to protect guest
memory in a virtual machine host.
For demonstration of secret memory usage we've created a userspace library
[1] that does two things: the first is act as a preloader for openssl to
redirect all the OPENSSL_malloc calls to secret memory meaning any secret
keys get automatically protected this way and the other thing it does is
expose the API to the user who needs it. We anticipate that a lot of the
use cases would be like the openssl one: many toolkits that deal with
secret keys already have special handling for the memory to try to give
them greater protection, so this would simply be pluggable into the
toolkits without any need for user application modification.
I've hesitated whether to continue to use new flags to memfd_create() or to
add a new system call and I've decided to use a new system call after I've
started to look into man pages update. There would have been two completely
independent descriptions and I think it would have been very confusing.
Hiding secret memory mappings behind an anonymous file allows (ab)use of
the page cache for tracking pages allocated for the "secret" mappings as
well as using address_space_operations for e.g. page migration callbacks.
The anonymous file may be also used implicitly, like hugetlb files, to
implement mmap(MAP_SECRET) and use the secret memory areas with "native" mm
ABIs in the future.
As the fragmentation of the direct map was one of the major concerns raised
during the previous postings, I've added an amortizing cache of PMD-size
pages to each file descriptor and an ability to reserve large chunks of the
physical memory at boot time and then use this memory as an allocation pool
for the secret memory areas.
v3: https://lore.kernel.org/lkml/20200804095035.18778-1-rppt@kernel.org
v2: https://lore.kernel.org/lkml/20200727162935.31714-1-rppt@kernel.org
v1: https://lore.kernel.org/lkml/20200720092435.17469-1-rppt@kernel.org/
rfc-v2: https://lore.kernel.org/lkml/20200706172051.19465-1-rppt@kernel.org/
rfc-v1: https://lore.kernel.org/lkml/20200130162340.GA14232@rapoport-lnx/
Mike Rapoport (6):
mm: add definition of PMD_PAGE_ORDER
mmap: make mlock_future_check() global
mm: introduce memfd_secret system call to create "secret" memory areas
arch, mm: wire up memfd_secret system call were relevant
mm: secretmem: use PMD-size pages to amortize direct map fragmentation
mm: secretmem: add ability to reserve memory at boot
arch/Kconfig | 7 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 2 +
arch/arm64/include/uapi/asm/unistd.h | 1 +
arch/riscv/include/asm/unistd.h | 1 +
arch/x86/Kconfig | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
fs/dax.c | 11 +-
include/linux/pgtable.h | 3 +
include/linux/syscalls.h | 1 +
include/uapi/asm-generic/unistd.h | 7 +-
include/uapi/linux/magic.h | 1 +
include/uapi/linux/secretmem.h | 8 +
kernel/sys_ni.c | 2 +
mm/Kconfig | 4 +
mm/Makefile | 1 +
mm/internal.h | 3 +
mm/mmap.c | 5 +-
mm/secretmem.c | 451 +++++++++++++++++++++++++
20 files changed, 501 insertions(+), 12 deletions(-)
create mode 100644 include/uapi/linux/secretmem.h
create mode 100644 mm/secretmem.c
--
2.26.2
1 year, 9 months
[PATCH 1/3] ndctl/namespace: Skip seed namespaces when processing all namespaces.
by Michal Suchanek
The seed namespaces are exposed by the kernel but most operations are
not valid on seed namespaces.
When processing all namespaces the user gets confusing errors from ndctl
trying to process seed namespaces. The kernel does not provide any way
to tell that a namspace is seed namespace but skipping namespaces with
zero size and UUID is a good heuristic.
The user can still specify the namespace by name directly in case
processing it is desirable.
Fixes: #41
Link: https://patchwork.kernel.org/patch/11473645/
Reviewed-by: Santosh S <santosh(a)fossix.org>
Tested-by: Harish Sriram <harish(a)linux.ibm.com>
Signed-off-by: Michal Suchanek <msuchanek(a)suse.de>
---
ndctl/namespace.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/ndctl/namespace.c b/ndctl/namespace.c
index e734248c9752..3fabe4799d75 100644
--- a/ndctl/namespace.c
+++ b/ndctl/namespace.c
@@ -2171,9 +2171,19 @@ static int do_xaction_namespace(const char *namespace,
ndctl_namespace_foreach_safe(region, ndns, _n) {
ndns_name = ndctl_namespace_get_devname(ndns);
- if (strcmp(namespace, "all") != 0
- && strcmp(namespace, ndns_name) != 0)
- continue;
+ if (strcmp(namespace, "all") == 0) {
+ static const uuid_t zero_uuid;
+ uuid_t uuid;
+
+ ndctl_namespace_get_uuid(ndns, uuid);
+ if (!ndctl_namespace_get_size(ndns) &&
+ !memcmp(uuid, zero_uuid, sizeof(uuid_t)))
+ continue;
+ } else {
+ if (strcmp(namespace, ndns_name) != 0)
+ continue;
+ }
+
switch (action) {
case ACTION_DISABLE:
rc = ndctl_namespace_disable_safe(ndns);
--
2.28.0
1 year, 10 months
[PATCH v4 1/2] memremap: rename MEMORY_DEVICE_DEVDAX to MEMORY_DEVICE_GENERIC
by Roger Pau Monne
This is in preparation for the logic behind MEMORY_DEVICE_DEVDAX also
being used by non DAX devices.
No functional change intended.
Signed-off-by: Roger Pau Monné <roger.pau(a)citrix.com>
---
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Cc: Dave Jiang <dave.jiang(a)intel.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.ibm.com>
Cc: Johannes Thumshirn <jthumshirn(a)suse.de>
Cc: Logan Gunthorpe <logang(a)deltatee.com>
Cc: linux-nvdimm(a)lists.01.org
Cc: xen-devel(a)lists.xenproject.org
Cc: linux-mm(a)kvack.org
---
drivers/dax/device.c | 2 +-
include/linux/memremap.h | 9 ++++-----
mm/memremap.c | 2 +-
3 files changed, 6 insertions(+), 7 deletions(-)
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 4c0af2eb7e19..1e89513f3c59 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -429,7 +429,7 @@ int dev_dax_probe(struct device *dev)
return -EBUSY;
}
- dev_dax->pgmap.type = MEMORY_DEVICE_DEVDAX;
+ dev_dax->pgmap.type = MEMORY_DEVICE_GENERIC;
addr = devm_memremap_pages(dev, &dev_dax->pgmap);
if (IS_ERR(addr))
return PTR_ERR(addr);
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 5f5b2df06e61..e5862746751b 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -46,11 +46,10 @@ struct vmem_altmap {
* wakeup is used to coordinate physical address space management (ex:
* fs truncate/hole punch) vs pinned pages (ex: device dma).
*
- * MEMORY_DEVICE_DEVDAX:
+ * MEMORY_DEVICE_GENERIC:
* Host memory that has similar access semantics as System RAM i.e. DMA
- * coherent and supports page pinning. In contrast to
- * MEMORY_DEVICE_FS_DAX, this memory is access via a device-dax
- * character device.
+ * coherent and supports page pinning. This is for example used by DAX devices
+ * that expose memory using a character device.
*
* MEMORY_DEVICE_PCI_P2PDMA:
* Device memory residing in a PCI BAR intended for use with Peer-to-Peer
@@ -60,7 +59,7 @@ enum memory_type {
/* 0 is reserved to catch uninitialized type fields */
MEMORY_DEVICE_PRIVATE = 1,
MEMORY_DEVICE_FS_DAX,
- MEMORY_DEVICE_DEVDAX,
+ MEMORY_DEVICE_GENERIC,
MEMORY_DEVICE_PCI_P2PDMA,
};
diff --git a/mm/memremap.c b/mm/memremap.c
index 03e38b7a38f1..006dace60b1a 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -216,7 +216,7 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
return ERR_PTR(-EINVAL);
}
break;
- case MEMORY_DEVICE_DEVDAX:
+ case MEMORY_DEVICE_GENERIC:
need_devmap_managed = false;
break;
case MEMORY_DEVICE_PCI_P2PDMA:
--
2.28.0
1 year, 10 months
[bug report] device-dax: add dis-contiguous resource support
by Dan Carpenter
Hello Dan Williams,
This is a semi-automatic email about new static checker warnings.
The patch 454c727769f5: "device-dax: add dis-contiguous resource
support" from Aug 26, 2020, leads to the following Smatch complaint:
drivers/dax/bus.c:788 alloc_dev_dax_range()
error: we previously assumed 'alloc' could be null (see line 772)
drivers/dax/bus.c
771 alloc = __request_region(res, start, size, dev_name(dev), 0);
772 if (!alloc && !dev_dax->nr_range) {
^^
This should probably be a ||?
773 /*
774 * If we adjusted an existing @ranges leave it alone,
775 * but if this was an empty set of ranges nothing else
776 * will release @ranges, so do it now.
777 */
778 kfree(ranges);
779 return -ENOMEM;
780 }
781
782 for (i = 0; i < dev_dax->nr_range; i++)
783 pgoff += PHYS_PFN(range_len(&ranges[i].range));
784 dev_dax->ranges = ranges;
785 ranges[dev_dax->nr_range++] = (struct dev_dax_range) {
786 .pgoff = pgoff,
787 .range = {
788 .start = alloc->start,
^^^^^^^^^^^^
Dereferences.
789 .end = alloc->end,
790 },
regards,
dan carpenter
1 year, 10 months
[PATCH] dax: Use kobj_to_dev() instead of container_of()
by Tian Tao
Use kobj_to_dev() instead of container_of()
Signed-off-by: Tian Tao <tiantao6(a)hisilicon.com>
---
drivers/dax/bus.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 092112b..9464b56 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -474,7 +474,7 @@ static DEVICE_ATTR_WO(delete);
static umode_t dax_region_visible(struct kobject *kobj, struct attribute *a,
int n)
{
- struct device *dev = container_of(kobj, struct device, kobj);
+ struct device *dev = kobj_to_dev(kobj);
struct dax_region *dax_region = dev_get_drvdata(dev);
if (is_static(dax_region))
@@ -1225,7 +1225,7 @@ static DEVICE_ATTR_RO(numa_node);
static umode_t dev_dax_visible(struct kobject *kobj, struct attribute *a, int n)
{
- struct device *dev = container_of(kobj, struct device, kobj);
+ struct device *dev = kobj_to_dev(kobj);
struct dev_dax *dev_dax = to_dev_dax(dev);
struct dax_region *dax_region = dev_dax->region;
--
2.7.4
1 year, 10 months
[PATCH v3 00/18] virtiofs: Add DAX support
by Vivek Goyal
Hi All,
This is V3 of patches. I had posted version v2 version here.
https://lore.kernel.org/linux-fsdevel/20200807195526.426056-1-vgoyal@redh...
I have taken care of comments from V2. Changes from V2 are.
- Rebased patches on top of 5.9-rc1
- Renamed couple of functions to get rid of iomap prefix. (Dave Chinner)
- Modified truncate/punch_hole paths to serialize with dax fault
path. For now did this only for dax paths. May be non-dax path
can benefit from this too. But that is an option for a different
day. (Dave Chinner).
- Took care of comments by Jan Kara in dax_layout_busy_page_range()
implementation patch.
- Dropped one of the patches which forced sync release in
fuse_file_put() path for DAX files. It was redundant now as virtiofs
already sets fs_context->destroy which forces sync release. (Miklos)
- Took care of some of the errors flagged by checkpatch.pl.
Description from previous post
------------------------------
This patch series adds DAX support to virtiofs filesystem. This allows
bypassing guest page cache and allows mapping host page cache directly
in guest address space.
When a page of file is needed, guest sends a request to map that page
(in host page cache) in qemu address space. Inside guest this is
a physical memory range controlled by virtiofs device. And guest
directly maps this physical address range using DAX and hence gets
access to file data on host.
This can speed up things considerably in many situations. Also this
can result in substantial memory savings as file data does not have
to be copied in guest and it is directly accessed from host page
cache.
Most of the changes are limited to fuse/virtiofs. There are couple
of changes needed in generic dax infrastructure and couple of changes
in virtio to be able to access shared memory region.
Thanks
Vivek
Sebastien Boeuf (3):
virtio: Add get_shm_region method
virtio: Implement get_shm_region for PCI transport
virtio: Implement get_shm_region for MMIO transport
Stefan Hajnoczi (2):
virtio_fs, dax: Set up virtio_fs dax_device
fuse,dax: add DAX mmap support
Vivek Goyal (13):
dax: Modify bdev_dax_pgoff() to handle NULL bdev
dax: Create a range version of dax_layout_busy_page()
virtiofs: Provide a helper function for virtqueue initialization
fuse: Get rid of no_mount_options
fuse,virtiofs: Add a mount option to enable dax
fuse,virtiofs: Keep a list of free dax memory ranges
fuse: implement FUSE_INIT map_alignment field
fuse: Introduce setupmapping/removemapping commands
fuse, dax: Implement dax read/write operations
fuse,virtiofs: Define dax address space operations
fuse, dax: Serialize truncate/punch_hole and dax fault path
fuse,virtiofs: Maintain a list of busy elements
fuse,virtiofs: Add logic to free up a memory range
drivers/dax/super.c | 3 +-
drivers/virtio/virtio_mmio.c | 31 +
drivers/virtio/virtio_pci_modern.c | 95 +++
fs/dax.c | 29 +-
fs/fuse/dir.c | 32 +-
fs/fuse/file.c | 1198 +++++++++++++++++++++++++++-
fs/fuse/fuse_i.h | 114 ++-
fs/fuse/inode.c | 146 +++-
fs/fuse/virtio_fs.c | 279 ++++++-
include/linux/dax.h | 6 +
include/linux/virtio_config.h | 17 +
include/uapi/linux/fuse.h | 34 +-
include/uapi/linux/virtio_fs.h | 3 +
include/uapi/linux/virtio_mmio.h | 11 +
include/uapi/linux/virtio_pci.h | 11 +-
15 files changed, 1933 insertions(+), 76 deletions(-)
Cc: Jan Kara <jack(a)suse.cz>
Cc: Dave Chinner <david(a)fromorbit.com>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: "Michael S. Tsirkin" <mst(a)redhat.com>
Cc: Vishal L Verma <vishal.l.verma(a)intel.com>
--
2.25.4
1 year, 10 months
[PATCH 1/2] ndctl/namespace: skip zero namespaces when processing all namespaces.
by Michal Suchanek
The kernel always creates zero length namespace with uuid 0 in each
region.
When processing all namespaces the user gets confusing errors from ndctl
trying to process this namespace. Skip it.
The user can still specify the namespace by name directly in case
processing it is desirable.
Fixes: #41
Reviewed-by: Santosh S <santosh(a)fossix.org>
Tested-by: Harish Sriram <harish(a)linux.ibm.com>
Signed-off-by: Michal Suchanek <msuchanek(a)suse.de>
---
ndctl/namespace.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/ndctl/namespace.c b/ndctl/namespace.c
index e734248c9752..3fabe4799d75 100644
--- a/ndctl/namespace.c
+++ b/ndctl/namespace.c
@@ -2171,9 +2171,19 @@ static int do_xaction_namespace(const char *namespace,
ndctl_namespace_foreach_safe(region, ndns, _n) {
ndns_name = ndctl_namespace_get_devname(ndns);
- if (strcmp(namespace, "all") != 0
- && strcmp(namespace, ndns_name) != 0)
- continue;
+ if (strcmp(namespace, "all") == 0) {
+ static const uuid_t zero_uuid;
+ uuid_t uuid;
+
+ ndctl_namespace_get_uuid(ndns, uuid);
+ if (!ndctl_namespace_get_size(ndns) &&
+ !memcmp(uuid, zero_uuid, sizeof(uuid_t)))
+ continue;
+ } else {
+ if (strcmp(namespace, ndns_name) != 0)
+ continue;
+ }
+
switch (action) {
case ACTION_DISABLE:
rc = ndctl_namespace_disable_safe(ndns);
--
2.26.2
1 year, 10 months
[PATCH 0/9] THP iomap patches for 5.10
by Matthew Wilcox (Oracle)
These patches are carefully plucked from the THP series. I would like
them to hit 5.10 to make the THP patchset merge easier. Some of these
are just generic improvements that make sense on their own terms, but
the overall intent is to support THPs in iomap.
I'll send another patch series later today which are the changes to
iomap which don't pay their own way until we actually have THPs in the
page cache. I would like those to be reviewed with an eye to merging
them into 5.11.
Matthew Wilcox (Oracle) (9):
iomap: Fix misplaced page flushing
fs: Introduce i_blocks_per_page
iomap: Use kzalloc to allocate iomap_page
iomap: Use bitmap ops to set uptodate bits
iomap: Support arbitrarily many blocks per page
iomap: Convert read_count to byte count
iomap: Convert write_count to byte count
iomap: Convert iomap_write_end types
iomap: Change calling convention for zeroing
fs/dax.c | 13 ++--
fs/iomap/buffered-io.c | 145 ++++++++++++++++------------------------
fs/jfs/jfs_metapage.c | 2 +-
fs/xfs/xfs_aops.c | 2 +-
include/linux/dax.h | 3 +-
include/linux/pagemap.h | 16 +++++
6 files changed, 83 insertions(+), 98 deletions(-)
--
2.28.0
1 year, 10 months
[PATCH v3 0/7] bugfix and optimize for drivers/nvdimm
by Zhen Lei
v2 --> v3:
1. Fix spelling error of patch 1 subject: memmory --> memory
2. Add "Reviewed-by: Oliver O'Halloran <oohall(a)gmail.com>" into patch 1
3. Rewrite patch descriptions of Patch 1, 3, 4
4. Add 3 new trivial patches 5-7, I just found that yesterday.
5. Unify all "subsystem" names to "libnvdimm:"
v1 --> v2:
1. Add Fixes for Patch 1-2
2. Slightly change the subject and description of Patch 1
3. Add a new trivial Patch 4, I just found that yesterday.
v1:
I found a memleak when I learned the drivers/nvdimm code today. And I also
added a sanity check for priv->bus_desc.provider_name, because strdup()
maybe failed. Patch 3 is a trivial source code optimization.
Zhen Lei (7):
libnvdimm: fix memory leaks in of_pmem.c
libnvdimm: add sanity check for provider_name in
of_pmem_region_probe()
libnvdimm: simplify walk_to_nvdimm_bus()
libnvdimm: reduce an unnecessary if branch in nd_region_create()
libnvdimm: reduce an unnecessary if branch in nd_region_activate()
libnvdimm: make sure EXPORT_SYMBOL_GPL(nvdimm_flush) close to its
function
libnvdimm: slightly simplify available_slots_show()
drivers/nvdimm/bus.c | 7 +++----
drivers/nvdimm/dimm_devs.c | 5 ++---
drivers/nvdimm/of_pmem.c | 7 +++++++
drivers/nvdimm/region_devs.c | 13 ++++---------
4 files changed, 16 insertions(+), 16 deletions(-)
--
1.8.3
1 year, 10 months