Another proposal for DAX fault locking
by Jan Kara
Hello,
I was thinking about current issues with DAX fault locking [1] (data
corruption due to racing faults allocating blocks) and also races which
currently don't allow us to clear dirty tags in the radix tree due to races
between faults and cache flushing [2]. Both of these exist because we don't
have an equivalent of page lock available for DAX. While we have a
reasonable solution available for problem [1], so far I'm not aware of a
decent solution for [2]. After briefly discussing the issue with Mel he had
a bright idea that we could used hashed locks to deal with [2] (and I think
we can solve [1] with them as well). So my proposal looks as follows:
DAX will have an array of mutexes (the array can be made per device but
initially a global one should be OK). We will use mutexes in the array as a
replacement for page lock - we will use hashfn(mapping, index) to get
particular mutex protecting our offset in the mapping. On fault / page
mkwrite, we'll grab the mutex similarly to page lock and release it once we
are done updating page tables. This deals with races in [1]. When flushing
caches we grab the mutex before clearing writeable bit in page tables
and clearing dirty bit in the radix tree and drop it after we have flushed
caches for the pfn. This deals with races in [2].
Thoughts?
Honza
[1] http://oss.sgi.com/archives/xfs/2016-01/msg00575.html
[2] https://lists.01.org/pipermail/linux-nvdimm/2016-January/004057.html
--
Jan Kara <jack(a)suse.com>
SUSE Labs, CR
6 years, 6 months
[PATCH v2 0/2] Expose known poison in SPA ranges to the block layer
by Vishal Verma
v2:
- Move poison list walking from pmem to core (Dan)
- If the pmem namespace starts at an offset, account for that (Dan)
- Fix a bug in extended status checking for ars_status
- Remove a duplicate include in pmem.c (only introduced in v1)
- When doing an ars_status, don't error out if an ARS has not yet
been performed.
- When checking if ARS is supported, also check the extended status
and make sure ARS for persistent memory is supported (as opposed to
just volatile memory)
- Print a dev_err message if find_poison fails
- Collapse patches 2 and 3 into a single patch
This series does a few things:
- Retrieve all known poison in the system physical address (SPA) space
using ARS (Address Range Scrub) commands to firmware
- Store this poison in a new 'nd_poison' structure
- In pmem, consume the poison list and expose the ranges as bad sectors
This depends on the badblocks series sent out previously.
A tree with the latest revisions of both the badblocks patchset and this
can be found at:
https://git.kernel.org/cgit/linux/kernel/git/vishal/nvdimm.git/log/?h=err...
Vishal Verma (2):
nfit_test: Enable DSMs for all test NFITs
libnvdimm: Add a poison list and export badblocks
drivers/acpi/nfit.c | 203 +++++++++++++++++++++++++++++++++++++++
drivers/nvdimm/core.c | 187 ++++++++++++++++++++++++++++++++++++
drivers/nvdimm/nd-core.h | 3 +
drivers/nvdimm/nd.h | 6 ++
drivers/nvdimm/pmem.c | 6 ++
include/linux/libnvdimm.h | 1 +
tools/testing/nvdimm/test/nfit.c | 9 ++
7 files changed, 415 insertions(+)
--
2.5.0
6 years, 6 months
[PATCH v2 0/2] DAX bdev fixes - move flushing calls to FS
by Ross Zwisler
During testing of raw block devices + DAX I noticed that the struct
block_device that we were using for DAX operations was incorrect. For the
fault handlers, etc. we can just get the correct bdev via get_block(),
which is passed in as a function pointer, but for the *sync code and for
sector zeroing we don't have access to get_block(). This is also an issue
for XFS real-time devices, whenever we get those working.
Patch one of this series fixes the DAX sector zeroing code by explicitly
passing in a valid struct block_device.
Patch two of this series fixes DAX *sync support by moving calls to
dax_writeback_mapping_range() out of filemap_write_and_wait_range() and
into the filesystem/block device ->writepages function so that it can
supply us with a valid block device. This also fixes DAX code to properly
flush caches in response to sync(2).
Thanks to Jan Kara for his initial draft of patch 2:
https://lkml.org/lkml/2016/2/9/485
Here are the changes that I've made to that patch:
1) For DAX mappings, only return after calling
dax_writeback_mapping_range() if we encountered an error. In the non-error
case we still need to write back normal pages, else we lose metadata
updates.
2) In dax_writeback_mapping_range(), move the new check for
if (!mapping->nrexceptional || wbc->sync_mode != WB_SYNC_ALL)
above the i_blkbits check. In my testing I found cases where
dax_writeback_mapping_range() was called for inodes with i_blkbits !=
PAGE_SHIFT - I'm assuming these are internal metadata inodes? They have no
exceptional DAX entries to flush, so we have no work to do, but if we
return error from the i_blkbits check we will fail the overall writeback
operation. Please let me know if it seems wrong for us to be seeing inodes
set to use DAX but with i_blkbits != PAGE_SHIFT and I'll get more info.
3) In filemap_write_and_wait() and filemap_write_and_wait_range(), continue
the writeback in the case that DAX is enabled but we only have a nonzero
mapping->nrpages. As with 1) and 2), I believe this is necessary to
properly writeback metadata changes. If this sounds wrong, please let me
know and I'll get more info.
A working tree can be found here:
https://git.kernel.org/cgit/linux/kernel/git/zwisler/linux.git/log/?h=fsy...
Ross Zwisler (2):
dax: supply DAX clearing code with correct bdev
dax: move writeback calls into the filesystems
fs/block_dev.c | 16 +++++++++++++++-
fs/dax.c | 22 ++++++++++++----------
fs/ext2/inode.c | 17 +++++++++++++++--
fs/ext4/inode.c | 7 +++++++
fs/xfs/xfs_aops.c | 11 ++++++++++-
fs/xfs/xfs_aops.h | 1 +
fs/xfs/xfs_bmap_util.c | 3 ++-
include/linux/dax.h | 8 +++++---
mm/filemap.c | 12 ++++--------
9 files changed, 71 insertions(+), 26 deletions(-)
--
2.5.0
6 years, 6 months
Re: [PATCH] kvm: do not SetPageDirty from kvm_set_pfn_dirty for file mappings
by Maxim Patlasov
On 02/12/2016 05:48 AM, Dmitry Monakhov wrote:
> Maxim Patlasov <mpatlasov(a)virtuozzo.com> writes:
>
>> The patch solves the following problem: file system specific routines
>> involved in ordinary routine writeback process BUG_ON page_buffers()
>> because a page goes to writeback without buffer-heads attached.
>>
>> The way how kvm_set_pfn_dirty calls SetPageDirty works only for anon
>> mappings. For file mappings it is obviously incorrect - there page_mkwrite
>> must be called. It's not easy to add page_mkwrite call to kvm_set_pfn_dirty
>> because there is no universal way to find vma by pfn. But actually
>> SetPageDirty may be simply skipped in those cases. Below is a
>> justification.
> Confirm. I've hit that BUGON
> [ 4442.219121] ------------[ cut here ]------------
> [ 4442.219188] kernel BUG at fs/ext4/inode.c:2285!
> <...>
>
>> When guest modifies the content of a page with file mapping, kernel kvm
>> makes the page dirty by the following call-path:
>>
>> vmx_handle_exit ->
>> handle_ept_violation ->
>> __get_user_pages ->
>> page_mkwrite ->
>> SetPageDirty
>>
>> Since then, the page is dirty from both guest and host point of view. Then
>> the host makes writeback and marks the page as write-protected. So any
>> further write from the guest triggers call-path above again.
> Please elaborate exact call-path which marks host-page.
wb_workfn ->
wb_do_writeback ->
wb_writeback ->
__writeback_inodes_wb ->
writeback_sb_inodes ->
__writeback_single_inode ->
do_writepages ->
ext4_writepages ->
mpage_prepare_extent_to_map ->
mpage_process_page_bufs ->
mpage_submit_page ->
clear_page_dirty_for_io ->
page_mkclean ->
rmap_walk->
rmap_walk_file ->
page_mkclean_one->
pte_wrprotect ->
pte_clear_flags(pte, _PAGE_RW)
Thanks,
Maxim
>> So, for file mappings, it's not possible to have new data written to a page
>> inside the guest w/o corresponding SetPageDirty on the host.
>>
>> This makes explicit SetPageDirty from kvm_set_pfn_dirty redundant.
>>
>> Signed-off-by: Maxim Patlasov <mpatlasov(a)virtuozzo.com>
>> ---
>> virt/kvm/kvm_main.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index a11cfd2..5a7d3fa 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -1582,7 +1582,8 @@ void kvm_set_pfn_dirty(kvm_pfn_t pfn)
>> if (!kvm_is_reserved_pfn(pfn)) {
>> struct page *page = pfn_to_page(pfn);
>>
>> - if (!PageReserved(page))
>> + if (!PageReserved(page) &&
>> + (!page->mapping || PageAnon(page)))
>> SetPageDirty(page);
>> }
>> }
>>
>> _______________________________________________
>> Linux-nvdimm mailing list
>> Linux-nvdimm(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/linux-nvdimm
6 years, 6 months
[PATCH v11 0/4] Machine check recovery when kernel accesses poison
by Tony Luck
This series is initially targeted at the folks doing filesystems
on top of NVDIMMs. They really want to be able to return -EIO
when there is a h/w error (just like spinning rust, and SSD does).
I plan to use the same infrastructure to write a machine check aware
"copy_from_user()" that will SIGBUS the calling application when a
syscall touches poison in user space (just like we do when the application
touches the poison itself).
I've dropped off the "reviewed-by" tags that I collected back prior to
adding the new field to the exception table. Please send new ones
if you can.
Changes
V10->V11
Boris: Optimize for aligned case in __mcsafe_copy()
Boris: Add whitespace and comments to __mcsafe_copy() for readability
Boris: Move Xeon E7 check to Intel quirks
Boris: Simpler description for mce=recovery command line option
V9->V10
Andy: Commit comment in part 2 is stale - refers to "EXTABLE_CLASS_FAULT"
Boris: Part1 - Numerous spelling, grammar, etc. fixes
Boris: Part2 - No longer need #include <linux/module.h> (in either file).
V8->V9
Boris: Create a synthetic cpu capability for machine check recovery.
Changes V7-V8
Boris: Would be so much cleaner if we added a new field to the exception table
instead of squeezing bits into the fixup field. New field added
Tony: Documentation needs to be updated. Done
Changes V6-V7:
Boris: Why add/subtract 0x20000000? Added better comment provided by Andy
Boris: Churn. Part2 changes things only introduced in part1.
Merged parts 1&2 into one patch.
Ingo: Missing my sign off on part1. Added.
Changes V5-V6
Andy: Provoked massive re-write by providing what is now part1 of this
patch series. This frees up two bits in the exception table
fixup field that can be used to tag exception table entries
as different "classes". This means we don't need my separate
exception table fro machine checks. Also avoids duplicating
fixup actions for #PF and #MC cases that were in version 5.
Andy: Use C99 array initializers to tie the various class fixup
functions back to the defintions of each class. Also give the
functions meanningful names (not fixup_class0() etc.).
Boris: Cleaned up my lousy assembly code removing many spurious 'l'
modifiers on instructions.
Boris: Provided some helper functions for the machine check severity
calculation that make the code more readable.
Boris: Have __mcsafe_copy() return a structure with the 'remaining bytes'
in a separate field from the fault indicator. Boris had suggested
Linux -EFAULT/-EINVAL ... but I thought it made more sense to return
the exception number (X86_TRAP_MC, etc.) This finally kills off
BIT(63) which has been controversial throughout all the early versions
of this patch series.
Changes V4-V5
Tony: Extended __mcsafe_copy() to have fixup entries for both machine
check and page fault.
Changes V3-V4:
Andy: Simplify fixup_mcexception() by dropping used-once local variable
Andy: "Reviewed-by" tag added to part1
Boris: Moved new functions to memcpy_64.S and declaration to asm/string_64.h
Boris: Changed name s/mcsafe_memcpy/__mcsafe_copy/ to make it clear that this
is an internal function and that return value doesn't follow memcpy() semantics.
Boris: "Reviewed-by" tag added to parts 1&2
Changes V2-V3:
Andy: Don't hack "regs->ax = BIT(63) | addr;" in the machine check
handler. Now have better fixup code that computes the number
of remaining bytes (just like page-fault fixup).
Andy: #define for BIT(63). Done, plus couple of extra macros using it.
Boris: Don't clutter up generic code (like mm/extable.c) with this.
I moved everything under arch/x86 (the asm-generic change is
a more generic #define).
Boris: Dependencies for CONFIG_MCE_KERNEL_RECOVERY are too generic.
I made it a real menu item with default "n". Dan Williams
will use "select MCE_KERNEL_RECOVERY" from his persistent
filesystem code.
Boris: Simplify conditionals in mce.c by moving tolerant/kill_it
checks earlier, with a skip to end if they aren't set.
Boris: Miscellaneous grammar/punctuation. Fixed.
Boris: Don't leak spurious __start_mcextable symbols into kernels
that didn't configure MCE_KERNEL_RECOVERY. Done.
Tony: New code doesn't belong in user_copy_64.S/uaccess*.h. Moved
to new .S/.h files
Elliott:Cacheing behavior non-optimal. Could use movntdqa, vmovntdqa
or vmovntdqa on source addresses. I didn't fix this yet. Think
of the current mcsafe_memcpy() as the first of several functions.
This one is useful for small copies (meta-data) where the overhead
of saving SSE/AVX state isn't justified.
Changes V1->V2:
0-day: Reported build errors and warnings on 32-bit systems. Fixed
0-day: Reported bloat to tinyconfig. Fixed
Boris: Suggestions to use extra macros to reduce code duplication in _ASM_*EXTABLE. Done
Boris: Re-write "tolerant==3" check to reduce indentation level. See below.
Andy: Check IP is valid before searching kernel exception tables. Done.
Andy: Explain use of BIT(63) on return value from mcsafe_memcpy(). Done (added decode macros).
Andy: Untangle mess of code in tail of do_machine_check() to make it
clear what is going on (e.g. that we only enter the ist_begin_non_atomic()
if we were called from user code, not from kernel!). Done.
Tony Luck (4):
x86: Expand exception table to allow new handling options
x86, mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception
table entries
x86, mce: Add __mcsafe_copy()
x86: Create a new synthetic cpu capability for machine check recovery
Documentation/x86/exception-tables.txt | 35 +++++++
Documentation/x86/x86_64/boot-options.txt | 2 +
arch/x86/include/asm/asm.h | 40 ++++----
arch/x86/include/asm/cpufeature.h | 1 +
arch/x86/include/asm/mce.h | 1 +
arch/x86/include/asm/string_64.h | 8 ++
arch/x86/include/asm/uaccess.h | 16 ++--
arch/x86/kernel/cpu/mcheck/mce-severity.c | 22 ++++-
arch/x86/kernel/cpu/mcheck/mce.c | 83 +++++++++-------
arch/x86/kernel/kprobes/core.c | 2 +-
arch/x86/kernel/traps.c | 6 +-
arch/x86/kernel/x8664_ksyms_64.c | 2 +
arch/x86/lib/memcpy_64.S | 151 ++++++++++++++++++++++++++++++
arch/x86/mm/extable.c | 100 ++++++++++++++------
arch/x86/mm/fault.c | 2 +-
scripts/sortextable.c | 32 +++++++
16 files changed, 410 insertions(+), 93 deletions(-)
--
2.5.0
6 years, 6 months
Re: Regarding installing pmem on ubuntu 15.10.
by Ross Zwisler
On Wed, Feb 10, 2016 at 09:53:53PM -0800, NAGA VENKATA SAI INDUBHASKAR JUPUDI wrote:
> Hello Sir,
>
> I'm a student from University of Caliornia, SantaCruz. I'm trying to
> install pmem device on ubuntu 15.10 following the guideline mentioned in:
>
> https://nvdimm.wiki.kernel.org/
>
>
> But I'm stuck at one point. After configuring the correct kernel parameters
> should I need to do something more in order to get pmem device because when
> I'm trying to do
>
> fdisk -l /dev/pmem0
>
> I'm getting an error no such folder.
>
> I'm stuck at this point (highlighted in red)
>
> [image: Inline image 1]
>
> Can you please suggest me what am I doing wrong here. I apologize for the
> inconvenience caused.
>
>
> Thanks & Regards,
> NAGA VENKATA SAI INDUBHASKAR JUPUDI,
> Department of Computer Engineering,
> University of California, Santa Cruz.
Well, the implicit steps between "set up the correct kernel config" and "use
fdisk" are "compile a kernel with the new config, reboot using that newly
compiled kernel".
Also, make sure you are setting up the memmap kernel param correctly:
https://nvdimm.wiki.kernel.org/how_to_choose_the_correct_memmap_kernel_pa...
6 years, 6 months
[PATCH] kvm: do not SetPageDirty from kvm_set_pfn_dirty for file mappings
by Maxim Patlasov
The patch solves the following problem: file system specific routines
involved in ordinary routine writeback process BUG_ON page_buffers()
because a page goes to writeback without buffer-heads attached.
The way how kvm_set_pfn_dirty calls SetPageDirty works only for anon
mappings. For file mappings it is obviously incorrect - there page_mkwrite
must be called. It's not easy to add page_mkwrite call to kvm_set_pfn_dirty
because there is no universal way to find vma by pfn. But actually
SetPageDirty may be simply skipped in those cases. Below is a
justification.
When guest modifies the content of a page with file mapping, kernel kvm
makes the page dirty by the following call-path:
vmx_handle_exit ->
handle_ept_violation ->
__get_user_pages ->
page_mkwrite ->
SetPageDirty
Since then, the page is dirty from both guest and host point of view. Then
the host makes writeback and marks the page as write-protected. So any
further write from the guest triggers call-path above again.
So, for file mappings, it's not possible to have new data written to a page
inside the guest w/o corresponding SetPageDirty on the host.
This makes explicit SetPageDirty from kvm_set_pfn_dirty redundant.
Signed-off-by: Maxim Patlasov <mpatlasov(a)virtuozzo.com>
---
virt/kvm/kvm_main.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a11cfd2..5a7d3fa 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1582,7 +1582,8 @@ void kvm_set_pfn_dirty(kvm_pfn_t pfn)
if (!kvm_is_reserved_pfn(pfn)) {
struct page *page = pfn_to_page(pfn);
- if (!PageReserved(page))
+ if (!PageReserved(page) &&
+ (!page->mapping || PageAnon(page)))
SetPageDirty(page);
}
}
6 years, 6 months
[PATCH v10 0/4] Machine check recovery when kernel accesses poison
by Tony Luck
This series is initially targeted at the folks doing filesystems
on top of NVDIMMs. They really want to be able to return -EIO
when there is a h/w error (just like spinning rust, and SSD does).
I plan to use the same infrastructure to write a machine check aware
"copy_from_user()" that will SIGBUS the calling application when a
syscall touches poison in user space (just like we do when the application
touches the poison itself).
With this series applied Dan Williams can write:
static inline int arch_memcpy_from_pmem(void *dst, const void __pmem *src, size_t n)
{
if (static_cpu_has(X86_FEATURE_MCRECOVERY)) {
struct mcsafe_ret ret;
ret = __mcsafe_copy(dst, (void __force *) src, n);
if (ret.remain)
return -EIO;
return 0;
}
memcpy(dst, (void __force *) src, n);
return 0;
}
I've dropped off the "reviewed-by" tags that I collected back prior to
adding the new field to the exception table. Please send new ones
if you can.
Changes
V9-V10
Andy: Commit comment in part 2 is stale - refers to "EXTABLE_CLASS_FAULT"
Boris: Part1 - Numerous spelling, grammar, etc. fixes
Boris: Part2 - No longer need #include <linux/module.h> (in either file).
V8->V9
Boris: Create a synthetic cpu capability for machine check recovery.
Changes V7-V8
Boris: Would be so much cleaner if we added a new field to the exception table
instead of squeezing bits into the fixup field. New field added
Tony: Documentation needs to be updated. Done
Changes V6-V7:
Boris: Why add/subtract 0x20000000? Added better comment provided by Andy
Boris: Churn. Part2 changes things only introduced in part1.
Merged parts 1&2 into one patch.
Ingo: Missing my sign off on part1. Added.
Changes V5-V6
Andy: Provoked massive re-write by providing what is now part1 of this
patch series. This frees up two bits in the exception table
fixup field that can be used to tag exception table entries
as different "classes". This means we don't need my separate
exception table fro machine checks. Also avoids duplicating
fixup actions for #PF and #MC cases that were in version 5.
Andy: Use C99 array initializers to tie the various class fixup
functions back to the defintions of each class. Also give the
functions meanningful names (not fixup_class0() etc.).
Boris: Cleaned up my lousy assembly code removing many spurious 'l'
modifiers on instructions.
Boris: Provided some helper functions for the machine check severity
calculation that make the code more readable.
Boris: Have __mcsafe_copy() return a structure with the 'remaining bytes'
in a separate field from the fault indicator. Boris had suggested
Linux -EFAULT/-EINVAL ... but I thought it made more sense to return
the exception number (X86_TRAP_MC, etc.) This finally kills off
BIT(63) which has been controversial throughout all the early versions
of this patch series.
Changes V4-V5
Tony: Extended __mcsafe_copy() to have fixup entries for both machine
check and page fault.
Changes V3-V4:
Andy: Simplify fixup_mcexception() by dropping used-once local variable
Andy: "Reviewed-by" tag added to part1
Boris: Moved new functions to memcpy_64.S and declaration to asm/string_64.h
Boris: Changed name s/mcsafe_memcpy/__mcsafe_copy/ to make it clear that this
is an internal function and that return value doesn't follow memcpy() semantics.
Boris: "Reviewed-by" tag added to parts 1&2
Changes V2-V3:
Andy: Don't hack "regs->ax = BIT(63) | addr;" in the machine check
handler. Now have better fixup code that computes the number
of remaining bytes (just like page-fault fixup).
Andy: #define for BIT(63). Done, plus couple of extra macros using it.
Boris: Don't clutter up generic code (like mm/extable.c) with this.
I moved everything under arch/x86 (the asm-generic change is
a more generic #define).
Boris: Dependencies for CONFIG_MCE_KERNEL_RECOVERY are too generic.
I made it a real menu item with default "n". Dan Williams
will use "select MCE_KERNEL_RECOVERY" from his persistent
filesystem code.
Boris: Simplify conditionals in mce.c by moving tolerant/kill_it
checks earlier, with a skip to end if they aren't set.
Boris: Miscellaneous grammar/punctuation. Fixed.
Boris: Don't leak spurious __start_mcextable symbols into kernels
that didn't configure MCE_KERNEL_RECOVERY. Done.
Tony: New code doesn't belong in user_copy_64.S/uaccess*.h. Moved
to new .S/.h files
Elliott:Cacheing behavior non-optimal. Could use movntdqa, vmovntdqa
or vmovntdqa on source addresses. I didn't fix this yet. Think
of the current mcsafe_memcpy() as the first of several functions.
This one is useful for small copies (meta-data) where the overhead
of saving SSE/AVX state isn't justified.
Changes V1->V2:
0-day: Reported build errors and warnings on 32-bit systems. Fixed
0-day: Reported bloat to tinyconfig. Fixed
Boris: Suggestions to use extra macros to reduce code duplication in _ASM_*EXTABLE. Done
Boris: Re-write "tolerant==3" check to reduce indentation level. See below.
Andy: Check IP is valid before searching kernel exception tables. Done.
Andy: Explain use of BIT(63) on return value from mcsafe_memcpy(). Done (added decode macros).
Andy: Untangle mess of code in tail of do_machine_check() to make it
clear what is going on (e.g. that we only enter the ist_begin_non_atomic()
if we were called from user code, not from kernel!). Done.
Tony Luck (4):
x86: Expand exception table to allow new handling options
x86, mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception
table entries
x86, mce: Add __mcsafe_copy()
x86: Create a new synthetic cpu capability for machine check recovery
Documentation/x86/exception-tables.txt | 35 ++++++++
Documentation/x86/x86_64/boot-options.txt | 4 +
arch/x86/include/asm/asm.h | 40 +++++----
arch/x86/include/asm/cpufeature.h | 1 +
arch/x86/include/asm/mce.h | 1 +
arch/x86/include/asm/string_64.h | 8 ++
arch/x86/include/asm/uaccess.h | 16 ++--
arch/x86/kernel/cpu/mcheck/mce-severity.c | 22 ++++-
arch/x86/kernel/cpu/mcheck/mce.c | 81 ++++++++++--------
arch/x86/kernel/kprobes/core.c | 2 +-
arch/x86/kernel/traps.c | 6 +-
arch/x86/kernel/x8664_ksyms_64.c | 2 +
arch/x86/lib/memcpy_64.S | 134 ++++++++++++++++++++++++++++++
arch/x86/mm/extable.c | 100 +++++++++++++++-------
arch/x86/mm/fault.c | 2 +-
scripts/sortextable.c | 32 +++++++
16 files changed, 393 insertions(+), 93 deletions(-)
--
2.5.0
6 years, 6 months
Re: Another proposal for DAX fault locking
by Jan Kara
On Wed 10-02-16 15:29:34, Dmitry Monakhov wrote:
> Jan Kara <jack(a)suse.cz> writes:
>
> > Hello,
> >
> > I was thinking about current issues with DAX fault locking [1] (data
> > corruption due to racing faults allocating blocks) and also races which
> > currently don't allow us to clear dirty tags in the radix tree due to races
> > between faults and cache flushing [2]. Both of these exist because we don't
> > have an equivalent of page lock available for DAX. While we have a
> > reasonable solution available for problem [1], so far I'm not aware of a
> > decent solution for [2]. After briefly discussing the issue with Mel he had
> > a bright idea that we could used hashed locks to deal with [2] (and I think
> > we can solve [1] with them as well). So my proposal looks as follows:
> >
> > DAX will have an array of mutexes (the array can be made per device but
> > initially a global one should be OK). We will use mutexes in the array as a
> > replacement for page lock - we will use hashfn(mapping, index) to get
> > particular mutex protecting our offset in the mapping. On fault / page
> > mkwrite, we'll grab the mutex similarly to page lock and release it once we
> > are done updating page tables. This deals with races in [1]. When flushing
> > caches we grab the mutex before clearing writeable bit in page tables
> > and clearing dirty bit in the radix tree and drop it after we have flushed
> > caches for the pfn. This deals with races in [2].
> >
> > Thoughts?
> Agree, only small note:
> Hash locks has side effect for batch locking due to collision.
> Some times we want to lock several pages/entries (migration/defragmentation)
> So we will endup with deadlock due to hash collision.
Yeah, but at least for the purposes we want the locks for locking just one
'page' is enough. If we ever needed locking more 'pages', we would have to
choose a different locking scheme.
Honza
--
Jan Kara <jack(a)suse.com>
SUSE Labs, CR
6 years, 6 months
[ndctl PATCH] ndctl: rework release collateral
by Dan Williams
Prompted by a request to have the spec file reference the github tarball
URL, rework spec file generation:
1/ Use noinst_SCRIPTS to do the token replacements in ndctl.spec.in
using sed, kill contrib/genspec and kill the %lname and %dname variables
2/ Use the LIBNDCTL_CURRENT variable to name the sles library package
3/ Unify the git snapshot naming with github archive naming. Split git-version
off from git-version-gen as a helper for make-git-snapshot.sh.
4/ Specify --disable-silent-rules to configure
Reported-by: Ralf Corsepius <rc040203(a)freenet.de>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
Documentation/Makefile.am | 2 ++
Makefile.am | 17 +++++++++++++
contrib/Makefile | 39 -------------------------------
contrib/genspec.c | 53 ------------------------------------------
contrib/rpmbuild.sh | 4 ---
git-version | 16 ++++---------
git-version-gen | 35 ++--------------------------
make-git-snapshot.sh | 16 +++----------
ndctl.spec.in | 35 ++++++++++++----------------
rpmbuild.sh | 5 ++++
sles/header | 0
11 files changed, 51 insertions(+), 171 deletions(-)
delete mode 100644 contrib/Makefile
delete mode 100644 contrib/genspec.c
delete mode 100755 contrib/rpmbuild.sh
copy git-version-gen => git-version (73%)
rename contrib/make-git-snapshot.sh => make-git-snapshot.sh (55%)
rename contrib/ndctl.spec.in => ndctl.spec.in (74%)
create mode 100755 rpmbuild.sh
rename contrib/sles/header => sles/header (100%)
diff --git a/Documentation/Makefile.am b/Documentation/Makefile.am
index a168cec100c8..3b3336918516 100644
--- a/Documentation/Makefile.am
+++ b/Documentation/Makefile.am
@@ -8,6 +8,8 @@ man1_MANS = \
ndctl-create-namespace.1 \
ndctl-list.1
+CLEANFILES = $(man1_MANS)
+
XML_DEPS = \
$(top_srcdir)/version.m4 \
Makefile \
diff --git a/Makefile.am b/Makefile.am
index 1c6a4b07f8f1..61dce040131c 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -49,6 +49,23 @@ LIBNDCTL_CURRENT=6
LIBNDCTL_REVISION=0
LIBNDCTL_AGE=0
+noinst_SCRIPTS = rhel/ndctl.spec sles/ndctl.spec
+CLEANFILES += $(noinst_SCRIPTS)
+
+do_rhel_subst = sed -e 's,VERSION,$(VERSION),g' \
+ -e 's,DNAME,ndctl-devel,g' \
+ -e 's,LNAME,libndctl,g'
+
+do_sles_subst = sed -e 's,VERSION,$(VERSION),g' \
+ -e 's,DNAME,libndctl-devel,g' \
+ -e 's,LNAME,libndctl$(LIBNDCTL_CURRENT),g'
+
+rhel/ndctl.spec: ndctl.spec.in Makefile.am
+ $(AM_V_GEN)$(MKDIR_P) rhel; $(do_rhel_subst) < $< > $@
+
+sles/ndctl.spec: ndctl.spec.in Makefile.am
+ $(AM_V_GEN)$(MKDIR_P) sles; cat sles/header $< | $(do_sles_subst) > $@
+
pkginclude_HEADERS = lib/ndctl/libndctl.h
lib_LTLIBRARIES = lib/libndctl.la
diff --git a/contrib/Makefile b/contrib/Makefile
deleted file mode 100644
index 766e186d0389..000000000000
--- a/contrib/Makefile
+++ /dev/null
@@ -1,39 +0,0 @@
-CC=gcc
-CFLAGS=-c -Wall
-LDFLAGS=
-SRCS=genspec.c
-OBJS=$(SRCS:.c=.o)
-PROG=genspec
-SPEC_IN=ndctl.spec.in
-RHEL_SPEC=rhel/$(SPEC_IN:.in=)
-SLES_SPEC=sles/$(SPEC_IN:.in=)
-SLES_IN=sles/header
-COMMIT_ID=git log --pretty=format:"%h" -n 1
-
-all: $(RHEL_SPEC) $(SLES_SPEC)
-
-$(RHEL_SPEC) : $(SPEC_IN) $(PROG)
- @mkdir -p rhel
- cat $(SPEC_IN) | $(dir $(PROG))$(PROG) `$(COMMIT_ID)` rhel > $@
-
-$(SLES_SPEC) : $(SLES_IN) $(SPEC_IN) $(PROG)
- @mkdir -p sles
- cat $(SLES_IN) $(SPEC_IN) | $(dir $(PROG))$(PROG) `$(COMMIT_ID)` sles > $@
-
-$(PROG) : $(OBJS) Makefile
- $(CC) $(LDFLAGS) $(OBJS) -o $@
-
-.c.o:
- $(CC) $(CFLAGS) $< -o $@
-
-clean:
- rm $(OBJS) $(PROG) $(RHEL_SPEC) $(SLES_SPEC)
- @rmdir rhel sles
-
-depend: .depend
-
-.depend: $(SRCS)
- rm -f $@ > /dev/null 2>&1
- $(CC) $(CFLAGS) -MM $^ -MF $@
-
-include .depend
diff --git a/contrib/genspec.c b/contrib/genspec.c
deleted file mode 100644
index 829f648f0a2f..000000000000
--- a/contrib/genspec.c
+++ /dev/null
@@ -1,53 +0,0 @@
-#include <stdio.h>
-#include <string.h>
-#include "../config.h"
-
-static char *lname[] = {
- "ndctl-libs", "libndctl3",
-};
-
-static char *dname[] = {
- "ndctl-devel", "libndctl-devel",
-};
-
-static int license[] = {
- 1, 0,
-};
-
-int main(int argc, char **argv)
-{
- const char *commit = argv[1];
- char buf[1024];
- int os;
-
- if (argc != 3) {
- fprintf(stderr, "commit id and OS must be specified\n");
- return 1;
- }
-
- if (strcmp(argv[2], "rhel") == 0)
- os = 0;
- else if (strcmp(argv[2], "sles") == 0)
- os = 1;
- else
- return 1;
-
- while (fgets(buf, sizeof(buf), stdin)) {
- if (strncmp("Version:", buf, 8) == 0)
- fprintf(stdout, "Version: %s\n", &VERSION[1]);
- else if (strncmp("%global gitcommit", buf, 17) == 0)
- fprintf(stdout, "%%global gitcommit %s\n", commit);
- else if (strncmp("%define lname", buf, 12) == 0)
- fprintf(stdout, "%%define lname %s\n", lname[os]);
- else if (strncmp("%define dname", buf, 12) == 0)
- fprintf(stdout, "%%define dname %s\n", dname[os]);
- else if (strncmp("%license", buf, 8) == 0 && !license[os])
- fprintf(stdout, "%%doc %s", &buf[8]);
- else if (strncmp("echo \"\" > version", buf, 17) == 0)
- fprintf(stdout, "echo \"%s\" > version\n", VERSION);
- else
- fprintf(stdout, "%s", buf);
- }
-
- return 0;
-}
diff --git a/contrib/rpmbuild.sh b/contrib/rpmbuild.sh
deleted file mode 100755
index 97def14bd965..000000000000
--- a/contrib/rpmbuild.sh
+++ /dev/null
@@ -1,4 +0,0 @@
-#!/bin/bash
-$(dirname $0)/make-git-snapshot.sh
-make -C $(dirname $0)
-rpmbuild -bb $(dirname $0)/rhel/ndctl.spec
diff --git a/git-version-gen b/git-version
similarity index 73%
copy from git-version-gen
copy to git-version
index 9cd2ebf1407a..3b5217752924 100755
--- a/git-version-gen
+++ b/git-version
@@ -9,7 +9,6 @@ dirty() {
fi
}
-GVF=version.m4
DEF_VER=v50
LF='
@@ -34,14 +33,9 @@ EOF
VN="$(dirty ${DEF_VER}.git$COMMIT)"
fi
-if test -r $GVF; then
- VC=$(sed -e 's/m4_define(\[GIT_VERSION], \[//' <$GVF)
- VC=$(echo $VC | sed -e 's/\])//')
-else
- VC=unset
+#drop leading 'v' out of the version so its a pure number
+if [ ${VN:0:1} = v ]; then
+ VN=${VN:1}
fi
-test "$VN" = "$VC" || {
- echo >&2 "GIT_VERSION = $VN"
- echo "m4_define([GIT_VERSION], [$VN])" >$GVF
- exit 0
-}
+
+echo $VN
diff --git a/git-version-gen b/git-version-gen
index 9cd2ebf1407a..e913a3674eef 100755
--- a/git-version-gen
+++ b/git-version-gen
@@ -1,38 +1,6 @@
#!/bin/sh
-dirty() {
- git update-index -q --refresh
- if test -z "$(git diff-index --name-only HEAD --)"; then
- echo "$1"
- else
- echo "${1}.dirty"
- fi
-}
-
GVF=version.m4
-DEF_VER=v50
-
-LF='
-'
-
-# First see if there is a version file (included in release tarballs),
-# then try git-describe, then default.
-if test -f version; then
- VN=$(cat version) || VN="$DEF_VER"
-elif test -d ${GIT_DIR:-.git} -o -f .git &&
- VN=$(git describe --match "v[0-9]*" --abbrev=7 HEAD 2>/dev/null) &&
- case "$VN" in
- *$LF*) (exit 1) ;;
- v[0-9]*)
- VN="$(dirty $VN)"
- esac; then
- VN=$(echo "$VN" | sed -e 's/-/./g');
-else
- read COMMIT COMMIT_SUBJECT <<EOF
- $(git log --oneline --abbrev=8 -n1 HEAD)
-EOF
- VN="$(dirty ${DEF_VER}.git$COMMIT)"
-fi
if test -r $GVF; then
VC=$(sed -e 's/m4_define(\[GIT_VERSION], \[//' <$GVF)
@@ -40,6 +8,9 @@ if test -r $GVF; then
else
VC=unset
fi
+
+VN=$(./git-version)
+
test "$VN" = "$VC" || {
echo >&2 "GIT_VERSION = $VN"
echo "m4_define([GIT_VERSION], [$VN])" >$GVF
diff --git a/contrib/make-git-snapshot.sh b/make-git-snapshot.sh
similarity index 55%
rename from contrib/make-git-snapshot.sh
rename to make-git-snapshot.sh
index e031223c740f..2825ac4321cd 100755
--- a/contrib/make-git-snapshot.sh
+++ b/make-git-snapshot.sh
@@ -14,16 +14,8 @@ trap 'rm -rf $WORKDIR' exit
[ -d "$REFDIR" ] && REFERENCE="--reference $REFDIR"
git clone $REFERENCE "$UPSTREAM" "$WORKDIR"
-pushd "$WORKDIR" > /dev/null
-git branch to-archive $HEAD
-read COMMIT_SHORTID COMMIT_TITLE <<EOGIT
-$(git log to-archive^..to-archive --pretty='format:%h %s')
-EOGIT
-popd > /dev/null
+VERSION=$(./git-version)
+DIRNAME="ndctl-${VERSION}"
+git archive --remote="$WORKDIR" --format=tar --prefix="$DIRNAME/" HEAD | gzip > $OUTDIR/"v${VERSION}.tar.gz"
-echo "Making git snapshot using commit: $COMMIT_SHORTID $COMMIT_TITLE"
-
-DIRNAME="$NAME-git$COMMIT_SHORTID"
-git archive --remote="$WORKDIR" --format=tar --prefix="$DIRNAME/" to-archive | xz -9 > $OUTDIR/"$DIRNAME.tar.xz"
-
-echo "Written $OUTDIR/$DIRNAME.tar.xz"
+echo "Written $OUTDIR/v${VERSION}.tar.gz"
diff --git a/contrib/ndctl.spec.in b/ndctl.spec.in
similarity index 74%
rename from contrib/ndctl.spec.in
rename to ndctl.spec.in
index da0c9cee947a..e24b31cc54f0 100644
--- a/contrib/ndctl.spec.in
+++ b/ndctl.spec.in
@@ -1,16 +1,11 @@
-%global gitcommit
-%define lname
-%define dname
-
Name: ndctl
-Version:
+Version: VERSION
Release: 1%{?dist}
Summary: Manage "libnvdimm" subsystem devices (Non-volatile Memory)
License: GPLv2
Group: Hardware/Other
Url: https://github.com/pmem/ndctl
-# Snapshot tarball can be created using: ./make-git-shapshot.sh [gitcommit]
-Source0: %{name}-git%{gitcommit}.tar.xz
+Source0: https://github.com/pmem/ndctl/archive/v%{version}.tar.gz
BuildRequires: autoconf
BuildRequires: asciidoc
@@ -26,36 +21,36 @@ BuildRequires: pkgconfig(json-c)
%description
Utility library for managing the "libnvdimm" subsystem. The "libnvdimm"
subsystem defines a kernel device model and control message interface for
-platform NVDIMM resources like those defined by the ACPI 6.0 NFIT (NVDIMM
+platform NVDIMM resources like those defined by the ACPI 6+ NFIT (NVDIMM
Firmware Interface Table).
-%package -n %dname
+%package -n DNAME
Summary: Development files for libndctl
License: LGPLv2
Group: Development/Libraries/Other
-Requires: %{lname}%{?_isa} = %{version}-%{release}
+Requires: LNAME%{?_isa} = %{version}-%{release}
-%description -n %dname
+%description -n DNAME
The %{name}-devel package contains libraries and header files for
developing applications that use %{name}.
-%package -n %lname
+%package -n LNAME
Summary: Management library for "libnvdimm" subsystem devices (Non-volatile Memory)
License: LGPLv2
Group: System/Libraries
-%description -n %lname
+%description -n LNAME
Libraries for %{name}.
%prep
-%setup -q %{?gitcommit:-n %{name}-git%{gitcommit}}
+%setup -q v%{version}
%build
-echo "" > version
+echo "VERSION" > version
./autogen.sh
-%configure --disable-static --enable-local
+%configure --disable-static --enable-local --disable-silent-rules
make %{?_smp_mflags}
%install
@@ -65,9 +60,9 @@ find $RPM_BUILD_ROOT -name '*.la' -exec rm -f {} ';'
%check
make check
-%post -n %lname -p /sbin/ldconfig
+%post -n LNAME -p /sbin/ldconfig
-%postun -n %lname -p /sbin/ldconfig
+%postun -n LNAME -p /sbin/ldconfig
%files
%defattr(-,root,root)
@@ -75,13 +70,13 @@ make check
%{_bindir}/ndctl
%{_mandir}/man1/*
-%files -n %lname
+%files -n LNAME
%defattr(-,root,root)
%doc README.md
%license COPYING licenses/BSD-MIT licenses/CC0
%{_libdir}/libndctl.so.*
-%files -n %dname
+%files -n DNAME
%defattr(-,root,root)
%license COPYING
%{_includedir}/ndctl/
diff --git a/rpmbuild.sh b/rpmbuild.sh
new file mode 100755
index 000000000000..4535b4654409
--- /dev/null
+++ b/rpmbuild.sh
@@ -0,0 +1,5 @@
+#!/bin/bash
+pushd $(dirname $0) >/dev/null
+./make-git-snapshot.sh
+popd > /dev/null
+rpmbuild -ba $(dirname $0)/rhel/ndctl.spec
diff --git a/contrib/sles/header b/sles/header
similarity index 100%
rename from contrib/sles/header
rename to sles/header
6 years, 6 months