[PATCH v9 0/4] iommu/smmu-v3: Workaround for hisilicon 161010801 erratum(reserve HW MSI)
by Shameer Kolothum
On certain HiSilicon platforms (hip06/hip07) the GIC ITS and PCIe RC
deviates from the standard implementation and this breaks PCIe MSI
functionality when SMMU is enabled.
The HiSilicon erratum 161010801 describes this limitation of certain
HiSilicon platforms to support the SMMU mappings for MSI transactions.
On these platforms GICv3 ITS translator is presented with the deviceID
by extending the MSI payload data to 64 bits to include the deviceID.
Hence, the PCIe controller on this platforms has to differentiate the MSI
payload against other DMA payload and has to modify the MSI payload.
This basically makes it difficult for this platforms to have a SMMU
translation for MSI.
This patch implements an ACPI and DT based quirk to reserve the hw msi
regions in the smmu-v3 driver which means these address regions will
not be translated and will be excluded from iova allocations.
To implement this quirk, the following changes are incorporated:
1. Added a generic helper function to IORT code to retrieve the
associated ITS base address from a device IORT node.
2. Added a generic helper function to of iommu code to retrieve the
associated msi controller base address from for a PCI RC
msi-mapping and also platform device msi-parent.
3. Blacklisted HiSilicon PCIe controllers on DT based hip06/hip07
platforms when SMMUv3 is enabled as there is no DT based solution
for this as of now.
Changelog:
v8 --> v9
-Thanks to Marc, fixed IORT helper function to reserve the ITS
translater region only.
-Removed the DT support for MSI reservation and blacklisted
HiSilicon PCIe controllers on DT based systems when SMMUv3 is
enabled.
v7 --> v8
Addressed comments from Rob and Lorenzo:
-Modified to use DT compatible string for errata.
-Changed logic to retrieve the msi-parent for DT case.
v6 --> v7
Addressed request from Will to add DT support for the erratum:
- added bt binding
- add of_iommu_msi_get_resv_regions()
New arm64 silicon errata entry
Rename iort_iommu_{its->msi}_get_resv_regions
v5 --> v6
Addressed comments from Robin and Lorenzo:
-No change to patch#1 .
-Reverted v5 patch#2 as this might break the platforms where this quirk
is not applicable. Provided a generic function in iommu code and added
back the quirk implementation in SMMU v3 driver(patch#3)
v4 --> v5
Addressed comments from Robin and Lorenzo:
-Added a comment to make it clear that, for now, only straightforward
HW topologies are handled while reserving ITS regions(patch #1).
v3 --> v4
Rebased on 4.13-rc1.
Addressed comments from Robin, Will and Lorenzo:
-As suggested by Robin, moved the ITS msi reservation into
iommu_dma_get_resv_regions().
-Added its_count != resv region failure case(patch #1).
v2 --> v3
Addressed comments from Lorenzo and Robin:
-Removed dev_is_pci() check in smmuV3 driver.
-Don't treat device not having an ITS mapping as an error in
iort helper function.
v1 --> v2
-patch 2/2: Invoke iort helper fn based on fwnode type(acpi).
RFCv2 -->PATCH
-Incorporated Lorenzo's review comments.
RFC v1 --> RFC v2
Based on Robin's review comments,
-Removed the generic erratum framework.
-Using IORT/MADT tables to retrieve the ITS base addr instead of vendor specific CSRT table.
Shameer Kolothum (4):
ACPI/IORT: Add msi address regions reservation helper
iommu/dma: Add a helper function to reserve HW MSI address regions for
IOMMU drivers
iommu/arm-smmu-v3:Enable ACPI based HiSilicon erratum 161010801
PCI: hisi: blacklist hip06/hip07 controllers behind SMMUv3
drivers/acpi/arm64/iort.c | 97 ++++++++++++++++++++++++++++++++++++++--
drivers/iommu/arm-smmu-v3.c | 27 ++++++++---
drivers/iommu/dma-iommu.c | 20 +++++++++
drivers/irqchip/irq-gic-v3-its.c | 3 +-
drivers/pci/dwc/pcie-hisi.c | 12 +++++
include/linux/acpi_iort.h | 7 ++-
include/linux/dma-iommu.h | 7 +++
7 files changed, 163 insertions(+), 10 deletions(-)
--
1.9.1
4 years, 7 months
[pm:bleeding-edge 87/88] include/linux/export.h:67:20: error: redefinition of '__kstrtab_acpi_subsys_freeze_noirq'
by kbuild test robot
tree: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git bleeding-edge
head: 9b3e83b2d7070805ab1890f18d0b3f2cdf177786
commit: e1a07b46a16732e00be7d1be004858fbbc524740 [87/88] ACPI / PM: Take SMART_SUSPEND driver flag into account
config: i386-randconfig-x074-201743 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
git checkout e1a07b46a16732e00be7d1be004858fbbc524740
# save the attached .config to linux build tree
make ARCH=i386
All error/warnings (new ones prefixed by >>):
In file included from include/linux/linkage.h:6:0,
from include/linux/kernel.h:6,
from include/linux/list.h:8,
from include/linux/resource_ext.h:17,
from include/linux/acpi.h:26,
from drivers/acpi/device_pm.c:21:
>> include/linux/export.h:67:20: error: redefinition of '__kstrtab_acpi_subsys_freeze_noirq'
static const char __kstrtab_##sym[] \
^
include/linux/export.h:100:25: note: in expansion of macro '___EXPORT_SYMBOL'
#define __EXPORT_SYMBOL ___EXPORT_SYMBOL
^~~~~~~~~~~~~~~~
include/linux/export.h:107:2: note: in expansion of macro '__EXPORT_SYMBOL'
__EXPORT_SYMBOL(sym, "_gpl")
^~~~~~~~~~~~~~~
>> drivers/acpi/device_pm.c:1144:1: note: in expansion of macro 'EXPORT_SYMBOL_GPL'
EXPORT_SYMBOL_GPL(acpi_subsys_freeze_noirq);
^~~~~~~~~~~~~~~~~
include/linux/export.h:67:20: note: previous definition of '__kstrtab_acpi_subsys_freeze_noirq' was here
static const char __kstrtab_##sym[] \
^
include/linux/export.h:100:25: note: in expansion of macro '___EXPORT_SYMBOL'
#define __EXPORT_SYMBOL ___EXPORT_SYMBOL
^~~~~~~~~~~~~~~~
include/linux/export.h:107:2: note: in expansion of macro '__EXPORT_SYMBOL'
__EXPORT_SYMBOL(sym, "_gpl")
^~~~~~~~~~~~~~~
drivers/acpi/device_pm.c:1124:1: note: in expansion of macro 'EXPORT_SYMBOL_GPL'
EXPORT_SYMBOL_GPL(acpi_subsys_freeze_noirq);
^~~~~~~~~~~~~~~~~
include/linux/export.h:70:36: error: redefinition of '__ksymtab_acpi_subsys_freeze_noirq'
static const struct kernel_symbol __ksymtab_##sym \
^
include/linux/export.h:100:25: note: in expansion of macro '___EXPORT_SYMBOL'
#define __EXPORT_SYMBOL ___EXPORT_SYMBOL
^~~~~~~~~~~~~~~~
include/linux/export.h:107:2: note: in expansion of macro '__EXPORT_SYMBOL'
__EXPORT_SYMBOL(sym, "_gpl")
^~~~~~~~~~~~~~~
>> drivers/acpi/device_pm.c:1144:1: note: in expansion of macro 'EXPORT_SYMBOL_GPL'
EXPORT_SYMBOL_GPL(acpi_subsys_freeze_noirq);
^~~~~~~~~~~~~~~~~
include/linux/export.h:70:36: note: previous definition of '__ksymtab_acpi_subsys_freeze_noirq' was here
static const struct kernel_symbol __ksymtab_##sym \
^
include/linux/export.h:100:25: note: in expansion of macro '___EXPORT_SYMBOL'
#define __EXPORT_SYMBOL ___EXPORT_SYMBOL
^~~~~~~~~~~~~~~~
include/linux/export.h:107:2: note: in expansion of macro '__EXPORT_SYMBOL'
__EXPORT_SYMBOL(sym, "_gpl")
^~~~~~~~~~~~~~~
drivers/acpi/device_pm.c:1124:1: note: in expansion of macro 'EXPORT_SYMBOL_GPL'
EXPORT_SYMBOL_GPL(acpi_subsys_freeze_noirq);
^~~~~~~~~~~~~~~~~
--
In file included from include/linux/linkage.h:6:0,
from include/linux/kernel.h:6,
from include/linux/list.h:8,
from include/linux/resource_ext.h:17,
from include/linux/acpi.h:26,
from drivers//acpi/device_pm.c:21:
>> include/linux/export.h:67:20: error: redefinition of '__kstrtab_acpi_subsys_freeze_noirq'
static const char __kstrtab_##sym[] \
^
include/linux/export.h:100:25: note: in expansion of macro '___EXPORT_SYMBOL'
#define __EXPORT_SYMBOL ___EXPORT_SYMBOL
^~~~~~~~~~~~~~~~
include/linux/export.h:107:2: note: in expansion of macro '__EXPORT_SYMBOL'
__EXPORT_SYMBOL(sym, "_gpl")
^~~~~~~~~~~~~~~
drivers//acpi/device_pm.c:1144:1: note: in expansion of macro 'EXPORT_SYMBOL_GPL'
EXPORT_SYMBOL_GPL(acpi_subsys_freeze_noirq);
^~~~~~~~~~~~~~~~~
include/linux/export.h:67:20: note: previous definition of '__kstrtab_acpi_subsys_freeze_noirq' was here
static const char __kstrtab_##sym[] \
^
include/linux/export.h:100:25: note: in expansion of macro '___EXPORT_SYMBOL'
#define __EXPORT_SYMBOL ___EXPORT_SYMBOL
^~~~~~~~~~~~~~~~
include/linux/export.h:107:2: note: in expansion of macro '__EXPORT_SYMBOL'
__EXPORT_SYMBOL(sym, "_gpl")
^~~~~~~~~~~~~~~
drivers//acpi/device_pm.c:1124:1: note: in expansion of macro 'EXPORT_SYMBOL_GPL'
EXPORT_SYMBOL_GPL(acpi_subsys_freeze_noirq);
^~~~~~~~~~~~~~~~~
include/linux/export.h:70:36: error: redefinition of '__ksymtab_acpi_subsys_freeze_noirq'
static const struct kernel_symbol __ksymtab_##sym \
^
include/linux/export.h:100:25: note: in expansion of macro '___EXPORT_SYMBOL'
#define __EXPORT_SYMBOL ___EXPORT_SYMBOL
^~~~~~~~~~~~~~~~
include/linux/export.h:107:2: note: in expansion of macro '__EXPORT_SYMBOL'
__EXPORT_SYMBOL(sym, "_gpl")
^~~~~~~~~~~~~~~
drivers//acpi/device_pm.c:1144:1: note: in expansion of macro 'EXPORT_SYMBOL_GPL'
EXPORT_SYMBOL_GPL(acpi_subsys_freeze_noirq);
^~~~~~~~~~~~~~~~~
include/linux/export.h:70:36: note: previous definition of '__ksymtab_acpi_subsys_freeze_noirq' was here
static const struct kernel_symbol __ksymtab_##sym \
^
include/linux/export.h:100:25: note: in expansion of macro '___EXPORT_SYMBOL'
#define __EXPORT_SYMBOL ___EXPORT_SYMBOL
^~~~~~~~~~~~~~~~
include/linux/export.h:107:2: note: in expansion of macro '__EXPORT_SYMBOL'
__EXPORT_SYMBOL(sym, "_gpl")
^~~~~~~~~~~~~~~
drivers//acpi/device_pm.c:1124:1: note: in expansion of macro 'EXPORT_SYMBOL_GPL'
EXPORT_SYMBOL_GPL(acpi_subsys_freeze_noirq);
^~~~~~~~~~~~~~~~~
vim +/__kstrtab_acpi_subsys_freeze_noirq +67 include/linux/export.h
f5016932 Paul Gortmaker 2011-05-23 62
f5016932 Paul Gortmaker 2011-05-23 63 /* For every exported symbol, place a struct in the __ksymtab section */
f2355416 Nicolas Pitre 2016-01-22 64 #define ___EXPORT_SYMBOL(sym, sec) \
f5016932 Paul Gortmaker 2011-05-23 65 extern typeof(sym) sym; \
f5016932 Paul Gortmaker 2011-05-23 66 __CRC_SYMBOL(sym, sec) \
f5016932 Paul Gortmaker 2011-05-23 @67 static const char __kstrtab_##sym[] \
f5016932 Paul Gortmaker 2011-05-23 68 __attribute__((section("__ksymtab_strings"), aligned(1))) \
b92021b0 Rusty Russell 2013-03-15 69 = VMLINUX_SYMBOL_STR(sym); \
b67067f1 Nicholas Piggin 2016-08-24 70 static const struct kernel_symbol __ksymtab_##sym \
f5016932 Paul Gortmaker 2011-05-23 71 __used \
b67067f1 Nicholas Piggin 2016-08-24 72 __attribute__((section("___ksymtab" sec "+" #sym), used)) \
f5016932 Paul Gortmaker 2011-05-23 73 = { (unsigned long)&sym, __kstrtab_##sym }
f5016932 Paul Gortmaker 2011-05-23 74
:::::: The code at line 67 was first introduced by commit
:::::: f50169324df4ad942e544386d136216c8617636a module.h: split out the EXPORT_SYMBOL into export.h
:::::: TO: Paul Gortmaker <paul.gortmaker(a)windriver.com>
:::::: CC: Paul Gortmaker <paul.gortmaker(a)windriver.com>
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
4 years, 8 months
Re: [Devel] [PATCH v5 1/2] acpi: apei: remove the unused dead-code for SEA/NMI notification type
by gengdongjiu
>
> On Wed, Oct 18, 2017 at 08:27:00PM +0800, gengdongjiu wrote:
> > For this patch(the first one), whether it can be firstly applied?
>
> Sure:
>
> Reviewed-by: Borislav Petkov <bp(a)suse.de>
Thanks Boris.
Hi Rafael,
If you think this patch is no problem, you can firstly apply this one. Boris have reviewed it. The first one is not related with the second. For the second one, I may need discuss more with James. Thanks so much.
>
> --
> Regards/Gruss,
> Boris.
>
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
> --
4 years, 8 months
Re: [Devel] [PATCH v5 2/2] acpi: apei: Add SEI notification type support for ARMv8
by gengdongjiu
HI James,
Thanks for the mail, for your some question, I think that is how to use NOTIFY_SEI, this patch just add a API to other user calling(such as KVM), because KVM needs to use that.
>
> Hi Dongjiu Geng,
>
> On 17/10/17 09:02, Dongjiu Geng wrote:
> > ARMv8.2 requires implementation of the RAS extension, in this
> > extension it adds SEI(SError Interrupt) notification type, this patch
> > adds new GHES error source SEI handling functions.
>
> This paragraph is merging two things that aren't related.
> The 'ARM v8.2 architecture extensions' have some RAS bits, which if your CPU implements v8.2 are required.
>
> ACPIv6.1 added NOTIFY_SEI as a notification type for ARMv8 systems.
>
> This patch adds a GHES function for NOTIFY_SEI. Please leave the CPU RAS extensions out of it.
>
>
> > Because this error source parsing and handling methods are similar
> > with the SEA. So share some SEA handling functions with the SEI
> >
> > Expose one API ghes_notify_abort() to external users. External modules
> > can call this exposed API to parse and handle the SEA or SEI.
>
> This series doesn't add a caller/user for this new API, so why do we need to do this now?
In RAS virtualization patch, for SEI, firstly I does not call APEI GHES driver, you suggest me calling APEI code. So I add NOTIFY_SEI support.
>
> (I still haven't had a usable answer for 'what does your firmware do when SError is masked', but I'll go beat that drum on the other thread).
>
>
> More important for the APEI code is: How do SEA and SEI interact?
The SEA and SEI can interact each other, in some case, it should be allow nesting
maybe we need to consider a better way. But I think the method should be implemented in another place, not in this pace.
That is say, it is implemented in the caller or user.
this patch just add a NOTIFY_SEI support, so that other user can call such as KVM.
>
> As far as I can see they can both interrupt each other, which isn't something the single in_nmi() path in APEI can handle. I thinks we should
> fix this first.
> (I'll try and polish my RFC that had a stab at that...)
Your fixing method use some lock? but I think it is not a good method. For example, when we handling SEI, then happen SEA, I think we need to handle the SEA immediately instead of waiting SEI finish handling.
>
>
> SEA gets away with a lot of things because its synchronous. SEI isn't. Xie XiuQi pointed to the memory_failure_queue() code. We can use
> this directly from SEA, but not SEI. (what happens if an SError arrives while we are queueing memory_failure work from an IRQ).
From my understand, xiexiuqi's code mainly solve
memory_failure() execution later than SEA/SEI handling, memory_failure() execution is in a workqueue process context,not in a SEA/SEI context. They are in different contex.
For example, after the SEA/SEI handing finished, the memory_failure() workqueue does not start running。so he add a flag to make sure this workqueue can be timely scheduled.
For your question: "what happens if an SError arrives while we are queueing memory_failure work from an IRQ",
I think SEA have the same issue, For example, when we are queueing memory_failure work from an IRQ, it happen SEA.
Do you mean xiexiuqi's patches can solve this problem?
>
> The one that scares me is the trace-point reporting stuff. What happens if an SError arrives while we are enabling a trace point? (these are
> static-keys right?)
>
>
> I don't think we can just plumb SEI in like this and be done with it.
> (I'm looking at teasing out the estatus cache code from being x86:NMI only. This way we solve the same 'cant do this from NMI context'
> with the same code'.)
Do you mean like this for caller/user to make sure it is from NMI context? Or do something in the GHES driver code to make sure it is in NMI context.
.............................
if (IS_ENABLED(CONFIG_ACPI_APEI_SEI)) {
if (interrupts_enabled(regs))
nmi_enter();
ret = ghes_notify_seI();
if (interrupts_enabled(regs))
nmi_exit();
}
..........................
>
>
> Thanks,
>
> James
>
>
>
> boring nits below:
>
> > Note: For the SEI(SError Interrupt), it is asynchronous external
> > abort, the error address recorded by firmware may be not accurate.
> > If not accurate, EL3 firmware needs to identify the address to a
> > invalid value.
>
> This paragraph keeps cropping up. Who expects an address with an SError?
> We don't get one for IRQs, but that never needs stating.
I do not check more code about the IRQ. Do you mean there is no address for IRQ?
If so, why you have question "what happens if an SError arrives while we are queueing memory_failure work from an IRQ"?
Only when have address, the memory_failure work can be called.
>
>
> > Cc: Borislav Petkov <bp(a)suse.de>
> > Cc: James Morse <james.morse(a)arm.com>
> > Signed-off-by: Dongjiu Geng <gengdongjiu(a)huawei.com>
> > Tested-by: Tyler Baicar <tbaicar(a)codeaurora.org>
>
> > Tested-by: Dongjiu Geng <gengdongjiu(a)huawei.com>
>
> (It's expected you test your own code)
>
>
>
> > diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index
> > 2509e4f..c98c1b3 100644
> > --- a/arch/arm64/mm/fault.c
> > +++ b/arch/arm64/mm/fault.c
> > @@ -585,7 +585,7 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
> > if (interrupts_enabled(regs))
> > nmi_enter();
> >
> > - ret = ghes_notify_sea();
> > + ret = ghes_notify_abort(ACPI_HEST_NOTIFY_SEA);
> >
> > if (interrupts_enabled(regs))
> > nmi_exit();
> > @@ -682,7 +682,7 @@ int handle_guest_sea(phys_addr_t addr, unsigned int esr)
> > int ret = -ENOENT;
> >
> > if (IS_ENABLED(CONFIG_ACPI_APEI_SEA))
> > - ret = ghes_notify_sea();
> > + ret = ghes_notify_abort(ACPI_HEST_NOTIFY_SEA);
> >
> > return ret;
> > }
> > diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
> > index de14d49..47fcb0c 100644
> > --- a/drivers/acpi/apei/Kconfig
> > +++ b/drivers/acpi/apei/Kconfig
> > @@ -54,6 +54,21 @@ config ACPI_APEI_SEA
> > option allows the OS to look for such hardware error record, and
> > take appropriate action.
> >
> > +config ACPI_APEI_SEI
> > + bool "APEI Asynchronous SError Interrupt logging/recovering support"
> > + depends on ARM64 && ACPI_APEI_GHES
> > + default y
> > + help
> > + This option should be enabled if the system supports
> > + firmware first handling of SEI (asynchronous SError interrupt).
> > +
> > + SEI happens with asynchronous external abort for errors on device
> > + memory reads on ARMv8 systems. If a system supports firmware first
> > + handling of SEI, the platform analyzes and handles hardware error
> > + notifications from SEI, and it may then form a HW error record for
> > + the OS to parse and handle. This option allows the OS to look for
> > + such hardware error record, and take appropriate action.
> > +
> > config ACPI_APEI_MEMORY_FAILURE
> > bool "APEI memory error recovering support"
> > depends on ACPI_APEI && MEMORY_FAILURE diff --git
> > a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index
> > 3eee30a..24b4233 100644
> > --- a/drivers/acpi/apei/ghes.c
> > +++ b/drivers/acpi/apei/ghes.c
> > @@ -815,43 +815,67 @@ static struct notifier_block ghes_notifier_hed =
> > {
> >
> > #ifdef CONFIG_ACPI_APEI_SEA
> > static LIST_HEAD(ghes_sea);
> > +#endif
> > +
> > +#ifdef CONFIG_ACPI_APEI_SEI
> > +static LIST_HEAD(ghes_sei);
> > +#endif
> >
> > +#if defined(CONFIG_ACPI_APEI_SEA) || defined(CONFIG_ACPI_APEI_SEI)
> > /*
> > - * Return 0 only if one of the SEA error sources successfully
> > reported an error
> > - * record sent from the firmware.
> > + * Return 0 only if one of the SEA or SEI error sources successfully
> > + * reported an error record sent from the firmware.
> > */
> > -int ghes_notify_sea(void)
> > +int ghes_notify_abort(u8 type)
> > {
> > struct ghes *ghes;
> > + struct list_head *head = NULL;
> > int ret = -ENOENT;
> >
> > - rcu_read_lock();
> > - list_for_each_entry_rcu(ghes, &ghes_sea, list) {
> > - if (!ghes_proc(ghes))
> > - ret = 0;
>
> > + if (type == ACPI_HEST_NOTIFY_SEA)
> > + head = &ghes_sea;
> > + else if (type == ACPI_HEST_NOTIFY_SEI)
> > + head = &ghes_sei;
>
> Surely if I only have one of CONFIG_ACPI_APEI_SE{A,I} this can't be compiled.
>
>
> > +
> > + if (head) {
> > + rcu_read_lock();
> > + list_for_each_entry_rcu(ghes, head, list) {
> > + if (!ghes_proc(ghes))
> > + ret = 0;
> > + }
> > + rcu_read_unlock();
> > }
> > - rcu_read_unlock();
> > return ret;
> > }
> >
> > -static void ghes_sea_add(struct ghes *ghes)
> > +static void ghes_abort_add(struct ghes *ghes)
> > {
> > - mutex_lock(&ghes_list_mutex);
> > - list_add_rcu(&ghes->list, &ghes_sea);
> > - mutex_unlock(&ghes_list_mutex);
> > + struct list_head *head = NULL;
> > + u8 notify_type = ghes->generic->notify.type;
> > +
>
> > + if (notify_type == ACPI_HEST_NOTIFY_SEA)
> > + head = &ghes_sea;
> > + else if (notify_type == ACPI_HEST_NOTIFY_SEI)
> > + head = &ghes_sei;
>
> And here.
>
>
> > +
> > + if (head) {
> > + mutex_lock(&ghes_list_mutex);
> > + list_add_rcu(&ghes->list, head);
> > + mutex_unlock(&ghes_list_mutex);
> > + }
> > }
> >
> > -static void ghes_sea_remove(struct ghes *ghes)
> > +static void ghes_abort_remove(struct ghes *ghes)
> > {
> > mutex_lock(&ghes_list_mutex);
> > list_del_rcu(&ghes->list);
> > mutex_unlock(&ghes_list_mutex);
> > synchronize_rcu();
> > }
> > -#else /* CONFIG_ACPI_APEI_SEA */
> > -static inline void ghes_sea_add(struct ghes *ghes) { } -static inline
> > void ghes_sea_remove(struct ghes *ghes) { } -#endif /*
> > CONFIG_ACPI_APEI_SEA */
> > +#else
> > +static inline void ghes_abort_add(struct ghes *ghes) { } static
> > +inline void ghes_abort_remove(struct ghes *ghes) { } #endif
> >
> > #ifdef CONFIG_HAVE_ACPI_APEI_NMI
> > /*
> > @@ -1084,6 +1108,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
> > goto err;
> > }
> > break;
> > + case ACPI_HEST_NOTIFY_SEI:
> > + if (!IS_ENABLED(CONFIG_ACPI_APEI_SEI)) {
> > + pr_warn(GHES_PFX "Generic hardware error source: %d notified via SEI is not supported!\n",
> > + generic->header.source_id);
> > + goto err;
> > + }
> > + break;
> > case ACPI_HEST_NOTIFY_NMI:
> > if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
> > pr_warn(GHES_PFX "Generic hardware error source: %d notified via
> > NMI interrupt is not supported!\n", @@ -1153,7 +1184,8 @@ static int ghes_probe(struct platform_device *ghes_dev)
> > break;
> >
> > case ACPI_HEST_NOTIFY_SEA:
> > - ghes_sea_add(ghes);
> > + case ACPI_HEST_NOTIFY_SEI:
> > + ghes_abort_add(ghes);
> > break;
> > case ACPI_HEST_NOTIFY_NMI:
> > ghes_nmi_add(ghes);
> > @@ -1206,7 +1238,8 @@ static int ghes_remove(struct platform_device *ghes_dev)
> > break;
> >
> > case ACPI_HEST_NOTIFY_SEA:
> > - ghes_sea_remove(ghes);
> > + case ACPI_HEST_NOTIFY_SEI:
> > + ghes_abort_remove(ghes);
> > break;
> > case ACPI_HEST_NOTIFY_NMI:
> > ghes_nmi_remove(ghes);
> > diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h index
> > 9061c5c..ec6f4ba 100644
> > --- a/include/acpi/ghes.h
> > +++ b/include/acpi/ghes.h
> > @@ -118,6 +118,6 @@ static inline void *acpi_hest_get_next(struct acpi_hest_generic_data *gdata)
> > (void *)section - (void *)(estatus + 1) < estatus->data_length; \
> > section = acpi_hest_get_next(section))
> >
> > -int ghes_notify_sea(void);
> > +int ghes_notify_abort(u8 type);
> >
> > #endif /* GHES_H */
> >
4 years, 8 months
Re: [Devel] [PATCH v5 1/2] acpi: apei: remove the unused dead-code for SEA/NMI notification type
by gengdongjiu
On 2017/10/18 18:17, Borislav Petkov wrote:
> On Wed, Oct 18, 2017 at 11:04:28AM +0800, gengdongjiu wrote:
>> ARM does not have ACPI_HEST_NOTIFY_NMI notification, which should only
>> used by x86. In the code, I see those guards are never used.
> Yeah, I see it now.
Borislav/Rafael,
For this patch(the first one), whether it can be firstly applied?
This patch is code clean and not related with the second one.
For the second, I may discuss more with James.
Thanks so much in advance.
>
> Thx.
>
> --
4 years, 8 months
Re: [Devel] [PATCH v5 2/2] acpi: apei: Add SEI notification type support for ARMv8
by James Morse
Hi Dongjiu Geng,
On 17/10/17 09:02, Dongjiu Geng wrote:
> ARMv8.2 requires implementation of the RAS extension, in
> this extension it adds SEI(SError Interrupt) notification
> type, this patch adds new GHES error source SEI handling
> functions.
This paragraph is merging two things that aren't related.
The 'ARM v8.2 architecture extensions' have some RAS bits, which if your CPU
implements v8.2 are required.
ACPIv6.1 added NOTIFY_SEI as a notification type for ARMv8 systems.
This patch adds a GHES function for NOTIFY_SEI. Please leave the CPU RAS
extensions out of it.
> Because this error source parsing and handling
> methods are similar with the SEA. So share some SEA handling
> functions with the SEI
>
> Expose one API ghes_notify_abort() to external users. External
> modules can call this exposed API to parse and handle the
> SEA or SEI.
This series doesn't add a caller/user for this new API, so why do we need to do
this now?
(I still haven't had a usable answer for 'what does your firmware do when SError
is masked', but I'll go beat that drum on the other thread).
More important for the APEI code is: How do SEA and SEI interact?
As far as I can see they can both interrupt each other, which isn't something
the single in_nmi() path in APEI can handle. I thinks we should fix this first.
(I'll try and polish my RFC that had a stab at that...)
SEA gets away with a lot of things because its synchronous. SEI isn't. Xie XiuQi
pointed to the memory_failure_queue() code. We can use this directly from SEA,
but not SEI. (what happens if an SError arrives while we are queueing
memory_failure work from an IRQ).
The one that scares me is the trace-point reporting stuff. What happens if an
SError arrives while we are enabling a trace point? (these are static-keys right?)
I don't think we can just plumb SEI in like this and be done with it.
(I'm looking at teasing out the estatus cache code from being x86:NMI only. This
way we solve the same 'cant do this from NMI context' with the same code'.)
Thanks,
James
boring nits below:
> Note: For the SEI(SError Interrupt), it is asynchronous external
> abort, the error address recorded by firmware may be not accurate.
> If not accurate, EL3 firmware needs to identify the address to a
> invalid value.
This paragraph keeps cropping up. Who expects an address with an SError?
We don't get one for IRQs, but that never needs stating.
> Cc: Borislav Petkov <bp(a)suse.de>
> Cc: James Morse <james.morse(a)arm.com>
> Signed-off-by: Dongjiu Geng <gengdongjiu(a)huawei.com>
> Tested-by: Tyler Baicar <tbaicar(a)codeaurora.org>
> Tested-by: Dongjiu Geng <gengdongjiu(a)huawei.com>
(It's expected you test your own code)
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 2509e4f..c98c1b3 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -585,7 +585,7 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
> if (interrupts_enabled(regs))
> nmi_enter();
>
> - ret = ghes_notify_sea();
> + ret = ghes_notify_abort(ACPI_HEST_NOTIFY_SEA);
>
> if (interrupts_enabled(regs))
> nmi_exit();
> @@ -682,7 +682,7 @@ int handle_guest_sea(phys_addr_t addr, unsigned int esr)
> int ret = -ENOENT;
>
> if (IS_ENABLED(CONFIG_ACPI_APEI_SEA))
> - ret = ghes_notify_sea();
> + ret = ghes_notify_abort(ACPI_HEST_NOTIFY_SEA);
>
> return ret;
> }
> diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
> index de14d49..47fcb0c 100644
> --- a/drivers/acpi/apei/Kconfig
> +++ b/drivers/acpi/apei/Kconfig
> @@ -54,6 +54,21 @@ config ACPI_APEI_SEA
> option allows the OS to look for such hardware error record, and
> take appropriate action.
>
> +config ACPI_APEI_SEI
> + bool "APEI Asynchronous SError Interrupt logging/recovering support"
> + depends on ARM64 && ACPI_APEI_GHES
> + default y
> + help
> + This option should be enabled if the system supports
> + firmware first handling of SEI (asynchronous SError interrupt).
> +
> + SEI happens with asynchronous external abort for errors on device
> + memory reads on ARMv8 systems. If a system supports firmware first
> + handling of SEI, the platform analyzes and handles hardware error
> + notifications from SEI, and it may then form a HW error record for
> + the OS to parse and handle. This option allows the OS to look for
> + such hardware error record, and take appropriate action.
> +
> config ACPI_APEI_MEMORY_FAILURE
> bool "APEI memory error recovering support"
> depends on ACPI_APEI && MEMORY_FAILURE
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 3eee30a..24b4233 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -815,43 +815,67 @@ static struct notifier_block ghes_notifier_hed = {
>
> #ifdef CONFIG_ACPI_APEI_SEA
> static LIST_HEAD(ghes_sea);
> +#endif
> +
> +#ifdef CONFIG_ACPI_APEI_SEI
> +static LIST_HEAD(ghes_sei);
> +#endif
>
> +#if defined(CONFIG_ACPI_APEI_SEA) || defined(CONFIG_ACPI_APEI_SEI)
> /*
> - * Return 0 only if one of the SEA error sources successfully reported an error
> - * record sent from the firmware.
> + * Return 0 only if one of the SEA or SEI error sources successfully
> + * reported an error record sent from the firmware.
> */
> -int ghes_notify_sea(void)
> +int ghes_notify_abort(u8 type)
> {
> struct ghes *ghes;
> + struct list_head *head = NULL;
> int ret = -ENOENT;
>
> - rcu_read_lock();
> - list_for_each_entry_rcu(ghes, &ghes_sea, list) {
> - if (!ghes_proc(ghes))
> - ret = 0;
> + if (type == ACPI_HEST_NOTIFY_SEA)
> + head = &ghes_sea;
> + else if (type == ACPI_HEST_NOTIFY_SEI)
> + head = &ghes_sei;
Surely if I only have one of CONFIG_ACPI_APEI_SE{A,I} this can't be compiled.
> +
> + if (head) {
> + rcu_read_lock();
> + list_for_each_entry_rcu(ghes, head, list) {
> + if (!ghes_proc(ghes))
> + ret = 0;
> + }
> + rcu_read_unlock();
> }
> - rcu_read_unlock();
> return ret;
> }
>
> -static void ghes_sea_add(struct ghes *ghes)
> +static void ghes_abort_add(struct ghes *ghes)
> {
> - mutex_lock(&ghes_list_mutex);
> - list_add_rcu(&ghes->list, &ghes_sea);
> - mutex_unlock(&ghes_list_mutex);
> + struct list_head *head = NULL;
> + u8 notify_type = ghes->generic->notify.type;
> +
> + if (notify_type == ACPI_HEST_NOTIFY_SEA)
> + head = &ghes_sea;
> + else if (notify_type == ACPI_HEST_NOTIFY_SEI)
> + head = &ghes_sei;
And here.
> +
> + if (head) {
> + mutex_lock(&ghes_list_mutex);
> + list_add_rcu(&ghes->list, head);
> + mutex_unlock(&ghes_list_mutex);
> + }
> }
>
> -static void ghes_sea_remove(struct ghes *ghes)
> +static void ghes_abort_remove(struct ghes *ghes)
> {
> mutex_lock(&ghes_list_mutex);
> list_del_rcu(&ghes->list);
> mutex_unlock(&ghes_list_mutex);
> synchronize_rcu();
> }
> -#else /* CONFIG_ACPI_APEI_SEA */
> -static inline void ghes_sea_add(struct ghes *ghes) { }
> -static inline void ghes_sea_remove(struct ghes *ghes) { }
> -#endif /* CONFIG_ACPI_APEI_SEA */
> +#else
> +static inline void ghes_abort_add(struct ghes *ghes) { }
> +static inline void ghes_abort_remove(struct ghes *ghes) { }
> +#endif
>
> #ifdef CONFIG_HAVE_ACPI_APEI_NMI
> /*
> @@ -1084,6 +1108,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
> goto err;
> }
> break;
> + case ACPI_HEST_NOTIFY_SEI:
> + if (!IS_ENABLED(CONFIG_ACPI_APEI_SEI)) {
> + pr_warn(GHES_PFX "Generic hardware error source: %d notified via SEI is not supported!\n",
> + generic->header.source_id);
> + goto err;
> + }
> + break;
> case ACPI_HEST_NOTIFY_NMI:
> if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
> pr_warn(GHES_PFX "Generic hardware error source: %d notified via NMI interrupt is not supported!\n",
> @@ -1153,7 +1184,8 @@ static int ghes_probe(struct platform_device *ghes_dev)
> break;
>
> case ACPI_HEST_NOTIFY_SEA:
> - ghes_sea_add(ghes);
> + case ACPI_HEST_NOTIFY_SEI:
> + ghes_abort_add(ghes);
> break;
> case ACPI_HEST_NOTIFY_NMI:
> ghes_nmi_add(ghes);
> @@ -1206,7 +1238,8 @@ static int ghes_remove(struct platform_device *ghes_dev)
> break;
>
> case ACPI_HEST_NOTIFY_SEA:
> - ghes_sea_remove(ghes);
> + case ACPI_HEST_NOTIFY_SEI:
> + ghes_abort_remove(ghes);
> break;
> case ACPI_HEST_NOTIFY_NMI:
> ghes_nmi_remove(ghes);
> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
> index 9061c5c..ec6f4ba 100644
> --- a/include/acpi/ghes.h
> +++ b/include/acpi/ghes.h
> @@ -118,6 +118,6 @@ static inline void *acpi_hest_get_next(struct acpi_hest_generic_data *gdata)
> (void *)section - (void *)(estatus + 1) < estatus->data_length; \
> section = acpi_hest_get_next(section))
>
> -int ghes_notify_sea(void);
> +int ghes_notify_abort(u8 type);
>
> #endif /* GHES_H */
>
4 years, 8 months
Re: [Devel] [PATCH v5 2/2] acpi: apei: Add SEI notification type support for ARMv8
by gengdongjiu
On 2017/10/18 18:04, Borislav Petkov wrote:
> On Wed, Oct 18, 2017 at 10:44:48AM +0100, James Morse wrote:
>> What should we call this thing?
> My only pet peeve is having abbreviations everywhere and nothing
> explaining them.
>
> So whatever you guys decide upon and as long as there's an explanation
> what those things mean and you stick with that name, is perfectly fine
> with me.
surely, we will, thanks Borislav's reminder.
4 years, 8 months
Re: [Devel] [PATCH v5 2/2] acpi: apei: Add SEI notification type support for ARMv8
by James Morse
Hi Borislav!
On 18/10/17 10:25, Borislav Petkov wrote:
> On Wed, Oct 18, 2017 at 05:17:27PM +0800, gengdongjiu wrote:
>> Thanks Borislav, can I write it as asynchronous exception or
>> asynchronous abort?
>
> WTF?!
Yup.
> The thing is abbreviated as "SEI" and apparently means "System Error
> Interrupt". Nothing else.
ARM has 'external abort', which are either synchronous or asynchronous, both are
delivered as different types of exception.
Asynchronous external abort is treated as a special kind of interrupt, 'SError
Interrupt', (where SError stands for System Error, but its rarely written like
that). 'SEI' is a relatively new abbreviation for SError interrupt.
What should we call this thing? In the ACPI code I'd prefer 'SEI' as that is
what the ACPI spec calls it. Here we are talking about an GHES notification.
But in the arm64 arch code this should be called SError Interrupt as this is
what the ARM-ARM calls it. This code cares about exception routing and interrupt
masking.
But, I don't really care.
Thanks,
James
4 years, 8 months
Re: [Devel] [PATCH v5 2/2] acpi: apei: Add SEI notification type support for ARMv8
by gengdongjiu
On 2017/10/18 17:06, Borislav Petkov wrote:
> On Wed, Oct 18, 2017 at 01:00:44PM +0800, gengdongjiu wrote:
>> SError is System Error, which is a asynchronous exception in the Internal CPU.
>>
>> In the ARM RAS Extension, there are mainly two type abort for CPU:
>> SEA(Synchronous External Abort)
>> SEI(SError Interrupt)
> And you're not writing it out as "System Error" because ...?
Thanks Borislav, can I write it as asynchronous exception or asynchronous abort?
4 years, 8 months
Re: [Devel] [PATCH v5 2/2] acpi: apei: Add SEI notification type support for ARMv8
by gengdongjiu
Hi Borislav,
On 2017/10/18 1:06, Borislav Petkov wrote:
> On Tue, Oct 17, 2017 at 04:02:21PM +0800, Dongjiu Geng wrote:
>> ARMv8.2 requires implementation of the RAS extension, in
>> this extension it adds SEI(SError Interrupt) notification
>> type, this patch adds new GHES error source SEI handling
>> functions. Because this error source parsing and handling
>> methods are similar with the SEA. So share some SEA handling
>> functions with the SEI
>>
>> Expose one API ghes_notify_abort() to external users. External
>> modules can call this exposed API to parse and handle the
>> SEA or SEI.
>>
>> Note: For the SEI(SError Interrupt), it is asynchronous external
>> abort, the error address recorded by firmware may be not accurate.
>> If not accurate, EL3 firmware needs to identify the address to a
>> invalid value.
>>
>> Cc: Borislav Petkov <bp(a)suse.de>
>> Cc: James Morse <james.morse(a)arm.com>
>> Signed-off-by: Dongjiu Geng <gengdongjiu(a)huawei.com>
>> Tested-by: Tyler Baicar <tbaicar(a)codeaurora.org>
>> Tested-by: Dongjiu Geng <gengdongjiu(a)huawei.com>
>> ---
>> arch/arm64/mm/fault.c | 4 +--
>> drivers/acpi/apei/Kconfig | 15 ++++++++++
>> drivers/acpi/apei/ghes.c | 71 ++++++++++++++++++++++++++++++++++-------------
>> include/acpi/ghes.h | 2 +-
>> 4 files changed, 70 insertions(+), 22 deletions(-)
>>
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index 2509e4f..c98c1b3 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -585,7 +585,7 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>> if (interrupts_enabled(regs))
>> nmi_enter();
>>
>> - ret = ghes_notify_sea();
>> + ret = ghes_notify_abort(ACPI_HEST_NOTIFY_SEA);
>>
>> if (interrupts_enabled(regs))
>> nmi_exit();
>> @@ -682,7 +682,7 @@ int handle_guest_sea(phys_addr_t addr, unsigned int esr)
>> int ret = -ENOENT;
>>
>> if (IS_ENABLED(CONFIG_ACPI_APEI_SEA))
>> - ret = ghes_notify_sea();
>> + ret = ghes_notify_abort(ACPI_HEST_NOTIFY_SEA);
>>
>> return ret;
>> }
>> diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
>> index de14d49..47fcb0c 100644
>> --- a/drivers/acpi/apei/Kconfig
>> +++ b/drivers/acpi/apei/Kconfig
>> @@ -54,6 +54,21 @@ config ACPI_APEI_SEA
>> option allows the OS to look for such hardware error record, and
>> take appropriate action.
>>
>> +config ACPI_APEI_SEI
>> + bool "APEI Asynchronous SError Interrupt logging/recovering support"
>
> What is "SError" ?
SError is System Error, which is a asynchronous exception in the Internal CPU.
In the ARM RAS Extension, there are mainly two type abort for CPU:
SEA(Synchronous External Abort)
SEI(SError Interrupt)
>
>> + depends on ARM64 && ACPI_APEI_GHES
>> + default y
>> + help
>> + This option should be enabled if the system supports
>> + firmware first handling of SEI (asynchronous SError interrupt).
>> +
>> + SEI happens with asynchronous external abort for errors on device
>> + memory reads on ARMv8 systems. If a system supports firmware first
>> + handling of SEI, the platform analyzes and handles hardware error
>> + notifications from SEI, and it may then form a HW error record for
>> + the OS to parse and handle. This option allows the OS to look for
>> + such hardware error record, and take appropriate action.
>> +
>> config ACPI_APEI_MEMORY_FAILURE
>> bool "APEI memory error recovering support"
>> depends on ACPI_APEI && MEMORY_FAILURE
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index 3eee30a..24b4233 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -815,43 +815,67 @@ static struct notifier_block ghes_notifier_hed = {
>>
>> #ifdef CONFIG_ACPI_APEI_SEA
>> static LIST_HEAD(ghes_sea);
>> +#endif
>> +
>> +#ifdef CONFIG_ACPI_APEI_SEI
>> +static LIST_HEAD(ghes_sei);
>> +#endif
>>
>> +#if defined(CONFIG_ACPI_APEI_SEA) || defined(CONFIG_ACPI_APEI_SEI)
>> /*
>> - * Return 0 only if one of the SEA error sources successfully reported an error
>> - * record sent from the firmware.
>> + * Return 0 only if one of the SEA or SEI error sources successfully
>> + * reported an error record sent from the firmware.
>> */
>> -int ghes_notify_sea(void)
>> +int ghes_notify_abort(u8 type)
>
> Adding "abort" everywhere makes it worse: what does this function now
> do, notify or abort or both?
This function is used to notify APEI driver to parse APEI table and do some
recovery, such as calling memrory_failure() to identify the address to a
poisoned memory and deliver SIGBUS to related application.
>
> Ditto for the remaining ones. Please think of a better name.
Ok, thanks for your good suggestion, I will consider to use a better name.
>
>
> ghes_notify_sei() sounds much better to me, for example. And then you
> can add a whole set of *_sei() functions similar to the *_sea() ones and
> then you don't have to do all that checking of the type but simply call
> the proper function set. They're not huge so that the duplication of
> code should be minimal.
Thanks for your suggestion again. I will flowing that and add a whole set
of *sei() functions.
>
4 years, 8 months