On Fri, Aug 9, 2019 at 9:53 AM Harris, James R <james.r.harris(a)intel.com> wrote:
On 8/9/19, 7:57 AM, "SPDK on behalf of Chuck Tuffli"
<spdk-bounces(a)lists.01.org on behalf of ctuffli(a)gmail.com> wrote:
Yep, this was a bug in my code, but I'm curious as to why this breaks
SPDK. Essentially, the new module leveraged code that is itself
multi-threaded and that appears to be confusing/corrupting the SPDK
execution (BTW, having libunwind as an option helped immensely).
Moving the module's cleanup to be asynchronous fixed the double call
to module_fini. I vaguely remember seeing something in the
documentation or comments that suggested sleeping in certain contexts
was 'bad'. Any hypothesis as to what may have gone wrong when mixing
SPDK threads and other POSIX threads? TIA
Hi Chuck,
Can you explain more about the bug that was in your code? Specifically
what some of your non-SPDK threads may have been doing during the
context of your synchronous module_fini function.
The bugs were
a) needing to set spdk_bdev_module.async_fini = true
b) needing to return 1 from spdk_bdev_fn_table.destruct() to (evidently)
indicate it is async
From some tracing, it appears there are a few pthread operations which
caused a Linux schedule in both the bdev destruct and the module_fini:
- pthread_cond_broadcast()
- pthread_cond_signal()
- pthread_cond_wait()
Obviously, there are other things happening during clean up (e.g.
pthread mutex lock/unlock), but these seemed like good suspects given
that marking code asynchronous seemed to help. Could this explain what
I was seeing?
--chuck