Hi Ben,
I revised a version and submitted it again.
Best Regards
Ziye Yang
-----Original Message-----
From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Walker, Benjamin
Sent: Monday, January 7, 2019 10:18 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] Some thoughts for the code in rdma.c
Hi Ziye,
The latest version of your patch (
) no
longer removes the refcnt, which is good. You've caught a real bug here with getting a
disconnect async event prior to assigning a qpair to a poll group. I made one suggestion
regarding the implementation, but good catch on this one.
Thanks,
Ben
-----Original Message-----
From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Yang, Ziye
Sent: Monday, December 24, 2018 7:19 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] Some thoughts for the code in rdma.c
Hi Sasha,
You can see this info:
https://github.com/spdk/spdk/issues/537
Best Regards
Ziye Yang
-----Original Message-----
From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Sasha
Kotchubievsky
Sent: Monday, December 24, 2018 2:13 PM
To: spdk(a)lists.01.org
Subject: Re: [SPDK] Some thoughts for the code in rdma.c
Hi,
If refcount mechanism doesn't work correct, it should be fixed. I
can't comment on that right now and should do some homework before.
It would be very appropriated if you elaborate, what happens in fio
at receiving "ctl+c" in target. I suppose, "ctrl+c" sends SIGINT to
the process.
So, if target receives SIGINT, fio crashes. That's correct ?
Best regards
Sasha
On 12/24/2018 7:49 AM, Yang, Ziye wrote:
> Hi Sasha,
>
> The problem I described is stately in:
>
https://review.gerrithub.io/#/c/spdk/spdk/+/437232/
>
> It is not for the performance but for the functionality.
> The first issue is: what I described in the patch : The thread send
> by the qpair-
>group->thread is NULL.
> The second issue is: The current usage of inc/dec_refcnt is not exactly correct.
For spdk_nvmf_qpair_disconnect can be also be directly called in the
upper layer (nvmf, for example, if the qpair cannot be added into the polling group).
However I found that the inc_refcnt is only called by the same thread,
but the dec_refcnt can be called by other threads. In the current
usage, it seems that we suggest that there will be qpair destroy from
the host side. However, it can be actively called by the NVMe-oF
target, however we lack of the matched inc_refcnt.
>
> Thanks.
>
>
>
>
> Best Regards
> Ziye Yang
>
>
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Sasha
> Kotchubievsky
> Sent: Monday, December 24, 2018 1:40 PM
> To: spdk(a)lists.01.org
> Subject: Re: [SPDK] Some thoughts for the code in rdma.c
>
> Hi Ziye,
>
> We can't ignore those events.
> We have 3 sources of events that triggers QP closing:
> - rdmacm events
> - driver's anync events
> - completions with error status
>
> Disconnect event from rdmacm is an indication of graceful shutdown
> from
initiator. Driver's events and completions with error are indication of errors.
Target can get only disconnect event, or combination of errors and
disconnect (see my example below).
> From performance perspective, "data path" (CQ polling) and
> "control path" (async events from driver) shouldn't be mixed in the
> same thread. Fetching async events involves kernel and should be
> separated from "data path". So, some coordination between threads is
> needed (refcount, for example)
>
> It's not clear for me, which problem you're dealing with. Maybe it
> can be
solved using current approach.
>
> Best regards
> Sasha
>
>
> On 12/24/2018 3:27 AM, Yang, Ziye wrote:
>> Hi Sasha,
>>
>> You mentioned that"
>>
>> " NVME-OF initiator closes QP in the middle of RDMA_READ/RDMA_WRITE
from target side and only then generate disconnect event in librdmacm.
Target gets 2 events: completion with error code and disconnect event
(doesn't matter in which order)".
>>
>> So my question is that: Could we ignore one of this event? (I mean
>> do nothing
for one event.) Since if there is qpair error, we will finally destroy the qpair.
>>
>> Thanks.
>>
>>
>>
>> Best Regards
>> Ziye Yang
>>
>> -----Original Message-----
>> From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Sasha
>> Kotchubievsky
>> Sent: Sunday, December 23, 2018 10:09 PM
>> To: spdk(a)lists.01.org
>> Subject: Re: [SPDK] Some thoughts for the code in rdma.c
>>
>> Hi Ziye,
>>
>> Which problem do you solve: performance, stability or remove code
complexity? Can you elaborate?
>>
>> If you ask me, the refcout, removed by your patch, is needed
>> because qp can
be disconnected in response for different events served in different
threads. I believe, some synchronization is required.
>>
>> For example, NVME-OF initiator closes QP in the middle of
RDMA_READ/RDMA_WRITE from target side and only then generate
disconnect event in librdmacm. Target gets 2 events: completion with
error code and disconnect event (doesn't matter in which order). In
that case, qp disconnect in target can be triggered by two events
coming from different sources and possible handled in different threads.
>>
>> Best regards
>>
>> Sasha
>>
>> On 12/21/2018 3:35 AM, Yang, Ziye wrote:
>>> Hi Ben,
>>>
>>> According to my knowledge, I think that the inc/dec_refcnt is
>>> still not
needed. I know that the qpair management related with the group (there
is a thread which manages the rdma_cm event and another is for
polling). The reason that caused this issue is that: we use the
spdk_send_msg to add the qpair to the group during the connection, but
we return the connection ready to the host side early. So there is a
window, that the host sends the disconnect, and there will be no group
binding to the qpair. My idea is that we do this checks in
spdk_nvmf_qpair_disconnect. If the qpair is not adding to the group, there will be two
reasons:
>>>
>>> 1 In the group adding operation, there is an issue, so the qpair
>>> is freed. So
we do not need to handle the disconnect and free the qpair again.
>>> 2 The qpair is still not added to the group.
>>>
>>> So I suggest if the qpair is not added in the case 2, we should
>>> deny the disconnect operation or delay such operation. (This is
>>> related with exceptional case, I think this handling is still reasonable.
>>> For normal operation, it will not do the connect, and disconnect
>>> with nothing operation. So deny the disconnect request this time
>>> is OK, and the initiator can do the disconnect later, and we can
>>> make sure the disconnect will only be accepted that the qpair is
>>> binding to the group. And I do not think that this strategy to
>>> handle the exceptional case will have any side effect.)
>>>
>>> BTW, only checking the qpair->group is not sufficient, but we can
>>> have
other state of the qpair (which is already defined in struct
spdk_nvmf_qpair_state).
>>>
>>> I would like to still post some patches to remove the
>>> inc/def_refcnt (may be
as test), and let those patches tested by our validation team to see
whether this really can solve this issue.
>>>
>>>
>>>
>>>
>>> Best Regards
>>> Ziye Yang
>>>
>>> -----Original Message-----
>>> From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Walker,
>>> Benjamin
>>> Sent: Friday, December 21, 2018 1:16 AM
>>> To: spdk(a)lists.01.org
>>> Subject: Re: [SPDK] Some thoughts for the code in rdma.c
>>>
>>> On Thu, 2018-12-20 at 02:59 +0000, Yang, Ziye wrote:
>>>> Hi all,
>>>>
>>>> I would like to discuss the following functions in
>>>> /lib/nvmf/rdma.c
>>>>
>>>> 1 I do not think that the _nvmf_rdma_disconnect_retry function is
necessary.
>>>> This function is used in nvmf_rdma_disconnect. And there is a
>>>> description as
>>>> follows:
>>>>
>>>> /* Read the group out of the qpair. This is
>>>> normally set and accessed only from
>>>> * the thread that created the group. Here,
>>>> we're not on that thread necessarily.
>>>> * The data member qpair->group begins it's life
>>>> as NULL and then is assigned to
>>>> * a pointer and never changes. So fortunately
>>>> reading this and checking for
>>>> * non-NULL is thread safe in the x86_64 memory
model.
>>>> */
>>>>
>>>> But for group adding for the qpair, it is related
>>>> with this
>>>> function: spdk_nvmf_poll_group_add, if the group is not ready,
>>>> we can just call spdk_nvmf_qpair_disconnect in that function. And
>>>> the spdk_nvmf_qpair_disconnect should have the strategy to
>>>> prevent the second time entering if the qpair is not destroyed due to
async behavior.
>>> There are two threads involved here typically - a management
>>> thread that is
polling the RDMA CM Event channel and for IBV async events (which are
independent things), and the poll group thread that owns the qpair.
>>>
>>> The RDMA CM Event channel notifies the target of two relevant
>>> events for a qpair
>>> - connect and disconnect. The IBV async event mechanism
>>> independently
notifies the target of IBV_EVENT_QP_FATAL conditions. The target's
response to both an RDMA_CM_EVENT_DISCONNECT and to an
IBV_EVENT_QP_FATAL is to disconnect the queue pair, but there are
scenarios where we could legitimately get both events for the same
qpair. The processing of both of these events involves passing a
message to the poll group thread, which is an asynchronous process and
may take an arbitrarily long time. The refcnt coordinates between
those two events, such that the first event does not destroy the qpair while there is
another event in- flight still.
>>>
>>> The qpair itself, after the initial set up phase, is also managed
>>> by the poll
group thread. If the user destroys a poll group, or manipulates the
poll group to add or remove a different qpair, or the qpair gets a
completion entry that it begins processing, then the qpair data
structure is being modified by the poll group thread. You can't
disconnect it on the management thread while this is happening or you
risk corrupting some of the data structures - in particular the poll group's qpair
list.
>>>
>>> Checking that the qpair hasn't been assigned to a poll group yet
>>> (by looking
for qpair.group set to NULL) isn't a sufficient protection because a
message to add the qpair to the poll group may already be in-flight. I
made some suggestions to use the refcnt to deal with this scenario on
your most recent
patch:
>>>
>>>
https://review.gerrithub.io/#/c/spdk/spdk/+/437232/
>>>
>>>> 2 If that, the following two functions in rdma.c are also unnecessary:
>>>>
>>>> spdk_nvmf_rdma_qpair_inc_refcnt
>>>> spdk_nvmf_rdma_qpair_dec_refcnt
>>>>
>>>>
>>>> Generally, my idea is that, we should use
>>>> spdk_nvmf_qpair_disconnect directly in every transport, and do
>>>> not use two many async messaging for doing this because I do not
>>>> think those
are necessary.
>>>>
>>>> Thanks.
>>>>
>>>>
>>>>
>>>>
>>>> Best Regards
>>>> Ziye Yang
>>>>
>>>> _______________________________________________
>>>> SPDK mailing list
>>>> SPDK(a)lists.01.org
>>>>
https://lists.01.org/mailman/listinfo/spdk
>>> _______________________________________________
>>> SPDK mailing list
>>> SPDK(a)lists.01.org
>>>
https://lists.01.org/mailman/listinfo/spdk
>>> _______________________________________________
>>> SPDK mailing list
>>> SPDK(a)lists.01.org
>>>
https://lists.01.org/mailman/listinfo/spdk
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>>
https://lists.01.org/mailman/listinfo/spdk
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>>
https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
>
https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
>
https://lists.01.org/mailman/listinfo/spdk
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org