Hi Wenhua,
Thanks. It would be better that you can reproduce your issue in an easy way then submit a
issue in github. Then the community can help you.
发自我的iPad
在 2020年8月30日,下午2:05,Wenhua Liu <liuw(a)vmware.com> 写道:
Hi Ziye,
I tested the patch you provided. It does not help. The problem still exists.
Thanks,
-Wenhua
On 8/26/20, 10:09 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:
Hi Wenhua,
Thanks for your continuing verification. So there should be some issues with zero copy
support in SPDK posix socket implementation in target side.
Best Regards
Ziye Yang
-----Original Message-----
From: Wenhua Liu <liuw(a)vmware.com>
Sent: Thursday, August 27, 2020 1:05 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: [SPDK] Re: Print backtrace in SPDK
Hi Ziye,
I have verified after disabling zero copy, the problem is gone. The following is the
change I made to disable zero copy.
spdk$ git diff module/sock/posix/posix.c diff --git a/module/sock/posix/posix.c
b/module/sock/posix/posix.c index 4eb1bf106..7b77289bb 100644
--- a/module/sock/posix/posix.c
+++ b/module/sock/posix/posix.c
@@ -53,9 +53,9 @@
#define MIN_SO_SNDBUF_SIZE (2 * 1024 * 1024) #define IOV_BATCH_SIZE 64
-#if defined(SO_ZEROCOPY) && defined(MSG_ZEROCOPY) -#define SPDK_ZEROCOPY
-#endif
+//#if defined(SO_ZEROCOPY) && defined(MSG_ZEROCOPY) //#define
+SPDK_ZEROCOPY //#endif
struct spdk_posix_sock {
struct spdk_sock base;
~/spdk$
With this change, I did VM power-on and shutdown 8 times and did not see a single
"Connection Reset by Peer" issue. Without the change, I did VM power-on and
shutdown 4 times, every time I saw at least one "Connection Reset by Peer" error
on every IO queue (4 IO queues in total).
Thanks,
-Wenhua
On 8/25/20, 9:51 PM, "Wenhua Liu" <liuw(a)vmware.com> wrote:
I did not check errno. The only thing I knew is _sock_flush returns -1 which is
the return value of sendmsg.
Thanks,
-Wenhua
On 8/25/20, 9:31 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:
Hi Wenhua,
What's error number when you see that sendmsg function returns -1 when you
use posix socket implmentation?
Best Regards
Ziye Yang
-----Original Message-----
From: Wenhua Liu <liuw(a)vmware.com>
Sent: Wednesday, August 26, 2020 12:27 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: [SPDK] Re: Print backtrace in SPDK
Hi Ziye,
Back to April/May, I used SPDK 20.01 (the first release supported FUSED
operation) in a VM and ran into this issue once in a while.
Recently, in order to test NVMe Abort, I updated the SPDK in that VM to 20.07
and I started seeing this issue consistently. Maybe this is because the change at our side
that makes the issue easier to reproduce.
I spent a lot time debugging this issue and found in wire data, the TCP/IP FIN
flag is set in the TCP packet in response to an NVME READ command. As FIN flag is set when
closing TCP connection. With this information, I found it's the function
nvmf_tcp_close_qpair close the TCP connection. To figure out how this function is called,
I wanted to print stack trace but could not find a way, so I sent an email to the SPDK
community asking for a solution. Later I used some other way and figured out the call path
which points where the problem happens.
I noticed the zero copy thing and tried to disable it but did not help (I can
try it again to confirm). I started thinking if my VM itself has problem. I set up another
VM with Ubuntu 20.04.1 and SPDK 20.07, but the problem still exists in this new target. As
I could not find how sendmsg works and I noticed there is a uring based socket
implementation. I wanted to give it a try so I asked you.
I will let you know if disabling zero copy will help.
Thanks,
-Wenhua
On 8/25/20, 6:52 PM, "Yang, Ziye" <ziye.yang(a)intel.com>
wrote:
Hi Wenhua,
Did you reproduce the issue you mentioned in last email with same VM
environment (OS) and same SPDK version? You mention that there is no issue with uring,
but there is issue with posix on the same SPDK version? Can you reproduce the issue with
latest version in SPDK master branch.
I think that the current difference with uring and posix is: For the posix
implementation, it uses the zero copy feature. Could you do some experiments to disable
the zero copy feature manually in posix.c like the following shows. Then we can firstly
eliminate whether there is issue with zero copy feature on the target side. Thanks.
#if defined(SO_ZEROCOPY) && defined(MSG_ZEROCOPY)
//#define SPDK_ZEROCOPY
#endif
Best Regards
Ziye Yang
-----Original Message-----
From: Wenhua Liu <liuw(a)vmware.com>
Sent: Wednesday, August 26, 2020 8:20 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: [SPDK] Re: Print backtrace in SPDK
Hi Ziye,
I'm using Ubuntu-20.04.1. The Linux kernel version seems to be 5.4.44
~spdk$ cat /proc/version_signature Ubuntu 5.4.0-42.46-generic 5.4.44 ~/spdk$
I downloaded, buit and installed liburing from source.
git clone
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub....
After switching to uring sock implementation, the "connection reset
by peer" problem is gone. I tried to power on and shutdown my testing VM and did not
see one single "connection reset by peer" issue. Before this, every time, I
powered on my testing VM, there were multiple "connection reset by peer"
failures happened.
Actually, I had this issue back to April/May. At that time, I could not
identify/corelate how the issue happened and did not drill down. This time, the issue
happened so frequently. This helped me dig out more information.
In summary, it seems the posix sock implementation may have some problem.
I'm not sure if this is generic or specific for running SPDK in VM. The issue might
also be related to our initiator implementation.
Thanks,
-Wenhua
On 8/24/20, 12:33 AM, "Yang, Ziye" <ziye.yang(a)intel.com>
wrote:
Hi Wenhua,
You need to compile spdk with --with-uring option. And you need to
1 Download the liburing and install it by yourself.
2 Check your kernel version. Uring socket implementation depends on
the kernel (> 5.4.3).
What's you kernel version in the VM?
Thanks.
Best Regards
Ziye Yang
-----Original Message-----
From: Wenhua Liu <liuw(a)vmware.com>
Sent: Monday, August 24, 2020 3:19 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: [SPDK] Re: Print backtrace in SPDK
Hi Ziye,
I'm using SPDK NVMe-oF target.
I used some other way and figured out the following call path:
posix_sock_group_impl_poll
-> _sock_flush <------------------ failed
-> spdk_sock_abort_requests
-> _pdu_write_done
-> nvmf_tcp_qpair_disconnect
-> spdk_nvmf_qpair_disconnect
-> _nvmf_qpair_destroy
-> spdk_nvmf_poll_group_remove
-> nvmf_transport_poll_group_remove
-> nvmf_tcp_poll_group_remove
-> spdk_sock_group_remove_sock
-> posix_sock_group_impl_remove_sock
-> spdk_sock_abort_requests
-> _nvmf_ctrlr_free_from_qpair
-> _nvmf_transport_qpair_fini
-> nvmf_transport_qpair_fini
-> nvmf_tcp_close_qpair
-> spdk_sock_close
The _sock_flush calls sendmsg to write the data to the socket.
It's sendmsg failing with return value -1. I captured wire data. In Wireshark, I can
see the READ command has been received by the target as a TCP packet. As the response to
this TCP packet, a TCP packet with FIN flag set is sent to the initiator. The FIN is to
close the socket connection.
I'm running SPDK target inside a VM. My NVMe/TCP initiator runs
inside another VM. I'm going to try with another SPDK target which runs on a physical
machine.
By the way, I noticed there is a uring based sock implementation, how
do I switch to this sock implementation. It seems the default is posix sock
implementation.
Thanks,
-Wenhua
On 8/23/20, 9:55 PM, "Yang, Ziye"
<ziye.yang(a)intel.com> wrote:
Hi Wenhua,
Which applications are you using from SPDK?
1 SPDK NVMe-oF target in target side?
2 SPDK NVMe perf or others?
For nvmf_tcp_close_qpair will be called in the following possible
cases (not all listed) for TCP transport. But it will be called by
spdk_nvmf_qpair_disconnect as the entry.
1 qpair is not in polling group
spdk_nvmf_qpair_disconnect
nvmf_transport_qpair_fini
2 spdk_nvmf_qpair_disconnect
....
_nvmf_qpair_destroy
nvmf_transport_qpair_fini
..
nvmf_tcp_close_qpair
3 spdk_nvmf_qpair_disconnect
....
_nvmf_qpair_destroy
_nvmf_ctrlr_free_from_qpair
_nvmf_transport_qpair_fini
..
nvmf_tcp_close_qpair
spdk_nvmf_qpair_disconnect is called by nvmf_tcp_qpair_disconnect
in tcp.c. nvmf_tcp_qpair_disconnect is called in the following cases:
(1) _pdu_write_done (if there is error for write);
(2) nvmf_tcp_qpair_handle_timeout.( No response from initiator in
30s if targets sends c2h_term_req)
(3) nvmf_tcp_capsule_cmd_hdr_handle. (Cannot get tcp req)
(4) nvmf_tcp_sock_cb. TCP PDU related handling issue.
Also in lib/nvmf/ctrlr.c Target side has a timer poller:
nvmf_ctrlr_keep_alive_poll. If there is no keep alive command sent from host, it will call
spdk_nvmf_qpair_disconnect in related polling group assoicated with the controller.
Best Regards
Ziye Yang
-----Original Message-----
From: Wenhua Liu <liuw(a)vmware.com>
Sent: Saturday, August 22, 2020 3:15 PM
To: Storage Performance Development Kit
<spdk(a)lists.01.org>
Subject: [SPDK] Print backtrace in SPDK
Hi,
Does anyone know if there is a function in SPDK that prints the
backtrace?
I run into a “Connection Reset by Peer” issue on host side when
testing NVMe/TCP. I identified it’s because some queue pairs are closed unexpectedly by
calling nvmf_tcp_close_qpair, but I could not figure out how/why this function is called.
I thought if the backtrace can be printed when calling this function, it might be helpful
to me to find the root cause.
Thanks,
-Wenhua
_______________________________________________
SPDK mailing list -- spdk(a)lists.01.org
To unsubscribe send an email to spdk-leave(a)lists.01.org
_______________________________________________
SPDK mailing list -- spdk(a)lists.01.org
To unsubscribe send an email to spdk-leave(a)lists.01.org
_______________________________________________
SPDK mailing list -- spdk(a)lists.01.org
To unsubscribe send an email to spdk-leave(a)lists.01.org
_______________________________________________
SPDK mailing list -- spdk(a)lists.01.org
To unsubscribe send an email to spdk-leave(a)lists.01.org
_______________________________________________
SPDK mailing list -- spdk(a)lists.01.org
To unsubscribe send an email to spdk-leave(a)lists.01.org
_______________________________________________
SPDK mailing list -- spdk(a)lists.01.org
To unsubscribe send an email to spdk-leave(a)lists.01.org
_______________________________________________
SPDK mailing list -- spdk(a)lists.01.org
To unsubscribe send an email to spdk-leave(a)lists.01.org
_______________________________________________
SPDK mailing list -- spdk(a)lists.01.org
To unsubscribe send an email to spdk-leave(a)lists.01.org
_______________________________________________
SPDK mailing list -- spdk(a)lists.01.org
To unsubscribe send an email to spdk-leave(a)lists.01.org
_______________________________________________
SPDK mailing list -- spdk(a)lists.01.org
To unsubscribe send an email to spdk-leave(a)lists.01.org
_______________________________________________
SPDK mailing list -- spdk(a)lists.01.org
To unsubscribe send an email to spdk-leave(a)lists.01.org
_______________________________________________
SPDK mailing list -- spdk(a)lists.01.org
To unsubscribe send an email to spdk-leave(a)lists.01.org