[MPTCP][PATCH v2 mptcp-next 00/10] ADD_ADDR: ports support
by Geliang Tang
v2:
- change mptcp_out_options's port field in CPU bype order.
- keep mptcp_options_received's port field in CPU bype order.
- add two new patches to simplify ADD_ADDR suboption writing.
- update mptcp_add_addr_len helper use adding up size.
- add more commit messages.
v1:
This series is the first version of ADD_ADDR ports support. I have solved
the listener problem which I mentioned at the meeting on 15th of October
by adding a new listening socket from the userspace (see patch 8). Up to
now this patchset works well.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/54
Geliang Tang (10):
mptcp: unify ADD_ADDR and echo suboptions writing
mptcp: unify ADD_ADDR and ADD_ADDR6 suboptions writing
mptcp: add port support for ADD_ADDR suboption writing
mptcp: use adding up size when get ADD_ADDR length
mptcp: add the outgoing ADD_ADDR port support
mptcp: send out dedicated packet for ADD_ADDR using port
mptcp: add port parameter for mptcp_pm_announce_addr
mptcp: deal with MPTCP_PM_ADDR_ATTR_PORT in PM netlink
selftests: mptcp: add port argument for pm_nl_ctl
selftests: mptcp: add testcases for ADD_ADDR with port
include/net/mptcp.h | 1 +
net/mptcp/options.c | 89 +++++++++++--------
net/mptcp/pm.c | 14 +--
net/mptcp/pm_netlink.c | 28 ++++--
net/mptcp/protocol.h | 36 +++++---
.../testing/selftests/net/mptcp/mptcp_join.sh | 26 +++++-
tools/testing/selftests/net/mptcp/pm_nl_ctl.c | 18 ++++
7 files changed, 153 insertions(+), 59 deletions(-)
--
2.26.2
1 year, 7 months
[PATCH net-next 0/6] mptcp: Miscellaneous MPTCP fixes
by Mat Martineau
This is a collection of small fixup and minor enhancement patches that
have accumulated in the MPTCP tree while net-next was closed. These are
prerequisites for larger changes we have queued up.
Patch 1 refines receive buffer autotuning.
Patches 2 and 4 are some minor locking and refactoring changes.
Patch 3 improves GRO and RX coalescing with MPTCP skbs.
Patches 5 and 6 add a sysctl for tuning ADD_ADDR retransmission timeout
and corresponding test code.
Florian Westphal (3):
mptcp: adjust mptcp receive buffer limit if subflow has larger one
mptcp: use _fast lock version in __mptcp_move_skbs
mptcp: split mptcp_clean_una function
Geliang Tang (2):
mptcp: add a new sysctl add_addr_timeout
selftests: mptcp: add ADD_ADDR timeout test case
Paolo Abeni (1):
tcp: propagate MPTCP skb extensions on xmit splits
include/net/mptcp.h | 21 ++++-
net/ipv4/tcp_output.c | 3 +
net/mptcp/ctrl.c | 14 +++
net/mptcp/pm_netlink.c | 8 +-
net/mptcp/protocol.c | 67 +++++++++----
net/mptcp/protocol.h | 1 +
tools/testing/selftests/net/mptcp/config | 10 ++
.../testing/selftests/net/mptcp/mptcp_join.sh | 94 ++++++++++++++-----
8 files changed, 171 insertions(+), 47 deletions(-)
base-commit: 1fb74191988fd1cc340c4b2fdaf4c47d2a7d1d17
--
2.29.2
1 year, 7 months
[MPTCP][PATCH mptcp-next 0/8] ADD_ADDR: ports support
by Geliang Tang
This series is the first version of ADD_ADDR ports support. I have solved
the listener problem which I mentioned at the meeting on 15th of October
by adding a new listening socket from the userspace (see patch 8). Up to
now this patchset works well.
TODO:
I added 2 octets padding in ADD_ADDR port suboption for alignment. (see
patch 1). We need to drop this padding.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/54
Geliang Tang (8):
mptcp: add ADD_ADDR port support for writing options
mptcp: add the outgoing ADD_ADDR port support
mptcp: send out ack for ADD_ADDR with port
mptcp: add port argument for mptcp_pm_announce_addr
mptcp: add the incoming ADD_ADDR port support
mptcp: add ADD_ADDR port support for netlink
selftests: mptcp: add ADD_ADDR port support for pm_nl_ctl
selftests: mptcp: add testcases for ADD_ADDR with port
include/net/mptcp.h | 1 +
net/mptcp/options.c | 80 +++++++++++++++----
net/mptcp/pm.c | 14 ++--
net/mptcp/pm_netlink.c | 28 +++++--
net/mptcp/protocol.h | 31 ++++---
.../testing/selftests/net/mptcp/mptcp_join.sh | 26 +++++-
tools/testing/selftests/net/mptcp/pm_nl_ctl.c | 18 +++++
7 files changed, 162 insertions(+), 36 deletions(-)
--
2.26.2
1 year, 7 months
[Weekly meetings] MoM - 29th of October 2020
by Matthieu Baerts
Hello everyone,
Today, we just had our 122th meeting with Mat and Ossama (Intel OTC),
Christoph (Apple), Paolo and Florian (RedHat) and myself (Tessares).
Thanks again for this new good meeting!
Here are the minutes of the meeting:
Accepted patches:
- The list of accepted patches can be seen on PatchWork:
https://patchwork.ozlabs.org/project/mptcp/list/?state=3
netdev (if mptcp ML is in cc) (/):
/
our repo (by: Florian Westphal, Geliang Tang, Matthieu Baerts,
Paolo Abeni):
1389109 [v3,mptcp-next] Squash to "mptcp: send out dedicated ADD_ADDR
packet"
1387031 [mptcp-next] Squash to "selftests: mptcp: add link failure test
case"
1386433 [mptcp-next,v3] mptcp: track window announced to peer
1385912 [mptcp-next] Squash to "selftests: mptcp: add link failure test
case"
1385644 [net-next] Squash-to: "mptcp: move page frag allocation in
mptcp_send...
1385380 [v5,mptcp-next,3/3] selftests: mptcp: add ADD_ADDR IPv6 test cases
1385379 [v5,mptcp-next,2/3] mptcp: send out dedicated ADD_ADDR packet
1385378 [v5,mptcp-next,1/3] mptcp: change add_addr_signal type
1385365 Squash to "selftests: mptcp: add ADD_ADDR timeout test case" v2
1385069 [net-next] Squash-to: "mptcp: refactor shutdown and close"
1384990 [net-next] mptcp: keep unaccepted MPC subflow into join list
1384773 [net] mptcp: add missing memory scheduling in the rx path
Pending patches:
- The list of pending patches can be seen on PatchWork:
https://patchwork.ozlabs.org/project/mptcp/list/?state=*
netdev (if mptcp ML is in cc) (by: Paolo Abeni):
[net] mptcp: add missing memory scheduling in the rx path
our repo (by: Florian Westphal, Geliang Tang, Paolo Abeni):
1370700: RFC: [RFC,2/4] tcp: move selected mptcp helpers to tcp.h/mptcp.h
1370701: RFC: [RFC,3/4] mptcp: add mptcp reset option support
1370702: RFC: [RFC,4/4] tcp: parse tcp options contained in reset packets:
- WIP
1375893: Under Review: [RFC,mptpcp-next] mptcp: add ooo prune support:
- Not sure if it will be needed
- best is certainly to wait for refactoring from Paolo
- keep it in mind for later → RFC
1387845: RFC: MPTCP stream performances:
- non trivial problem
- perf unstable in export branch
- receiver is not sending ACK when it should do
- also happening on net-next but problem less visible due to
different way we enqueue packets on the send side
- after we moved skb from subflow to mptcp, TCP window is (not
updated?). No hook so far in MPTCP side to solve this
- Better to read Paolo's report in the ML, there are more details
than here :)
- Paolo is working on a less hackish way to avoid dupacks (covering
the same TCP seq but having different MPTCP options):
- but perf are impacted → from 30Gbps to 22Gbps
- Paolo will share the patch
- Maybe a solution: remove the workqueue for the receive part.
Patch would be bigger.
1389827: New: [mptcp-next] mptcp: add mptcp_pm_should_add_signal_echo
helper:
- New helper (nice) but unclear why it is alone.
- is it to be squashed earlier? (yest it is → Done)
- is it to be applied before another one already in the export branch?
1389851: New: [mptcp-next,1/8] mptcp: add ADD_ADDR port support for
writing options
1389852: New: [mptcp-next,2/8] mptcp: add the outgoing ADD_ADDR port support
1389853: New: [mptcp-next,3/8] mptcp: send out ack for ADD_ADDR with port
1389854: New: [mptcp-next,4/8] mptcp: add port argument for
mptcp_pm_announce_addr
1389855: New: [mptcp-next,5/8] mptcp: add the incoming ADD_ADDR port support
1389856: New: [mptcp-next,6/8] mptcp: add ADD_ADDR port support for netlink
1389857: New: [mptcp-next,7/8] selftests: mptcp: add ADD_ADDR port
support for pm_nl_ctl
1389858: New: [mptcp-next,8/8] selftests: mptcp: add testcases for
ADD_ADDR with port:
- to be reviewed
- seems the cover-letter was not sent to the ML (resend by Geliang)
Issues on Github:
https://github.com/multipath-tcp/mptcp_net-next/issues/
Recently opened (latest from last week: 101)
104 [syzkaller] general protection fault in skb_release_data [bug]
[syzkaller]:
- still no reproducer
103 [syzkaller] WARNING in inet_csk_listen_stop [bug] [syzkaller]:
- TODO
Bugs (opened, flagged as "bug" and assigned)
94 Packetdrill: after a received DATA_FIN, no new packets can be
treated [bug] @dcaratti:
- WIP
85 Packetdrill: multiple timeout reported by the CI [bug] @matttbe:
- WIP
Bugs (opened and flagged as "bug" and not assigned)
104 [syzkaller] general protection fault in skb_release_data [bug]
[syzkaller]
103 [syzkaller] WARNING in inet_csk_listen_stop [bug] [syzkaller]
99 simult_flows selftest is unstable: remaining sockets in
TIME-WAIT state [bug]:
- can be closed, should be rare to have that
- normal to have the sockets in TIME-WAIT but we exceeded
expected time
70 [syzkaller] WARNING in mptcp_reset_timer [bug] [syzkaller]
65 clearing properly the status in listen() [bug]
56 msk connection state set without msk lock [bug]
In Progress (opened and assigned)
96 Python: add support for IPPROTO_MPTCP [enhancement] @matttbe
76 [gs]etsockopt per subflow: BPF [enhancement] @matttbe
54 ADD_ADDR: ports support [enhancement] @geliangtang
43 [syzkaller] Change syzkaller to exercise MPTCP inet_diag
interface [enhancement] [syzkaller] @cpaasch
Recently closed (since last week)
102 mptcp_pm_create_subflow_or_signal_addr: suspicious
rcu_dereference_check() usage [bug] @geliangtang
98 dss_ssn_specified_client packetdrill test fails (timeout) [bug]
@dcaratti
55 ADD_ADDR: IPv6 support [enhancement] @geliangtang
31 Allow MPTCP + SYN_COOKIES [enhancement]
FYI: Current Roadmap:
- Bugs: https://github.com/multipath-tcp/mptcp_net-next/projects/2
- Current merge window (5.11):
https://github.com/multipath-tcp/mptcp_net-next/projects/6
- For later: https://github.com/multipath-tcp/mptcp_net-next/projects/4
Commits to send to net-next:
- can be sent all together
- *Mat* can send those: (thanks!)
f98f5b9c8df3 selftests: mptcp: add ADD_ADDR timeout test case
54636b15a670 mptcp: add a new sysctl add_addr_timeout
d1d7fa643c41 mptcp: split mptcp_clean_una function
14dd19c7f04d tcp: propagate MPTCP skb extensions on xmit splits
4d4469b0b558 mptcp: use _fast lock version in __mptcp_move_skbs
dbb10e76de38 mptcp: adjust mptcp receive buffer limit if subflow has
larger one
Some issues with Fedora + NM + net-next:
- root cause is outside network apparently
- hopefully the next RC should be OK
Force apps to use MPTCP:
- package LD_PRELOAD script?
- but not easy for services (e.g. launched by systemd)
- BPF cgroup? No socket groups for the moment
- sysctl per NS? Might be a first solution but still people might
want to select only one app...
- *Mat* is going to share an RFC and we can discuss on the ML
Extra tests:
- news about Syzkaller? (Christoph):
- running the latest syzkaller without extra modifications
- news about interop with mptcp.org? (Christoph):
- /
- news about Intel's kbuild? (Mat):
- still a bit more mptcp_join.sh failures
- but still no more details
- packetdrill (Davide):
- /
- CI (Matth):
- /
Next meeting:
- We propose to have the next meeting on Thursday, the 6th of November.
- Usual UTC time: 16:00 UTC (8am PST, 5pm CET, Midnight CST)
- /!\ it looks like some regions change time next week-end, please
reply to this email if this is an issue for you!
- Still open to everyone!
- https://annuel2.framapad.org/p/mptcp_upstreaming_20201106
Feel free to comment on these points and propose new ones for the next
meeting!
Talk to you on Thursday,
Matt
--
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
1 year, 7 months
[MPTCP][PATCH v2 mptcp-next] Squash to "mptcp: change add_addr_signal type"
by Geliang Tang
This patch added a new helper mptcp_pm_should_add_signal_echo.
Reviewed-by: Mat Martineau <mathew.j.martineau(a)linux.intel.com>
Signed-off-by: Geliang Tang <geliangtang(a)gmail.com>
---
v2:
- make it as a squash-to patch.
- change 'Subject' from 'mptcp: add mptcp_pm_should_add_signal_echo helper'
to 'Squash to "mptcp: change add_addr_signal type"'
This patch will conflict with 'mptcp: send out dedicated ADD_ADDR packet':
"""
<<<<<<< HEAD
static inline bool mptcp_pm_should_add_signal_echo(struct mptcp_sock *msk)
{
return READ_ONCE(msk->pm.add_addr_signal) & BIT(MPTCP_ADD_ADDR_ECHO);
=======
static inline bool mptcp_pm_should_add_signal_ipv6(struct mptcp_sock *msk)
{
return READ_ONCE(msk->pm.add_addr_signal) & BIT(MPTCP_ADD_ADDR_IPV6);
>>>>>>> 20761fc28b53... mptcp: send out dedicated ADD_ADDR packet
}
"""
Please resolve the conflict like this:
"""
static inline bool mptcp_pm_should_add_signal_echo(struct mptcp_sock *msk)
{
return READ_ONCE(msk->pm.add_addr_signal) & BIT(MPTCP_ADD_ADDR_ECHO);
}
static inline bool mptcp_pm_should_add_signal_ipv6(struct mptcp_sock *msk)
{
return READ_ONCE(msk->pm.add_addr_signal) & BIT(MPTCP_ADD_ADDR_IPV6);
}
"""
---
net/mptcp/pm.c | 2 +-
net/mptcp/protocol.h | 5 +++++
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index bc6619670d37..c2c12f02a263 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -186,7 +186,7 @@ bool mptcp_pm_add_addr_signal(struct mptcp_sock *msk, unsigned int remaining,
if (!mptcp_pm_should_add_signal(msk))
goto out_unlock;
- *echo = READ_ONCE(msk->pm.add_addr_signal) & BIT(MPTCP_ADD_ADDR_ECHO);
+ *echo = mptcp_pm_should_add_signal_echo(msk);
if (remaining < mptcp_add_addr_len(msk->pm.local.family, *echo))
goto out_unlock;
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 977e74dc45eb..77eae8addc91 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -524,6 +524,11 @@ static inline bool mptcp_pm_should_add_signal(struct mptcp_sock *msk)
return READ_ONCE(msk->pm.add_addr_signal) & BIT(MPTCP_ADD_ADDR_SIGNAL);
}
+static inline bool mptcp_pm_should_add_signal_echo(struct mptcp_sock *msk)
+{
+ return READ_ONCE(msk->pm.add_addr_signal) & BIT(MPTCP_ADD_ADDR_ECHO);
+}
+
static inline bool mptcp_pm_should_rm_signal(struct mptcp_sock *msk)
{
return READ_ONCE(msk->pm.rm_addr_signal);
--
2.26.2
1 year, 7 months
[MPTCP][PATCH mptcp-next] mptcp: add mptcp_pm_should_add_signal_echo helper
by Geliang Tang
This patch added a new helper mptcp_pm_should_add_signal_echo.
Signed-off-by: Geliang Tang <geliangtang(a)gmail.com>
---
net/mptcp/pm.c | 2 +-
net/mptcp/protocol.h | 5 +++++
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index 83f59a428560..75c5040e8d5d 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -198,7 +198,7 @@ bool mptcp_pm_add_addr_signal(struct mptcp_sock *msk, unsigned int remaining,
if (!mptcp_pm_should_add_signal(msk))
goto out_unlock;
- *echo = READ_ONCE(msk->pm.add_addr_signal) & BIT(MPTCP_ADD_ADDR_ECHO);
+ *echo = mptcp_pm_should_add_signal_echo(msk);
if (remaining < mptcp_add_addr_len(msk->pm.local.family, *echo))
goto out_unlock;
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 7a5c16f176c0..d29c6a4749eb 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -529,6 +529,11 @@ static inline bool mptcp_pm_should_add_signal(struct mptcp_sock *msk)
return READ_ONCE(msk->pm.add_addr_signal) & BIT(MPTCP_ADD_ADDR_SIGNAL);
}
+static inline bool mptcp_pm_should_add_signal_echo(struct mptcp_sock *msk)
+{
+ return READ_ONCE(msk->pm.add_addr_signal) & BIT(MPTCP_ADD_ADDR_ECHO);
+}
+
static inline bool mptcp_pm_should_add_signal_ipv6(struct mptcp_sock *msk)
{
return READ_ONCE(msk->pm.add_addr_signal) & BIT(MPTCP_ADD_ADDR_IPV6);
--
2.26.2
1 year, 7 months
[PATCH net] mptcp: add missing memory scheduling in the rx path
by Paolo Abeni
When moving the skbs from the subflow into the msk receive
queue, we must schedule there the required amount of memory.
Try to borrow the required memory from the subflow, if needed,
so that we leverage the existing TCP heuristic.
Fixes: 6771bfd9ee24 ("mptcp: update mptcp ack sequence from work queue")
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
---
net/mptcp/protocol.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 185dacb39781..e7419fd15d84 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -274,6 +274,15 @@ static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk,
skb_ext_reset(skb);
skb_orphan(skb);
+ /* try to fetch required memory from subflow */
+ if (!sk_rmem_schedule(sk, skb, skb->truesize)) {
+ if (ssk->sk_forward_alloc < skb->truesize)
+ goto drop;
+ __sk_mem_reclaim(ssk, skb->truesize);
+ if (!sk_rmem_schedule(sk, skb, skb->truesize))
+ goto drop;
+ }
+
/* the skb map_seq accounts for the skb offset:
* mptcp_subflow_get_mapped_dsn() is based on the current tp->copied_seq
* value
@@ -301,6 +310,7 @@ static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk,
* will retransmit as needed, if needed.
*/
MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_DUPDATA);
+drop:
mptcp_drop(sk, skb);
return false;
}
--
2.26.2
1 year, 7 months
[RFC PATCH] mptcp: refine MPTCP-level ack scheduling
by Paolo Abeni
Send timely MPTCP-level ack is somewhat when the insertion
into the msk receive level is performed by the worker.
It needs TCP-level dup-ack to notify the MPTCP-level
ack_seq increase, as both the TCP-level ack seq and the
rcv window are unchanged.
We can actually avoid processing incoming data with the
worker, and let the subflow or recevmsg() send ack as needed.
When recvmsg() moves the skbs inside the msk receive queue,
the msk space is still unchanged, so tcp_cleanup_rbuf() could
end-up skipping TCP-level ack generation. Anyway, when
__mptcp_move_skbs() is invoked, a known amount of bytes is
going to be consumed soon: we update rcv wnd computation taking
them in account.
Additionally we need to explicitly trigger tcp_cleanup_rbuf()
when recvmsg() consumes a significant amount of the receive buffer.
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
---
net/mptcp/options.c | 1 +
net/mptcp/protocol.c | 74 +++++++++++++++++++-------------------------
net/mptcp/protocol.h | 7 +++++
net/mptcp/subflow.c | 4 +--
4 files changed, 40 insertions(+), 46 deletions(-)
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 248e3930c0cb..8a59b3e44599 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -530,6 +530,7 @@ static bool mptcp_established_options_dss(struct sock *sk, struct sk_buff *skb,
opts->ext_copy.ack64 = 0;
}
opts->ext_copy.use_ack = 1;
+ WRITE_ONCE(msk->old_wspace, __mptcp_space((struct sock *)msk));
/* Add kind/length/subtype/flag overhead if mapping is not populated */
if (dss_size == 0)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index a6bd06c724d5..0a618cd26df7 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -405,7 +405,7 @@ static void mptcp_set_timeout(const struct sock *sk, const struct sock *ssk)
mptcp_sk(sk)->timer_ival = tout > 0 ? tout : TCP_RTO_MIN;
}
-static void mptcp_send_ack(struct mptcp_sock *msk)
+static void mptcp_send_ack(struct mptcp_sock *msk, bool force)
{
struct mptcp_subflow_context *subflow;
@@ -413,8 +413,13 @@ static void mptcp_send_ack(struct mptcp_sock *msk)
struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
lock_sock(ssk);
- tcp_send_ack(ssk);
+ if (force)
+ tcp_send_ack(ssk);
+ else
+ tcp_cleanup_rbuf(ssk, 1);
release_sock(ssk);
+ if (!force)
+ break;
}
}
@@ -466,7 +471,7 @@ static bool mptcp_check_data_fin(struct sock *sk)
ret = true;
mptcp_set_timeout(sk, NULL);
- mptcp_send_ack(msk);
+ mptcp_send_ack(msk, true);
if (sock_flag(sk, SOCK_DEAD))
return ret;
@@ -489,7 +494,6 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
unsigned int moved = 0;
bool more_data_avail;
struct tcp_sock *tp;
- u32 old_copied_seq;
bool done = false;
int sk_rbuf;
@@ -506,7 +510,6 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
pr_debug("msk=%p ssk=%p", msk, ssk);
tp = tcp_sk(ssk);
- old_copied_seq = tp->copied_seq;
do {
u32 map_remaining, offset;
u32 seq = tp->copied_seq;
@@ -571,10 +574,7 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
}
} while (more_data_avail);
- *bytes += moved;
- if (tp->copied_seq != old_copied_seq)
- tcp_cleanup_rbuf(ssk, 1);
-
+ *bytes = moved;
return done;
}
@@ -678,19 +678,8 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
if (atomic_read(&sk->sk_rmem_alloc) > sk_rbuf)
goto wake;
- if (move_skbs_to_msk(msk, ssk))
- goto wake;
-
- /* mptcp socket is owned, release_cb should retry */
- if (!test_and_set_bit(TCP_DELACK_TIMER_DEFERRED,
- &sk->sk_tsq_flags)) {
- sock_hold(sk);
+ move_skbs_to_msk(msk, ssk);
- /* need to try again, its possible release_cb() has already
- * been called after the test_and_set_bit() above.
- */
- move_skbs_to_msk(msk, ssk);
- }
wake:
if (wake)
sk->sk_data_ready(sk);
@@ -1522,9 +1511,9 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied)
msk->rcvq_space.time = mstamp;
}
-static bool __mptcp_move_skbs(struct mptcp_sock *msk)
+static bool __mptcp_move_skbs(struct mptcp_sock *msk, unsigned int rcv)
{
- unsigned int moved = 0;
+ unsigned int moved_total = 0;
bool done;
/* avoid looping forever below on racing close */
@@ -1534,6 +1523,7 @@ static bool __mptcp_move_skbs(struct mptcp_sock *msk)
__mptcp_flush_join_list(msk);
do {
struct sock *ssk = mptcp_subflow_recv_lookup(msk);
+ unsigned int moved;
bool slowpath;
if (!ssk)
@@ -1541,12 +1531,17 @@ static bool __mptcp_move_skbs(struct mptcp_sock *msk)
slowpath = lock_sock_fast(ssk);
done = __mptcp_move_skbs_from_subflow(msk, ssk, &moved);
+ moved_total += moved;
+ if (moved_total && rcv) {
+ WRITE_ONCE(msk->rmem_pending, min(rcv, moved_total));
+ tcp_cleanup_rbuf(ssk, 1);
+ WRITE_ONCE(msk->rmem_pending, 0);
+ }
unlock_sock_fast(ssk, slowpath);
} while (!done);
- if (mptcp_ofo_queue(msk) || moved > 0) {
- if (!mptcp_check_data_fin((struct sock *)msk))
- mptcp_send_ack(msk);
+ if (mptcp_ofo_queue(msk) || moved_total > 0) {
+ mptcp_check_data_fin((struct sock *)msk);
return true;
}
return false;
@@ -1570,8 +1565,8 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
target = sock_rcvlowat(sk, flags & MSG_WAITALL, len);
__mptcp_flush_join_list(msk);
- while (len > (size_t)copied) {
- int bytes_read;
+ for (;;) {
+ int bytes_read, old_space;
bytes_read = __mptcp_recvmsg_mskq(msk, msg, len - copied);
if (unlikely(bytes_read < 0)) {
@@ -1583,9 +1578,14 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
copied += bytes_read;
if (skb_queue_empty(&sk->sk_receive_queue) &&
- __mptcp_move_skbs(msk))
+ __mptcp_move_skbs(msk, len - copied))
continue;
+ /* be sure to advertize window change */
+ old_space = READ_ONCE(msk->old_wspace);
+ if ((tcp_space(sk) - old_space) >= old_space)
+ mptcp_send_ack(msk, false);
+
/* only the master socket status is relevant here. The exit
* conditions mirror closely tcp_recvmsg()
*/
@@ -1638,7 +1638,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
/* .. race-breaker: ssk might have gotten new data
* after last __mptcp_move_skbs() returned false.
*/
- if (unlikely(__mptcp_move_skbs(msk)))
+ if (unlikely(__mptcp_move_skbs(msk, 0)))
set_bit(MPTCP_DATA_READY, &msk->flags);
} else if (unlikely(!test_bit(MPTCP_DATA_READY, &msk->flags))) {
/* data to read but mptcp_wait_data() cleared DATA_READY */
@@ -1870,7 +1870,6 @@ static void mptcp_worker(struct work_struct *work)
if (test_and_clear_bit(MPTCP_WORK_CLOSE_SUBFLOW, &msk->flags))
__mptcp_close_subflow(msk);
- __mptcp_move_skbs(msk);
if (mptcp_send_head(sk))
mptcp_push_pending(sk, 0);
@@ -2491,8 +2490,7 @@ static int mptcp_getsockopt(struct sock *sk, int level, int optname,
return -EOPNOTSUPP;
}
-#define MPTCP_DEFERRED_ALL (TCPF_DELACK_TIMER_DEFERRED | \
- TCPF_WRITE_TIMER_DEFERRED)
+#define MPTCP_DEFERRED_ALL (TCPF_WRITE_TIMER_DEFERRED)
/* this is very alike tcp_release_cb() but we must handle differently a
* different set of events
@@ -2510,16 +2508,6 @@ static void mptcp_release_cb(struct sock *sk)
sock_release_ownership(sk);
- if (flags & TCPF_DELACK_TIMER_DEFERRED) {
- struct mptcp_sock *msk = mptcp_sk(sk);
- struct sock *ssk;
-
- ssk = mptcp_subflow_recv_lookup(msk);
- if (!ssk || sk->sk_state == TCP_CLOSE ||
- !schedule_work(&msk->work))
- __sock_put(sk);
- }
-
if (flags & TCPF_WRITE_TIMER_DEFERRED) {
mptcp_retransmit_handler(sk);
__sock_put(sk);
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 7a5c16f176c0..fe24e480689e 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -221,10 +221,12 @@ struct mptcp_sock {
u64 rcv_data_fin_seq;
struct sock *last_snd;
int snd_burst;
+ int old_wspace;
atomic64_t snd_una;
atomic64_t wnd_end;
unsigned long timer_ival;
u32 token;
+ int rmem_pending;
unsigned long flags;
bool can_ack;
bool fully_established;
@@ -259,6 +261,11 @@ static inline struct mptcp_sock *mptcp_sk(const struct sock *sk)
return (struct mptcp_sock *)sk;
}
+static inline int __mptcp_space(const struct sock *sk)
+{
+ return tcp_space(sk) + READ_ONCE(mptcp_sk(sk)->rmem_pending);
+}
+
static inline struct mptcp_data_frag *mptcp_send_head(const struct sock *sk)
{
const struct mptcp_sock *msk = mptcp_sk(sk);
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 1e9a72af67dc..dcfb5c65924e 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -850,8 +850,6 @@ static void mptcp_subflow_discard_data(struct sock *ssk, struct sk_buff *skb,
sk_eat_skb(ssk, skb);
if (mptcp_subflow_get_map_offset(subflow) >= subflow->map_data_len)
subflow->map_valid = 0;
- if (incr)
- tcp_cleanup_rbuf(ssk, incr);
}
static bool subflow_check_data_avail(struct sock *ssk)
@@ -973,7 +971,7 @@ void mptcp_space(const struct sock *ssk, int *space, int *full_space)
const struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
const struct sock *sk = subflow->conn;
- *space = tcp_space(sk);
+ *space = __mptcp_space(sk);
*full_space = tcp_full_space(sk);
}
--
2.26.2
1 year, 7 months
[selftests] f2ff7f11f9: WARNING:suspicious_RCU_usage
by kernel test robot
Greeting,
FYI, we noticed the following commit (built with gcc-9):
commit: f2ff7f11f9a74842245db52d685bf9bc7ac2c4b1 ("selftests: mptcp: add ADD_ADDR IPv6 test cases")
https://github.com/multipath-tcp/mptcp_net-next.git export
in testcase: kernel-selftests
version: kernel-selftests-x86_64-b5a583fb-1_20201015
with following parameters:
group: kselftests-mptcp
ucode: 0xdc
test-description: The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel.
test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
on test machine: 8 threads Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz with 28G memory
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <lkp(a)intel.com>
[ 229.193156] WARNING: suspicious RCU usage
[ 229.197723] 5.9.0-13449-gf2ff7f11f9a7 #1 Tainted: G I
[ 229.204734] -----------------------------
[ 229.209277] include/net/sock.h:1915 suspicious rcu_dereference_check() usage!
[ 229.216990]
[ 229.216990] other info that might help us debug this:
[ 229.216990]
[ 229.226621]
[ 229.226621] rcu_scheduler_active = 2, debug_locks = 1
[ 229.234252] 3 locks held by kworker/2:1/64:
[ 229.239016] #0: ffff888100054938 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1be/0x580
[ 229.249063] #1: ffffc9000029fe58 ((work_completion)(&msk->work)){+.+.}-{0:0}, at: process_one_work+0x1be/0x580
[ 229.259745] #2: ffff888750f14c60 (sk_lock-AF_INET6){+.+.}-{0:0}, at: mptcp_worker+0x47/0x900
[ 229.268913]
[ 229.268913] stack backtrace:
[ 229.274409] CPU: 2 PID: 64 Comm: kworker/2:1 Tainted: G I 5.9.0-13449-gf2ff7f11f9a7 #1
[ 229.284150] Hardware name: Dell Inc. OptiPlex 7040/0Y7WYT, BIOS 1.2.8 01/26/2016
[ 229.292076] Workqueue: events mptcp_worker
[ 229.296740] Call Trace:
[ 229.299765] dump_stack+0x8d/0xb5
[ 229.303631] __sk_dst_check+0xa7/0xe0
[ 229.307860] inet6_csk_route_socket+0x1a5/0x440
[ 229.312963] ? inet6_csk_xmit+0x58/0x240
[ 229.317401] inet6_csk_xmit+0x58/0x240
[ 229.321703] __tcp_transmit_skb+0x571/0xc80
[ 229.326421] mptcp_pm_check_send_dedicated_add_addr_packet+0x4c/0x80
[ 229.333929] mptcp_pm_create_subflow_or_signal_addr+0x659/0x700
[ 229.340370] mptcp_worker+0x68a/0x900
[ 229.344553] process_one_work+0x23e/0x580
[ 229.349134] worker_thread+0x50/0x3c0
[ 229.353324] ? process_one_work+0x580/0x580
[ 229.358076] kthread+0x133/0x180
[ 229.361883] ? kthread_park+0xa0/0xa0
[ 229.366100] ret_from_fork+0x22/0x30
[ 234.629864] # 19 unused signal address IPv6 syn[ ok ] - synack[ ok ] - ack[ ok ]
[ 234.629869]
[ 234.646748] # add[fail] got 0 ADD_ADDR[s] expected 1
[ 234.646753]
[ 234.663869] # - echo [fail] got 0 ADD_ADDR echo[s] expected 1
[ 234.663874]
[ 234.672854] # Server ns stats
[ 234.672858]
[ 234.680940] # MPTcpExtMPCapableSYNRX 1 0.0
[ 234.680945]
[ 234.691230] # MPTcpExtMPCapableACKRX 1 0.0
[ 234.691234]
[ 234.701766] # MPTcpExtMPTCPRetrans 5 0.0
[ 234.701770]
[ 234.712131] # MPTcpExtDuplicateData 1 0.0
[ 234.712135]
[ 234.721565] # Client ns stats
[ 234.721569]
[ 234.728499] # MPTcpExtMPTCPRetrans 1 0.0
[ 234.728503]
[ 234.739086] # MPTcpExtDuplicateData 5 0.0
[ 234.739090]
[ 234.840939] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth1: link becomes ready
[ 234.878710] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth2: link becomes ready
[ 234.916540] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth3: link becomes ready
[ 234.954915] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth4: link becomes ready
[ 235.841647] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth1: link becomes ready
[ 241.248714] # 20 single address IPv6 syn[ ok ] - synack[ ok ] - ack[ ok ]
[ 241.248720]
[ 241.274168] # add[ ok ] - echo [ ok ]
[ 241.274174]
[ 241.397805] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth1: link becomes ready
[ 241.435917] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth2: link becomes ready
[ 241.474418] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth3: link becomes ready
[ 241.512504] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth4: link becomes ready
[ 242.433629] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth1: link becomes ready
[ 247.824119] # 21 signal address, ADD_ADDR6 timeout syn[ ok ] - synack[ ok ] - ack[ ok ]
[ 247.824125]
[ 247.850751] # add[ ok ] - echo [ ok ]
[ 247.850756]
[ 247.976931] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth1: link becomes ready
[ 248.014492] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth2: link becomes ready
[ 248.052311] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth3: link becomes ready
[ 248.090896] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth4: link becomes ready
[ 248.961779] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth1: link becomes ready
[ 254.389366] # 22 remove single address IPv6 syn[ ok ] - synack[ ok ] - ack[ ok ]
[ 254.389372]
[ 254.414626] # add[ ok ] - echo [ ok ]
[ 254.414631]
[ 254.439555] # rm [ ok ] - sf [ ok ]
[ 254.439560]
[ 254.563953] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth1: link becomes ready
[ 254.601725] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth2: link becomes ready
[ 254.639744] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth3: link becomes ready
[ 254.677730] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth4: link becomes ready
[ 255.553706] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth1: link becomes ready
[ 260.988434] # 23 remove subflow and signal IPv6 syn[ ok ] - synack[ ok ] - ack[ ok ]
[ 260.988440]
[ 261.014261] # add[ ok ] - echo [ ok ]
[ 261.014267]
[ 261.038447] # rm [ ok ] - sf [ ok ]
[ 261.038452]
[ 261.162474] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth1: link becomes ready
[ 261.200463] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth2: link becomes ready
[ 261.238952] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth3: link becomes ready
[ 261.276707] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth4: link becomes ready
[ 262.145701] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth1: link becomes ready
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
Thanks,
lkp
1 year, 7 months