Bugzilla – Bug 1216858
python-eventlet tests fail with kernel 6.6
Last modified: 2023-11-27 14:00:21 UTC
Created attachment 870621 [details] BAD strace -rT -e trace=network,/epoll Currently at: https://build.opensuse.org/package/live_build_log/openSUSE:Factory:Staging:H/python-eventlet/standard/x86_64 > =================================== FAILURES =================================== > _______________________ TestGreenSocket.test_full_duplex _______________________ > > self = <tests.greenio_test.TestGreenSocket testMethod=test_full_duplex> > > def test_full_duplex(self): > ... > > large_evt.wait() > > tests/greenio_test.py:424: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > eventlet/greenthread.py:181: in wait > return self._exit_event.wait() > eventlet/event.py:125: in wait > result = hub.switch() ... > E tests.TestIsTakingTooLong: 1 > > eventlet/hubs/hub.py:313: TestIsTakingTooLong to this commit. With the commit, the test takes > 1.5 s. Without the commit it takes only < 300 ms. And they set timeout to 1 s. The reduced self-stadning test case: #!/usr/bin/python3 import eventlet from eventlet.green import select, socket, time, ssl def bufsized(sock, size=1): """ Resize both send and receive buffers on a socket. Useful for testing trampoline. Returns the socket. >>> import socket >>> sock = bufsized(socket.socket(socket.AF_INET, socket.SOCK_STREAM)) """ sock.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, size) sock.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, size) return sock def min_buf_size(): """Return the minimum buffer size that the platform supports.""" test_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) test_sock.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 1) return test_sock.getsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF) def test_full_duplex(): large_data = b'*' * 10 * min_buf_size() listener = socket.socket(socket.AF_INET, socket.SOCK_STREAM) listener.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) listener.bind(('127.0.0.1', 0)) listener.listen(50) bufsized(listener) def send_large(sock): sock.sendall(large_data) def read_large(sock): result = sock.recv(len(large_data)) while len(result) < len(large_data): result += sock.recv(len(large_data)) assert result == large_data def server(): (sock, addr) = listener.accept() sock = bufsized(sock) send_large_coro = eventlet.spawn(send_large, sock) eventlet.sleep(0) result = sock.recv(10) expected = b'hello world' while len(result) < len(expected): result += sock.recv(10) assert result == expected send_large_coro.wait() server_evt = eventlet.spawn(server) client = socket.socket(socket.AF_INET, socket.SOCK_STREAM) client.connect(('127.0.0.1', listener.getsockname()[1])) bufsized(client) large_evt = eventlet.spawn(read_large, client) eventlet.sleep(0) client.sendall(b'hello world') server_evt.wait() large_evt.wait() client.close() test_full_duplex() ===================================== I bisected it to: commit dfa2f0483360d4d6f2324405464c9f281156bd87 Author: Eric Dumazet <edumazet@google.com> Date: Mon Jul 17 15:29:17 2023 +0000 tcp: get rid of sysctl_tcp_adv_win_scale
Created attachment 870622 [details] GOOD strace -rT -e trace=network,/epoll
Upstream report: https://lore.kernel.org/all/738bb6a1-4e99-4113-9345-48eea11e2108@kernel.org/ Eric says: Retracting TCP windows has always been problematic. If we really want to be very gentle, this could add more logic, shorter timer events for pathological cases like that, I am not sure this is really worth it, especially if dealing with one million TCP sockets in this state.
How about my suggestion to set SO_SNDBUF and SO_RCVBUF before listen() and connect()? Would that be an acceptable solution or does the test depend on setting the buffer sizes this late?
This seems to go the way the test is broken: It seems the test had some expectations. Setting a small (1 byte) RCVBUF/SNDBUF, and yet expecting to send 46080 bytes fast enough was not reasonable. It might have relied on the fact that tcp sendmsg() can cook large GSO packets, even if sk->sk_sndbuf is small. With tight memory settings, it is possible TCP has to resort on RTO timers (200ms by default) to recover from dropped packets. (In reply to Michal Kubeček from comment #3) > How about my suggestion to set SO_SNDBUF and SO_RCVBUF before listen() and > connect()? Would that be an acceptable solution or does the test depend on > setting the buffer sizes this late? And this ^^
For the record, actual value of SO_SNDBUF is 4608, as seen in the strace output. I would assume SO_RCVBUF will also be something like that.
How can I help with the needinfo to me? Not clear to me. I think we should have this conversation with eventlet upstream? I can disable the test if it is turning out to be problematic
I think we (I picked you likely beacuse of maintainership of the pkd) can do not much about it: https://github.com/eventlet/eventlet/issues/821