Even after referenced patch there is still possible recursive callbacks
from evbuffer_drain(bev_input), i.e.:
wm_transfer() -> evbuffer_drain() -> wm_transfer()
inc(ctx->get)
But if we will increment ctx->get before drain that we will not add more
data to buffer.
Refs: 54c6fe3c ("regress_ssl: make ssl/bufferevent_wm_filter more fault-tolerance")
CI: https://ci.appveyor.com/project/nmathewson/libevent/build/job/f0rv299i71wnuxdq#L2546
Due to inplace callbacks (i.e. no BEV_OPT_DEFER_CALLBACKS) we cannot be
sure that wm_transfer() will not be called recursively and indeed it
still happens sometimes, and the referenced patch increase amount of
this times, especially for linux/poll.
Fixes: 66304a23cf748714159c988e78f35401c5352827 ("Fix
ssl/bufferevent_wm_filter when bev does not reach watermark on break")
EVHTTP_CON_READ_ON_WRITE_ERROR works only if an error already read from
the socket, but if we already got EPIPE on write we cannot read from the
socket anymore, and win32 does not guarantee that read will happens
before (although it happens from time to time).
In the referenced patch I just replaced callback with not expecting 417,
but like I already wrote, this is not always true (i.e. it is flacky).
Fixes: 3b581693ac1967f7f8d98491cb772a1b415eb4cd ("test/http:
read_on_write_error: fix it for win32")
For the ssl/bufferevent_wm* we have next configuration:
- payload_len = 1024
- wm_high = 5120
- limit = 40960
- to_read = 512
In this test we expect that with high watermark installed to "wm_high"
we will read "limit" bytes by reading "to_read" at a time, but adding
"payload_len" at a time (this "to_read"/"payload_len" limits is
installed to finally overflow watermark).
Once we read "limit" bytes we break, by disable EV_READ and reset
callbacks. Although this will not work if when we want to break we do
not reach watermark, this is because watermarks installs evbuffer
callback for the input buffer and if the watermark does not reached it
will enable EV_READ while be_openssl_enable() will read from the
underlying buffer (in case the openssl bufferevent created via
bufferevent_openssl_filter_new()) and call callback again (until it will
reach watermark or read al from the underlying buffer -- this is why it
stops in our caes).
And this is exactly what happened in win32, you can see this in the
following logs:
- win32 before:
OK C:\vagrant\test\regress_ssl.c:829: wm_transfer-client(00DC2750): in: 4608, out: 0, got: 40960
OK C:\vagrant\test\regress_ssl.c:834: wm_transfer-client(00DC2750): break
OK C:\vagrant\test\regress_ssl.c:829: wm_transfer-client(00DC2750): in: 4608, out: 0, got: 41472
OK C:\vagrant\test\regress_ssl.c:834: wm_transfer-client(00DC2750): break
OK C:\vagrant\test\regress_ssl.c:829: wm_transfer-client(00DC2750): in: 4608, out: 0, got: 41984
OK C:\vagrant\test\regress_ssl.c:834: wm_transfer-client(00DC2750): break
OK C:\vagrant\test\regress_ssl.c:829: wm_transfer-client(00DC2750): in: 4608, out: 0, got: 42496
OK C:\vagrant\test\regress_ssl.c:834: wm_transfer-client(00DC2750): break
- win32 after:
OK C:\vagrant\test\regress_ssl.c:821: wm_transfer-client(00FC26F0): break
OK C:\vagrant\test\regress_ssl.c:836: wm_transfer-client(00FC26F0): in: 4800, out: 0, got: 40960
- linux before:
OK ../test/regress_ssl.c:829: wm_transfer-client(0x55555566f5e0): in: 5120, out: 0, got: 40960
OK ../test/regress_ssl.c:834: wm_transfer-client(0x55555566f5e0): break
- linux after:
OK ../test/regress_ssl.c:821: wm_transfer-client(0x55555566f5e0): break
OK ../test/regress_ssl.c:836: wm_transfer-client(0x55555566f5e0): in: 5120, out: 0, got: 40960
(As you can see in linux case we already reach watermark hence it passed
before).
So fix the issue by breaking before draining.
But during fixing this I was thinking is this right? I.e. reading from
the be_openssl_enable(), maybe we should force deferred callbacks at
least?
Next tests uses fds without O_NONBLOCK flag
- main/free_active_base
- main/many_events
- et/et (has some other bits cleaned up by using TT_* flags and test
setup/cleanup callbacks)
And hence they will fail in debug mode (EVENT_DEBUG_MODE=):
Assertion flags & O_NONBLOCK failed in event_debug_assert_socket_nonblocking_
And use it in places where event_debug() should be called (since it
requires access to "event_debug_logging_mask_" and in win32 it is
tricky).
One of this places that is covered by this patch is the test for
event_debug().
One tricky bit is reply to the BIO_C_GET_FD command, since otherwise it
will try to close(0) and accepted bev in ssl/bufferevent_connect_sleep
will leak. Other seems more or less trivial.
This was done to make sure that for at least generic cases does not
leak (tricky cases was listed here nmathewson/Libevent#83).
And this will allow run ssl/.. with --no-fork
- http/basic_trailing_space -- covers cases when there is trailing space
after the request line (nginx handles this)
- http/simple_nonconformant -- covers non RFC3986 conformant URIs
Between start (setting "started_at") and comparing the time when
timeouts triggered with the start (test_timeval_diff_eq), there is too
much various things that can introduce extra delays and eventually could
fail the test on machine with shortage of CPU.
And this is exactly what happend on:
- travis-ci
- #262
Here is a simple reproducer that I came up with for this issue:
docker run --cpus=0.01 -e LD_LIBRARY_PATH=$PWD/lib -e PATH=/usr/bin:/bin:$PWD/bin -v $PWD:$PWD --rm -it debian:testing regress --no-fork --verbose bufferevent/bufferevent_timeout
Under limited CPU (see reproducer) the test almost always has problems
with that "write_timeout_at" exceed default timeval diff tolerance
(test_timeval_diff_eq() has 50 tolerance), i.e.:
FAIL ../test/regress_bufferevent.c:1040: assert(labs(timeval_msec_diff(((&started_at)), ((&res1.write_timeout_at))) - (100)) <= 50): 101 vs 50
But under some setup write timeout can even not triggered, and the
reason for this is that we write to the bufferevent 1024*1024 bytes, and
hence if evbuffer_write_iovec() will has some delay after writev() and
not send more then one vector at a time [1], it is pretty simple to
trigger, i.e.:
FAIL ../test/regress_bufferevent.c:1040: assert(labs(timeval_msec_diff(((&started_at)), ((&res1.write_timeout_at))) - (100)) <= 50): 1540155888478 vs 50
[1]: https://gist.github.com/azat/b72773dfe7549fed865d439e03de05c1
So this patch just send static small payload for all cases (plus a few
more asserts added).
The outcome of this patch is that all regression tests passed on
travis-ci for linux box [2]. While before it fails almost always [3].
Also reproducer with CPU limiting via docker also survive some
iterations (and strictly speaking it should has less CPU then travis-ci
workers I guess).
[2]: https://travis-ci.org/azat/libevent/builds/444391481
[3]: https://travis-ci.org/libevent/libevent/builds/444336505
There is only one test that uses service name getaddrinfo_async, which
manually check whether it works or not, other should not assume that it
is available and works.
There was already an attempt to overcome some possible limitations, like
lack of "http" in /etc/services in
d6bafbbeb27ff3943d6f3b6783bcded76384c31e ("test/dns: replace servname
since solaris does not have "http"")
When there is no /etc/services file evdns_getaddrinfo() will fail (with
service="ssh") and hence it will go to then "end" label with locked
rp.lock which in case of debug locking checks will bail with:
[err] ../evthread.c:220: Assertion lock->count == 0 failed in debug_lock_free
So add rp.locked flag, and unlock the lock before freeing it if it is in
locked state.
And here is how you can reproduce the issue:
$ docker run -e LD_LIBRARY_PATH=$PWD/lib -e PATH=/usr/bin:/bin:$PWD/bin -v $PWD:$PWD --rm -it debian:testing regress dns/getaddrinfo_race_gotresolve
(since debian:testing does not have /etc/services)
avoid warnings with any modern C99 compiler due to implicit function
declaration for pthread_create, as shown by the following :
test/regress_dns.c:2226:2: warning: implicit declaration of function
'pthread_create' is invalid in C99 [-Wimplicit-function-declaration]
THREAD_START(thread[0], race_base_run, &rp);
^
test/regress_thread.h:35:2: note: expanded from macro 'THREAD_START'
pthread_create(&(threadvar), NULL, fn, arg)
^
test/regress_dns.c:2226:2: warning: this function declaration is not a prototype
[-Wstrict-prototypes]
test/regress_thread.h:35:2: note: expanded from macro 'THREAD_START'
pthread_create(&(threadvar), NULL, fn, arg)
^
$ clang --version
Apple LLVM version 9.1.0 (clang-902.0.39.2)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
Closes: #686 (cherry-picked)
LibreSSL 2.7 implements OpenSSL 1.1 API except for BIO_get_init()
See also: https://bugs.freebsd.org/226900
Signed-off-by: Bernard Spil <brnrd@FreeBSD.org>
Closes: #617 (cherry-pick)
This is important, as otherwise clients can easily exhaust the file
descriptors available on a libevent HTTP server, which can cause
problems in other code which does not handle EMFILE well: for example,
see https://github.com/bitcoin/bitcoin/issues/11368Closes: #578 (patch cherry picked)
This is the second hunk of the first patch
5ff8eb26371c4dc56f384b2de35bea2d87814779 ("Fix crashing http server when
callback do not reply in place")
Fixes: #567
General http callback looks like:
static void http_cb(struct evhttp_request *req, void *arg)
{
evhttp_send_reply(req, HTTP_OK, "Everything is fine", NULL);
}
And they will work fine becuase in this case http will write request
first, and during write preparation it will disable *read callback* (in
evhttp_write_buffer()), but if we don't reply immediately, for example:
static void http_cb(struct evhttp_request *req, void *arg)
{
return;
}
This will leave connection in incorrect state, and if another request
will be written to the same connection libevent will abort with:
[err] ../http.c: illegal connection state 7
Because it thinks that read for now is not possible, since there were no
write.
Fix this by disabling EV_READ entirely. We couldn't just reset callbacks
because this will leave EOF detection, which we don't need, since user
hasn't replied to callback yet.
Reported-by: Cory Fields <cory@coryfields.com>
In function ‘send_a_byte_cb’:
test/regress.c:1853:2: warning: ignoring return value of ‘write’, declared with
attribute warn_unused_result [-Wunused-result]
(void) write(*sockp, "A", 1);