As you can see right now linux workers has zero failed tests, while osx
workers has 18 failed tests:
[bufferevent_connect_hostname_emfile FAILED]
[bufferevent_pair_release_lock FAILED]
[bufferevent_timeout FAILED]
[bufferevent_timeout_filter FAILED]
[bufferevent_timeout_pair FAILED]
[common_timeout FAILED]
[del_wait FAILED]
[immediatesignal FAILED]
[loopexit FAILED]
[loopexit_multiple FAILED]
[monotonic_res FAILED]
[no_events FAILED]
[persistent_active_timeout FAILED]
[persistent_timeout_jump FAILED]
[signal_switchbase FAILED]
[signal_while_processing FAILED]
[simpletimeout FAILED]
[usleep FAILED]
And this patch should remove from this list time related failures
(though maybe not all of them).
AFAIR there is shortage of osx workers on travis-ci, IOW builds that
requires them can wait fair amount of time in the queue by just waiting.
Plus linux workers AFAICS can run multiple jobs in parallel (4-5), while
osx does not.
Hence if we do allow failures for osx (and right now they have a lot of
failed tests) let's mark build result based on tests under linux only.
So in a nut shell this will reduce build time from 5 hours to 20-30
minutes.
Between start (setting "started_at") and comparing the time when
timeouts triggered with the start (test_timeval_diff_eq), there is too
much various things that can introduce extra delays and eventually could
fail the test on machine with shortage of CPU.
And this is exactly what happend on:
- travis-ci
- #262
Here is a simple reproducer that I came up with for this issue:
docker run --cpus=0.01 -e LD_LIBRARY_PATH=$PWD/lib -e PATH=/usr/bin:/bin:$PWD/bin -v $PWD:$PWD --rm -it debian:testing regress --no-fork --verbose bufferevent/bufferevent_timeout
Under limited CPU (see reproducer) the test almost always has problems
with that "write_timeout_at" exceed default timeval diff tolerance
(test_timeval_diff_eq() has 50 tolerance), i.e.:
FAIL ../test/regress_bufferevent.c:1040: assert(labs(timeval_msec_diff(((&started_at)), ((&res1.write_timeout_at))) - (100)) <= 50): 101 vs 50
But under some setup write timeout can even not triggered, and the
reason for this is that we write to the bufferevent 1024*1024 bytes, and
hence if evbuffer_write_iovec() will has some delay after writev() and
not send more then one vector at a time [1], it is pretty simple to
trigger, i.e.:
FAIL ../test/regress_bufferevent.c:1040: assert(labs(timeval_msec_diff(((&started_at)), ((&res1.write_timeout_at))) - (100)) <= 50): 1540155888478 vs 50
[1]: https://gist.github.com/azat/b72773dfe7549fed865d439e03de05c1
So this patch just send static small payload for all cases (plus a few
more asserts added).
The outcome of this patch is that all regression tests passed on
travis-ci for linux box [2]. While before it fails almost always [3].
Also reproducer with CPU limiting via docker also survive some
iterations (and strictly speaking it should has less CPU then travis-ci
workers I guess).
[2]: https://travis-ci.org/azat/libevent/builds/444391481
[3]: https://travis-ci.org/libevent/libevent/builds/444336505
Otherwise we will try to open NULL filename and got EFAULT and anyway
will got the same return code from evdns_base_resolv_conf_parse_impl()
Closes: #680 (cherry-picked)
That was a bad idea, because all this patch did is just uses extra CPU
time on travis-ci workers and eventually fail the build anyway.
It also increases time of executing one matrix entry (x3).
This reverts commit 7004ee8e869faabe3a284fb3d7eb859fb50ef90c.
* regress-dns-fixes:
Do not rely on getservbyname() for most of the dns regression tests
Turn off dns/getaddrinfo_race_gotresolve by default
Fix an error for debug locking in dns/getaddrinfo_race_gotresolve
There is only one test that uses service name getaddrinfo_async, which
manually check whether it works or not, other should not assume that it
is available and works.
There was already an attempt to overcome some possible limitations, like
lack of "http" in /etc/services in
d6bafbbeb27ff3943d6f3b6783bcded76384c31e ("test/dns: replace servname
since solaris does not have "http"")
When there is no /etc/services file evdns_getaddrinfo() will fail (with
service="ssh") and hence it will go to then "end" label with locked
rp.lock which in case of debug locking checks will bail with:
[err] ../evthread.c:220: Assertion lock->count == 0 failed in debug_lock_free
So add rp.locked flag, and unlock the lock before freeing it if it is in
locked state.
And here is how you can reproduce the issue:
$ docker run -e LD_LIBRARY_PATH=$PWD/lib -e PATH=/usr/bin:/bin:$PWD/bin -v $PWD:$PWD --rm -it debian:testing regress dns/getaddrinfo_race_gotresolve
(since debian:testing does not have /etc/services)
Despite the presence of 'sys/queue.h' in some stdlib implementations
(i.e. uclibc) 'LIST_HEAD' macro can be missing. This fix defines this
macro in the same manner as was done previously for 'TAILQ_'.
Fixes: #539Closes: #639 (cherry-picked)
Backport: 2.1.9
* be-wm-overrun-v2:
Fix hangs due to watermarks overruns in bufferevents implementations
test: cover watermarks (with some corner cases) in ssl bufferevent
Fixes: #690
Some implementations of bufferevents (for example openssl) can overrun
read high watermark.
And after this if user callback will not drain enough data it will be
suspended (i.e. it will not be runned again anymore).
This is not the expecting behaviour as one may guess, since in this case
the data will never be read. Hence once we detected that the watermark
exceeded (even after calling user callback) we will schedule the
callback again.
This also can be fixed in bufferevent openssl implementation (by
strictly limiting how much data is added to the read buffer according to
read high watermark), but since this data is already available (and in
memory) there is no point in doing so.
avoid warnings with any modern C99 compiler due to implicit function
declaration for pthread_create, as shown by the following :
test/regress_dns.c:2226:2: warning: implicit declaration of function
'pthread_create' is invalid in C99 [-Wimplicit-function-declaration]
THREAD_START(thread[0], race_base_run, &rp);
^
test/regress_thread.h:35:2: note: expanded from macro 'THREAD_START'
pthread_create(&(threadvar), NULL, fn, arg)
^
test/regress_dns.c:2226:2: warning: this function declaration is not a prototype
[-Wstrict-prototypes]
test/regress_thread.h:35:2: note: expanded from macro 'THREAD_START'
pthread_create(&(threadvar), NULL, fn, arg)
^
$ clang --version
Apple LLVM version 9.1.0 (clang-902.0.39.2)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
Closes: #686 (cherry-picked)
This will help in investigations of win32 failures...
P.S. looks like there is no better way, since
"artifacts.*.publish_on_failure" hadn't been implemented yet (while was
requiested in 2015).
According to evdns.c, the result not freed by libevent after
the callback runs:
evdns_getaddrinfo_gotresolve()
{
...
data->user_cb(0, data->pending_result, data->user_data);
data->pending_result = NULL;
...
}
To reproduce, build with -fsanitize=address, add -g to the getopt
list in dns-example.c like in the current commit and run
dns-example -g google.com
Closes: #681 # cherry-picked
evdns_getaddrinfo() starts two parallel requests for A and AAAA record.
But if request is created from thread different from dns_base's, request of A record is
started immediately and may result in calling free_getaddrinfo_request() from
evdns_getaddrinfo_gotresolve() because `other_req' doesn't exist yet.
After that, request of AAAA record starts and finishes, and evdns_getaddrinfo_gotresolve()
is called again for structure that is already freed.
This commits adds locking into evdns_getaddrinfo() function.
First of all __func__ is not a macro, it is char[] array, so the code
that we had before in cmake, was incorrect, i.e.:
#if defined (__func__)
#define EVENT____func__ __func__
#elif defined(__FUNCTION__)
#define EVENT____func__ __FUNCTION__
#else
#define EVENT____func__ __FILE__
#endif
So just detect do we have __func__/__FUNCTION__ in configure/cmake
before build and define EVENT__HAVE___func__/EVENT__HAVE___FUNCTION__
to use the later to choose which should be used as a __func__ (if it is
not presented).
Closes: #644
* official/pr/671:
Capitalise project names consistently in README.md
Indent configure flag section to make markdown format them as code
Use https for resources that support it
Rewords awkward sentences in README.md
Fix typos in README.md
Otherwise the logs show a bunch of pointless progress, which is
really only useful for interactive sessions.
Signed-off-by: Philip Prindeville <philipp@redfish-solutions.com>
Closes: #651 (cherry-picked)
The script make-event-config.sed was mangling all the symbols by
prefixing them with "EVENT__". The problem here is that some
symbols aren't for local consumption within libevent, but rather
influence other system header files (ex: __USE_FILE_OFFSET64 is
used by dozens of header files including <sys/sendfile.h>).
As a workaround, all symbols starting with a capital letter only
(with the exception of STDC_HEADERS which must also be left
untouched) will be mangled.
Future contributors will need to be aware of this distinction.