Before this patch http server don't knows when client disconnected until it
will try to write to it, IOW to detect is client still alive you need to write
something to client socket, however it is not convenient since it requires to
store all clients somewhere and poll them periodically, and I don't see any
regressions if we will leave EV_READ always (like libevhtp do), since we
already reset read callback in evhttp_write_buffer() (see
http/write_during_read).
Also since we don't disable EV_READ anymore we don't need some enable EV_READ,
so we will reduce number of epoll_ctl() calls.
Covered-by: http/terminate_chunked_oneshot
Covered-by: http/write_during_read
Fixes: #78
Before this patch every time we are retrying our request we resolve
domain, but we could optimize this (since UDP is slow) by using cached
conn_address value, so do this.
In http the only case when when we could store it is when we already
connected, *but* if we are doing request using domain name, then we need
to do request to nameserver to get IP address, and this is handled by
bufferevent.
So when we have IP address (from nameserver) and don't have connection
to this IP address, we could already cache it to avoid extra DNS
requests (since UDP is slow), and we can't do this from http layer, only
from bufferevent.
This will fix some invalid read/write:
==556== Invalid read of size 8
==556== at 0x4E4EEC6: event_queue_remove_timeout (minheap-internal.h:178)
==556== by 0x4E508AA: event_del_nolock_ (event.c:2764)
==556== by 0x4E53535: event_base_loop (event.c:3088)
==556== by 0x406FCFA: dispatch (libcrawl.c:271)
==556== by 0x402863: main (crawler.c:49)
==556== Address 0x68a3f18 is 152 bytes inside a block of size 400 free'd
==556== at 0x4C29C97: free (in /usr/local/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==556== by 0x406F140: renew (libcrawl.c:625)
==556== by 0x4E6CDE9: evhttp_connection_cb_cleanup (http.c:1331)
==556== by 0x4E6E2B2: evhttp_connection_cb (http.c:1424)
==556== by 0x4E4DF2D: bufferevent_writecb (bufferevent_sock.c:310)
==556== by 0x4E52D1D: event_process_active_single_queue (event.c:1584)
==556== by 0x4E53676: event_base_loop (event.c:1676)
==556== by 0x406FCFA: dispatch (libcrawl.c:271)
==556== by 0x402863: main (crawler.c:49)
But this one because of some invalid write before (I guess).
It is 100% reproduced during massive crawling (because this process
has many different servers), but after spending some time for trying to
reproduce this using some simple tests/utils I gave up for a few days (I
have a lot of work to do), but I'm sending this patch as a reminder.
Just in case, I've tried next tests:
- mixing timeouts/retries
- shutdown http server and return it back
- slow dns server for first request
- sleep before accept
- hacking libevent sources to change the behaviour of http layer (so it
will go into that function which I'm insterested in).
This patch provides the ability to receive a callback on the completion of a
request. The callback takes place immediately before the request's resources
are released.
evhttp_write_buffer() used by evhttp_send_reply_chunk() can take callback
executed when (part of) the buffer has been written. Using this callback to
schedule the next chunk avoids buffering large amounts of data in memory.
Basically tcp final handshake looks like this:
(C - client, S - server)
ACK[C] - FIN/ACK[S] - FIN/ACK[S] - ACK [C]
However there are servers, that didn't close connection like this,
while it is still _considered_ as valid, and using libevent http layer
we can do requests to such servers.
Modified handshake:
(C - client, S - server)
ACK[C] - RST/ACK[S] - RST/ACK[S]
And in this case we can't extract IP address from socket, because it is
already closed, and getpeername() will return: "transport endpoint is not connected".
So we need to store address that we are connecting to, after we know it,
and that is what this patch do.
I have reproduced it, however it have some extra packages.
(I will try to fix it)
https://github.com/azat/nfq-examples/blob/master/nfqnl_rst_fin.c
The evhttp_send_reply method invokes evhttp_write_buffer with a
callback that may release the underlying request object and
bufferevent upon completion. This cleanup callback is invoked by the
underlying bufferevent's write callback. Improperly enabling write
events before referencing the bufferevent could lead to use after free
and memory corruption.
It is useful to know why you callback called with NULL (i.e. it failed),
for example if you set max_body with evhttp_connection_set_max_body_size()
you must know that it failed because of body was longer than this size.
(Commit message tweaked by Nick)
This patch add check in evhttp_decode_uri_internal() that next 2 symbols
are exists in array of chars for decoding, if don't have two next 2
symbols don't try to decode '%FF'
Before this patch socket created before domain was resolved, and it
always create with AF_INET (ipv4), but we must create socket only after
domain was resolved to understad which protocol family have domain
address.
Thank to Patrick Pelletier, who found this bug.
This is needed to be able to read the response code line especially
when acting as an http client using evhttp_make_request.
(patched by nickm to make the return value const)
According to RFC2616:
All linear white space, including folding, has the same semantics
as SP. A recipient MAY replace any linear white space with a single
SP before interpreting the field value or forwarding the message
downstream.
If I understand the C standard correctly, you can't actually point
at a position immediately _before_ the start of an object; only at the
position immediately after.
According to J.2 in the standard, in its big list of undefined behavior:
"The behavior is undefined in the following circumstances:
...
— Addition or subtraction of a pointer into, or just beyond, an
array object and an integer type produces a result that does not
point into, or just beyond, the same array object (6.5.6)."
So we've got to fix rtrim to not do that. Also, make it unit tested,
and give it an evutil_*_ name.