When a nameserver is down, we periodically try sending a "probe"
message to that nameserver to see if it has come back up. If a
nameserver comes up, we cancel any pending probe messages.
Cancelling a probe message while handling the probe's response would
result in a access-after-free or a double-free, so when we notice that
we're about to call a nameserver up because of having received a probe
from it, we need to check whether current response is the response
from the probe.
There was a case where we didn't to that, though: when the resolver
gave us an unusual error response to our request that it resolve
google.com. This is pretty rare, but apparently it can happen with
some weird cacheing nameservers -- the one on the mikrotik router, for
example. Without this patch, we would crash with a NULL pointer
derefernce.
Thanks to Hannes Sowa for finding this issue and helping me track it
down.
We were doing this because of (correct) reports that NetBSD gives an
EBADF when you try to add the write side of a pipe for which the
read side has been closed. But on most kqueue platforms, that
doesn't happen, and on *all* kqueue platforms, reporting a
nonexistent fd (which we usually have if we have seen EBADF) as
readable tends to give programs a case of the vapors.
Nicholas Marriott wrote the original patch here; I did the comment
fixes.
This is not a perfect fix, but it's much much better than the
current buggy behavior, which could lead to filtering SSL
connections that just stopped reading.
Based on ideas by Maseeb Abdul Qadir and Mark Ellzey.
We used to use the needs_reinit flag in struct eventop to indicate
whether an event backend had shared state across a fork(), and
therefore would require us to construct a new event backend. But
when we realized that the signal notification fds and the thread
notification fds would always be shared across forks, we stopped
looking at it.
This patch restores the old behavior so that poll, select, and
win32select don't need to do a linear scan over all pending
fds/signals when they do a reinit. Their life is hard enough
already.
Previously, event_reinit required a bunch of really dubious hacks,
and violated a lot of abstraction barriers to mess around with lists
of internal events and "pretend" to re-add them.
The new (and fairly well commented!) implementation tries to be much
smarter, by isolating the changes as much as possible to the backend
state, and minimizing the amount of abstraction violations.
Specifically, we now use event_del() to remove events we want to
remove, rather than futzing around with queues in event_reinit().
To avoid bogus calls to evsel->del(), we temporarily replace evsel
with a null-object stub.
Also, we now push the responsibility for calling evsel->add() down
into the evmap code, so that we don't actually need to unlink and
re-link all of our events.