Back when deferred_cb stuff had its own queue, the queue was always
executed, but we never ran more than 16 callbacks per iteration.
That made for two problems:
1: Because deferred_cb stuff would always run, and had no priority,
it could cause priority inversion.
2: It doesn't respect the max_dispatch_interval code.
Then, when I refactored deferred_cb to be a special case of
event_callback, that solved the above issues, but made for two more
issues:
3: Because deferred_cb stuff would always get the default priority,
it could could low-priority bufferevents to get too much priority.
4: With code like bufferevent_pair, it's easy to get into a
situation where two deferreds keep adding one another, preventing
the event loop from ever actually scanning for more events.
This commit fixes the above by giving deferreds a better notion of
priorities, and by limiting the number of deferreds that can be
added to the _current_ loop iteration's active queues. (Extra
deferreds are put into the active_later state.)
That isn't an all-purpose priority inversion solution, of course: for
that, you may need to mess around with max_dispatch_interval.
An event or event callback can now be in an additional state: "active
later". When an event is in this state, it will become active the
next time we run through the event loop. This lets us do what we
wanted to with deferred callbacks: make a type of active thing that
avoids infinite circular regress in a way that starves other events or
exhausts the stack. It improves on deferred callbacks by respecting
priorities, and by having a non-kludgy way to avoid event starvation.
The epoll interface ordinarily gives us one-millisecond
precision, so on Linux it makes perfect sense to use the
CLOCK_MONOTONIC_COARSE timer. But when the user has set the new
PRECISE_TIMER flag for an event_base (either by the
EVENT_BASE_FLAG_PRECISE_TIMER flag, or by the EVENT_PRECISE_TIMER
environment variable), they presumably want finer granularity.
On not-too-old Linuxes, we can achieve this using the Timerfd
mechanism, which accepts nanosecond granularity and understands
posix clocks. It's a little more expensive than just calling
epoll_wait(), so we won't do it by default.