The "min_share" logic, which was designed to prevent piles of
extremely small writes when running up against a group rate limit,
could lead to confusing behavior if you ever set a min_share less
than your burst rate. If that happened, then as soon as your group
rate limit was exhausted, you'd stop reading/writing, and never
start again, since the amount readable/writeable would never
actually hit min_share.
We now cap min_share at the rate per tick.
Found by George Kadianakis
Someday, when networks are far faster and people frequently want a
burst value greater than 2GB per tick, this will seem very forsightful
indeed.
For now, it breaks ABI, but not source. Fixes bug 3092096.
Previously, whenever writing was disabled on a bufferevent_filter (or
a filtering SSL bufferevent), we would stop writing on the underlying
bufferevent. This would make for trouble, though, since if you
implemented common patterns like "stop writing once data X has been
flushed", your bufferevent filter would disable the underlying
bufferevent after the data was flushed to the underlying bufferevent,
but before actually having it written to the network.
Now, we have filters leave their underlying bufferevents enabled for
reading and writing for reading and writing immediately. They are not
disabled, unless the user wants to disable them, which is now allowed.
To handle the case where we want to choke reading on the underlying
bufferevent because the filter no longer wants to read, we use
bufferevent_suspend_read(). This is analogous to the way that we use
bufferevent_suspend_write() to suspend writing on a filtering
bufferevent when the underlying bufferevent's output buffer has hit
its high watermark.
Remember, the code
int is_less_than(int a, unsigned b) {
return a < b;
}
is buggy, since the C integer promotion rules basically turn it into
int is_less_than(int a, unsigned b) {
return ((unsigned)a) < b;
}
and we really want something closer to
int is_less_than(int a, unsigned b) {
return a < 0 || ((unsigned)a) < b;
}
.
Suggested by an example from Ralph Castain
The EVLOCK_LOCK and EVLOCK_UNLOCK macros already check to make sure
that the lock is present before messing with it, so there's no point
in checking the lock before calling them.
A good compiler should be able to simplify code like
if (lock) {
if (lock)
acquire(lock);
}
, but why count on it?
Once again, there's no reason to keep these lists in the order we were
keeping them in. Every time we needed to traverse them for any important
purpose, we started at a random point anyway.
The different bufferevent implementations had different behavior for
their timeouts. Some of them kept re-triggering the timeouts
indefinitely; some disabled the event immediately the first time a
timeout triggered. Some of them made the timeouts only count when
the bufferevent was actively trying to read or write; some did not.
The new behavior is modeled after old socket bufferevents, since
they were here first and their behavior is relatively sane.
Basically, each timeout disables the bufferevent's corresponding
read or write operation when it fires. Timeouts are stopped
whenever we suspend writing or reading, and reset whenever we
unsuspend writing or reading. Calling bufferevent_enable resets a
timeout, as does changing the timeout value.
This should end the family of bugs where we call bufferevent_free()
while a pending callback is holding a reference on the bufferevent,
and the callback tries to invoke the user callbacks before it releases
its own final reference.
This means that bufferevent_decref() is now a separate function from
bufferevent_free().
When we're doing a lookup in preparation for doing a connect, we
might have an unconnected socket on hand, and mustn't actually do
any reading or writing with it.
Most of these should be unable to fail, since adding a timeout
generally always works. Still, it's better not to try to be "too
smart for our own good here."
There are some remaining event_add() calls that I didn't add checks
for; I've marked those with "XXXX" comments.
The fairness algorithms are not the best, not every bufferevent type
is supported, and some of the locking tricks here are simply absurd.
Still, this code should be a good first step.
There are lots of things we do internally in bufferevents to indicate
"the user would like this operation to happen, but we aren't going to
try until some other condition goes away." Our logic here has gotten
entirely too complicated.
This patch tries to fix that by adding the idea of 'suspend flags' for
read and write. To say "don't bother reading or writing until
condition X no longer holds," bufferevent_suspend_read/write(bev,
BEV_SUSPEND_X). When X no longer holds, call
bufferevent_unsuspend_read/write(bev, BEV_SUSPEND_X).
Right now, only the read-watermark logic uses this.
Previously, our default lock model kind of assumed that every lock was
potentially a read-write lock. This was a poor choice, since
read-write locks are far more expensive than regular locks, and so the
lock API should only use them when we can actually take advantage of
them. Neither our pthreads or win32 lock implementation provided rw
locks.
Now that we have a way (not currently used!) to indicate that we
really want a read-write lock, we shouldn't actually say "lock this
for reading" or "lock this for writing" unless we mean it.
The entry points are evutil_getaddrinfo and evdns_getaddrinfo respectively.
There are fairly extensive unit tests.
I believe this code conforms to RFC3493 pretty closely, but there are
probably more issues. It should get tested on more platforms.
This code means we can dump the well-intentioned but weirdly-implemented
bufferevent_evdns and evutil_resolve code.
svn:r1537
This is code by Chris Davis, with changes to get the unit tests failing less aggressively.
The unit tests for this code do not completely pass yet; Chris is looking into that. If they aren't passing by the next release, I'll turn off this code.
svn:r1499
This function, bufferevent_socket_connect_hostname() can either use
evdns to do the resolve, or use a new function (evutil_resolve) that
uses getaddrinfo or gethostbyname, like http.c does now.
This function is meant to eventually replace the hostname resolution mess in
http.c.
svn:r1496
This code adds a new Bufferevent type that is only compiled when the
openssl library is present. It supports using an SSL object and an
event alert mechanism, which can either be an fd or an underlying
bufferevent.
There is still more work to do: the unit tests are incomplete, and we
need to support flush and shutdown much better. Sometimes events are
generated needlessly: this will hose performance.
There's a new encrypting proxy in sample/le-proxy.c.
This code has only been tested on OSX, and nowhere else.
svn:r1382
Rationale: we hold a lock on the bufferevent when its callbacks are
executing, so we need to release the lock afterwards. But the
callback might free the bufferevent, so unless we're holding a
reference on the bufferevent, the lock might not be there for us to
release.
svn:r1347