-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Tie connection counter to the future/channel #1379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@stepancheg Great piece of work, thanks! Did it fix your issue? Could you please answer/address comments? |
Instead of manually tracking where connection counter should be released, just do it when future becomes done or connection becomes closed. I believe, code is safer against leaks this way.
9bd9b95
to
f5a4cb2
Compare
Don't know yet. Probably there was not issue in latest alpha. We used alpha1 before. So, now we are running our program with alpha9.
Sorry, what comments? BTW, I've rebased the patch against master. |
f5a4cb2
to
f79efc3
Compare
Updated patch to remove now unused |
Patched version seem not to leak counter, but it hard to say with 100% confidence after just an hour of running. |
channelManager.releaseChannelLock(prevKey); | ||
releasePartitionKeyLock(); | ||
|
||
throw new AssertionError("unreachable"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw new IllegalStateException("Trying to acquire partition lock twice. Please report.")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw new IllegalStateException("Trying to acquire partition lock concurrently. Please report.");
"Twice" is not proper word, because function is allowed to be called twice.
@@ -56,6 +58,7 @@ | |||
|
|||
private final long start = unpreciseMillisTime(); | |||
private final ChannelPoolPartitioning connectionPoolPartitioning; | |||
private final ChannelManager channelManager; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make sense to encapsulate all the connection limits code into a dedicated object. ChannelManager is too much of a god object. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like ConnectionSemaphore
. Implementing it now.
@@ -265,16 +265,10 @@ private Channel getOpenChannel(NettyResponseFuture<?> future, Request request, P | |||
|
|||
Object partitionKey = future.getPartitionKey(); | |||
|
|||
// we disable channelPreemption when performing next requests | |||
final boolean acquireChannelLock = !performingNextRequest; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're losing something here, but it think it's fine and it didn't make sense in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I overlooked it, and I agree, that it did something strange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original author's intent was to not require an extra semaphore when being redirected to another host. IMHO, too specific to be worth supporting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original author's intent was to not require an extra semaphore when being redirected to another host.
And that is actually wrong IMHO. For example, because of that behaviour you can run off your operating system file descriptor limit which could be configured to be almost equal to connection count limit.
f79efc3
to
103518c
Compare
Updated the patch with you requests. |
private final boolean maxConnectionsPerHostEnabled; | ||
private final ConcurrentHashMap<Object, NonBlockingSemaphore> freeChannelsPerHost = new ConcurrentHashMap<>(); | ||
|
||
private final ConnectionSemaphore connectionSemaphore; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be here :)
|
||
private AsyncHttpClientHandler wsHandler; | ||
|
||
public ChannelManager(final AsyncHttpClientConfig config, Timer nettyTimer) { | ||
|
||
this.config = config; | ||
|
||
connectionSemaphore = new ConnectionSemaphore(config); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
@stepancheg How did your tests went? Is the semaphore leak fixed? If so, I'm a bit frustrated that we don't know how the leak happened :) Are we sure that was just the semaphores leaking and not the connections too? Thanks a lot! |
Well, counters seems to be correct for last 8 hours. It means that either problem is fixed or system/network load changed. And at the same time I see incorrect semaphore values on different servers (with ligher load) on alpha9, on these servers probably semaphore is released extra time.
I think there were 10s of places where leak could happen. Because it is hard to manually manager locks/ref counters. The idea of the patch is to write code in a way that you don't have to check and recheck every line for corner cases and exception safety.
I think so. We've checked previosly with
OK, give me a couple of minutes. |
103518c
to
ae13225
Compare
Updated a patch, removed dead code. |
Great! I'll merge, try to backport on 2.0 too, and will release both 2.1-alpha and 2.0. |
I'd avoid backporting to 2.0, it's too dangerous IMO. I'd prefer if you released it as alpha, and people (in particular, our project) could perform more tests with this change. And even if this patch fixes issues, some behaiour could be changed, and users could rely on old behavior. |
I've just released 2.1.0-alpha10, it should be up on central in 20 min. Thanks a lot! |
Motivation: #1379 introduced ConnectionSemaphore to deal with connection limits. When no such limit is set, which is the default, we could bypass all this code. Modification: Don’t create ConnectionSemaphore when no limit is set and bypass all related code when it’s null. Result: More simple code path and possibly better performance
Instead of manually tracking where connection counter should be
released, just do it when future becomes done or connection becomes
closed.
I believe, code is safer against leaks this way.