Skip to content

Tie connection counter to the future/channel #1379

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 30, 2017

Conversation

stepancheg
Copy link
Contributor

Instead of manually tracking where connection counter should be
released, just do it when future becomes done or connection becomes
closed.

I believe, code is safer against leaks this way.

@slandelle
Copy link
Contributor

@stepancheg Great piece of work, thanks! Did it fix your issue? Could you please answer/address comments?

Instead of manually tracking where connection counter should be
released, just do it when future becomes done or connection becomes
closed.

I believe, code is safer against leaks this way.
@stepancheg
Copy link
Contributor Author

Did it fix your issue?

Don't know yet. Probably there was not issue in latest alpha. We used alpha1 before.

So, now we are running our program with alpha9.

Could you please answer/address comments?

Sorry, what comments?

BTW, I've rebased the patch against master.

@stepancheg
Copy link
Contributor Author

No, seems like alpha9 is also leaking:

2017-03-29_2137

And funny thing is that on another server it seems to leak in different direction: number of permits + number of open connection exceeds configured max connections.

Now going to try patched version.

@stepancheg
Copy link
Contributor Author

Updated patch to remove now unused partitionKey channel attribute.

@stepancheg
Copy link
Contributor Author

Patched version seem not to leak counter, but it hard to say with 100% confidence after just an hour of running.

channelManager.releaseChannelLock(prevKey);
releasePartitionKeyLock();

throw new AssertionError("unreachable");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw new IllegalStateException("Trying to acquire partition lock twice. Please report.")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw new IllegalStateException("Trying to acquire partition lock concurrently. Please report."); "Twice" is not proper word, because function is allowed to be called twice.

@@ -56,6 +58,7 @@

private final long start = unpreciseMillisTime();
private final ChannelPoolPartitioning connectionPoolPartitioning;
private final ChannelManager channelManager;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense to encapsulate all the connection limits code into a dedicated object. ChannelManager is too much of a god object. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like ConnectionSemaphore. Implementing it now.

@@ -265,16 +265,10 @@ private Channel getOpenChannel(NettyResponseFuture<?> future, Request request, P

Object partitionKey = future.getPartitionKey();

// we disable channelPreemption when performing next requests
final boolean acquireChannelLock = !performingNextRequest;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're losing something here, but it think it's fine and it didn't make sense in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I overlooked it, and I agree, that it did something strange.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original author's intent was to not require an extra semaphore when being redirected to another host. IMHO, too specific to be worth supporting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original author's intent was to not require an extra semaphore when being redirected to another host.

And that is actually wrong IMHO. For example, because of that behaviour you can run off your operating system file descriptor limit which could be configured to be almost equal to connection count limit.

@stepancheg
Copy link
Contributor Author

Updated the patch with you requests.

private final boolean maxConnectionsPerHostEnabled;
private final ConcurrentHashMap<Object, NonBlockingSemaphore> freeChannelsPerHost = new ConcurrentHashMap<>();

private final ConnectionSemaphore connectionSemaphore;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be here :)


private AsyncHttpClientHandler wsHandler;

public ChannelManager(final AsyncHttpClientConfig config, Timer nettyTimer) {

this.config = config;

connectionSemaphore = new ConnectionSemaphore(config);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@slandelle
Copy link
Contributor

@stepancheg How did your tests went? Is the semaphore leak fixed? If so, I'm a bit frustrated that we don't know how the leak happened :) Are we sure that was just the semaphores leaking and not the connections too?
There's just some dead code to be removed, and then I can merge when you want.

Thanks a lot!

@stepancheg
Copy link
Contributor Author

How did your tests went? Is the semaphore leak fixed?

Well, counters seems to be correct for last 8 hours. It means that either problem is fixed or system/network load changed.

And at the same time I see incorrect semaphore values on different servers (with ligher load) on alpha9, on these servers probably semaphore is released extra time.

2017-03-30_1114

I'm a bit frustrated that we don't know how the leak happened

I think there were 10s of places where leak could happen. Because it is hard to manually manager locks/ref counters. The idea of the patch is to write code in a way that you don't have to check and recheck every line for corner cases and exception safety.

Are we sure that was just the semaphores leaking and not the connections too?

I think so. We've checked previosly with lsof, there were expected number of open connections.

There's just some dead code to be removed, and then I can merge when you want.

OK, give me a couple of minutes.

@stepancheg
Copy link
Contributor Author

Updated a patch, removed dead code.

@slandelle
Copy link
Contributor

Great! I'll merge, try to backport on 2.0 too, and will release both 2.1-alpha and 2.0.

@slandelle slandelle merged commit f240845 into AsyncHttpClient:master Mar 30, 2017
@slandelle slandelle added this to the 2.0.32 milestone Mar 30, 2017
@stepancheg
Copy link
Contributor Author

I'd avoid backporting to 2.0, it's too dangerous IMO. I'd prefer if you released it as alpha, and people (in particular, our project) could perform more tests with this change. And even if this patch fixes issues, some behaiour could be changed, and users could rely on old behavior.

@stepancheg stepancheg deleted the channel-lock branch March 30, 2017 08:50
@slandelle
Copy link
Contributor

I've just released 2.1.0-alpha10, it should be up on central in 20 min.
I've backported your work in a branch: https://github.com/AsyncHttpClient/async-http-client/tree/2.0-fix1377
Once the fix is confirmed and stable, I still think we should release it. I don't think there's any behavior change.

Thanks a lot!

slandelle added a commit that referenced this pull request Oct 3, 2017
Motivation:

#1379 introduced ConnectionSemaphore to deal with connection limits.
When no such limit is set, which is the default, we could bypass all
this code.

Modification:

Don’t create ConnectionSemaphore when no limit is set and bypass all
related code when it’s null.

Result:

More simple code path and possibly better performance
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants