-
Notifications
You must be signed in to change notification settings - Fork 10.4k
Description
Problem
We have multiple VMs with AspNetCore running behind a load balancer. Clients reuse TCP connections as much as possible, and therefore VMs get "pinned" to a subset of clients. We often reach states where one or two VMs are using >90% of CPU resources, whereas most other VMs are <5%. This causes a noticeable degradation and is a recurrent event.
Proposal
One proposal is provide a setting that tells the maximum number of requests a given TCP connection can deliver.
Another is to allow closing a connection programatically. In this case, the HttpContext
or other class should expose metrics such as number of requests delivered on that connection, although we can have a good estimate from the request payload itself (i.e. number of requests that a given IP address has performed) or we can simply throw dice and close connections on 0.1% of requests.
In either case, the developer should have ability tell if the closing of a connection will be immediate or graceful. In case of immediate termination of TCP connection, it's fine for ASP.NET to throw exceptions at I/O operations of previously existing requests.
In case of graceful termination, the service must only mark for closing, and stop processing new requests. Requests that exist before close initiated should be satisfied. In HTTP/11
, the server can automatically return 503
responses for any request that arrives after close is initiated, until the last previously existing request is responded, after which the server can close the TCP socket. The server should also add Connection: close
header as a hint to clients stop sending new requests. In HTTP/2
or HTTP/3
, the server should send a GOAWAY
frame and ignore new streams, as dictated by the RFC.
Workaround
The only workaround we know is to restart the AspNetCore application (or entire VM) periodically. However this is cumbersome to implement because we must restart only too-busy nodes, avoid simultaneous restarts of more than one node, and also avoid restarts of two distinct nodes too close to each other. If you have another workaround, please let us know!
Thanks
Fernando Colombo