-
Notifications
You must be signed in to change notification settings - Fork 332
perf: optimized request headers #1335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor issues with the list but LGTM otherwise.
Maybe could we use this list instead of the one coming from ChatGPT? https://en.wikipedia.org/wiki/List_of_HTTP_header_fields We may even automate updating the list 😅 |
There is also https://www.iana.org/assignments/http-fields/http-fields.xhtml |
I added a script that scrapes all headers from https://github.com/pichillilorenzo/known-http-header-db, but after noticing it hasn't been updated in 3 years, I removed it again. I don't think these headers change very often, maybe PHP will even get a request object before these need to update. I think it's fine to update manually since this is only an optimization. Instead, I added a test to at least verify that the entered headers are correctly transformed. |
Very strange, cgo apparenty has issues with tests in internal modules. |
cgi.go
Outdated
} | ||
|
||
zendHeader := C.frankenphp_init_persistent_string(toUnsafeChar(commonKey), C.size_t(len(commonKey))) | ||
thread.knownHeaderKeys[key] = zendHeader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't we populate this cache when booting the thread?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I think that might be unnecessary work (more so with dynamic threads). Due to having trusted variable keys, it should already save time on the first registration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, the cache would directly contain the zend_strings
once (and not once per thread), but I have not yet found a way to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually found a way to only register them once by explicitly adding the 'interned string' flag on the main thread. I'll test if this is safe, it might also be useful for the known $_SERVER keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$_SERVER
keys and common headers are now registered at the start of the main thread. I was only able to do this by explicitly marking the zend_strings as permanent and interned.
This seems to be slightly faster since we don't do any ref-counting. Since these strings stay alive for the entire process, they also don't have to be explicitly released.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome
76e662e
to
dd66a3c
Compare
Thanks! |
I noticed that an average browser request actually has quite a few request headers. A request to this github page when logged in will have ~15 headers totaling ~2kb in data.
Header keys
Currently there's an optimisation in place that uses otter to cache header keys. After looking at flamegraphs, I noticed that in this case caching is actually not much more efficient than just concatenating the string.
This PR gets rid of the otter dependency and instead caches 80 of the most common request headers directly as a
zend_string
(by explicitly writing them into a map).Cookie Header
Additionally, this PR removes the cookie sanitizing on the go side. I noticed that other SAPIs 1 2 3 will also just forward the cookie header and let PHP sanitize it.
When testing with Nginx/FPM I also noticed that nginx will even forward null bytes in the cookie, leading to a truncated cookie. I'm pretty sure this will behave the same way on Caddy/FPM, though I haven't explicitly tested there.
Sanitizing the cookie on the go side is actually pretty slow and leads to a slightly different behaviour. Slightly malformed cookies like this would be sanitized differently by GO and PHP:
Cookie:
key =value
GO trims the key on both sides:
key=value
PHP replaces the space with
_
:key_=value
PHP will request the cookie from the Sapi everytime it discovers the Cookie header in the request, even if the Cookie is not explicitly read by the PHP script.
Flamegraphs
Header registration with the 15 github headers:

registering headers consumes ~8.5% of the request, sanitizing the cookie consumes ~9.5%
Header registration this PR:
registering headers consumes ~4.5% of the request, forwarding the cookie header consumes ~0.5%