Skip to content

perf: optimized request headers #1335

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
Jan 27, 2025
Merged

perf: optimized request headers #1335

merged 34 commits into from
Jan 27, 2025

Conversation

AlliBalliBaba
Copy link
Contributor

I noticed that an average browser request actually has quite a few request headers. A request to this github page when logged in will have ~15 headers totaling ~2kb in data.

Header keys

Currently there's an optimisation in place that uses otter to cache header keys. After looking at flamegraphs, I noticed that in this case caching is actually not much more efficient than just concatenating the string.
This PR gets rid of the otter dependency and instead caches 80 of the most common request headers directly as a zend_string (by explicitly writing them into a map).

Cookie Header

Additionally, this PR removes the cookie sanitizing on the go side. I noticed that other SAPIs 1 2 3 will also just forward the cookie header and let PHP sanitize it.
When testing with Nginx/FPM I also noticed that nginx will even forward null bytes in the cookie, leading to a truncated cookie. I'm pretty sure this will behave the same way on Caddy/FPM, though I haven't explicitly tested there.

Sanitizing the cookie on the go side is actually pretty slow and leads to a slightly different behaviour. Slightly malformed cookies like this would be sanitized differently by GO and PHP:

Cookie: key =value
GO trims the key on both sides: key=value
PHP replaces the space with _: key_=value

PHP will request the cookie from the Sapi everytime it discovers the Cookie header in the request, even if the Cookie is not explicitly read by the PHP script.

Flamegraphs

Header registration with the 15 github headers:
registering headers consumes ~8.5% of the request, sanitizing the cookie consumes ~9.5%
flame-inefficient-headers

Header registration this PR:
registering headers consumes ~4.5% of the request, forwarding the cookie header consumes ~0.5%

flame-efficient-headers

@AlliBalliBaba AlliBalliBaba marked this pull request as ready for review January 18, 2025 10:21
Copy link
Member

@withinboredom withinboredom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor issues with the list but LGTM otherwise.

@dunglas
Copy link
Member

dunglas commented Jan 18, 2025

Maybe could we use this list instead of the one coming from ChatGPT? https://en.wikipedia.org/wiki/List_of_HTTP_header_fields

We may even automate updating the list 😅

@dunglas
Copy link
Member

dunglas commented Jan 19, 2025

There is also https://www.iana.org/assignments/http-fields/http-fields.xhtml
It's easy to parse as it's available in XML, but it's only the official headers, and it's not specified if it's a request or a response header.

@AlliBalliBaba
Copy link
Contributor Author

I added a script that scrapes all headers from https://github.com/pichillilorenzo/known-http-header-db, but after noticing it hasn't been updated in 3 years, I removed it again. I don't think these headers change very often, maybe PHP will even get a request object before these need to update. I think it's fine to update manually since this is only an optimization.

Instead, I added a test to at least verify that the entered headers are correctly transformed.

@AlliBalliBaba
Copy link
Contributor Author

Very strange, cgo apparenty has issues with tests in internal modules.

cgi.go Outdated
}

zendHeader := C.frankenphp_init_persistent_string(toUnsafeChar(commonKey), C.size_t(len(commonKey)))
thread.knownHeaderKeys[key] = zendHeader
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we populate this cache when booting the thread?

Copy link
Contributor Author

@AlliBalliBaba AlliBalliBaba Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I think that might be unnecessary work (more so with dynamic threads). Due to having trusted variable keys, it should already save time on the first registration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, the cache would directly contain the zend_strings once (and not once per thread), but I have not yet found a way to do so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually found a way to only register them once by explicitly adding the 'interned string' flag on the main thread. I'll test if this is safe, it might also be useful for the known $_SERVER keys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$_SERVER keys and common headers are now registered at the start of the main thread. I was only able to do this by explicitly marking the zend_strings as permanent and interned.

This seems to be slightly faster since we don't do any ref-counting. Since these strings stay alive for the entire process, they also don't have to be explicitly released.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome

@Alliballibaba2 Alliballibaba2 force-pushed the perf/optimized-request-headers branch from 76e662e to dd66a3c Compare January 24, 2025 15:08
@dunglas dunglas merged commit dd250e3 into main Jan 27, 2025
55 checks passed
@dunglas dunglas deleted the perf/optimized-request-headers branch January 27, 2025 20:48
@dunglas
Copy link
Member

dunglas commented Jan 27, 2025

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants