Skip to content

Reverted AsyncResponseStream buffer to cbuf which is considerably faster than StreamString #148

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 8, 2025

Conversation

mathieucarbou
Copy link
Member

@mathieucarbou mathieucarbou commented Apr 7, 2025

This PR fixes an issue discovered in OpenDTU (see: tbnobody/OpenDTU#2535).

The replacement of cbuf by StringStream is definitely slower (3-4 times). Even calling setTimeout(0) to inhibit the timed read with readBytes() does not help speeding up.

This PR reverts the code and also provides a MRE helping isolate the problem / reproduce it.

The use of StringStream was causing several issues like:

  • throttling within slow response (poll events)
  • TWDT could trigger because of the slow StringStream

Note: this issue is only visible for big payloads: this OpenDTU endpoints uses a buffer of 40k.

@mathieucarbou mathieucarbou self-assigned this Apr 7, 2025
@mathieucarbou mathieucarbou marked this pull request as draft April 7, 2025 22:28
@mathieucarbou mathieucarbou changed the title Fix: AsyncResponseStream could cause a 1 sec delay... Reverted AsyncResponseStream buffer to cbuf which is considerably faster than StreamString Apr 7, 2025
@mathieucarbou mathieucarbou marked this pull request as ready for review April 7, 2025 22:53
@mathieucarbou mathieucarbou force-pushed the timeout branch 2 times, most recently from 66919c5 to e218213 Compare April 8, 2025 06:33
@me-no-dev
Copy link
Member

Any idea why StreamString is so slow?

@mathieucarbou
Copy link
Member Author

mathieucarbou commented Apr 8, 2025

autocannon -c 1 -d 30 http://192.168.4.1/

with cbuf

image

with StreamString

  • CRASH if we not disable TWDT with -D CONFIG_ASYNC_TCP_USE_WDT=0
  • lot of logs: [ 29448][D][AsyncTCP.cpp:173] _get_async_event(): coalescing polls, network congestion or async callbacks might be too slow!
image

setting a timeout with setTImeout(0) to avoid a 1 sec wait time does not help.

Analysis:

The perf of StreamString is really bad because of several things:

  1. virtual size_t readBytes(char *buffer, size_t length); is not overridden, and the default implementation is doing a timed read and calls several times read()
    2.read() calls each time String::remove(0, 1), which is doing a memmove() of the nearly complete buffer for each read.

So if we have a bufferSize of 40k like in the OpenDTU project, reading causes 40960 calls to read(), each call doing a memmove of the remaining buffer.

That's super slow.

So this PR reverts to cbuff until StreamString perf is fixed in Arduino Core.

About StreamString inefficiency with large buffers:

The problem is the inefficiency between StreamString and String operations.

  • First, StreamString should override readBytes() method to avoid calling read() repeatedly.
  • Optionally the class should be modified for better perf with an flag preventing String mutations when read starts and also keep an index of the current position. This will prevent concurrent read / write and will avoid all the memory moves.

There is a tradeoff to make:

  1. having the String in memory reduced after each read to really match the remaining bytes (slower - uses more heap because of copies)
  2. or keep the String in memory as-is, immutable, and add indexes in StreamString to iterate over the String (faster and no move or copy - uses less heap but a steady mount of heap matching the initial String size)

@mathieucarbou mathieucarbou merged commit 1bf9172 into main Apr 8, 2025
30 checks passed
@mathieucarbou mathieucarbou deleted the timeout branch April 8, 2025 12:30
mathieucarbou added a commit to mathieucarbou/arduino-esp32 that referenced this pull request Apr 8, 2025
… buffers

Using `StreamString` on large buffers causes large movements of memory data when reading which can cause the TWDT to trigger, heap fragmentation or crash due to allocation failures.

`StreamString` is definitely slower than cbuf.h for large buffers.

Explanation of the issue with a small MRE: ESP32Async/ESPAsyncWebServer#148 (comment)

This issue was discovered in the OpenDTU project : tbnobody/OpenDTU#2535
mathieucarbou added a commit to mathieucarbou/arduino-esp32 that referenced this pull request Apr 8, 2025
Using `StreamString` on large buffers causes large movements of memory data when reading which can cause the TWDT to trigger, heap fragmentation or crash due to allocation failures.

`StreamString` is definitely slower than cbuf.h for large buffers.

Explanation of the issue with a small MRE: ESP32Async/ESPAsyncWebServer#148 (comment)

This issue was discovered in the OpenDTU project : tbnobody/OpenDTU#2535
mathieucarbou added a commit that referenced this pull request Apr 8, 2025
mathieucarbou added a commit that referenced this pull request Apr 8, 2025
Urgent fix of previous PR #148: Arduino Core 2 has a different implementation of size()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants