-
-
Notifications
You must be signed in to change notification settings - Fork 547
feat: make port mapping checks timeout configurable #3176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: make port mapping checks timeout configurable #3176
Conversation
✅ Deploy Preview for testcontainers-go ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Need to investigate more in depth, as all tests pass locally, but on CI, they hang for ever (30m, GitHub workers default timeout). It only happens for the socat module, rootless Docker, and Ryuk disabled, so I guess there is something related to the ports that is causing the "deadlock"?? |
I think I spotted a bug?? in |
Probably related magiconair/properties#79 |
@@ -141,6 +146,11 @@ func read() Config { | |||
config.RyukConnectionTimeout = timeout | |||
} | |||
|
|||
testcontainersPortMappingTimeoutEnv := readTestcontainersEnv("TESTCONTAINERS_PORT_MAPPING_TIMEOUT") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value is used in defaultReadinessHook
. What if this hook will be extended in future and will include more checks than just check of the port mapping? "Port mapping" (in the name of environment variable, in the name of local variable and in the name of field of Config
struct) sounds like exposure of implementation details of "default readiness hook".
What if we use different name (I'm not good in naming), like:
testcontainersPortMappingTimeoutEnv := readTestcontainersEnv("TESTCONTAINERS_PORT_MAPPING_TIMEOUT") | |
testcontainersDefaultReadinessHookTimeoutEnv := readTestcontainersEnv("TESTCONTAINERS_DEFAULT_READINESS_HOOK_TIMEOUT") |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not new, but this check seems to duplicate what wait already handles, can we just remove this check totally allow wait definitions to handle it. We could have containers waiting for ports which the user doesn't care about making things slower than they might be?
@@ -46,6 +46,7 @@ jobs: | |||
env: | |||
TESTCONTAINERS_RYUK_DISABLED: "${{ inputs.ryuk-disabled }}" | |||
RYUK_CONNECTION_TIMEOUT: "${{ inputs.project-directory == 'modules/compose' && '5m' || '60s' }}" | |||
TESTCONTAINERS_PORT_MAPPING_TIMEOUT: "5s" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: In the standard case 5s might sound generous but if a system is underload this could cause issues, what lead to choice of 5s as the default?
expected := Config{} | ||
|
||
assert.Equal(t, expected, config) | ||
// Group 1: Basic environment setup tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: extract all tests to at most level deep as the extra level doesn't seem to be only making it harder to identify the tests.
require.Equal(t, Config{}, config) | ||
}) | ||
|
||
t.Run("HOME-does-not-contain-TC-props-file", func(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: use more concise names, for example the important thing for this test is no props file so I would go with something like "no-props-file".
t.Run("properties-file", func(t *testing.T) { | ||
// Group 3.1: Docker host configuration | ||
t.Run("docker-host", func(t *testing.T) { | ||
tests := []struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: avoid table driven tests as it prevents tests being run individually directly from in editor.
// it will be ready when all the exposed ports are mapped, checking every 50ms, up to 1s, | ||
// and failing if all the exposed ports are not mapped in the interval defined by the | ||
// [config.TestcontainersPortMappingTimeout] config. | ||
func checkPortsMapped(ctx context.Context, inspectRawContainer func(context.Context) (*container.InspectResponse, error), exposedPorts []string, timeout time.Duration) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: this seams to duplication functionality from the wait package, could you clarify why we need it as I'm not familiar with this part of the code base, could be legacy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me check internally with the Java folks, as this check was suggested by them a few months ago. I'm fine with totally removing it, as you said, the wait strategies are here for doing that job
Hi @mdelapenya,
If you talk about https://github.com/testcontainers/testcontainers-go/actions/runs/15347196816/job/43186220491?pr=3176:
then please note that this PR is not going to fix this issue, because it is not related to timeout value (which is made configurable in this PR). Refer to the 2nd note in #3165 (comment) for the root cause of this test failure. Thank you. |
@mabrarov the CI builds I mentioned:
|
Hi @stevenh and @mdelapenya,
I like this idea, but please note that not all parts (mostly modules) of Testcontainers for Go can be ready for this, because some existing code can expect that port mapping completes immediately after container start. If we remove this readiness hook, then we risk to break such code. Here is a quick & dirty research for such code (code which inspects mapped ports after container started without making multiple tries): #2670 (comment). Thank you. |
Hi @mdelapenya,
All of these failures seem to be caused by issue which I mentioned. Feel free to check my comment:
The failed tests in CI behave same way. The only difference is larger timeout (if I get it right). Thank you. |
Thanks @mabrarov, those modules in the comment don't have a proper wait strategy, and need to wait for the listening port, so we must fix them (and any other module in the same situation). If we finally remove this check, we must make sure we inform client code that they must create a robust wait strategy, else container startup will be flaky, and to be fair, this should already be expected 😅 In fact, this was added in #2606 as a moat for users not defining consistent wait strategies. |
I think the larger timeout is caused because the CI does not have a testcontainers.properties and our config code fails to define the new timeout (we need to fix it, as mentioned in the follow-up section of the PR), as a result, there is no timeout, and the retry hangs forever. This is what I observe from those builds. In any case, after these conversations, I'm more interested in removing the code, and enforce defining best practices with the wait strategies. A validation warning, maybe? |
Hi @mdelapenya,
Yes, and the users may need Update: fixed linked to PR which should be #3165 |
Mmm why not adding a new wait strategy for all the mapped ports, using your code as buildng block? It would satisfy the readinessCheck, and could be optional or even required (added as a wait strategy in the |
On the face that sounds like a good idea, but might not be the best solution, why? Containers could be exposing lots of ports, and the user might not care about all of them. As we know expose is a specific edge case for them to be usable is something different so I would suggest that being explicit about the ones you care about would result in better behaviour. Sure the wait for all might be convenient but result in lazy application and slower start up times. Thooughts? |
Hi @mdelapenya,
Do you plan to revisit this PR and instead of making
? It is not clear for me what we are going to do with #3165 in that case (what will be the order of PRs, what is the future of #3165 in general). Thank you. |
We have two options:
|
#3165 should be merged when ready, that could be good building block for containers that need to know the port details for container configuration, that's an edge case in my mind, but still solid core change. The more standard workflow is waiting for a listening port. It might be useful to have a "for all exposed" option for port wait strategies, but I think we should explore the impact of that before adding. |
I like option 1 as it removes the duplication and simplifies the configration for all cases. It might be a breaking change, but should only be for modules which have aren't correctly waiting for their required ports, so in the long run will result in more stability. One could argue its not a breaking change, it's just one which will expose bugs in other areas, so not something to shy away from IMO. |
Hi, If my opinion is needed / helpful, then I vote for option 1 from #3176 (comment). I was surprised by Thank you. |
FYI @mdelapenya, I opened #3177 for the issue described in #3176 (comment) (in #3165 (comment)). Thank you. |
Closing, we are going to remove the internal check for the exposed ports:
For that, )'m closing this one and will send a new one removing the check. |
What does this PR do?
This PR adds a new setting to the properties file:
tc.port.mapping.timeout
, with a default value of 5s.Users can configure it via properties, or with the
TESTCONTAINERS_PORT_MAPPING_TIMEOUT
env var.This new property is used in the check for exposed ports.
I took the chance to refactor the existing config tests, something flying around my head for months. I think they are more readable now.
Why is it important?
Before these changes, the library always used 5s to check for all the exposed ports, and depending on the underlying hardware it could lead to flakiness.
Now, users can set the env var, or the property in the ~/.testcontainers.properties file to change that timeout.
Related issues
Follow-ups
Make our config.Read methods raise an error in case the testcontainers properties file contains any, informing user about that. At the moment, we are swallowing the error, and applying the "default" config from env, which is not enough, as we should use what's defined at the config struct's defaults definition.