Skip to content

feat: make port mapping checks timeout configurable #3176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

mdelapenya
Copy link
Member

@mdelapenya mdelapenya commented May 30, 2025

  • feat: make exposing ports configurable by config
  • chore: apply configuration to the port mapping check
  • chore: refactor checkPortsMapped to make it testable

What does this PR do?

This PR adds a new setting to the properties file: tc.port.mapping.timeout, with a default value of 5s.

Users can configure it via properties, or with the TESTCONTAINERS_PORT_MAPPING_TIMEOUT env var.

This new property is used in the check for exposed ports.

I took the chance to refactor the existing config tests, something flying around my head for months. I think they are more readable now.

Why is it important?

Before these changes, the library always used 5s to check for all the exposed ports, and depending on the underlying hardware it could lead to flakiness.

Now, users can set the env var, or the property in the ~/.testcontainers.properties file to change that timeout.

Related issues

Follow-ups

Make our config.Read methods raise an error in case the testcontainers properties file contains any, informing user about that. At the moment, we are swallowing the error, and applying the "default" config from env, which is not enough, as we should use what's defined at the config struct's defaults definition.

@mdelapenya mdelapenya requested a review from a team as a code owner May 30, 2025 05:23
@mdelapenya mdelapenya added the feature New functionality or new behaviors on the existing one label May 30, 2025
@mdelapenya mdelapenya self-assigned this May 30, 2025
@mdelapenya mdelapenya added the feature New functionality or new behaviors on the existing one label May 30, 2025
@mdelapenya mdelapenya requested a review from stevenh May 30, 2025 05:23
Copy link

netlify bot commented May 30, 2025

Deploy Preview for testcontainers-go ready!

Name Link
🔨 Latest commit 613b0ab
🔍 Latest deploy log https://app.netlify.com/projects/testcontainers-go/deploys/6839a9df5960b000080cb66c
😎 Deploy Preview https://deploy-preview-3176--testcontainers-go.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@mdelapenya
Copy link
Member Author

Need to investigate more in depth, as all tests pass locally, but on CI, they hang for ever (30m, GitHub workers default timeout). It only happens for the socat module, rootless Docker, and Ryuk disabled, so I guess there is something related to the ports that is causing the "deadlock"??

@mdelapenya
Copy link
Member Author

Need to investigate more in depth, as all tests pass locally, but on CI, they hang for ever (30m, GitHub workers default timeout).

I think I spotted a bug?? in magiconair/properties when the properties struct defines a default value, but the Decode fails, assigning a zero value instead of that default value. @stevenh it also could be our own implementation reading the props, which is not handling errors. If you agree, I'll do that in a different PR.

@mdelapenya
Copy link
Member Author

I think I spotted a bug?? in magiconair/properties when the properties struct defines a default value, but the Decode fails, assigning a zero value instead of that default value.

Probably related magiconair/properties#79

@@ -141,6 +146,11 @@ func read() Config {
config.RyukConnectionTimeout = timeout
}

testcontainersPortMappingTimeoutEnv := readTestcontainersEnv("TESTCONTAINERS_PORT_MAPPING_TIMEOUT")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value is used in defaultReadinessHook. What if this hook will be extended in future and will include more checks than just check of the port mapping? "Port mapping" (in the name of environment variable, in the name of local variable and in the name of field of Config struct) sounds like exposure of implementation details of "default readiness hook".

What if we use different name (I'm not good in naming), like:

Suggested change
testcontainersPortMappingTimeoutEnv := readTestcontainersEnv("TESTCONTAINERS_PORT_MAPPING_TIMEOUT")
testcontainersDefaultReadinessHookTimeoutEnv := readTestcontainersEnv("TESTCONTAINERS_DEFAULT_READINESS_HOOK_TIMEOUT")

?

Copy link
Contributor

@stevenh stevenh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not new, but this check seems to duplicate what wait already handles, can we just remove this check totally allow wait definitions to handle it. We could have containers waiting for ports which the user doesn't care about making things slower than they might be?

@@ -46,6 +46,7 @@ jobs:
env:
TESTCONTAINERS_RYUK_DISABLED: "${{ inputs.ryuk-disabled }}"
RYUK_CONNECTION_TIMEOUT: "${{ inputs.project-directory == 'modules/compose' && '5m' || '60s' }}"
TESTCONTAINERS_PORT_MAPPING_TIMEOUT: "5s"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: In the standard case 5s might sound generous but if a system is underload this could cause issues, what lead to choice of 5s as the default?

expected := Config{}

assert.Equal(t, expected, config)
// Group 1: Basic environment setup tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: extract all tests to at most level deep as the extra level doesn't seem to be only making it harder to identify the tests.

require.Equal(t, Config{}, config)
})

t.Run("HOME-does-not-contain-TC-props-file", func(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: use more concise names, for example the important thing for this test is no props file so I would go with something like "no-props-file".

t.Run("properties-file", func(t *testing.T) {
// Group 3.1: Docker host configuration
t.Run("docker-host", func(t *testing.T) {
tests := []struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: avoid table driven tests as it prevents tests being run individually directly from in editor.

// it will be ready when all the exposed ports are mapped, checking every 50ms, up to 1s,
// and failing if all the exposed ports are not mapped in the interval defined by the
// [config.TestcontainersPortMappingTimeout] config.
func checkPortsMapped(ctx context.Context, inspectRawContainer func(context.Context) (*container.InspectResponse, error), exposedPorts []string, timeout time.Duration) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: this seams to duplication functionality from the wait package, could you clarify why we need it as I'm not familiar with this part of the code base, could be legacy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check internally with the Java folks, as this check was suggested by them a few months ago. I'm fine with totally removing it, as you said, the wait strategies are here for doing that job

@mabrarov
Copy link
Contributor

Hi @mdelapenya,

Need to investigate more in depth, as all tests pass locally, but on CI, they hang for ever (30m, GitHub workers default timeout). It only happens for the socat module, rootless Docker, and Ryuk disabled, so I guess there is something related to the ports that is causing the "deadlock"??

If you talk about https://github.com/testcontainers/testcontainers-go/actions/runs/15347196816/job/43186220491?pr=3176:

     reuse_test.go:39: 
        	Error Trace:	/home/runner/work/testcontainers-go/testcontainers-go/reuse_test.go:39
        	Error:      	Received unexpected error:
        	            	create container: all exposed ports, [8080/tcp], were not mapped in 5s: port 8080/tcp is not mapped yet
        	Test:       	TestGenericContainer_stop_start_withReuse
2025/05/30 13:09:28 🐳 Stopping container: fcb600a0f955
2025/05/30 13:09:28 ✅ Container stopped: fcb600a0f955
2025/05/30 13:09:28 🐳 Terminating container: fcb600a0f955
2025/05/30 13:09:28 🚫 Container terminated: fcb600a0f955

=== FAIL: . TestGenericContainer_stop_start_withReuse (re-run 5) (14.98s)

then please note that this PR is not going to fix this issue, because it is not related to timeout value (which is made configurable in this PR). Refer to the 2nd note in #3165 (comment) for the root cause of this test failure.

Thank you.

@mabrarov
Copy link
Contributor

Hi @stevenh and @mdelapenya,

Not new, but this check seems to duplicate what wait already handles, can we just remove this check totally allow wait definitions to handle it. We could have containers waiting for ports which the user doesn't care about making things slower than they might be?

I like this idea, but please note that not all parts (mostly modules) of Testcontainers for Go can be ready for this, because some existing code can expect that port mapping completes immediately after container start. If we remove this readiness hook, then we risk to break such code.

Here is a quick & dirty research for such code (code which inspects mapped ports after container started without making multiple tries): #2670 (comment).

Thank you.

@mabrarov
Copy link
Contributor

Hi @mdelapenya,

the CI builds I mentioned:

* socat build: hangs https://github.com/testcontainers/testcontainers-go/actions/runs/15339849072/job/43173511093

* rootless build: hangs https://github.com/testcontainers/testcontainers-go/actions/runs/15339849072/job/43173511077

* reaper disabled: hangs https://github.com/testcontainers/testcontainers-go/actions/runs/15339849072/job/43173511093

All of these failures seem to be caused by issue which I mentioned. Feel free to check my comment:

2025/05/27 20:28:09 🐳 Creating container for image nginx:alpine
2025/05/27 20:28:09 🐳 Creating container for image testcontainers/ryuk:0.11.0
2025/05/27 20:28:09 ✅ Container created: 931b882451ca
2025/05/27 20:28:09 🐳 Starting container: 931b882451ca
2025/05/27 20:28:09 ✅ Container started: 931b882451ca
2025/05/27 20:28:09 ⏳ Waiting for container id 931b882451ca image: testcontainers/ryuk:0.11.0. Waiting for: &{Port:8080/tcp timeout:<nil> PollInterval:100ms skipInternalCheck:false}
2025/05/27 20:28:09 🔔 Container is ready: 931b882451ca
2025/05/27 20:28:09 ✅ Container created: cf711b2a8ec9
2025/05/27 20:28:09 🐳 Starting container: cf711b2a8ec9
2025/05/27 20:28:09 ✅ Container started: cf711b2a8ec9
2025/05/27 20:28:09 🔔 Container is ready: cf711b2a8ec9
2025/05/27 20:28:09 🐳 Stopping container: cf711b2a8ec9
2025/05/27 20:28:19 ✅ Container stopped: cf711b2a8ec9
2025/05/27 20:28:19 ✅ Container started: cf711b2a8ec9
2025/05/27 20:28:19 All requested ports were not exposed: port 8080/tcp is not mapped yet
2025/05/27 20:28:19 All requested ports were not exposed: port 8080/tcp is not mapped yet
2025/05/27 20:28:19 All requested ports were not exposed: port 8080/tcp is not mapped yet
...
    reuse_test.go:39:
                Error Trace:    /ws/github/mabrarov/testcontainers-go/reuse_test.go:39
                Error:          Received unexpected error:
                                create container: all exposed ports, [8080/tcp], were not mapped in 5s: port 8080/tcp is not mapped yet
                Test:           TestGenericContainer_stop_start_withReuse
2025/05/27 20:28:24 🐳 Stopping container: cf711b2a8ec9
2025/05/27 20:28:24 ✅ Container stopped: cf711b2a8ec9
2025/05/27 20:28:24 🐳 Terminating container: cf711b2a8ec9
2025/05/27 20:28:24 🚫 Container terminated: cf711b2a8ec9
--- FAIL: TestGenericContainer_stop_start_withReuse (15.36s)
FAIL
FAIL    github.com/testcontainers/testcontainers-go     15.375s
FAIL 

Note that there is no "Starting container: cf711b2a8ec9" message before the last "Container started: cf711b2a8ec9" message.

The failed tests in CI behave same way. The only difference is larger timeout (if I get it right).

Thank you.

@mdelapenya
Copy link
Member Author

Here is a quick & dirty research for such code (code which inspects mapped ports after container started without making multiple tries): #2670 (comment).

Thanks @mabrarov, those modules in the comment don't have a proper wait strategy, and need to wait for the listening port, so we must fix them (and any other module in the same situation).

If we finally remove this check, we must make sure we inform client code that they must create a robust wait strategy, else container startup will be flaky, and to be fair, this should already be expected 😅 In fact, this was added in #2606 as a moat for users not defining consistent wait strategies.

@mdelapenya
Copy link
Member Author

mdelapenya commented May 30, 2025

The failed tests in CI behave same way. The only difference is larger timeout (if I get it right).

I think the larger timeout is caused because the CI does not have a testcontainers.properties and our config code fails to define the new timeout (we need to fix it, as mentioned in the follow-up section of the PR), as a result, there is no timeout, and the retry hangs forever. This is what I observe from those builds.

In any case, after these conversations, I'm more interested in removing the code, and enforce defining best practices with the wait strategies. A validation warning, maybe?

@mabrarov
Copy link
Contributor

mabrarov commented May 30, 2025

Hi @mdelapenya,

If we finally remove this check, we must make sure we inform client code that they must create a robust wait strategy

Yes, and the users may need wait.ForMappedPort from #3176 #3165 to do that 😄 (note that wait.ForMappedPort waits for a single port while defaultReadinessHook waits for multiples ports, so wait.ForMappedPort is not one-to-one replacement for defaultReadinessHook).

Update: fixed linked to PR which should be #3165

@mdelapenya
Copy link
Member Author

Hi @mdelapenya,

If we finally remove this check, we must make sure we inform client code that they must create a robust wait strategy

Yes, and the users may need wait.ForMappedPort from #3176 to do that 😄 (note that wait.ForMappedPort waits for a single port while defaultReadinessHook waits for multiples ports, so wait.ForMappedPort is not one-to-one replacement for defaultReadinessHook).

Mmm why not adding a new wait strategy for all the mapped ports, using your code as buildng block? It would satisfy the readinessCheck, and could be optional or even required (added as a wait strategy in the testcontainers.Run function).

@stevenh
Copy link
Contributor

stevenh commented May 30, 2025

Hi @mdelapenya,

If we finally remove this check, we must make sure we inform client code that they must create a robust wait strategy

Yes, and the users may need wait.ForMappedPort from #3176 to do that 😄 (note that wait.ForMappedPort waits for a single port while defaultReadinessHook waits for multiples ports, so wait.ForMappedPort is not one-to-one replacement for defaultReadinessHook).

Mmm why not adding a new wait strategy for all the mapped ports, using your code as buildng block? It would satisfy the readinessCheck, and could be optional or even required (added as a wait strategy in the testcontainers.Run function).

On the face that sounds like a good idea, but might not be the best solution, why? Containers could be exposing lots of ports, and the user might not care about all of them. As we know expose is a specific edge case for them to be usable is something different so I would suggest that being explicit about the ones you care about would result in better behaviour.

Sure the wait for all might be convenient but result in lazy application and slower start up times. Thooughts?

@mabrarov
Copy link
Contributor

mabrarov commented May 30, 2025

Hi @mdelapenya,

why not adding a new wait strategy for all the mapped ports, using your code as buildng block? It would satisfy the readinessCheck, and could be optional or even required (added as a wait strategy in the testcontainers.Run function).

Do you plan to revisit this PR and instead of making defaultReadinessHook timeout configurable:

  1. Plan to remove defaultReadinessHook
  2. Plan to add new wait strategy where user can choose either wait for mapping of all exposed ports or wait for mapping of explicitly provided ports (refer to feat: make port mapping checks timeout configurable #3176 (comment) from @stevenh) and recommend this new wait strategy in documentation (and in release note about removal of defaultReadinessHook)

?

It is not clear for me what we are going to do with #3165 in that case (what will be the order of PRs, what is the future of #3165 in general).

Thank you.

@mdelapenya
Copy link
Member Author

Sure the wait for all might be convenient but result in lazy application and slower start up times. Thooughts?

We have two options:

  1. create an option for waiting for all the mapped ports, document it, and let users decide, not applying to by default, as I suggested above. It's basically what we are doing in the readiness check atm (not consistent with the wait strategies approach). We must explicitly mention it in the release notes, that the behaviour has changed (means a breaking change)
  2. keep the code as is, making it configurable (proper naming included)

@stevenh
Copy link
Contributor

stevenh commented May 30, 2025

#3165 should be merged when ready, that could be good building block for containers that need to know the port details for container configuration, that's an edge case in my mind, but still solid core change.

The more standard workflow is waiting for a listening port.

It might be useful to have a "for all exposed" option for port wait strategies, but I think we should explore the impact of that before adding.

@stevenh
Copy link
Contributor

stevenh commented May 30, 2025

Sure the wait for all might be convenient but result in lazy application and slower start up times. Thooughts?

We have two options:

  1. create an option for waiting for all the mapped ports, document it, and let users decide, not applying to by default, as I suggested above. It's basically what we are doing in the readiness check atm (not consistent with the wait strategies approach). We must explicitly mention it in the release notes, that the behaviour has changed (means a breaking change)
  2. keep the code as is, making it configurable (proper naming included)

I like option 1 as it removes the duplication and simplifies the configration for all cases. It might be a breaking change, but should only be for modules which have aren't correctly waiting for their required ports, so in the long run will result in more stability.

One could argue its not a breaking change, it's just one which will expose bugs in other areas, so not something to shy away from IMO.

@mabrarov
Copy link
Contributor

Hi,

If my opinion is needed / helpful, then I vote for option 1 from #3176 (comment). I was surprised by defaultReadinessHook. It would be nice to check if Testcontainers for Java has such feature (if yes, then, maybe maintainers of Testcontainers for Java can speak about this feature or want to remove / replace it too).

Thank you.

@mabrarov
Copy link
Contributor

FYI @mdelapenya,

I opened #3177 for the issue described in #3176 (comment) (in #3165 (comment)).

Thank you.

@mdelapenya mdelapenya closed this Jun 3, 2025
@mdelapenya mdelapenya deleted the configurable-timeout-ports branch June 3, 2025 11:12
@mdelapenya
Copy link
Member Author

Closing, we are going to remove the internal check for the exposed ports:

  1. the library offers a robust set of wait strategies that allows user configure the process of waiting for a good state
  2. given 1) we cannot over protect our users with this check. If we recommend defining a proper wait strategy, it means exactly that, and waiting for ports must be defined by the user. We are adding a new wait.ForMappedPort strategy (see feat(kafka,redpanda): support for waiting for mapped ports without external checks #3165)

For that, )'m closing this one and will send a new one removing the check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New functionality or new behaviors on the existing one
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants