Skip to content

Add readiness probe for NGINX on pod startup #3629

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

salonichf5
Copy link
Contributor

@salonichf5 salonichf5 commented Jul 15, 2025

Proposed changes

Write a clear and concise description that helps reviewers understand the purpose and impact of your changes. Use the
following format:

Problem: Users want to have a readiness probe for NGINX on startup to verify pod health.

Solution: Expose readiness probe configuration fields in the NginxProxy spec, allowing users to customize them as needed. By default, deploy NGINX with a readiness probe at the /readyz endpoint.

Testing: Manual testing

  1. Readiness Probe empty - Nginx starts successfully, Nginx Proxy config has default values for readiness probe, Nginx deployment reflects the same, Health check server listening at 8081.

values.yaml

  readinessProbe: {}

Nginx proxy Config

Spec:
  Ip Family:  dual
  Kubernetes:
    Deployment:
      Container:
        Image:
          Pull Policy:  Never
          Repository:   nginx-gateway-fabric/nginx
          Tag:          sa.choudhary
      Replicas:         1
    Service:
      External Traffic Policy:  Local
      Type:                     LoadBalancer
Events:                         <none>


Nginx Deployment

http-get http://:8081/readyz delay=3s timeout=1s period=10s #success=1 #failure=3
  1. Fields are set for the readiness Probe config
    Nginx starts successfully, Nginx Proxy config has correct values for readiness probe fields, Nginx deployment reflects the same
readinessProbe:
    port: 8083
    initialDelaySeconds: 5

NGINX deployment

    SeccompProfile:  RuntimeDefault
    Readiness:       http-get http://:8083/readyz delay=5s timeout=1s period=10s #success=1 #failure=3


Nginx proxy config

Spec:
    Deployment:
      Container:
        Image:
          Pull Policy:  Never
          Repository:   nginx-gateway-fabric/nginx
          Tag:          sa.choudhary
        Readiness Probe:
          Initial Delay Seconds:  5
          Port:                   8083
      Replicas:                   1


Server block

# Health check server block listening at 8083
server {
    listen 8083;

    location = /readyz { 
        access_log off;
        return 200;
    }
}



  1. Now update the fields using Nginx proxy config

Nginx restarts a new pod successfully, Nginx deployment reflects the same, server block port is updated

## updating nginx proxy config

    Deployment:
      Container:
        Image:
          Pull Policy:  Never
          Repository:   nginx-gateway-fabric/nginx
          Tag:          sa.choudhary
        Readiness Probe:
          Initial Delay Seconds:  15
          Port:                   8085


NGINX deployment

    SeccompProfile:  RuntimeDefault
    Readiness:       http-get http://:8085/readyz delay=15s timeout=1s period=10s #success=1 #failure=3

Server block


# Health check server block
server {
    listen 8085;

    location = /readyz { 
        access_log off;
        return 200;
    }
}

  1. verified the above sets for deployment type DaemonSet as well.

Please focus on (optional): If you any specific areas where you would like reviewers to focus their attention or provide
specific feedback, add them here.

Closes #3594

Checklist

Before creating a PR, run through this checklist and mark each as complete.

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked that all unit tests pass after adding my changes
  • I have updated necessary documentation
  • I have rebased my branch onto main
  • I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

Release notes

If this PR introduces a change that affects users and needs to be mentioned in the release notes,
please add a brief note that summarizes the change.

Readiness probe settings for NGINX are now configurable in the `NginxProxy` spec. 

@github-actions github-actions bot added documentation Improvements or additions to documentation enhancement New feature or request helm-chart Relates to helm chart labels Jul 15, 2025
@salonichf5 salonichf5 force-pushed the feat/nginx-readinessProbe branch from 7cb82dc to 1b718a5 Compare July 15, 2025 22:16
Copy link

codecov bot commented Jul 15, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.91%. Comparing base (487dbe6) to head (5dd1fd8).
Report is 8 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3629      +/-   ##
==========================================
+ Coverage   86.89%   86.91%   +0.01%     
==========================================
  Files         127      127              
  Lines       15287    15333      +46     
  Branches       62       62              
==========================================
+ Hits        13284    13326      +42     
- Misses       1849     1853       +4     
  Partials      154      154              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@salonichf5 salonichf5 marked this pull request as ready for review July 16, 2025 14:05
@salonichf5 salonichf5 requested a review from a team as a code owner July 16, 2025 14:05
@salonichf5 salonichf5 force-pushed the feat/nginx-readinessProbe branch from fad90d4 to a48dbf0 Compare July 17, 2025 15:47
@salonichf5 salonichf5 requested review from sjberman and bjee19 July 17, 2025 15:49
Comment on lines +393 to +397
// Container defines container fields for the NGINX container.
//
// +optional
Container ContainerSpec `json:"container"`

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like some unnecessary changes to the DeploymentSpec and DaemonSetSpec?

Copy link
Contributor Author

@salonichf5 salonichf5 Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird it comes out as a separate addition, i just re-arranged fields for the linter

// Deployment is the configuration for the NGINX Deployment.
type DeploymentSpec struct {
	// Container defines container fields for the NGINX container.
	//
	// +optional
	Container ContainerSpec `json:"container"`

	// Number of desired Pods.
	//
	// +optional
	Replicas *int32 `json:"replicas,omitempty"`

	// Pod defines Pod-specific fields.
	//
	// +optional
	Pod PodSpec `json:"pod"`
}

// DaemonSet is the configuration for the NGINX DaemonSet.
type DaemonSetSpec struct {
	// Container defines container fields for the NGINX container.
	//
	// +optional
	Container ContainerSpec `json:"container"`

	// Pod defines Pod-specific fields.
	//
	// +optional
	Pod PodSpec `json:"pod"`
}

just have all the original fields re-arranged

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder why the linter would care now? It's been in this order for awhile with no issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a huge deal if it's blocking you from using the old order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe because the containerSpec had additions and its space bytes changed, I'm assuming.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point.

// If not specified, the default port is 8081.
//
// +optional
// +kubebuilder:default:=8081
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of me thinks we don't actually set this default in the CRD, and just set it internally. Otherwise our default NginxProxy resource is going to render this, which I don't think it needs to since it's a niche field that people probably aren't going to update very often.

Comment on lines 496 to 507
## Defines the settings for the data plane readiness probe. This probe returns Ready when the NGINX data plane is ready to serve traffic.
readinessProbe: {}
# @schema
# type: integer
# minimum: 1
# maximum: 65535
# @schema
# -- Port in which the readiness endpoint is exposed.
# port: 8081

# The Kubernetes probe can also be configured by modifying the NginxProxy resource.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just put this under the container spec as well, it doesn't need to be at top level field. Then in the yaml, you shouldn't need to add it since it'll be added automatically in the container spec.

// Probe describes the Kubernetes Probe configuration.
//
// +optional
*corev1.Probe `json:",inline"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually reconsidering this now....because I'm not sure we want to allow a user to configure the entire probe. Mainly because then it would require updates to the nginx endpoint that we expose. And what if they set a GRPCAction instead of HTTPAction? We can't act on that.

I think until we get user feedback on the need for full probe configuration, we keep things limited. Let's just expose port and initialdelayseconds for now, like we did before. Sorry for the back and forth.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, but just to clarify. They wouldn't be able to set GRPCAction in our NGINX deployment spec, because we only copy over specific fields from the probe

	probe.InitialDelaySeconds = probeSpec.InitialDelaySeconds
	probe.TimeoutSeconds = probeSpec.TimeoutSeconds
	probe.PeriodSeconds = probeSpec.PeriodSeconds
	probe.SuccessThreshold = probeSpec.SuccessThreshold
	probe.FailureThreshold = probeSpec.FailureThreshold
	probe.TerminationGracePeriodSeconds = probeSpec.TerminationGracePeriodSeconds

So. even if the set the field ProbeHandler to GRPCAction, it wouldn't get copied over. Maybe I should mention that in the comment for Probe instead of making changes?

Copy link
Collaborator

@sjberman sjberman Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know, though it would be odd for a user to be able to set a bunch of fields and then none of them get propagated (that's a weird and non-deterministic API). My original idea was just take all of their fields and then set them. I think we just keep the simple case for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but what my original thinking was that we set the ProbeHandler specific to our needs that being at the endpoint /readyz and configurable port. Along with that we carry over other probe settings.

So, to ensure I got all the details right

  1. Add readiness probe under container in values.yaml
  2. Expose two fields port and initialDelaySeconds only with defaults only set in the code and not at API spec level

Would we also run into these issues for when we allow custom patches for Deployment, Service and ocntainer or K8s API will handle it during the merge calls?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, those details sound right. We can't expose ProbleHandler to users if we're not going to honor all the fields, that's a broken API.

For custom patches, there's certainly a grey area there, which we'll need to document (@ciarams87). Some fields may not work as expected because it could require internal changes (like with the probe endpoint and such). But that's the nature of allowing a raw JSON patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay good to know

@salonichf5 salonichf5 requested a review from sjberman July 17, 2025 18:02
@salonichf5 salonichf5 force-pushed the feat/nginx-readinessProbe branch from 572c5b4 to 5dd1fd8 Compare July 17, 2025 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request helm-chart Relates to helm chart release-notes
Projects
Status: 🆕 New
Development

Successfully merging this pull request may close these issues.

v 2.x No Readiness or Liveness Probes for the Data Plane
3 participants