Skip to content

[cmd/opampsupervisor] Supervisor may leak collector process #32189

@BinaryFissionGames

Description

@BinaryFissionGames

Component(s)

cmd/opampsupervisor, extension/opamp

Describe the issue you're reporting

In its current form, the Supervisor may leak a collector process if it is unexpectedly killed and doesn't get a chance to stop the collector. You can see this by issuing a kill -9 to the supervisor and observing that the collector process is still running. This will block subsequent startups of the supervisor, as multiple collectors will try to occupy the same ports (8888, any user-configured components with a port)

Under normal circumstance, we shouldn't leak the collector process, but I think we can make this more robust so that when the supervisor unexpectedly dies, the collector dies too.

I thought through a couple ideas (collector PID file, doing some black magic stuff per-os to have the OS auto-kill the children), but I think they all end up a way more complex than just having the collector monitor it's ppid and exiting if it changes.

In one of our old managed agents, this is how we would handle the cases where the supervisor may shutdown without properly killing the child process.

To fully outline the proposal, it would be something like:

  1. Add an optional field to the OpAMP extension, supervisor_pid
  2. When supervisor_pid is configured, the OpAMP extension will poll (maybe every ~5 seconds) and ensure that the value of os.Getppid() equals the value of supervisor_pid
  3. If these values are not equal, the OpAMP extension reports a fatal error to trigger a collector shutdown.

Note: I think the second step here might need different logic on Windows, as the orphaned process may not be re-parented like it is on other systems.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions