-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Description
Component(s)
cmd/opampsupervisor, extension/opamp
Describe the issue you're reporting
In its current form, the Supervisor may leak a collector process if it is unexpectedly killed and doesn't get a chance to stop the collector. You can see this by issuing a kill -9
to the supervisor and observing that the collector process is still running. This will block subsequent startups of the supervisor, as multiple collectors will try to occupy the same ports (8888, any user-configured components with a port)
Under normal circumstance, we shouldn't leak the collector process, but I think we can make this more robust so that when the supervisor unexpectedly dies, the collector dies too.
I thought through a couple ideas (collector PID file, doing some black magic stuff per-os to have the OS auto-kill the children), but I think they all end up a way more complex than just having the collector monitor it's ppid and exiting if it changes.
In one of our old managed agents, this is how we would handle the cases where the supervisor may shutdown without properly killing the child process.
To fully outline the proposal, it would be something like:
- Add an optional field to the OpAMP extension,
supervisor_pid
- When
supervisor_pid
is configured, the OpAMP extension will poll (maybe every ~5 seconds) and ensure that the value ofos.Getppid()
equals the value ofsupervisor_pid
- If these values are not equal, the OpAMP extension reports a fatal error to trigger a collector shutdown.
Note: I think the second step here might need different logic on Windows, as the orphaned process may not be re-parented like it is on other systems.