Currently VM Pool instances are tracked only in the VM Runner's in-memory state.
If the Runner does not shut down gracefully (e.g. hardware failure, force termination,
or ASG scale-in race conditions), the pool VMs become orphaned with no process to
manage or terminate them.
We have implemented startup scripts on the delegate that scan for and terminate orphaned pool instances on next boot, but this means orphans persist and incur cloud compute costs until the next delegate replacement. It also adds operational maintenance overhead that should not be the customer's responsibility.
It would be a valuable enhancement to have the Runner persist pool state externally
(e.g. tags, DynamoDB, or a local file) and reconcile on startup, or provide an official
orphan detection and cleanup mechanism, so that pool VM lifecycle is reliably managed
regardless of how the Runner terminates.