We have encountered issues where our pipelines become stuck or fail when delegates disconnect unexpectedly during task assignment or task rebroadcasting. The system continues to attempt to assign tasks to these disconnected delegates, causing delays and hindering our deployment processes.
Problem Statement:
Pipelines get stuck when tasks are repeatedly broadcast to delegates that are no longer connected.
There is no immediate detection or handling mechanism for delegates that disconnect during task assignments.
This leads to increased pipeline execution times and requires manual intervention to abort and restart pipelines.
Proposed Solution:
Implement a mechanism to promptly detect when delegates have disconnected during task assignment.
Introduce logic to reassign tasks to available and connected delegates without causing the pipeline to hang.
Optimize the task rebroadcasting process to avoid repeatedly targeting disconnected delegates.
Benefits:
Improved reliability and stability of pipeline executions.
Reduced manual intervention to manage stuck or failed pipelines.
Enhanced efficiency in environments with fluctuating delegate availability.
Use Case:
As a user, when I run a pipeline, I want the system to handle delegate disconnections gracefully so that my pipeline does not get stuck or fail, ensuring smooth and efficient deployments.
Created by diego.pereira@harness.io
October 7, 2024