Automatic retry of failed stage in pipeline execution
this fiscal quarter
T
Turquoise Planarian
User would like to the failure strategy to retry a stage if a step or step group fails. Currently, only option is to rollback the stage if a step fails. would like to see the option to retry the entire stage if a step within the stage fails.
Log In
Harness Engineering
this fiscal quarter
Canny AI
Merged in a post:
Support Matrix strategy at stage level to run on all stages despite failure and have an option to retry on failed stages
C
Capable Leopard
As of now while using matrix strategy on stage level does not executes all stages in the matrix if any one of it fails, and we cannot use failure strategy to ignore failure as we need an option to retry on failed stages.
Rohan Gupta
long-term
This is feasible we can add this to our long term backlog
C
Capable Leopard
I've left a comment but the status still says Pending Feedback
so copying it here
If any one of the stages fails, do you want the remaining stages in the matrix to continue running, or should they restart? Continue running
If I run 50 stages with a maxConcurrency of 10, it would run 5 batches of 10. In case of any failure in the first batch itself, the remaining batches are not run. I would like to run all batches even if there is a failure
Sudarshan Purohit
Capable Leopard: Thanks for the clarification. It sounds like you want to set an "Ignore Failure" Failure Strategy for your matrix stages. When you do this, the failed stage will still be marked failed, but it won't affect the other ongoing stages (and the execution would continue).
C
Capable Leopard
Sudarshan Purohit , that's what I expected as well but that's not what I have observed
I've got two failed stages within the matrix and the pipeline did not proceed to all the other stages in the matrix
Can you take a look?
C
Capable Leopard
Sudarshan Purohit could I get an update on this please?
Sudarshan Purohit
Capable Leopard: Apologies, this slipped by. Could you share the failure strategy setup of the stages in the matrix?
Based on your required behaviour, I'm expecting:
* Failure Strategy --> Retry Step
* Retry Count --> (As per your preference)
* Post Retry Failure Action -> Ignore Failure (or Mark as Success, if you want the final status to be Successful even if the step failed).
C
Capable Leopard
Sudarshan Purohit It's actually simpler.
Here is my failure strategy(screenshot 1) and looping strategy(screenshot 2)
I see two failures in the first group of 10 and the pipeline does not process the remaining stages i.e. 8/46 (screenshot 3)
Sudarshan Purohit
Capable Leopard: So if I understand this right, you were expecting the remaining stages in the matrix to continue executing, even though two were failed, because the failure strategy was set to Ignore Failure? And this is not happening - the execution is stopping after the first lot of 10 is done?
Just to confirm my expectations, could you try two experiments?
* Set Failure Strategy to Mark as Success, and see if the execution continues
* Set Failure Strategy to Retry, with a count of 1, and a post failure strategy of ignore failure, and see if this works.
In the meantime, I will work with the Customer Success team to access and analyze the pipeline execution you listed above.
Thanks for your patience.
C
Capable Leopard
Sudarshan Purohit thanks for getting back to me!
The execution is stopping after the first lot of 10 is done? Correct
Experiment 1: Set Failure Strategy to Mark as Success
See the same behaviour.
Experiment 2: Failure Strategy to Retry, with a count of 1, and a post failure strategy of ignore failure
Same behaviour here as well.
C
Capable Leopard
@Sudarshan Purohit could I get an update please?
Sudarshan Purohit
Capable Leopard: Hi, I did speak to the customer success team to bring this up in the regular Harness sync. Let me follow up and get the right people connected here.
Sudarshan Purohit
Hi Capable Leopard, could you expand on this? I'm not clear on what the ask is here.
* You're creating stages dynamically in a matrix.
* If any one of the stages fails, do you want the remaining stages in the matrix to continue running, or should they restart? Today, Stage Failure Strategies are applicable to the stage they're running in, and at most to rolling back the complete pipeline.
* For the stage that failed, you wanted to retry it? This should be doable just by setting a Retry Failure Strategy for the Matrix stage?
C
Capable Leopard
Sudarshan Purohit
If any one of the stages fails, do you want the remaining stages in the matrix to continue running, or should they restart? Continue running
If I run 50 stages with a maxConcurrency of 10, it would run 5 batches of 10. In case of any failure in the first batch itself, the remaining batches are not run. I would like to run all batches even if there is a failure
C
Capable Leopard
Raised as discussed on this ticket: https://support.harness.io/hc/en-us/requests/58351