Is it possible to restart a single failed block within a scattered WDL task ?
I'm running large scale analysis on WGS pVCF with a WDL workflow I built.
To handle the large amount of input files, I'm using the scatter/gather functionnality of WDL to speed up the process.
The thing is in some occurences, one of the sub-instances can fail and lead to a global halt and fail of the whole scatter task (even if the other instances are running just fine or are already done).
I tried to tweak some runtime parameters in my WDL tasks, and specifically the "dx_restart" value so that a failed task could restart without impacting the other parallel instances.
When I run one instance of the applet on it's own (meaning outside the workflow) it works, the job restarts on a fail with no issue.
But when I run the whole workflow, as soon as an error occurs, the whole workflow is stopped without even trying to restart the specific failed instance.
Is there any way for the workflow not to break as soon as a single scattered task fails ? Or maybe to save the result of successful scattered instances, so it doesn't have to run the same thing again ?
Thanks a lot,
Antoine L.
Comments
2 comments
Hi @Antoine Laine?,
This seems like a bug. Please forward it to ukbiobank-support@dnanexus.com. Support team will be able to help you and file a bug report to engineering team.
As a workaround, are you able to use job reuse on the whole workflow to pick up where it left off after one of these failures?
Please sign in to leave a comment.