I'm still having issues concerning - https://community.dnanexus.com/s/question/0D5t0000048tMbUCAU/how-to-automatically-restart-jobs-upon-spotinstanceinterruption.
I am running WDL workflows and getting errors such as "The machine running the job was terminated by the cloud provider". I'm fairly confident I have the correct execution policy to restart jobs given any error but this doesn't seem to have an effect:
dx describe job-GP3ybZQJg8Jkpj3z5GQyf91v # job that gives the error
executionPolicy {"maxRestarts": 5, "restartOn": {"*": 2}} # has the following executionpolicy
Any help appreciated, many thanks,
Barney
Thanks, that second link was useful - I noticed "restartableEntryPoints" which I've set to "all" to enable all entry points to be restartable. Unfortunately the same error still persists with no restarts... Would I be able to get in contact with any DNANexus staff familiar with scaling WDL's to >1K jobs? We're evaluating using large WDL workflows more broadly but so far have been unable to get them to work.
Comments
5 comments
If you do dx describe job-GP3ybZQJg8Jkpj3z5GQyf91v, how many failureCounts do you see?
I'm replying for Barney because he's locked out of his account:
? dx describe job-GP3ybZQJg8Jkpj3z5GQyf91v | grep "failureCounts"
failureCounts {}
Thanks, OK, the job has not been restarted. I am sharing more ideas.
As per dxcompiler ExpertOptions documentation - https://github.com/dnanexus/dxCompiler/blob/develop/doc/ExpertOptions.md#setting-dnanexus-specific-attributes-in-extrasjson - dxCompiler equivalent to setting the runtime options through the dxapp.json file in dnax applets, is the extras file, specified with the -extras command line option.
You can study some examples in this paragraph: https://github.com/dnanexus/dxCompiler/blob/develop/doc/ExpertOptions.md#default-and-per-task-attributes
Thanks, that second link was useful - I noticed "restartableEntryPoints" which I've set to "all" to enable all entry points to be restartable. Unfortunately the same error still persists with no restarts... Would I be able to get in contact with any DNANexus staff familiar with scaling WDL's to >1K jobs? We're evaluating using large WDL workflows more broadly but so far have been unable to get them to work.
Many Thanks, Barney
current extras.json:
{
"defaultTaskDxAttributes" : {
"runSpec": {
"restartableEntryPoints": "all",
"timeoutPolicy": {
"*": {
"hours": 1
}
},
"executionPolicy": {
"restartOn": {
"*": 3
}
},
"systemRequirements": {
"access" : {
"project": "CONTRIBUTE",
"network": [
"*"
]
}
}
}
}
}
You may contact our support team as they could look into your project and see what might be the issue with job restart. ukbiobank-support@dnanexus.com
Please sign in to leave a comment.