Set time limit policy on dx run jobs with 'cmd' input and how to check time limits
Hi!
I am trying to run some GWAS analysis on Hail in the UK Biobank RAP. To make things automatised I am using the Spark JupiterLab app through the dx run command. I could not find much information on how to set up time limits in this case (since I would like to be in full control of what I am running).
From the manual page here: https://platform.dnanexus.com/app/app-dxjupyterlab_spark_cluster
It is specified that this is how to run a dxjupyterlab
or dxjupyterlab_spark_cluster
command:
my_cmd="papermill notebook.ipynb output_notebook.ipynb"
dx run dxjupyterlab -icmd="$my_cmd" -iin="notebook.ipynb"
It is also specified that :
the "duration" argument will be ignored when running the app with "cmd". The app can be run from commandline with the --extra-args flag to limit the runtime, e.g. "dx run dxjupyterlab --extra-args '{"timeoutPolicyByExecutable": {"app-xxxx":{"*": {"hours": 1}}}}'".
I tried using this command to run my notebook (including the timeoutPolicyByExecutable
):
my_cmd="papermill notebook.ipynb output_notebook.ipynb"
dx run dxjupyterlab_spark_cluster -icmd="$my_cmd" -iin="notebook.ipynb" --instance-type "mem1_ssd1_v2_x16" --tag="test1" --priority "high" --name "test1" --extra-args '{"timeoutPolicyByExecutable": {"app-dxjupyterlab_spark_cluster":{"*": {"minutes": 5}}}}'
But the job kept running beyond the 5 minutes limit I set up. Moreover, when I look for information on the timeoutpolicy in the json of the running job by typing:
dx describe job-xxxx --json | jq '.timeoutPolicyByExecutable'
I do get a null
as output.
What am I doing wrong? Is there some clear information around on this?
Thanks
Comments
7 comments
Hi Gabriele Maria Sgarlata
Here is a suggestion from my assistant. Let me know if it's any help or not. Plesae bear in mind that, while enthusiastic, she's a bit confused by the amount of documentation I give her to process ;-)
The flag itself is fine — the problem is the key you pass inside
timeoutPolicyByExecutable
.timeoutPolicyByExecutable
expects the executable ID (app-xxxxxxxxxxxxxxxxxxxxxxxx
) as the first-level key, not the human-readable name dxjupyterlab_spark_cluster.The platform validates the key against the regex
^(app|applet)-[0-9A-Za-z]{24}$
and silently ignores anything that does not match, so the policy you sent is dropped and the job inherits the default 30-day timeout .Because no key matched,
dx describe … | jq '.timeoutPolicyByExecutable'
quite correctly printsnull
.How to apply a timeout that actually works
JobTimeoutExceeded
.dx describe <job-id> --json | jq '.timeoutPolicyByExecutable'
.Alternative shortcuts
Limit just this one job instead of the whole tree:
Here you omit the executable ID completely; the policy is attached directly to the job’s own
timeoutPolicy
field ."days"
,"hours"
,"minutes"
is accepted.Why the CLI
--duration
flag does not helpWhen you supply
-icmd=…
, the app switches to “headless” mode and the high-levelduration
parameter is explicitly ignored (see the app help page).timeoutPolicyByExecutable
ortimeoutPolicy
are therefore the only supported ways to enforce a wall-clock limit for scripted JupyterLab/Spark launches.TL;DR
Replace
app-dxjupyterlab_spark_cluster
with the real 24-character app ID (or use the simplertimeoutPolicy
form) and the timeout will be honoured.Thank you!
This helped solving my problem and now the job stops at the set time limit.
However, when I type
dx describe <job-id> --json | jq '.timeoutPolicyByExecutable'
I do still getnull
. Note that I type this command when the job is already running.Thank you,
Gabriele
Glad this worked. Can you paste the whole of `dx describe <job-id> --json` (with anything sensitive redacted)? I'm not familiar with the output, but I wonder if timeoutPolicyByExecutable is the right key.
Yeah, sure! here it is. This is an example of an analysis to which I gave a time limit of 10 minutes.
I hope it helps. Thanks you!
"timeout": 600000,
← This looks like what you want?Thank you a lot Dr. Mc. Ninja!!
Yes, this is what I wanted. It is good to know that time is expressed in milliseconds. Also, I would suggest updating the documentation page, specifying that to get the time limits the right key to use is
timeout
, that is:dx describe <job-id> --json | jq '.timeout'
Best,
Gabriele
Glad it helped.
Yeah, in theory the gitbook documentation is somewhere in github and can be improved by the community, however, I've not found the right repo to make a PR against… I forget what I wanted to change. Perhaps someone from RAP can do it :-)
Please sign in to leave a comment.