Set time limit policy on dx run jobs with 'cmd' input and how to check time limits

13 May 2025 01:31
7 comments

Hi!

I am trying to run some GWAS analysis on Hail in the UK Biobank RAP. To make things automatised I am using the Spark JupiterLab app through the dx run command. I could not find much information on how to set up time limits in this case (since I would like to be in full control of what I am running).

From the manual page here: https://platform.dnanexus.com/app/app-dxjupyterlab_spark_cluster

It is specified that this is how to run a dxjupyterlab or dxjupyterlab_spark_cluster command:

my_cmd="papermill notebook.ipynb output_notebook.ipynb"
dx run dxjupyterlab -icmd="$my_cmd" -iin="notebook.ipynb"

It is also specified that :

the "duration" argument will be ignored when running the app with "cmd". The app can be run from commandline with the --extra-args flag to limit the runtime, e.g. "dx run dxjupyterlab --extra-args '{"timeoutPolicyByExecutable": {"app-xxxx":{"*": {"hours": 1}}}}'".

I tried using this command to run my notebook (including the timeoutPolicyByExecutable ):

my_cmd="papermill notebook.ipynb output_notebook.ipynb"
dx run dxjupyterlab_spark_cluster -icmd="$my_cmd" -iin="notebook.ipynb" --instance-type "mem1_ssd1_v2_x16" --tag="test1" --priority "high" --name "test1" --extra-args '{"timeoutPolicyByExecutable": {"app-dxjupyterlab_spark_cluster":{"*": {"minutes": 5}}}}'

But the job kept running beyond the 5 minutes limit I set up. Moreover, when I look for information on the timeoutpolicy in the json of the running job by typing:

dx describe job-xxxx --json | jq '.timeoutPolicyByExecutable'

I do get a null as output.

What am I doing wrong? Is there some clear information around on this?

Thanks

Comments

7 comments

Dr. Mc. Ninja
- 16 May 2025 09:03
Hi Gabriele Maria Sgarlata

Here is a suggestion from my assistant. Let me know if it's any help or not. Plesae bear in mind that, while enthusiastic, she's a bit confused by the amount of documentation I give her to process ;-)

The flag itself is fine — the problem is the key you pass inside timeoutPolicyByExecutable.
- timeoutPolicyByExecutable expects the executable ID (app-xxxxxxxxxxxxxxxxxxxxxxxx) as the first-level key, not the human-readable name dxjupyterlab_spark_cluster.
  The platform validates the key against the regex ^(app|applet)-[0-9A-Za-z]{24}$ and silently ignores anything that does not match, so the policy you sent is dropped and the job inherits the default 30-day timeout .
- In the docs the working examples always use real IDs, e.g. the REGENIE snippet (note the long hexadecimal IDs) .
- At runtime, the policy is looked up by ID, then by entry-point name, exactly as described in the “Specifying Job Timeouts” section .
  Because no key matched, dx describe … | jq '.timeoutPolicyByExecutable' quite correctly prints null.
How to apply a timeout that actually works
```
# 1. Get the ID of the executable you are launching
APP_ID=$(dx describe dxjupyterlab_spark_cluster --json | jq -r .id)
# returns something like app-G1PQbYj0f9Yk8Zy061p0Q123

# 2. Build your JSON with that ID
EXTRA='{"timeoutPolicyByExecutable": {"'"$APP_ID"'": {"*": {"minutes": 5}}}}'

# 3. Launch
my_cmd='papermill notebook.ipynb output_notebook.ipynb'
dx run dxjupyterlab_spark_cluster \
  -icmd="$my_cmd" -iin=notebook.ipynb \
  --instance-type mem1_ssd1_v2_x16 \
  --tag test1 --priority high --name test1 \
  --extra-args "$EXTRA"
```
- Five minutes after the container starts, the root JupyterLab job will be terminated with JobTimeoutExceeded.
- Any sub-jobs it spawns inherit the same 5-minute limit unless they carry their own override.
- You can confirm the policy was set with
  dx describe <job-id> --json | jq '.timeoutPolicyByExecutable'.
Alternative shortcuts
- Limit just this one job instead of the whole tree:
```
--extra-args '{"timeoutPolicy":{"*":{"minutes":5}}}'
```
  Here you omit the executable ID completely; the policy is attached directly to the job’s own timeoutPolicy field .
- Use hours or days – any combination of "days", "hours", "minutes" is accepted.
Why the CLI --duration flag does not help
When you supply -icmd=…, the app switches to “headless” mode and the high-level duration parameter is explicitly ignored (see the app help page).
timeoutPolicyByExecutable or timeoutPolicy are therefore the only supported ways to enforce a wall-clock limit for scripted JupyterLab/Spark launches.
TL;DR
Replace app-dxjupyterlab_spark_cluster with the real 24-character app ID (or use the simpler timeoutPolicy form) and the timeout will be honoured.
0
Gabriele Maria Sgarlata
- 21 May 2025 01:27
Thank you!
This helped solving my problem and now the job stops at the set time limit.
However, when I type dx describe <job-id> --json | jq '.timeoutPolicyByExecutable' I do still get null. Note that I type this command when the job is already running.
Thank you,
Gabriele

0
Dr. Mc. Ninja
- 21 May 2025 15:04
Glad this worked. Can you paste the whole of `dx describe <job-id> --json` (with anything sensitive redacted)? I'm not familiar with the output, but I wonder if timeoutPolicyByExecutable is the right key.

0

Gabriele Maria Sgarlata

21 May 2025 17:12

Yeah, sure! here it is. This is an example of an analysis to which I gave a time limit of 10 minutes.

I hope it helps. Thanks you!

{
    "id": "job-J0g0K08Jy7K0q8K0qQ976KqK",
    "region": "aws:eu-west-2",
    "name": "chr22_bgen2mt",
    "tags": [
        "chr22_bgen2mt"
    ],
    "properties": {},
    "executable": "app-GykQjbQ0k912pQjJ1bJ9z88J",
    "executableName": "dxjupyterlab_spark_cluster",
    "class": "job",
    "created": 1747846913718,
    "modified": 1747847060522,
    "project": "project-xxx",
    "billTo": "",
    "costLimit": null,
    "invoiceMetadata": null,
    "folder": "/",
    "parentJob": null,
    "originJob": "job-J0g0K08Jy7K0q8K0qQ976KqK",
    "parentAnalysis": null,
    "analysis": null,
    "stage": null,
    "rootExecution": "job-J0g0K08Jy7K0q8K0qQ976KqK",
    "state": "running",
    "function": "main",
    "workspace": "container-J0g0K10J2VV3fxX3X0j5GF38",
    "launchedBy": "user-xxxxx",
    "detachedFrom": null,
    "priority": "high",
    "workerReuseDeadlineRunTime": {
        "state": "reuse-off",
        "waitTime": -1,
        "at": -1
    },
    "dependsOn": [],
    "singleContext": false,
    "failureCounts": {},
    "stateTransitions": [
        {
            "newState": "runnable",
"setAt": 1747846917146
        },
        {
            "newState": "running",
            "setAt": 1747847060098
        }
    ],
    "ignoreReuse": true,
    "httpsApp": {
        "ports": [
            443,
            8081
        ],
        "shared_access": "NONE",
        "dns": {
            "url": "https://job-J0g0K08Jy7K0q8K0qQ976KqK.dnanexus.cloud"
        },
        "enabled": true,
        "loginURL": "https://ukbiobank.dnanexus.com/login",
        "isolatedBrowsing": false
    },
    "rank": 0,
    "parentJobTry": null,
    "detachedFromTry": null,
    "tryCreated": 1747846913718,
    "details": {},
    "systemRequirements": {
        "*": {
            "instanceType": "mem1_ssd1_v2_x16",
            "nvidiaDriver": "R535",
            "clusterSpec": {
                "type": "spark",
                "version": "3.5.2",
                "initialInstanceCount": 2,
"bootstrapScript": "#!/usr/bin/env bash\n\nset -e -o pipefail\n\n# Requires bash version 4\n# Uncomment particular regions once the app is enabled in them\ndeclare -A TARGET_PROJECTS\ndeclare -r IMAGES_ENV_FILE='/home/dnanexus/images.env'\ndeclare -r OLDEST_SUPPORTED_SNAPSHOT_APP_VERSION=2.0.0\n\nTARGET_PROJECTS[azure:westus]=\"App and Applet Assets Azure\"\nTARGET_PROJECTS[aws:us-east-1]=\"App and Applet Assets\"\nTARGET_PROJECTS[aws:eu-central-1]=\"App and Applet Assets Germany\"\nTARGET_PROJECTS[aws:eu-west-2]=\"App and Applet Assets London\"\nTARGET_PROJECTS[aws:eu-west-2-g]=\"App and Applet Assets Europe (London)\"\nTARGET_PROJECTS[aws:me-south-1]=\"App and Applet Assets Bahrain\"\nTARGET_PROJECTS[azure:uksouth-ofh]=\"App and Applet Assets OFH-TRE (London)\"\n# Adding VEP involves two steps:\n# - downloading ensemblorg/ensembl-vep Docker image, loading it, and tagging. The loaded image\n#   will have the \"latest\" set by default, a tag with a descriptive version should be additionally set.\n# - downloading and unpacking the cache files used by vep and storing them in /cluster/vep\n# The downloaded tarballs have to be available in public DNAnexus projects \"App and Applet Assets\"\n\nfunction locate_image() {\n    # returns path to the image used by docker\n    # the flavor image paths are in variables sourced from images.env\n    # if a snapshot is provided it has priority\n\n    # use snapshot path if provided\n    if [[ -n $snapshot ]]; then\n        echo \"$snapshot\"\n    else\n        # selecting flavor\n        local DOCKER_IMAGE=$spark_cluster_image\n        # apps and applets have different env variables for resource location\n        if [[ \"$DX_RESOURCES_ID\" != \"\" ]]; then\n            local DX_ASSETS_ID=\"$DX_RESOURCES_ID\"\n        else\n            # in apps images are on the top level in resource container,\n            # in applets they are in a folder inside the project\n            local DX_ASSETS_ID=\"$DX_PROJECT_CONTEXT_ID\"\n            local DIR_NAME=${DOCKER_IMAGE//_*//} # dxjupyterlab-r_1.2.3 -> dxjupyterlab-r/\n            local DOCKER_IMAGE=$DIR_NAME$DOCKER_IMAGE\n        fi\n\n        echo \"$DX_ASSETS_ID:$DOCKER_IMAGE\"\n    fi\n}\n\nversion_comp() {\n    # converts semantic version with three numbers to format that can be compared with test -lt\n    # if it is not a three number semantic version return 000000000\n    version=$(echo \"$1\" | sed 's/\\([0-9]*\\.[0-9]*\\.[0-9]*\\).*+build.*/\\1/' | awk -F. '{ printf \"%d.%d.%d\\n\", $1, $2, $3 }')\n    if [[ $version =~ ^[0-9]+\\.[0-9]+\\.[0-9]+$ ]]; then\n        # shellcheck disable=SC2046,SC2183\n        printf \"%03d%03d%03d\" $(echo \"$version\" | tr '.' ' ')\n    else\n        echo \"000000000\"\n    fi\n}\n\ncheck_snapshot(){\n    # if using snapshot, check that it is not from a too old app version\n    if [[ -n $snapshot ]] ; then\n        snapshot_app_version=$(dx describe \"$snapshot\" --json | jq -r '.details.app_version')\n        current_app_version=$(cat /home/dnanexus/dnanexus-executable.json | jq -r '.version')\n        if [[ 10#$(version_comp \"$snapshot_app_version\") -lt 10#$(version_comp $OLDEST_SUPPORTED_SNAPSHOT_APP_VERSION) ]]; then\n        echo \"AppError: Cannot use snapshot created by app version $snapshot_app_version with this version of the app ('$current_app_version'). Please use a different version of the app with this snapshot, or re-create the snapshot using this version of the app and use it with this version of the app.\"\n        dx-jobutil-report-error \"AppError: Cannot use snapshot created by app version $snapshot_app_version with this version of the app ('$current_app_version'). Please use a different version of the app with this snapshot, or re-create the snapshot using this version of the app and use it with this version of the app.\"\n        exit 1\n        fi\n    fi\n}\n\nload_docker(){\n    # load docker image\n    IMG_LOCATION=$(locate_image)\n    echo \"Loading the Docker image: $IMG_LOCATION\"\n    out=$(dx cat \"$IMG_LOCATION\" | docker load)\n    echo \"$out\"\n    imagename=${out#Loaded image: }\n    echo -e \"\\nexport imagename=$imagename\" >>/cluster/dx-cluster.environment\n}\n\nsetup_java_python(){\n    # take env from conda of image to host\n    docker run -v /opt:/scratch $imagename cp -r /opt/conda /scratch\n    source /opt/conda/bin/activate\n    export PYSPARK_PYTHON=/opt/conda/bin/python\n    export PYSPARK_DRIVER_PYTHON=/opt/conda/bin/python\n\n    # because we can not overwrite java in /cluster/hadoop/etc/hadoop/hadoop-env.sh\n    # which is generated by this script: https://github.com/dnanexus/nucleus/blob/6e6f607a3576c07eaa5dc1cd71f06506f1490360/tip/packages/ubuntu/custom/cluster-pkg/cluster/hadoop/etc/hadoop/hadoop-env.sh#L33\n    # so force to create new symlink\n    ln -sf /opt/conda/bin/java /usr/bin/java \n    echo -e \"\\nexport JAVA_HOME=/opt/conda\" >>/cluster/dx-cluster.environment\n}\n\n# restart java\nrestart_java_service(){\n    setup_java_python\n    restart_worker(){\n        echo \"Stopping Spark HDFS worker\"\n        $HADOOP_HOME/bin/hdfs --config \"${HADOOP_CONF_DIR}\" --daemon stop datanode\n        $SPARK_HOME/sbin/stop-worker.sh spark://master:$SPARK_MASTER_PORT\n\n        echo \"Restarting Spark HDFS worker\"\n        source /cluster/dx-cluster.environment\n        $HADOOP_HOME/bin/hdfs --config \"${HADOOP_CONF_DIR}\" --daemon start datanode\n        $SPARK_HOME/sbin/start-worker.sh spark://master:$SPARK_MASTER_PORT\n    }\n\n    restart_master(){\n        echo \"Stopping Spark HDFS head node\"\n        $HADOOP_HOME/bin/hdfs --config \"${HADOOP_CONF_DIR}\" --daemon stop namenode\n        $SPARK_HOME/sbin/stop-master.sh --properties-file $SPARK_CONF_DIR/spark-standalone.conf\n\n        echo \"Restarting Spark HDFS head node\"\n        source /cluster/dx-cluster.environment\n        $HADOOP_HOME/bin/hdfs --config \"${HADOOP_CONF_DIR}\" --daemon start namenode\n        $SPARK_HOME/sbin/start-master.sh --properties-file $SPARK_CONF_DIR/spark-standalone.conf\n        if [ $DNAX_INSTANCE_COUNT -eq 1 ]; then\n            restart_worker\n        fi\n    }\n  \n    if [ -z \"$DX_CLUSTER_MASTER_IP\" ]; then\n        echo \"Restarting Spark HDFS head node\"\n        restart_master\n    else\n        echo \"Restarting Spark HDFS worker...\"\n        restart_worker\n    fi\n}\n\ninstall_vep() {\n    # Both sha and tag need to be updated when updating VEP\n    VEP_DOCKER_SHA=bc0984bf18c78c968e5cfe59e819cedbeed41f4b564ec3750dd7843b51a63dfb\n    VEP_DOCKER_VERSION=1.0.9\n    echo \"VEP version: $VEP_DOCKER_VERSION\"\n\n    REGION=$(cat /home/dnanexus/dnanexus-job.json | jq -r .region)\n\n    echo \"Downloading and loading the VEP Docker image..\"\n    DOCKER_TARBALL=docker_dnanexus_vep_$VEP_DOCKER_VERSION.tar.gz\n    dx cat \"${TARGET_PROJECTS[$REGION]}:/jupyterlab/vep/$DOCKER_TARBALL\" | docker load &\n\n    echo \"Downloading and unpacking VEP cache file..\"\n    dx cat \"${TARGET_PROJECTS[$REGION]}:/jupyterlab/vep/homo_sapiens_vep_GRCh38_${VEP_DOCKER_VERSION}.tar.gz\" | tar zxf - -C /cluster/\n\n    echo \"Downloading and unpacking LOFTEE plugin files.\"\n    gzip -d /cluster/vep/loftee.sql.gz\n    gzip -d /cluster/vep/human_ancestor.fa.gz\n    wait\n    docker tag dnanexus/dxjupyterlab-vep:$VEP_DOCKER_VERSION dnanexus/dxjupyterlab-vep:latest\n\n    chmod a+rwx /cluster/vep\n}\n\n# cluster-adv-pkg debian package will contain any third party libraries that needs to be installed\n# in all cluster nodes. Glow is one such example. The package has the following folder structure.\n#   /cluster-adv/third-party/<feature>/<feature-version>/*\n# Based on the feature, we might have to follow specific installation steps. In the case of glow,\n# it requires that all the glow related jars to be copied to spark classpath.\ninstall_glow_or_hail() {\n\n    JOB_INPUT_FILE=/home/dnanexus/job_input.json\n    sudo apt-get install cluster-adv-pkg\n\n    if [ -f \"$JOB_INPUT_FILE\" ]; then\n        feature=\"$(jq .feature --raw-output $JOB_INPUT_FILE)\"\n        echo \"Selected feature input is $feature. Installing..\"\n        if [ \"${feature}\" == 'GLOW' ]; then\n            /cluster-adv/third-party/glow/install.sh 2.0.0\n        elif [ \"${feature}\" == 'HAIL' ]; then\n            /cluster-adv/third-party/hail/install.sh 0.2.132\n        elif [ \"${feature}\" == 'HAIL-VEP' ]; then\n            /cluster-adv/third-party/hail/install.sh 0.2.132\n            install_vep\n        fi\n    fi\n}\n\n\nif [ \"$DX_JOB_ID\" != \"\" ]; then\n    # Attach log from prebootstrap script to the job log\n    if [[ -f $PRE_BOOTSTRAP_LOG ]]; then\n        cat $PRE_BOOTSTRAP_LOG\n    fi\n    source $IMAGES_ENV_FILE\n    load_docker\n    check_snapshot\n    setup_java_python\n    echo \"Executing bootstrap script on all nodes of the cluster\"\n    restart_java_service || echo \"Failed to restart Java service, continuing...\"\n    install_glow_or_hail\n    /cluster/dx-cluster.sh hdfs-enable-dnax\n    echo \"Done executing bootstrap script\"\nfi\n",
                "ports": "9000, 40000-55000"
            }
        }
    },
    "executionPolicy": {
        "restartOn": {
            "UnresponsiveWorker": 2,
            "JMInternalError": 1,
            "ExecutionError": 1
        }
    },
    "instanceType": "mem1_ssd1_v2_x16",
    "finalPriority": "high",
    "networkAccess": [
        "*"
    ],
    "runInput": {
        "cmd": "",
        "in": [
            {
                "$dnanexus_link": {
                    "project": "project-xxx",
"id": "file-J0bxp2QJy7K82gKpy5bXkBfB"
                }
            }
        ]
    },
    "originalInput": {
        "cmd": "",
        "in": [
            {
                "$dnanexus_link": {
                    "project": "project-xxx",
                    "id": "file-J0bxp2QJy7K82gKpy5bXkBfB"
                }
            }
        ],
        "duration": 240,
        "feature": "HAIL"
    },
    "input": {
        "cmd": "",
        "in": [
            {
                "$dnanexus_link": "file-J0bxp2QJy7K82gKpy5bXkBfB"
            }
        ],
        "duration": 240,
        "feature": "HAIL"
    },
    "output": null,
    "clusterSlaves": [
        {
            "host": "ec2-18-170-99-229.eu-west-2.compute.amazonaws.com",
            "sshPort": 22,
            "internalIp": "10.60.33.38"
        }
    ],
    "host": "ec2-13-40-105-76.eu-west-2.compute.amazonaws.com",
    "debug": {},"app": "app-GykQjbQ0k912pQjJ1bJ9z88J",
    "resources": "container-GykQjbQJ3PgXpQjJ1bJ9z88j",
    "projectCache": "container-Gykvj90Jy7K5yQyqXx9FbZF4",
    "startedRunning": 1747846993000,
    "delayWorkspaceDestruction": false,
    "clusterID": "cluster-J0g0K08Jy7K0q8K0qQ976KqG",
    "clusterSpec": {
        "type": "spark",
        "version": "3.5.2",
        "initialInstanceCount": 2,
        "bootstrapScript": "#!/usr/bin/env bash\n\nset -e -o pipefail\n\n# Requires bash version 4\n# Uncomment particular regions once the app is enabled in them\ndeclare -A TARGET_PROJECTS\ndeclare -r IMAGES_ENV_FILE='/home/dnanexus/images.env'\ndeclare -r OLDEST_SUPPORTED_SNAPSHOT_APP_VERSION=2.0.0\n\nTARGET_PROJECTS[azure:westus]=\"App and Applet Assets Azure\"\nTARGET_PROJECTS[aws:us-east-1]=\"App and Applet Assets\"\nTARGET_PROJECTS[aws:eu-central-1]=\"App and Applet Assets Germany\"\nTARGET_PROJECTS[aws:eu-west-2]=\"App and Applet Assets London\"\nTARGET_PROJECTS[aws:eu-west-2-g]=\"App and Applet Assets Europe (London)\"\nTARGET_PROJECTS[aws:me-south-1]=\"App and Applet Assets Bahrain\"\nTARGET_PROJECTS[azure:uksouth-ofh]=\"App and Applet Assets OFH-TRE (London)\"\n# Adding VEP involves two steps:\n# - downloading ensemblorg/ensembl-vep Docker image, loading it, and tagging. The loaded image\n#   will have the \"latest\" set by default, a tag with a descriptive version should be additionally set.\n# - downloading and unpacking the cache files used by vep and storing them in /cluster/vep\n# The downloaded tarballs have to be available in public DNAnexus projects \"App and Applet Assets\"\n\nfunction locate_image() {\n    # returns path to the image used by docker\n    # the flavor image paths are in variables sourced from images.env\n    # if a snapshot is provided it has priority\n\n    # use snapshot path if provided\n    if [[ -n $snapshot ]]; then\n        echo \"$snapshot\"\n    else\n        # selecting flavor\n        local DOCKER_IMAGE=$spark_cluster_image\n        # apps and applets have different env variables for resource location\n        if [[ \"$DX_RESOURCES_ID\" != \"\" ]]; then\n            local DX_ASSETS_ID=\"$DX_RESOURCES_ID\"\n        else\n            # in apps images are on the top level in resource container,\n            # in applets they are in a folder inside the project\n            local DX_ASSETS_ID=\"$DX_PROJECT_CONTEXT_ID\"\n            local DIR_NAME=${DOCKER_IMAGE//_*//} # dxjupyterlab-r_1.2.3 -> dxjupyterlab-r/\n            local DOCKER_IMAGE=$DIR_NAME$DOCKER_IMAGE\n        fi\n\n        echo \"$DX_ASSETS_ID:$DOCKER_IMAGE\"\n    fi\n}\n\nversion_comp() {\n    # converts semantic version with three numbers to format that can be compared with test -lt\n    # if it is not a three number semantic version return 000000000\n    version=$(echo \"$1\" | sed 's/\\([0-9]*\\.[0-9]*\\.[0-9]*\\).*+build.*/\\1/' | awk -F. '{ printf \"%d.%d.%d\\n\", $1, $2, $3 }')\n    if [[ $version =~ ^[0-9]+\\.[0-9]+\\.[0-9]+$ ]]; then\n        # shellcheck disable=SC2046,SC2183\n        printf \"%03d%03d%03d\" $(echo \"$version\" | tr '.' ' ')\n    else\n        echo \"000000000\"\n    fi\n}\n\ncheck_snapshot(){\n    # if using snapshot, check that it is not from a too old app version\n    if [[ -n $snapshot ]] ; then\n        snapshot_app_version=$(dx describe \"$snapshot\" --json | jq -r '.details.app_version')\n        current_app_version=$(cat /home/dnanexus/dnanexus-executable.json | jq -r '.version')\n        if [[ 10#$(version_comp \"$snapshot_app_version\") -lt 10#$(version_comp $OLDEST_SUPPORTED_SNAPSHOT_APP_VERSION) ]]; then\n        echo \"AppError: Cannot use snapshot created by app version $snapshot_app_version with this version of the app ('$current_app_version'). Please use a different version of the app with this snapshot, or re-create the snapshot using this version of the app and use it with this version of the app.\"\n        dx-jobutil-report-error \"AppError: Cannot use snapshot created by app version $snapshot_app_version with this version of the app ('$current_app_version'). Please use a different version of the app with this snapshot, or re-create the snapshot using this version of the app and use it with this version of the app.\"\n        exit 1\n        fi\n    fi\n}\n\nload_docker(){\n    # load docker image\n    IMG_LOCATION=$(locate_image)\n    echo \"Loading the Docker image: $IMG_LOCATION\"\n    out=$(dx cat \"$IMG_LOCATION\" | docker load)\n    echo \"$out\"\n    imagename=${out#Loaded image: }\n    echo -e \"\\nexport imagename=$imagename\" >>/cluster/dx-cluster.environment\n}\n\nsetup_java_python(){\n    # take env from conda of image to host\n    docker run -v /opt:/scratch $imagename cp -r /opt/conda /scratch\n    source /opt/conda/bin/activate\n    export PYSPARK_PYTHON=/opt/conda/bin/python\n    export PYSPARK_DRIVER_PYTHON=/opt/conda/bin/python\n\n    # because we can not overwrite java in /cluster/hadoop/etc/hadoop/hadoop-env.sh\n    # which is generated by this script: https://github.com/dnanexus/nucleus/blob/6e6f607a3576c07eaa5dc1cd71f06506f1490360/tip/packages/ubuntu/custom/cluster-pkg/cluster/hadoop/etc/hadoop/hadoop-env.sh#L33\n    # so force to create new symlink\n    ln -sf /opt/conda/bin/java /usr/bin/java \n    echo -e \"\\nexport JAVA_HOME=/opt/conda\" >>/cluster/dx-cluster.environment\n}\n\n# restart java\nrestart_java_service(){\n    setup_java_python\nrestart_worker(){\n        echo \"Stopping Spark HDFS worker\"\n        $HADOOP_HOME/bin/hdfs --config \"${HADOOP_CONF_DIR}\" --daemon stop datanode\n        $SPARK_HOME/sbin/stop-worker.sh spark://master:$SPARK_MASTER_PORT\n\n        echo \"Restarting Spark HDFS worker\"\n        source /cluster/dx-cluster.environment\n        $HADOOP_HOME/bin/hdfs --config \"${HADOOP_CONF_DIR}\" --daemon start datanode\n        $SPARK_HOME/sbin/start-worker.sh spark://master:$SPARK_MASTER_PORT\n    }\n\n    restart_master(){\n        echo \"Stopping Spark HDFS head node\"\n        $HADOOP_HOME/bin/hdfs --config \"${HADOOP_CONF_DIR}\" --daemon stop namenode\n        $SPARK_HOME/sbin/stop-master.sh --properties-file $SPARK_CONF_DIR/spark-standalone.conf\n\n        echo \"Restarting Spark HDFS head node\"\n        source /cluster/dx-cluster.environment\n        $HADOOP_HOME/bin/hdfs --config \"${HADOOP_CONF_DIR}\" --daemon start namenode\n        $SPARK_HOME/sbin/start-master.sh --properties-file $SPARK_CONF_DIR/spark-standalone.conf\n        if [ $DNAX_INSTANCE_COUNT -eq 1 ]; then\n            restart_worker\n        fi\n    }\n  \n    if [ -z \"$DX_CLUSTER_MASTER_IP\" ]; then\n        echo \"Restarting Spark HDFS head node\"\n        restart_master\n    else\n        echo \"Restarting Spark HDFS worker...\"\n        restart_worker\n    fi\n}\n\ninstall_vep() {\n    # Both sha and tag need to be updated when updating VEP\n    VEP_DOCKER_SHA=bc0984bf18c78c968e5cfe59e819cedbeed41f4b564ec3750dd7843b51a63dfb\n    VEP_DOCKER_VERSION=1.0.9\n    echo \"VEP version: $VEP_DOCKER_VERSION\"\n\n    REGION=$(cat /home/dnanexus/dnanexus-job.json | jq -r .region)\n\n    echo \"Downloading and loading the VEP Docker image..\"\n    DOCKER_TARBALL=docker_dnanexus_vep_$VEP_DOCKER_VERSION.tar.gz\n    dx cat \"${TARGET_PROJECTS[$REGION]}:/jupyterlab/vep/$DOCKER_TARBALL\" | docker load &\n\n    echo \"Downloading and unpacking VEP cache file..\"\n    dx cat \"${TARGET_PROJECTS[$REGION]}:/jupyterlab/vep/homo_sapiens_vep_GRCh38_${VEP_DOCKER_VERSION}.tar.gz\" | tar zxf - -C /cluster/\n\n    echo \"Downloading and unpacking LOFTEE plugin files.\"\n    gzip -d /cluster/vep/loftee.sql.gz\n    gzip -d /cluster/vep/human_ancestor.fa.gz\n    wait\n    docker tag dnanexus/dxjupyterlab-vep:$VEP_DOCKER_VERSION dnanexus/dxjupyterlab-vep:latest\n\n    chmod a+rwx /cluster/vep\n}\n\n# cluster-adv-pkg debian package will contain any third party libraries that needs to be installed\n# in all cluster nodes. Glow is one such example. The package has the following folder structure.\n#   /cluster-adv/third-party/<feature>/<feature-version>/*\n# Based on the feature, we might have to follow specific installation steps. In the case of glow,\n# it requires that all the glow related jars to be copied to spark classpath.\ninstall_glow_or_hail() {\n\n    JOB_INPUT_FILE=/home/dnanexus/job_input.json\n    sudo apt-get install cluster-adv-pkg\n\n    if [ -f \"$JOB_INPUT_FILE\" ]; then\n        feature=\"$(jq .feature --raw-output $JOB_INPUT_FILE)\"\n        echo \"Selected feature input is $feature. Installing..\"\n        if [ \"${feature}\" == 'GLOW' ]; then\n            /cluster-adv/third-party/glow/install.sh 2.0.0\n        elif [ \"${feature}\" == 'HAIL' ]; then\n            /cluster-adv/third-party/hail/install.sh 0.2.132\n        elif [ \"${feature}\" == 'HAIL-VEP' ]; then\n            /cluster-adv/third-party/hail/install.sh 0.2.132\n            install_vep\n        fi\n    fi\n}\n\n\nif [ \"$DX_JOB_ID\" != \"\" ]; then\n    # Attach log from prebootstrap script to the job log\n    if [[ -f $PRE_BOOTSTRAP_LOG ]]; then\n        cat $PRE_BOOTSTRAP_LOG\n    fi\n    source $IMAGES_ENV_FILE\n    load_docker\n    check_snapshot\n    setup_java_python\n    echo \"Executing bootstrap script on all nodes of the cluster\"\n    restart_java_service || echo \"Failed to restart Java service, continuing...\"\n    install_glow_or_hail\n    /cluster/dx-cluster.sh hdfs-enable-dnax\n    echo \"Done executing bootstrap script\"\nfi\n",
        "ports": "9000, 40000-55000"
    },
    "nvidiaDriver": "R535",
    "preserveJobOutputs": null,
    "detailedJobMetrics": false,
    "try": 0,
    "egressReport": {},
    "timeout": 600000,
    "treeTurnaroundTime": 190
}

Dr. Mc. Ninja
- 22 May 2025 10:38
"timeout": 600000, ← This looks like what you want?

0
Gabriele Maria Sgarlata
- 22 May 2025 23:12
Thank you a lot Dr. Mc. Ninja!!
Yes, this is what I wanted. It is good to know that time is expressed in milliseconds. Also, I would suggest updating the documentation page, specifying that to get the time limits the right key to use is timeout, that is:
dx describe <job-id> --json | jq '.timeout'
Best,
Gabriele

0
Dr. Mc. Ninja
- 22 May 2025 23:56
Glad it helped.
Yeah, in theory the gitbook documentation is somewhere in github and can be improved by the community, however, I've not found the right repo to make a PR against… I forget what I wanted to change. Perhaps someone from RAP can do it :-)

0

Please sign in to leave a comment.

Comments

How to apply a timeout that actually works

Alternative shortcuts

Why the CLI --duration flag does not help

Why the CLI `--duration` flag does not help