Advanced Workflow Description Language (WDL) Concepts and Docker

About this article

This article builds on the fundamentals covered in Introduction to WDL on the UK Biobank Research Analysis Platform.

Topics include:

specifying WDL workflow inputs and useful meta information
calling native DNAnexus apps
using subworkflows, WDL standard libraries, expressions and expression placeholders
dynamic resource allocation
WDL development best practices
effective project management
Docker integration.

Note: The majority of the example code and screenshots in this article are taken from this DNAnexus tutorial. We recommend watching the tutorial for additional context.

Specifying inputs for a workflow run

You can provide inputs for a workflow using the command line interface (CLI). The dx run command (part of the dx-toolkit) lets you enter each input directly from the command line.

Example CLI code:

dx run workflow-XXXX
-istage-common.plink_beds=/Bulk/…/ukb22418_c21_b0_v2.bed
-istage-common.plink_bims=/Bulk/…/ukb22418_c21_b0_v2.bim
-istage-common.plink_fams=/Bulk/…/ukb22418_c21_b0_v2.fam

Alternatively, you can use a JSON file to define inputs. This can be created using dxCompiler to transform a WDL JSON file to a DNAnexus JSON file.

Example JSON file:

{
  "stage-common.ref_fasta_index":
    "/reference_files/GRCh38_full_analysis_set_plus_decoy_hla.fa",
  "stage-common.ref_fasta":
    "/reference_files/GRCh38_full_analysis_set_plus_decoy_hla.fa",
  "stage-common.cram":
    "/Bulk/Exome sequences/Exome OQFE CRAM files/12/1234567.cram",
  "stage-common.cram_index":
    "/Bulk/Exome sequences/Exome OQFE CRAM files/12/1234567.cram.crai",
  "stage-common.num_chrom": 22
}

Meta and parameter meta

As mentioned in our introduction article, meta provides overall metadata about the task, while parameter meta gives extra detail about specific inputs. For example, on UKB-RAP, you can use meta to indicate that a task calls a native applet, and parameter meta to indicate which files to stream.

This is useful as sometimes you don't need to download the file completely before using it. In these cases, you can simply stream the file.

Reusing native apps

There may be cases where you want to reuse native apps you have already developed. You can do this by using the dxni subcommand of the dxCompiler to generate a WDL file which contains task wrappers for each of the native apps you have in the specified directory.

Example code:

java -jar dxCompiler.jar dnxi --project project-ID --folder /output --output dx_extern.wdl

Then you can use the generated tasks wrappers in a workflow by importing the generated WDL file, e.g. import dx_extern.wdl as lib. Alternatively, you can create task wrappers from scratch.

Subworkflows

Subworkflows let you break up large workflows into smaller, more manageable parts, making your code easier to read and maintain. While each WDL file can only contain one workflow, it can include multiple tasks. You can use the import statement to call other WDL workflows (subworkflows) in a parent workflow. For example, you might have separate workflows for variant calling and post-processing, both run in a single parent workflow.

Example code:

# Example of importing and using a subworkflow
import "variant_calling.wdl" as variant_calling
import "post_processing.wdl" as post_processing
 
workflow MainWorkflow {
  input {
    File input_bam
  }
  call variant_calling.VariantCalling {
    input: bam = input_bam
  }
  call post_processing.PostProcess {
    input: vcf = variant_calling.output_vcf
  }
}

Standard library

WDL provides built-in functions for reading/writing files, e.g. read_tsv() and write_tsv(), string manipulation, e.g. base_name(), array manipulation, e.g. select_first() and length(), and float manipulation, e.g. floor() and ceil().

Expressions and expression placeholders

WDL expressions let you customise tasks using operators and conditional logic. For example, you can change the output based on an input value:

Example code:

input {
    Boolean morning
}
String greeting = “Good” + if morning then “morning” else “afternoon”

Expression placeholders are used to refer to WDL variables within the command section of a task. The placeholder format depends on the command style:

With command <<< >>>, use ~{} for placeholders
With command { ... }, you can use either ~{} or ${}

It's best to use ~{} as it clearly separates WDL variables from Bash variables.

You can also use placeholders to set values conditionally, for example to set true and false values.

Example code:

#From ban2fastx.wdl command
command{
bam2fasta \
--output ~{outputPrefix} \
-c ~{compressionLevel} \
~{true="--split-barcodes" false="" splitByBarcode} \
~{"--seqid-prefix " + seqIdPrefix} \
$bamFiles
}

Dynamic resource allocation

Resource requirements are set at runtime, and so can be computed dynamically. You can use functions like size() and ceil() to estimate memory and disk requirements based on input files. For example, you might allocate resources based on input file sizes:

String memory = "~{512 + ceil(size([inputBed, faidx], 'M'))}M"

This example sets the memory to 512 MB plus the combined size (in MB) of inputBed and faidx, rounded up.

WDL best practices

When writing WDL workflows, it’s helpful to follow a few simple best practices, many of which are similar to general coding principles:

Reuse tested tasks from existing repositories rather than starting from scratch.
Keep things simple, especially when using conditionals.
Avoid repetition by modularising your code.
Use clear naming conventions and keep your code well-documented.
Where possible, make parameters optional and set sensible defaults.
Organise your files and maintain a clean project structure.

Project management

A good project structure helps to reinforce best practices. Aim to organise workflows and tasks into separate folders, document workflows, and use small files for testing.

Example project structure:

Simple Project Structure/
   ├── README.md
   ├── applets/
   ├── docs/
   ├── scripts/
   ├── tasks/
   │   ├── README.md
   │   └── tasks.wdl
   └── workflows/
       ├── workflow_1/
       │   ├── workflow_1.1:input.dx.json
       │   ├── workflow_1.1:input.json
       │   └── workflow_1.wdl
       └── workflow_2/
           ├── workflow_2.1:input.default.json
           ├── workflow_2.1:input.dx.json
           ├── workflow_2.1:input.json
           └── workflow_2.wdl

Docker and WDL workflows

Docker lets you run software in a consistent, reproducible environment across different systems — ideal for remote execution in WDL workflows. A Docker image packages an operating system, software, and all dependencies into a single container. It’s portable, reusable, and shareable (e.g. via BioContainers), and is the recommended way to manage dependencies on UKB-RAP.

To use Docker in a WDL task, simply specify a Docker image or snapshot within the task definition. These can be stored on the platform and reused across workflows.

Example Code:

runtime{
      docker: “dx://project-XXXXXXX:/docker_images/gatk_image.tar.gz”
}

Advanced Workflow Description Language (WDL) Concepts and Docker

About this article

Specifying inputs for a workflow run

Meta and parameter meta

Reusing native apps

Subworkflows

Standard library

Expressions and expression placeholders

Dynamic resource allocation

WDL best practices

Project management

Docker and WDL workflows

Was this article helpful?

Comments