Introduction to Workflow Description Language (WDL)

Dan N
Dan N The helpers that keep the community running smoothly. UKB Community team Communications Team
  • Updated

About this article

This article explains Workflow Description Language (WDL) and how to use it on UK Biobank’s Research Analysis Platform (UKB-RAP), including how to write WDL scripts and run WDL workflows.

Note: Example code and screenshots in this article are taken from this DNAnexus tutorial. We recommend watching the tutorial for additional context.

 

Why use WDL

WDL is a human-readable language for describing data processing workflows. It’s portable and can be executed in various environments, including local machines, High-Performance Computing (HPC) systems, and cloud platforms.

WDL workflows are defined in a single script that includes both the individual tasks and the workflow itself. Once written, the script can be compiled and executed.

The key benefits of WDL:

  • Works across a range of environments, from local machines (for example using miniWDL) to cloud platforms
  • Handles routine tasks like file transfers and Docker management
  • Supports dynamic resource allocation based on task requirements
  • Makes it easy to share and reuse workflows across projects or teams

 

Writing WDL Scripts

WDL scripts are made up of tasks and workflows.

Tasks: these are equivalent to applets and contain inputs, outputs, and a command section.

Workflows: contain tasks, inputs, and outputs, and can call tasks and use their outputs as inputs for other tasks.

Each WDL script consists of a single workflow which can contain multiple tasks. WDL also supports customisation features like scatter-gather and conditionals.

In the below example, the WDL script:

  1. Transforms a CRAM file into a BAM file.
  2. Slices the BAM file into multiple BAM files for each chromosome.
  3. Counts the number of alignments in each BAM file, creating an output array of integers where each element represents number of alignments that were counted in the BAM.
WDL example 1.png

 

In the above example, the version of WDL is visible at the beginning of the code. If the version is not specified, the compiler will assume you’re using the latest version.

The inputs for the workflow include:

  • a CRAM file
  • a CRAM index file
  • a reference fasta file
  • a reference fasta index file
  • the number of chromosomes

All the inputs are mandatory, except number of chromosomes, denoted by a “?” after “Int”.  The type of input must also be stated explicitly.

After input specification, the order and inputs/outputs for the tasks are specified.

In the screenshot above you can see the input/output specification for each task and which commands have been run in each.

The ‘command’ section is used to perform the actual computation, while the ‘runtime’ section represents the minimum requirements needed to run a task and the conditions under which a task should be interpreted as a failure or success.

Finally, you can see that the output of the workflow is an array of integers, representing the count of alignments for each chromosome.

 

WDL Syntax

WDL supports a range of data types:

  • Primitive types: including String, Boolean, Int, Float, File
  • Compound types: Array, Map, Pair
  • Custom types: Structs

Inputs and outputs can be:

  • Optional, marked with a ?
  • Required to be non-empty, marked with a +
  • Default values, set using the select_first() function

The command section is written using the keyword ‘command’ with <<< >>>. It uses bash syntax, and variables are written with a tilde and curly brackets (e.g. ~{variable}).

The task runtime attribute defines the resources a task needs. Several keys can be used for this, including docker, cpu, memory and disks.

You can also add metadata to make your code easier to understand:

  • Meta provides general metadata about the task or workflow (e.g. who wrote it, contact details)
  • Parameter meta gives extra detail about specific inputs (eg. usage tips or explanations)

These metadata sections are not interpreted by the compiler - they’re purely to support you and your colleagues in reading and reusing code.

 

Running WDL Workflows on UKB-RAP

You can run WDL scripts on UKB-RAP using dxCompiler, a Java-based tool that converts it into a format that runs natively on the platform.

This process involves:

  • Turning WDL into applets and workflows
  • Managing software dependencies using e.g. Docker images
  • Handling debugging

Running WDL workflows on the UKB-RAP is a two-step process:

1) Compile WDL file

The compilation process uses dxCompiler to translate WDL scripts into native apps (relating to individual tasks) and workflows.

Example code:

java -jar dxCompiler-2.8.3.jar compile view_and_count.wdl \
--inputs view_and_count.input.json \
--project $PROJECT \
--destination /compiled_workflow/

Example output:

WDL example 2.png

 

2) Execute WDL workflow

This involves running the compiled workflow with inputs using the dx run command.

Example code:

dx run –f view_and_count.input.dxanexus.input.dx.json \
workflow-XXXX \
--destination /output_folder/

Example output:

WDL example 3.png

 

Helpful Resources

 

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.