About this article
This article explains Workflow Description Language (WDL) and how to use it on UK Biobank’s Research Analysis Platform (UKB-RAP), including how to write WDL scripts and run WDL workflows.
Note: Example code and screenshots in this article are taken from this DNAnexus tutorial. We recommend watching the tutorial for additional context.
Why use WDL
WDL is a human-readable language for describing data processing workflows. It’s portable and can be executed in various environments, including local machines, High-Performance Computing (HPC) systems, and cloud platforms.
WDL workflows are defined in a single script that includes both the individual tasks and the workflow itself. Once written, the script can be compiled and executed.
The key benefits of WDL:
- Works across a range of environments, from local machines (for example using miniWDL) to cloud platforms
- Handles routine tasks like file transfers and Docker management
- Supports dynamic resource allocation based on task requirements
- Makes it easy to share and reuse workflows across projects or teams
Writing WDL Scripts
WDL scripts are made up of tasks and workflows.
Tasks: these are equivalent to applets and contain inputs, outputs, and a command section.
Workflows: contain tasks, inputs, and outputs, and can call tasks and use their outputs as inputs for other tasks.
Each WDL script consists of a single workflow which can contain multiple tasks. WDL also supports customisation features like scatter-gather and conditionals.
In the below example, the WDL script:
- Transforms a CRAM file into a BAM file.
- Slices the BAM file into multiple BAM files for each chromosome.
- Counts the number of alignments in each BAM file, creating an output array of integers where each element represents number of alignments that were counted in the BAM.
In the above example, the version of WDL is visible at the beginning of the code. If the version is not specified, the compiler will assume you’re using the latest version.
The inputs for the workflow include:
- a CRAM file
- a CRAM index file
- a reference fasta file
- a reference fasta index file
- the number of chromosomes
All the inputs are mandatory, except number of chromosomes, denoted by a “?” after “Int”. The type of input must also be stated explicitly.
After input specification, the order and inputs/outputs for the tasks are specified.
In the screenshot above you can see the input/output specification for each task and which commands have been run in each.
The ‘command’ section is used to perform the actual computation, while the ‘runtime’ section represents the minimum requirements needed to run a task and the conditions under which a task should be interpreted as a failure or success.
Finally, you can see that the output of the workflow is an array of integers, representing the count of alignments for each chromosome.
WDL Syntax
WDL supports a range of data types:
- Primitive types: including String, Boolean, Int, Float, File
- Compound types: Array, Map, Pair
- Custom types: Structs
Inputs and outputs can be:
- Optional, marked with a ?
- Required to be non-empty, marked with a +
- Default values, set using the select_first() function
The command section is written using the keyword ‘command’ with <<< >>>. It uses bash syntax, and variables are written with a tilde and curly brackets (e.g. ~{variable}).
The task runtime attribute defines the resources a task needs. Several keys can be used for this, including docker, cpu, memory and disks.
You can also add metadata to make your code easier to understand:
- Meta provides general metadata about the task or workflow (e.g. who wrote it, contact details)
- Parameter meta gives extra detail about specific inputs (eg. usage tips or explanations)
These metadata sections are not interpreted by the compiler - they’re purely to support you and your colleagues in reading and reusing code.
Running WDL Workflows on UKB-RAP
You can run WDL scripts on UKB-RAP using dxCompiler, a Java-based tool that converts it into a format that runs natively on the platform.
This process involves:
- Turning WDL into applets and workflows
- Managing software dependencies using e.g. Docker images
- Handling debugging
Running WDL workflows on the UKB-RAP is a two-step process:
1) Compile WDL file
The compilation process uses dxCompiler to translate WDL scripts into native apps (relating to individual tasks) and workflows.
Example code:
java -jar dxCompiler-2.8.3.jar compile view_and_count.wdl \
--inputs view_and_count.input.json \
--project $PROJECT \
--destination /compiled_workflow/Example output:
2) Execute WDL workflow
This involves running the compiled workflow with inputs using the dx run command.
Example code:
dx run –f view_and_count.input.dxanexus.input.dx.json \
workflow-XXXX \
--destination /output_folder/Example output:
Helpful Resources
- Find sample WDL workflows on the UK Biobank GitHub
- Open WDL
- dxCompiler
Comments
0 comments
Please sign in to leave a comment.