DRAGEN WGS BGEN files use 16-bit probabilities that are incompatible with Hail
WGS DRAGEN BGEN files were recently released on UKBB.
Using HAIL 2.4 1 on a Spark cluster, I generated Hail index files using the following command in Python:
hl.index_bgen(path=file_url,
index_file_map={file_url:f"hdfs:///{filename}.idx2"},
reference_genome="GRCh38",
skip_invalid_loci=False)
After the index creation completed successfully, I loaded the bgens into Hail matrix table using import_bgen:
mt = hl.import_bgen(path=bgen_file_map,
entry_fields=['GT'],
sample_file=f"file://{bgen_path}/ukb24309_c1_b0_v1.sample",
n_partitions=None,
block_size=None,
index_file_map=index_file_map,
variants=annotation_ht,)
Finally, I computed an aggregate score using hl.agg.sum over all samples and a subset of variants.
When I tried to run this computation and write out a dataframe, I receive the following error:
FatalError: HailException: Hail only supports 8-bit probabilities, found 16.
This occurs whether or not I load in the ‘GP’ entry field in Hail. It also occurs on Hail 2.3.1. Reviewing the Hail source code and documentation, it appears this is a fundamental limitation of Hail, which means the WGS DRAGEN BGEN files may not be usable by Hail.
I began a conversion to 8-bit BGENS with qctools, but it is very slow and not feasible to run across all the UKBB WGS data.
Has anyone else observed this issue and found a workaround?
Alternatively, are there plans to release 8-bit BGEN files of the DRAGEN WGS samples?
Comments
1 comment
Hi Andrew
We do not have plans to release more versions of the DRAGEN WGS BGENs.
Over to the community for advise on using HAIL.
Hope this helps!
George
Please sign in to leave a comment.