Query about bgen to vcf conversion time on UKB RAP
I’ve been running a task on the UK Biobank RAP platform to convert a chromosome from bgen format to vcf format. It has been running for over a day now. I’m wondering if this is a normal duration for such a task? Has anyone experienced similar wait times or have advice on how long this process usually takes?
Any insights or suggestions on whether this duration seems reasonable would be greatly appreciated.
Thanks in advance!
Comments
2 comments
Hi Baozhuo,
Many UK Biobank BGEN fields are available as population VCFs (pVCF).
To understand the duration could you provide some details about what are running (program, data-field, instance and priority)
Thank you for getting in touch
Hi George,I encountered the same issue. Could you please tell me how you converted files like ukb21008_c22_b0_v1.bgen into VCF format? I’ve been experiencing the same problem where the process runs for a long time without producing any result.
Logging initialized (priority)
Logging initialized (bulk)
Downloading bundled file resources.tar.gz
>>> Unpacking resources.tar.gz to /
tar: Removing leading `/' from member names
Downloading bundled file qctool.tar.gz
>>> Unpacking qctool.tar.gz to /
tar: Removing leading `/' from member names
Downloading bundled file plato.tar.gz
>>> Unpacking plato.tar.gz to /
tar: Removing leading `/' from member names
Downloading bundled file bedtools.tar.gz
>>> Unpacking bedtools.tar.gz to /
tar: Removing leading `/' from member names
Downloading bundled file htslib.tar.gz
>>> Unpacking htslib.tar.gz to /
tar: Removing leading `/' from member names
Downloading bundled file java.tar.gz
>>> Unpacking java.tar.gz to /
tar: Removing leading `/' from member names
Downloading bundled file plink.tar.gz
>>> Unpacking plink.tar.gz to /
tar: Removing leading `/' from member names
Downloading bundled file r.tar.gz
>>> Unpacking r.tar.gz to /
tar: Removing leading `/' from member names
Downloading bundled file sambamba.tar.gz
>>> Unpacking sambamba.tar.gz to /
tar: Removing leading `/' from member names
Downloading bundled file seqtk.tar.gz
>>> Unpacking seqtk.tar.gz to /
tar: Removing leading `/' from member names
Downloading bundled file vcflib.tar.gz
>>> Unpacking vcflib.tar.gz to /
tar: Removing leading `/' from member names
Downloading bundled file vcftools.tar.gz
>>> Unpacking vcftools.tar.gz to /
tar: Removing leading `/' from member names
Downloading bundled file plink2.tar.gz
>>> Unpacking plink2.tar.gz to /
tar: Removing leading `/' from member names
Downloading bundled file regenie.tar.gz
>>> Unpacking regenie.tar.gz to /
tar: Removing leading `/' from member names
Downloading bundled file bolt-lmm_asset.tar.gz
>>> Unpacking bolt-lmm_asset.tar.gz to /
tar: Removing leading `/' from member names
Downloading bundled file bgen.tar.gz
>>> Unpacking bgen.tar.gz to /
tar: Removing leading `/' from member names
dxpy/0.383.1 (Linux-5.15.0-1070-aws-x86_64-with-glibc2.29) Python/3.8.10
bash running (job ID job-Gv37XK0JYp1jq5vgKK30Vk4z)
downloading file: file-GQK69v0JykJfg6p39kBpBKGY to filesystem: /home/dnanexus/in/in/0/ukb21008_c22_b0_v1.bgen
Oct 09 2024, 8:42 PM
Using dxfuse version v1.4.0
The log file is located at /root/.dxfuse/dxfuse.log
starting fs daemon
wait for ready
Daemon started successfully
Downloading files using 4 threads+ [[ '' == '' ]]
+ eval 'qctool -g ukb21008_c22_b0_v1.bgen -og myfile.vcf'
++ qctool -g ukb21008_c22_b0_v1.bgen -og myfile.vcf
Welcome to qctool
(version: 2.2.0, revision: unknown)
(C) 2009-2020 University of Oxford
Opening genotype files : [ ] (0/1,0.0s,0.0/s) Opening genotype files : [******************************] (1/1,0.0s,116.6/s) Opening genotype files : [******************************] (1/1,0.0s,106.3/s)
========================================================================
Input SAMPLE file(s): Output SAMPLE file: "(n/a)".
Sample exclusion output file: "(n/a)".
Input GEN file(s):
(4645893 snps) "ukb21008_c22_b0_v1.bgen (bgen v1.2; 488315 unnamed samples; zstd compression)"
(total 4645893 snps in 1 sources).
Number of samples: 488315
Output GEN file(s): "myfile.vcf"
Output SNP position file(s): (n/a)
Sample filter: .
# of samples in input files: 488315.
# of samples after filtering: 488315 (0 filtered out).
========================================================================
VCFFormatSNPDataSink::write_header(): FORMAT entries are:
##FORMAT=<ID=GP,Type=Float,Number=G,Description="Genotype call probabilities">
Oct 09 2024, 8:50 PM
CPU: 26% (4 cores) * Memory: 1262/7731MB * Storage: 26/324GB * Net: 37↓/0↑MBps
Oct 09 2024, 9:00 PM
CPU: 26% (4 cores) * Memory: 1270/7731MB * Storage: 29/324GB * Net: 0↓/0↑MBps
Oct 09 2024, 9:10 PM
CPU: 26% (4 cores) * Memory: 1262/7731MB * Storage: 31/324GB * Net: 0↓/0↑MBps
Oct 09 2024, 9:20 PM
CPU: 26% (4 cores) * Memory: 1265/7731MB * Storage: 34/324GB * Net: 0↓/0↑MBps
Oct 09 2024, 9:30 PM
CPU: 26% (4 cores) * Memory: 1283/7731MB * Storage: 37/324GB * Net: 0↓/0↑MBps
Oct 09 2024, 9:40 PM
CPU: 26% (4 cores) * Memory: 1266/7731MB * Storage: 40/324GB * Net: 0↓/0↑MBps
Please sign in to leave a comment.