Query about bgen to vcf conversion time on UKB RAP

Baozhuo Ai

I’ve been running a task on the UK Biobank RAP platform to convert a chromosome from bgen format to vcf format. It has been running for over a day now. I’m wondering if this is a normal duration for such a task? Has anyone experienced similar wait times or have advice on how long this process usually takes?

Any insights or suggestions on whether this duration seems reasonable would be greatly appreciated.

Thanks in advance!

Comments

2 comments

  • Comment author
    George F The helpers that keep the community running smoothly. UKB Community team Data Analyst

    Hi Baozhuo, 

    Many UK Biobank BGEN fields are available as population VCFs (pVCF). 

    To understand the duration could you provide some details about what are running (program, data-field, instance and priority)

    Thank you for getting in touch

    0
  • Comment author
    Shangyou Zheng

    Hi George,I encountered the same issue. Could you please tell me how you converted files like ukb21008_c22_b0_v1.bgen into VCF format? I’ve been experiencing the same problem where the process runs for a long time without producing any result.

    Logging initialized (priority)

    Logging initialized (bulk)

    Downloading bundled file resources.tar.gz

    >>> Unpacking resources.tar.gz to /

    tar: Removing leading `/' from member names

    Downloading bundled file qctool.tar.gz

    >>> Unpacking qctool.tar.gz to /

    tar: Removing leading `/' from member names

    Downloading bundled file plato.tar.gz

    >>> Unpacking plato.tar.gz to /

    tar: Removing leading `/' from member names

    Downloading bundled file bedtools.tar.gz

    >>> Unpacking bedtools.tar.gz to /

    tar: Removing leading `/' from member names

    Downloading bundled file htslib.tar.gz

    >>> Unpacking htslib.tar.gz to /

    tar: Removing leading `/' from member names

    Downloading bundled file java.tar.gz

    >>> Unpacking java.tar.gz to /

    tar: Removing leading `/' from member names

    Downloading bundled file plink.tar.gz

    >>> Unpacking plink.tar.gz to /

    tar: Removing leading `/' from member names

    Downloading bundled file r.tar.gz

    >>> Unpacking r.tar.gz to /

    tar: Removing leading `/' from member names

    Downloading bundled file sambamba.tar.gz

    >>> Unpacking sambamba.tar.gz to /

    tar: Removing leading `/' from member names

    Downloading bundled file seqtk.tar.gz

    >>> Unpacking seqtk.tar.gz to /

    tar: Removing leading `/' from member names

    Downloading bundled file vcflib.tar.gz

    >>> Unpacking vcflib.tar.gz to /

    tar: Removing leading `/' from member names

    Downloading bundled file vcftools.tar.gz

    >>> Unpacking vcftools.tar.gz to /

    tar: Removing leading `/' from member names

    Downloading bundled file plink2.tar.gz

    >>> Unpacking plink2.tar.gz to /

    tar: Removing leading `/' from member names

    Downloading bundled file regenie.tar.gz

    >>> Unpacking regenie.tar.gz to /

    tar: Removing leading `/' from member names

    Downloading bundled file bolt-lmm_asset.tar.gz

    >>> Unpacking bolt-lmm_asset.tar.gz to /

    tar: Removing leading `/' from member names

    Downloading bundled file bgen.tar.gz

    >>> Unpacking bgen.tar.gz to /

    tar: Removing leading `/' from member names

    dxpy/0.383.1 (Linux-5.15.0-1070-aws-x86_64-with-glibc2.29) Python/3.8.10

    bash running (job ID job-Gv37XK0JYp1jq5vgKK30Vk4z)

    downloading file: file-GQK69v0JykJfg6p39kBpBKGY to filesystem: /home/dnanexus/in/in/0/ukb21008_c22_b0_v1.bgen

    Oct 09 2024, 8:42 PM

    Using dxfuse version v1.4.0

    The log file is located at /root/.dxfuse/dxfuse.log

    starting fs daemon

    wait for ready

    Daemon started successfully

    Downloading files using 4 threads+ [[ '' == '' ]]

    + eval 'qctool -g ukb21008_c22_b0_v1.bgen -og myfile.vcf'

    ++ qctool -g ukb21008_c22_b0_v1.bgen -og myfile.vcf

     

    Welcome to qctool

    (version: 2.2.0, revision: unknown)

     

    (C) 2009-2020 University of Oxford

     

    Opening genotype files : [ ] (0/1,0.0s,0.0/s) Opening genotype files : [******************************] (1/1,0.0s,116.6/s) Opening genotype files : [******************************] (1/1,0.0s,106.3/s)

    ========================================================================

     

    Input SAMPLE file(s): Output SAMPLE file: "(n/a)".

    Sample exclusion output file: "(n/a)".

     

    Input GEN file(s):

    (4645893 snps) "ukb21008_c22_b0_v1.bgen (bgen v1.2; 488315 unnamed samples; zstd compression)"

    (total 4645893 snps in 1 sources).

    Number of samples: 488315

    Output GEN file(s): "myfile.vcf"

    Output SNP position file(s): (n/a)

    Sample filter: .

    # of samples in input files: 488315.

    # of samples after filtering: 488315 (0 filtered out).

     

    ========================================================================

     

    VCFFormatSNPDataSink::write_header(): FORMAT entries are:

    ##FORMAT=<ID=GP,Type=Float,Number=G,Description="Genotype call probabilities">

     

    Oct 09 2024, 8:50 PM

    CPU: 26% (4 cores) * Memory: 1262/7731MB * Storage: 26/324GB * Net: 37↓/0↑MBps

    Oct 09 2024, 9:00 PM

    CPU: 26% (4 cores) * Memory: 1270/7731MB * Storage: 29/324GB * Net: 0↓/0↑MBps

    Oct 09 2024, 9:10 PM

    CPU: 26% (4 cores) * Memory: 1262/7731MB * Storage: 31/324GB * Net: 0↓/0↑MBps

    Oct 09 2024, 9:20 PM

    CPU: 26% (4 cores) * Memory: 1265/7731MB * Storage: 34/324GB * Net: 0↓/0↑MBps

    Oct 09 2024, 9:30 PM

    CPU: 26% (4 cores) * Memory: 1283/7731MB * Storage: 37/324GB * Net: 0↓/0↑MBps

    Oct 09 2024, 9:40 PM

    CPU: 26% (4 cores) * Memory: 1266/7731MB * Storage: 40/324GB * Net: 0↓/0↑MBps

    0

Please sign in to leave a comment.