500K Dragen vcf file phasing

Yong Qian

I randomly picked a individual vcf file from DRAGEN cram and found that among 5 million+ variants in the file, 676,866 are phased (e.g. 1|1 in GT field)  and the rest are unphased (e.g 1/1).

How the phasing was done to those phased variants?  Can I used shapeit  and its .b38.gmap.gz to phasing all the variants?   Will there be much change to the variants that are already phased?

Comments

1 comment

  • Comment author
    Lucy BG The helpers that keep the community running smoothly. UKB Community team Data Analyst

    Hi Yong,

    We have two articles on the DRAGEN datasets released which may assist you in understanding the steps taken to prepare the data, the filters applied, and how best to use it in your current workflow.

    The first article covers the gVCF, pVCF, and CRAM format files which were released in November 2023: Initial DRAGEN whole genome sequencing (WGS) data release

    The second article covers the ML-corrected pVCF, BGEN and PLINK2 format files which were released in March 2025: ML-Corrected DRAGEN whole genome sequencing (WGS) release

    Each article covers the respective pipeline and quality control steps which occurred to prepare the data for release. 

    Hope this helps!

    -1

Please sign in to leave a comment.