500K Dragen vcf file phasing
I randomly picked a individual vcf file from DRAGEN cram and found that among 5 million+ variants in the file, 676,866 are phased (e.g. 1|1 in GT field) and the rest are unphased (e.g 1/1).
How the phasing was done to those phased variants? Can I used shapeit and its .b38.gmap.gz to phasing all the variants? Will there be much change to the variants that are already phased?
Comments
1 comment
Hi Yong,
We have two articles on the DRAGEN datasets released which may assist you in understanding the steps taken to prepare the data, the filters applied, and how best to use it in your current workflow.
The first article covers the gVCF, pVCF, and CRAM format files which were released in November 2023: Initial DRAGEN whole genome sequencing (WGS) data release
The second article covers the ML-corrected pVCF, BGEN and PLINK2 format files which were released in March 2025: ML-Corrected DRAGEN whole genome sequencing (WGS) release
Each article covers the respective pipeline and quality control steps which occurred to prepare the data for release.
Hope this helps!
Please sign in to leave a comment.