GWAS Analysis Using pVCF Format Data
I have experience conducting GWAS with population-level data in Bgen and Plink formats. Recently, I encountered 50k WGS data available exclusively in pVCF format. I'm interested in understanding whether this format is compatible with GWAS analysis or if there's a way to convert it to Bgen/Plink formats. Thank you for your assistance!
Comments
3 comments
It is possible to convert the pVCFs to BGEN and PLINK. There is a protocol here: https://dnanexus.gitbook.io/uk-biobank-rap/science-corner/whole-exome-sequencing-oqfe-protocol/protocol-for-processing-ukb-whole-exome-sequencing-data-sets#conversion-of-pvcf-to-plink-and-bgen-files.
We are also planning on releasing a workflow on our github that can undertake this conversion.
I'm not entirely sure which field you are referring to but there are plans to release the 500K DRAGEN WGS (field 24310) and GATK/Graphtyper 500K in PLINK and BGEN format https://biobank.ctsu.ox.ac.uk/showcase/label.cgi?id=185 . There are currently PLINK and BGEN for the GATK/Graphtyper 200K WGS (https://biobank.ctsu.ox.ac.uk/showcase/label.cgi?id=271), as well as the exome sequencing (https://biobank.ctsu.ox.ac.uk/showcase/label.cgi?id=170 ).
George F I can't find those data in the PLINK/BGEN format when I check my bulk WGS folder (as shown on the attached screenshot). Can you please let me know what am I missing here?
Hi Mitja,
The 500K GATK/Graphtyper and DRAGEN datasets are currently only available in pVCF format. We are currently working to release as PLINK and BGEN.
The 200K version of the GATK/Graphtyper dataset https://biobank.ctsu.ox.ac.uk/showcase/label.cgi?id=271 is available as PLINK and BGEN, as well as the exome sequencing https://biobank.ctsu.ox.ac.uk/showcase/label.cgi?id=170
Please sign in to leave a comment.