Query regarding 200k wgs dataset bgen vs Plink format files?
Hello,
I have performed a GWAS analysis using the SAIGE mixed model on the 200k WGS dataset, and the results appear to deviate from what was previously observed in our analysis of the same phenotype on imputed datasets from the UK Biobank.
I am curious about the potential issues causing this disparity and have reviewed the documentation on how BGEN files were generated. However, I couldn't find any relevant information so reaching out with a few queries of mine.
- Could you please clarify the differences between the BGEN files and PLINK files found in the 200k WGS folder in Bulk in DNAnexus? Are these formats identical, or were any additional filters applied during the conversion? ( I know the difference in file format, just want to understand if any additional filters were applied)
- When using BGEN files in PLINK, it is necessary to reference the alleles. Could you please confirm if the variants in the BGEN files are in ref-first or ref-last format?
- I would like to know if the SAIGE applet includes the "--An alleleOrder" parameter. If so, how can I access that parameter? I attempted to include it alongside LOCO, but I was unable to run the analysis successfully.
Please provide me with the necessary information so that I can plan my analysis accordingly and repeat it if necessary.
~Akhil
Comments
3 comments
Any comments or updates would be appreciated?
For your SAIGE question and mixed model work, actually I would like to ask how you managed to run SAIGE? I am also very interested in running SAIGE and have not yet had much success...
For questions about BGEN, I would recommend contacting the UKB Access team. I was not able to find resources about processing WGS data to generate BGEN. Here is just a general
description https://biobank.ndph.ox.ac.uk/showcase/refer.cgi?id=12
Please sign in to leave a comment.