I have a question about the "Haplotypes" file (the 0.06TB one - BGEN and BGI). Does this file only contained pruned SNPs after eliminating those in high LD? I am unable to understand how the haplotypes are stored here. Please Help!
Since this file is much smaller than the imputed file, I am assuming it only has the pruned SNPs, can someone confirm? Or does the file have all haplotype combinations (and if so, how are they stored?) Could someone please lay out the format of the file for me, and if there's a way to easily visualize what it contains (I am using python mainly due to workplace constrictions).
PS: I do want to prune SNPs from relevant genes in the file (those which have high LD) so that I can run some statistical tests easily.
Comments
3 comments
i hope this was the right place to ask this^^
These links might help.
https://enkre.net/cgi-bin/code/bgen/wiki/?name=BGEN+in+the+UK+Biobank
https://www.well.ox.ac.uk/~gav/bgen_format/spec/latest.html
thanks! I've looked through them before, but I'll try looking again
Please sign in to leave a comment.