I have a question about the "Haplotypes" file (the 0.06TB one - BGEN and BGI). Does this file only contained pruned SNPs after eliminating those in high LD? I am unable to understand how the haplotypes are stored here. Please Help!

Since this file is much smaller than the imputed file, I am assuming it only has the pruned SNPs, can someone confirm? Or does the file have all haplotype combinations (and if so, how are they stored?) Could someone please lay out the format of the file for me, and if there's a way to easily visualize what it contains (I am using python mainly due to workplace constrictions).   PS: I do want to prune SNPs from relevant genes in the file (those which have high LD) so that I can run some statistical tests easily.

Comments

3 comments

Please sign in to leave a comment.