Missing homozygous variant calls in querying Dragen WGS gvcf files
Hello,
I have a list of about 400 snps that I want to retrieve from individual level Dragen WGS gvcf files. My vcftools query returns only about half of those snps. I had to use --positions as I could not retrieve using rsid.
We noticed that the snps that were not returned happened to be homozygous in those samples. A collegue noticed that the “original DRAGEN program was invoked by UKB with the --vc-emit-ref-confidence option set to GVCF, which groups the homozygous-ref calls into contiguous blocks based on quality.That means, if vcftools is looking for exact positions, it will only find homozygous SNPs when they happen to fall right at the beginning of a block”
Is there some way to query and return a row for every one of the input snps from these DRAGEN WGS gvcf files regardless of if they fall inside of a contiguous block of homozygous-ref calls ?
I am running a command like this on a few thousand participant IDs
vcftools --gzvcf "/mnt/project/Bulk/DRAGEN WGS/Whole genome variant call files (GVCFs) (DRAGEN) [500k release]/10/1094821_24051_0_0.dragen.hard-filtered.gvcf.gz" --out 1094821 --positions /mnt/project/Srikanth.dir/data/list400SnpsHg38.txt --recode;
Thanks much in advance,
Srikanth Jammulapati
Comments
0 comments
Please sign in to leave a comment.