How can I identify which subjects have which alleles of rs56041637 in the WGS data? rs56041637 is an intron variant not in the imputed snps.
https://www.ncbi.nlm.nih.gov/snp/rs56041637
In case a pVCF is available for filtering a specific variant, I would apply some filtering e.g. using Hail. In case there is no pVCF, there might be a reason for joint calling (making pVCF). The variant calls in the single sample VCF are not reliable and a variant call one sample might turn out to be a false call. Also after joint calling, there are QC criteria.
I am not certain which WGS data set to use. How would I find the pVCF? If I made one, would it look like the one in the wiki link? What would you advise me to do?
I want to see if the alleles described of rs56041637 in this article might be related to phenotype grip strength in UKBB.
"rs56041637 is a CATC-repeat insertion. In the patient dataset, we observed that patients who are homozygous for the risk alleles at both rs12608932 and rs12973192 tend to have 3 to 5 CATC-repeats at rs56041637; patients who are homozygous for reference alleles at both rs12608932 and rs12973192 tend to have shorter (0 to 2) repeats at rs56041637 (Fig. 4d). Thus, in addition to the two lead GWAS SNPs (rs12608932 and rs12973192), we now nominate rs56041637, as potentially contributing to risk for disease by making UNC13A more vulnerable to cryptic exon inclusion when TDP-43 is depleted from the nucleus."
pVCF is a population VCF which contains variant information from multiple samples, it has form of a big VCF table. Depending on which UKB WGS dataset you have access to, you may find pVCFs in the dispensed project. I would check the Bulk > Whole genome sequences folder in your project.
OK. I do not think there is one only correct method how to analyze pVCF or how to answer your question about rs56041637. But pVCFs might be a good resource how to start with this. I would first confirm if rs110402 is covered in the captured region by reading the appropriate pVCF. I would also try to search for the variant using its loci rather than rsID. IMO, this could be done using Hail or PLINK.
Comments
8 comments
Which WGS dataset are you referring to?
In case a pVCF is available for filtering a specific variant, I would apply some filtering e.g. using Hail. In case there is no pVCF, there might be a reason for joint calling (making pVCF). The variant calls in the single sample VCF are not reliable and a variant call one sample might turn out to be a false call. Also after joint calling, there are QC criteria.
I am not certain which WGS data set to use. How would I find the pVCF? If I made one, would it look like the one in the wiki link? What would you advise me to do?
https://en.wikipedia.org/wiki/Variant_Call_Format
I want to see if the alleles described of rs56041637 in this article might be related to phenotype grip strength in UKBB.
"rs56041637 is a CATC-repeat insertion. In the patient dataset, we observed that patients who are homozygous for the risk alleles at both rs12608932 and rs12973192 tend to have 3 to 5 CATC-repeats at rs56041637; patients who are homozygous for reference alleles at both rs12608932 and rs12973192 tend to have shorter (0 to 2) repeats at rs56041637 (Fig. 4d). Thus, in addition to the two lead GWAS SNPs (rs12608932 and rs12973192), we now nominate rs56041637, as potentially contributing to risk for disease by making UNC13A more vulnerable to cryptic exon inclusion when TDP-43 is depleted from the nucleus."
https://www.biorxiv.org/content/10.1101/2021.04.02.438213v1.full
pVCF is a population VCF which contains variant information from multiple samples, it has form of a big VCF table. Depending on which UKB WGS dataset you have access to, you may find pVCFs in the dispensed project. I would check the Bulk > Whole genome sequences folder in your project.
I have whole genome graph typer joint call pVCF files. How do I analyze them.
OK. I do not think there is one only correct method how to analyze pVCF or how to answer your question about rs56041637. But pVCFs might be a good resource how to start with this. I would first confirm if rs110402 is covered in the captured region by reading the appropriate pVCF. I would also try to search for the variant using its loci rather than rsID. IMO, this could be done using Hail or PLINK.
Can you give me the plink command line
I do not have any example of such command ready to use. I think you can develop some testing commands based on this doc page: https://www.cog-genomics.org/plink/1.9/filter
Please sign in to leave a comment.