We wanted to do regression analysis on a subpopulation of 10,000 UK Biobank participants for 12 imputed SNPs on the RAP platform.
It would be very helpful if you could guide us on how to proceed with the analysis on the RAP platform.
Thank you. This is helpful but I just want to run GWAS for only 12 imputed SNPs and not the entire imputation data for all chromosomes. Since this is a time-sensitive project you could let me know any specific commands to screen out specific SNPs for association analysis with the phenotype in your platform.
You can use Swiss-army-knife tool on the platform which has most basic bioinformatics tools installed (bcftools, plink, etc) to filter the data and then use the tutorial above to run GWAS.
-h bgen_to_vcf/new_header.txt instead of new_header.txt I replaced with my sample file.
But the genotype FID generated for the vcf file were named as anonymous sample 1 and so forth which does not match the EID or ID present in sample file.
This work is on deadline so I cannot work with the entire imputed data hence is there someway we can get the same ID for my vcf file as the EID for these specific filtered SNPs using the above commands.
This is a pure bioinformatics question. I can give recommendation for what you can try, but I don't know all detail for all these commands without going through the manual.
1) you might want to figure out which command in those chain that remove you sample id. Just break the pipe and inspect output from each of them. After you know that, you can try to focus on manual of those tools if you can modify your commend to get different results.
2) you can do a quick check if the sample order has been switch. If you are confident that the order is preserved, reheader or some sort of mapping id might help.
3) You may post a new question in this community if other members could chime in. Or you could also try Biostar which is would have a larger community for bioinformatics users.
Comments
10 comments
If you are interested in computing GWAS analysis, some steps published in the following tutorial might be useful:
https://dnanexus.gitbook.io/uk-biobank-rap/science-corner/gwas-ex
Thank you. This is helpful but I just want to run GWAS for only 12 imputed SNPs and not the entire imputation data for all chromosomes. Since this is a time-sensitive project you could let me know any specific commands to screen out specific SNPs for association analysis with the phenotype in your platform.
You can use Swiss-army-knife tool on the platform which has most basic bioinformatics tools installed (bcftools, plink, etc) to filter the data and then use the tutorial above to run GWAS.
See video on how to use swiss-army-knife here. https://youtu.be/8bcHeoEggBI?t=2110
Thank you it helped and I was able to screen out specific SNPs . I would like to know how do I link my phenotype eids to genotypic FID or IID?
They are all set the EID based on this tutorial. See this answer here.
https://community.dnanexus.com/s/question/0D5t00000414p2DCAQ/can-you-match-plink-output-to-eids
I used the bgenix tool to extract specific SNPs.
This is the command that I used
bgenix -g ukb_imp_chr${CHR}_v3.bgen \
-i ukb_imp_chr${CHR}_v3.bgen.bgi \
-vcf -incl-rsids ${RSID} | \
bcftools reheader \
-h bgen_to_vcf/new_header.txt | \
bcftools annotate \
--rename-chrs bgen_to_vcf/rename_contigs.txt | \
bgzip -c > new_file.vcf.gz && tabix -p vcf new_file.vcf.gz
-h bgen_to_vcf/new_header.txt instead of new_header.txt I replaced with my sample file.
But the genotype FID generated for the vcf file were named as anonymous sample 1 and so forth which does not match the EID or ID present in sample file.
This work is on deadline so I cannot work with the entire imputed data hence is there someway we can get the same ID for my vcf file as the EID for these specific filtered SNPs using the above commands.
This is a pure bioinformatics question. I can give recommendation for what you can try, but I don't know all detail for all these commands without going through the manual.
1) you might want to figure out which command in those chain that remove you sample id. Just break the pipe and inspect output from each of them. After you know that, you can try to focus on manual of those tools if you can modify your commend to get different results.
2) you can do a quick check if the sample order has been switch. If you are confident that the order is preserved, reheader or some sort of mapping id might help.
3) You may post a new question in this community if other members could chime in. Or you could also try Biostar which is would have a larger community for bioinformatics users.
I introduced the sample id into the vcf file but when I am trying to convert the vcf file to plink files it is giving an error sample files not found.
How can I resolve this issue.
1) you likely only need to run this on, at most, 12 chromosomes. (12 snps, 1 snp per chromosome)
2) Since you are converting the vcf file to plink in the last stage, You should consider using plink for all steps in this extraction.
You would need to do the following:
a) take the bgen file and convert it to plink format.
b) extract the snps you want
c) list the output plink files into a filelist and merge the individual chromosome plink files into a single plink file
d) delete the chromosome working files.
This is off the top of my head right now, so I do not guarantee that it will work perfectly. It is for illustrative purposes
" plink2 --bgen ukb_imp_chr${CHR}_v3.bgen ref-first --sample ukb48065_imp_chr1_v3_s487296.sample \
--make-pgen --out ukbi_ch${CHR}_v3 ; \
plink2 --pfile ukbi_ch${CHR}_v3 --extract your_12_snps.txt --make-pgen ukbi_ch${CHR}_12snps ; \
ls *_12snps.pgen | sed -e 's/.pgen//g'> files_to_merge.txt; \
plink2 --pmerge-list files_to_merge.txt pfile --make-bed --out ukb48065_12snps_merged; \
rm files_to_merge.txt; rm ukbi_ch${CHR}_v3* "
Thank you this works
Please sign in to leave a comment.