Using PLINK GWAS tool on the RAP generates a text output file with many duplicates. How do I get rid of the duplicates?

3 comments

Chai Fungtammasan DNAnexus Team
- 11 October 2022 15:19
Can you select those files and just remove them? Or maybe not upload them to the platform to begin with.

0
Former User of DNAx Community_10
- 11 October 2022 15:29
Plink GWAS on RAP produces a single txt output file
plink_gwas.plink2.PHENO1.glm.logistic.assoc.txt
The file has 12 columns
#CHROM POS ID REF ALT A1 TEST OBS_CT OR LOG(OR)_SE Z_STAT P
The file I got had 5.5 million rows.
The file had 200,000 duplicates. In order to use LocusZoom and get a correct Manhattan Plot and qqPlot the duplicates must be removed. How do I do that?

0
Chai Fungtammasan DNAnexus Team
- 11 October 2022 15:50
I see. You would need to write a script to remove duplicate. Then you can use swiss-army-knife to run that python script on your file.

0

Please sign in to leave a comment.