Using PLINK GWAS tool on the RAP generates a text output file with many duplicates. How do I get rid of the duplicates?

Comments

3 comments

  • Comment author
    Chai Fungtammasan DNAnexus Team

    Can you select those files and just remove them? Or maybe not upload them to the platform to begin with.

    0
  • Plink GWAS on RAP produces a single txt output file

    plink_gwas.plink2.PHENO1.glm.logistic.assoc.txt

    The file has 12 columns

    #CHROM POS ID REF ALT A1 TEST OBS_CT OR LOG(OR)_SE Z_STAT P

    The file I got had 5.5 million rows.

    The file had 200,000 duplicates. In order to use LocusZoom and get a correct Manhattan Plot and qqPlot the duplicates must be removed. How do I do that?

    0
  • Comment author
    Chai Fungtammasan DNAnexus Team

    I see. You would need to write a script to remove duplicate. Then you can use swiss-army-knife to run that python script on your file.

    0

Please sign in to leave a comment.