Are sample IDs in joint-called VCF pseudonymized to match project's EIDs?

Taiki Yamaguchi

I am utilizing joint-called WGS 500k release (Field 24310).

Are sample IDs in joint-called VCF pseudonymized to match project's EIDs?

I think so by results of grepping both IDs in participant-level IDs, which I cought EID but not pVCF sample ID, but I'd like to be sure that it is correct, hopefully with official manual page.

I've read through WGS FAQ page, which says:

The EID in the filename is pseudonymised to match your application EIDs. These EIDs are consistent
across your project space, for all bulk and tabular data. Please disregard any sample IDs within the
gVCF, VCF and CRAM files. Further information can be found on the UKB-RAP FAQ page.  

So it is unclear that pVCF's sample ID are aligned (or, it can be read as NOT aligned, because they included “VCF”). The link lead the FAQ page itself.

I've searched the forum, but I could not find questions for this point.

Any information would be appreciated. Thanks in advance.

Comments

2 comments

  • Comment author
    George F The helpers that keep the community running smoothly. UKB Community team Data Analyst
    • Official comment

    Dear Taiki, 

    You are correct. The eids in the joint call pVCF match the project EIDs. For individuals files only the filename is renamed to the match the EID.

     

    Hope this helps

  • Comment author
    Taiki Yamaguchi

    Dear George,

    I'm glad to hear that. Thank you very much.

    Sincerely,

    Taiki

    0

Please sign in to leave a comment.