mLOY Data Extraction for Individuals

Mohammad Waqas

Need Help with Extracting mLOY Information for Individual UKB Subjects

Hello all. Our lab has recently taken an interest in mosaic Loss of Y Chromosome (mLOY) research. We have previously done mLOY status extraction on subjects from our institution's private biobank. To do this, we used the MADloy R package (more info: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03768-z  )To call mLOY status, MADloy uses genotyping intensity data in the form of Log R Ratio (LRR) and B-allele Frequency (BAF) values. Each individual's genotyping data is constructed in the PennCNV format, essentially just a 5-6 column text file that contains the following columns: SNP Name; Chromosome; Position; LRR; BAF; GType (this last column is optional). MADloy uses PennCNV files for each subject to create a distribution of mean Log R Ratio in the Y Chromosome (mLRR-Y) across all subjects and then uses an outlier detection method to call LOY status given the distribution. For the data from our private biobank, we received each individual's genotyping data in the form of .idat files, from which LRR and BAF values for each individual's genotyped SNPs were extracted using Illumina's GenomeStudio software. From there, it was relatively easy to construct the PennCNV files and then run MADloy on them to get mLOY status calls.

We wanted to do something similar on the UKB. I saw that on DNAnexus there are LRR and BAF folders in the Genotype Results directory which contain these data divided into chromosomes. But when I took a look at these files, they just contained raw values without SNP or individual information. My best guess is that these might just be the average values for each SNP across all subjects, likely organized in ascending SNP order. However, I don't know for sure. Regardless, we need individualized LRR and BAF information that is specified by SNP position.

Does anybody know how I can get this information? Or alternatively, if you know another way to extract individual mLOY status without using PennCNV files or the MADloy package, that would also be helpful.

Thanks in advance.

Comments

2 comments

  • Comment author
    Lea K. Data Analyst The helpers that keep the community running smoothly. UKB Community team

    Hi Mohammad,

    You may find the SNP filtering notebook useful to filter individual SNPs from the genotype data. Hope this helps.

    Thank you for using the community forum.

     

    0
  • Comment author
    Mohammad Waqas

    Ok, thank you! Will this allow me to see individual subject-level information too? As in, will I be able to see LRR and BAF values for each individual subject, not just each SNP?

    0

Please sign in to leave a comment.