Hi, By whom and how were the helper filer for the 450K exomes generated? Has any filtering been applied for the pVCF or the plink files? Is the release on the RAP from this pipeline? https://www.nature.com/articles/s41586-021-04103-z

Comments

7 comments

  • Comment author
    Chai Fungtammasan DNAnexus Team

    The main exome sequencing data released on RAP is processed with OQFE pipeline. You can see the link to related publications which contain the involved research groups here. The link also describes which step data came from.

    https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=170

     

    As for the Nature article you mentioned, that data has not been released yet.

    0
  • Why is this being presented in such an opaque way? Can you you just tell me where the exact info on how the helper files were generated. I'm missing lots of details.

    The article links in https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=170 are missing that information. Moreover that page is referencing the release on the 200K exomes which is just confusing.... Needs to be updated. Need more details.

    0
  • Comment author
    Chai Fungtammasan DNAnexus Team

    The OQFE 450k whole-exome sequencing data is processed in the same way as the OQFE 200k WES. There is a note on the top saying that `Fields 23141-23146 will eventually contain information on all the participants for whom exome sequencing is possible.` We understand that this is not very clear, and we will look into updating the documentation.

    For by whom and how, you can find the link to a publication on that page.

    https://www.medrxiv.org/content/10.1101/2020.11.02.20222232v1

    I'm not sure I understand the need to know info around helper files since those are reference genomes and coordinates. What I normally would do is to make sure I use the same version of ref and coordinate if I want to do any comparison. However, I hope this would give you the pointer to the correct groups of researchers.

    0
  • Ah ok I see. By helper files I mean these files that are found in one of the subfolder that is called helper_files

    ukb23149_450k_OQFE.90pct10dp_qc_variants.txt

    ukb23149_450k_OQFE.annotations.txt.gz

    ukb23149_450k_OQFE.sets.txt.gz

    ukb23149_450k_OQFE.variant_ID_mappings.txt

     

    It is not clear from reading the https://www.medrxiv.org/content/10.1101/2020.11.02.20222232v1 how they actually fit in with those files. Especially since that pertains to the 200K exomes so number of variants etc won't match when doing it for the 450K.

    For instance " individual and variant missingness <10%, Hardy Weinberg Equilibrium p-value>10^-15, minimum read coverage depth of 7 for SNPs and 10 for indels, at least one sample per site passed the allele balance threshold > 0.15 for SNPs and 0.20 for indels" are those variants filter in the ukb23149_450k_OQFE.90pct10dp_qc_variants.txt.

    I would guess so but I don't like to just assume that is the case if I am working on a publication were I put my name on it.

    I was hoping someone could just give a straight answer I find it strange that nobody seems to know.

     

     

    0
  • Comment author
    Chai Fungtammasan DNAnexus Team

    @Gustav Ahlberg? Thanks for clarifying which helps_files you are discussing.

    Here is some additional info we have.

    ukb23149_450k_OQFE.annotations.txt.gz and ukb23149_450k_OQFE.sets.txt.gz are annotation files for running Regenie

     

    ukb23149_450k_OQFE.90pct10dp_qc_variants.txt contains the variants that failed the ?90pct10dp? filter

    You may see the detail for the 90pct10dp here https://dnanexus.gitbook.io/uk-biobank-rap/science-corner/protocol-for-processing-ukb-whole-exome-sequencing-data-sets/details-on-processing-the-300k-exome-data-to-generate-the-quality-control-set

     

     

    ukb23149_450k_OQFE.variant_ID_mappings.txt contains the mappings of variant IDs in the multiallelic pVCF to the biallelic variants in the PLINK 1.9 bim file.

    0
  • Comment author
    Former User of DNAx Community_24

    Are those helper files available for the 450k? And what is the path? I don't seem to find them. Thanks!

    0
  • Comment author
    Chai Fungtammasan DNAnexus Team

    Do you mean final WES? We discussed 450k WES above.

    0

Please sign in to leave a comment.