How should I get allele frequencies for a specific cohort as per the WGS on the UKB RAP?

Hi all,

 

I'm hoping to get allele frequencies for the entire genome as per the WGS on the UKB RAP.

 

I've tried to pipe the output from a `bcftools view` command (for cohort filtering) into a `bcftools query` command (for allele frequencies). This approach took approximately 20 minutes when tested on a sample population level VCF block file on a default instance. A `data find` command suggests there are >2,500 such VCF block files. Naturally, longer per-block running times have consequences for the costs of the analysis, so I'm looking for a fast and cost effective way to get the allele frequencies.

 

I've also tried to use a PLINK (--freq) approach as suggested on a similar query, but this approach runs into issues with space on a default instance.

 

I know that cohort-specific allele frequencies are available in the Cohort Browser under the Genomics tab, but, from what I gather, these frequencies are derived from the WES data only and limited to a maximum range. Furthermore, the documentation suggests that using the UI is inefficient for downloading this kind of data and instead suggests using the SQL Runner app. Is it possible to use such an approach to get allele frequencies for the WGS data? If so, how can this be done?

 

Thanks!

Comments

1 comment

  • Comment author
    Rachael W The helpers that keep the community running smoothly. UKB Community team Data Analyst

     

    You are correct that the Genomics search in the Cohort Browser is based on the WES data, and contains derived annotation such as the allele frequencies.

     

    There is not currently a similar set of annotation for the WGS data. Creating such as resource would require significant effort. There is nothing on the Future Timelines to suggest that anyone is working on it at present. Once the full cohort has WGS (expected very soon now), researchers might publish results that would be helpful to you. Please keep checking back to see what has been added.

     

    The SQL Runner app would need to read derived annotation data for the WGS, which has not been created, so it will not be any use to you at present.

     

    I am not a genetics expert, and I do not know whether there is a better way to calculate allele frequencies. Can someone else advise?

    0

Please sign in to leave a comment.