Using rsID to apply genomic filter on my RAP cohort

Hello, I have a question concerning applying the genomic filter for my ukb RAP cohort - I asked the same question at today's seminar, and posting this after hearing that I should post it on community Q&A.

 

I have paid for tier 3 data and I'm trying to apply the genomic filter for the pre-defined cohort.

 

However, I've found that applying the filter using 'Variant ID' doesn't work as expected.

 

For example, rsID 'rs11209948' can be searched on 'genomic search' in the following URL: 'https://biobank.ndph.ox.ac.uk/ukb/gsearch.cgi'.

But if I apply rs11209948 to use to genomic filter, it only returns the error message.

 

Here is an another confusing vice versa example. rsID 'rs2179744' cannot be found on the genomic search page which was mentioned above, but applying the rsID doesn't return any error message, and just work well.

 

Why does this happen? Isn't the cohort browser showing the results based on the UK biobank database? If so, why does using the variant ID that cannot be searched on the DB returns some kind of value, and at the same time the variant ID that can be searched on the DB only return an error message?

 

I thought that the problem might lie in the following two reasons: 1) reference version of the human genome or... 2) it might work if I convert the rsID to the chr_pos_ref_alt, as mentioned on the displayed message.

 

So I converted the rsIDs to the relevant format like 5_88367793_C_T, but it still won't work, regardless of the GRCh version.

 

I reported this issue to the customer service, however, I've got no reply except the one saying that I have to grant permission to ukb org-support. I replied to the mail no sooner than I got the mail, but there is still no answer. Is anything wrong with my analysis?

 

------------

Plus, what I'd like to do is to make a logical matrix that contains a list of SNPs as a column and participant EIDs as a row. I would appreciate for any idea to make it.

Comments

1 comment

  • Comment author
    Ondrej Klempir DNAnexus Team

    Hi {@005t000000BBwSeAAL}?,

     

    I am sharing my idea. I first checked that page: https://biobank.ctsu.ox.ac.uk/crystal/gsearch.cgi

    I am not sure about background data in this service, but it seems to me that this is not related to pVCF and exome data.

     

    Screenshot 2023-02-24 at 20.14.58With that in mind, UKB RAP documentation (https://dnanexus.gitbook.io/uk-biobank-rap/getting-started/working-with-ukb-data#browsing-dataset-fields-using-the-cohort-browser) says the following:

     

    "If your access application has been approved for Data-field 23146, 23148 and/or 23157, the Cohort Browser will automatically include a "GENOMICS" section, where you can browse variants in your cohort. The data backing the section depends on the dataset version dispensed: 23157 for version 11 and later, 23148 for version 7 and later, 23146 for previous versions. These variants are sourced from the pVCF files of field #23146, 23148 and/or 23157, after annotating with snpEff GRCh38.92, dbSNP b154 and gnomAD r2.1.1. You can also use these variants to apply genomic filters."

     

    From what I see, the fields mentioned above are exome data.

     

    ---------------------------

    And to your "Plus, what I'd like to do is to make a logical matrix that contains a list of SNPs as a column and participant EIDs as a row. I would appreciate for any idea to make it.", would pVCF format be the file structure you need, e.g. the data fields Data-field 23146, 23148 and/or 23157?

     

     

    0

Please sign in to leave a comment.