I am trying to download proteomics data in ukbiobank rap. I could able to extract the variable that was noted in the release email. I dont see any values in the file. I tried downloading through data portal but I am not sure if its downloading or not.
Is there a way to download these in ukbiobank rap platform. I believe we paid for complete access of all datasets but not sure why I couldn't able to download? Have anyone facing this issue ?
The Showcase entry for field 30900 https://biobank.ndph.ox.ac.uk/showcase/field.cgi?id=30900 shows Cost Tier d2 o2 s2. This means that projects with tier 2 or above will be able to download the data via the portal on the RAP, and use the data portal online on the RAP, and download the data from Showcase data portal.
I notice that Re-Dispense is likely to take a long time (days rather than hours) as there is currently a long queue. Once it has completed:
The field 30900 only indicates how many proteins have been measured for each participant. To find the actual data values for Instance 0 for each protein in the Cohort Browser, select Add column, then look in folder
To access the proteomics data via table olink_data in the data portal on Showcase, it will be necessary to wait for the notification email for basket 4041758, which contains field 30900.
Hello Rachael, Thank you so much for the information.
1.) I tried to download the files using Spark as mentioned above and was only able to download 30,900. However, when I tried to get Olink instance 0, as mentioned in the instructions, I couldn't see anything. Do you know what the column name for that instance is that we can use to download it from the Spark cluster?
2.)Regarding basket approvals and notifications, do you have any idea about the turnaround time for basket requests? Knowing this would definitely help me plan my project accordingly.
Thank you so much for helping me out. I was hoping to get access either from Dnanexus or Basket access so I can plan my project.
2.) Turnaround time for baskets is generally about a week, though this can vary when there are system problems or busy times such as a showcase refresh. The good news is that your latest basket is currently in the generation queue and I would expect it to be ready by tomorrow. If you haven't received the email by Friday morning, post again here.
1.) I don't know the answer to this. To find the specific column name for a particular protein, you could look at the column name as it appears in the cohort browser. For example, there is an assay called AARSD1. You could try guessing a column name of AARSD1_0. Note that the Entity is Olink Instance 0 NPX Result. There are several Resource files in the RAP in folder Bulk > Protein Biomarkers > Olink > helper_files.
The protein-IDs are in olink_assay.dat, which relates assay AARSD1 to UniProt Q9BTE6 etc. Olink_proteomics_data.pdf might also be useful.
Be aware that downloading the whole olink_data table from the showcase data portal can take a very long time, (minutes rather than seconds).
Hello Rachael, Thank you for the information. I got my basket approved and could able to download data as per their suggestions. Thank you so much for your help.
I don't think it is available yet. If you "follow" the preprint, you can get a notification of when it is published, and the publication version should include a url for the gwas data.
0
Permanently deleted user
Hi Chai,
I am not sure to which category in Table 2 do the proteomics data correspond to. Can you please point me to it?
Comments
12 comments
The proteomics data are in cohort browser. Whether or not you can download it would depend on your Tier of access. See table 2 https://www.ukbiobank.ac.uk/enable-your-research/costs/transitional-arrangements-and-faqs#accordion-9-id
We covered this during proteomics round table discussion today. We announce talks/events in this community portal announcement section.
You could join our training section on June 1 which we will go in detail for hands-on part.
Hello Chai,
Thank you for the information. I will go through the link mentioned above and register for the training section on June 1.
Regards
Akhil
Hi Akhil,
the data in your RAP project area is a copy of the main data held in the RAP. The main data was updated recently, on 12th April 2023, as Version 15.1, and that update includes the Olink Proteomics data portal table field 30900. See https://dnanexus.gitbook.io/uk-biobank-rap/getting-started/data-release-versions. In order to get the new data into your RAP project area, you need to Re-Dispense your project. See https://dnanexus.gitbook.io/uk-biobank-rap/getting-started/updating-dispensed-data .
The Showcase entry for field 30900 https://biobank.ndph.ox.ac.uk/showcase/field.cgi?id=30900 shows Cost Tier d2 o2 s2. This means that projects with tier 2 or above will be able to download the data via the portal on the RAP, and use the data portal online on the RAP, and download the data from Showcase data portal.
There is a forum post by Ondrej Klemper on data exporting which might be useful, see https://community.dnanexus.com/s/question/0D5t000004SBm0eCAD/query-of-the-week-1-export-phenotypic-data-to-a-file
There is a section on the proteomics data structure in this page https://dnanexus.gitbook.io/uk-biobank-rap/getting-started/working-with-ukb-data#about-proteomics-data
I notice that Re-Dispense is likely to take a long time (days rather than hours) as there is currently a long queue. Once it has completed:
The field 30900 only indicates how many proteins have been measured for each participant. To find the actual data values for Instance 0 for each protein in the Cohort Browser, select Add column, then look in folder
Biological samples > Blood assays > Proteomics > Protein biomarkers > Olink Instance 0
Select each protein of interest and click Add column to table.
It is not possible to download the whole of the Proteomics data via the cohort browser. The cohort browser has a limit of 200 columns for saving and exporting, and a limit of 30k rows . See https://documentation.dnanexus.com/user/cohort-browser#variant-browser .
There is a Table Exporter app in the Tools tab that might be useful, but I think you would need to define a table using Spark in JupyterLab first, see
https://dnanexus.gitbook.io/uk-biobank-rap/working-on-the-research-analysis-platform/using-spark-to-analyze-tabular-data . Also see video https://dnanexus.gitbook.io/uk-biobank-rap/working-on-the-research-analysis-platform/using-jupyterlab-on-the-research-analysis-platform .
To access the proteomics data via table olink_data in the data portal on Showcase, it will be necessary to wait for the notification email for basket 4041758, which contains field 30900.
Hello Rachael, Thank you so much for the information.
1.) I tried to download the files using Spark as mentioned above and was only able to download 30,900. However, when I tried to get Olink instance 0, as mentioned in the instructions, I couldn't see anything. Do you know what the column name for that instance is that we can use to download it from the Spark cluster?
2.)Regarding basket approvals and notifications, do you have any idea about the turnaround time for basket requests? Knowing this would definitely help me plan my project accordingly.
Thank you so much for helping me out. I was hoping to get access either from Dnanexus or Basket access so I can plan my project.
Akhil
Hi Akhil,
2.) Turnaround time for baskets is generally about a week, though this can vary when there are system problems or busy times such as a showcase refresh. The good news is that your latest basket is currently in the generation queue and I would expect it to be ready by tomorrow. If you haven't received the email by Friday morning, post again here.
1.) I don't know the answer to this. To find the specific column name for a particular protein, you could look at the column name as it appears in the cohort browser. For example, there is an assay called AARSD1. You could try guessing a column name of AARSD1_0. Note that the Entity is Olink Instance 0 NPX Result. There are several Resource files in the RAP in folder Bulk > Protein Biomarkers > Olink > helper_files.
The protein-IDs are in olink_assay.dat, which relates assay AARSD1 to UniProt Q9BTE6 etc. Olink_proteomics_data.pdf might also be useful.
Be aware that downloading the whole olink_data table from the showcase data portal can take a very long time, (minutes rather than seconds).
Hello Rachael, Thank you for the information. I got my basket approved and could able to download data as per their suggestions. Thank you so much for your help.
Hello, Just checking to see if the gwas summary association data for this paper is available for us. I dont see any official publication so thought of asking. https://www.biorxiv.org/content/10.1101/2022.06.17.496443v1.supplementary-material
I don't think it is available yet. If you "follow" the preprint, you can get a notification of when it is published, and the publication version should include a url for the gwas data.
Hi Chai,
I am not sure to which category in Table 2 do the proteomics data correspond to. Can you please point me to it?
Thanks
I assume you are having numeric tier 1-3. You can check cost tier policy for each data based on UKB Showcase here.
https://biobank.ndph.ox.ac.uk/showcase/rectab.cgi?id=1072
https://biobank.ndph.ox.ac.uk/showcase/help.cgi?cd=tier
Please sign in to leave a comment.