Why is data missing from my UKB-RAP project?

#CheadleDA UK Biobank Data Analysts
#CheadleDA UK Biobank Data Analysts The helpers that keep the community running smoothly. UKB Community team Data Analyst
  • Updated

There are various reasons for this:

  • The data dispensed will depend on the Access Management System (AMS) Application Tier of the project.

Cost Tier 3 data, such as the Whole Genome Sequencing, will not appear in projects that are Tier 1 or Tier 2. Check the Tier of the project and the Cost Tier of the data in the Showcase page for the field. If you have an older style (Tier 0) application and need additional data available on UKB-RAP, please get in touch with the Access Team.

  • New data is being added to the UK Biobank database.

Many UK Biobank fields only have data for a subset of participants. The field page in Showcase will provide an indication of how many participants might have data for the field.

  • Data may be listed on Showcase but not yet in the main UKB-RAP copy.

Compare the Showcase field version date with the latest UKB-RAP version date.   

  • Data may be in the main UKB-RAP copy but not in your dispensed UKB-RAP project copy.

Compare the latest UKB-RAP version date with the project dispense date. The project dispense date is part of the filename for the Dataset Record. If necessary, dispense a new project to receive the data. 

  • Tabular data and Bulk data are stored differently.

Check whether the field of the missing data is listed in Showcase as Bulk or Data (which means Tabular Data), and search in the Bulk folder or the Parquet Database. Data in the Parquet database can be viewed using the Cohort Browser.

  • A small amount of data is being removed from the UK Biobank dataset.

UKB participants can withdraw consent to the use of their data. These withdrawn participants are frequently removed from the main UKB-RAP copy of the data. Researchers should remove the withdrawn participants from the dispensed project copy whenever they receive an email about withdrawals. They may remove withdrawn participants more frequently. 

  • If the dispense process did not complete successfully, the data may be corrupt or some might be missing.   

This is unlikely, but not impossible. If you have checked all other possibilities, you could consider dispensing a whole new UKB-RAP project.

  • Data might be Restricted.   

Some fields require special authorisation. Some fields are Restricted and not available to any researchers at present. Check whether the data field is marked in Showcase as Restricted.

  • Data might be filtered with a file limit in the UKB-RAP viewer 

Only the most recently updated 1000 files are displayed in each folder when browsing through the UKB-RAP. The total number of items in a folder should be displayed at the bottom of the page- for reference, please see the screenshot below.

To see a different set of files, you can use the sort and filter options in the viewer. Alternatively, you may be interested in using the dx-toolkit to navigate through files using the Command Line Interface (CLI).

More specifically, you can use the dx find data command to search for a field or dx ls to list all objects in a folder . For more information about the CLI, you may wish to see the DNAnexus video tutorials

 

Total number of files example:

Related to

Was this article helpful?

0 out of 2 found this helpful

Have more questions? Submit a request

Comments

0 comments

Article is closed for comments.