Integrating External Data Sources with UK Biobank on the RAP

Lea K.
Lea K. Data Analyst The helpers that keep the community running smoothly. UKB Community team
  • Updated

The UK Biobank Research Analysis Platform (UKB-RAP) gives researchers secure, cloud-based access to one of the world’s largest and most detailed health datasets. It includes information from over 500,000 participants, covering genetics, health records, imaging, and lifestyle data. While this is already a powerful resource, researchers can get even more value by combining UK Biobank data with external sources.

Why Integrate External Data?

While UK Biobank offers depth and breadth, integrating external data sources can enhance analytical scope in several ways:

  • Cross-validation: External datasets can be used to validate findings in UK Biobank data and visa versa.
  • Generalisation: Comparing UK Biobank findings with other cohorts helps assess reproducibility and population-level relevance.
  • Larger Training Sets: Machine learning and AI models thrive on data volume and variety. Integration helps overcome limitations of single-source datasets.
  • Enhanced Modelling: Combining UK Biobank data with other sources (e.g., hospital EHRs or public health datasets) may increase sample diversity and improve predictive model accuracy.

 

Types of External Data Suitable for Integration

Researchers may consider integrating:

  • Public open datasets: Public health statistics, census data, environmental statistics (e.g., air/water quality).
  • Institutional data: Local hospital EHRs, imaging repositories, or regional biobanks.
  • Omics data: Proteomics, metabolomics, transcriptomics.

 

Integrating External Data on UKB-RAP

The UKB-RAP provides:

  • Secure environment: Data within UKB-RAP project space is only visible and available to those users that have been added to the application. As a result, the RAP provides a demonstrably secure environment in which to host external data with security requirements. 
  • Scalable tools: RAP supports genomic analysis, imaging pipelines, and machine learning.

Uploading External Data

Bringing external data into the UKB-RAP is straightforward, data import options include: 

Data Format

One important aspect of successful integration is ensuring that data is in a format compatible with RAP tools. 

Limited selections of UK Biobank data can be exported to a tabular .csv format in UKB-RAP project space using Table-exporter.

For larger selections, users may want integrate with the Parquet format used by the UKB-RAP for tabular data. Parquet format is a columnar storage format that is highly efficient for large-scale data processing with tools like Apache Spark. 

Researchers might need to convert external CSV or TSV files to Parquet before performing joins or transformations to optimise performance and compatibility if accessing the data using Spark.

Imaging and genetic files stored in compressed formats within the Bulk folder on RAP.

Harmonising Data

  • Participant linkage: UK Biobank uses pseudonymised IDs (encoded_ids/eids) unique to each application. Cross-referencing must be planned within the constraints of data privacy.
  • Variable alignment: Standardise units, formats, and coding schemes across datasets.
  • Metadata mapping: Use UK Biobank’s Showcase field IDs and dictionaries to align variables.

Considerations

  1. Storage and Compute Costs
    • Storage: Data dispensed into UKB-RAP projects is stored free of charge on the RAP. However, any external data uploaded, analysis outputs, and newly created datasets will incur storage costs.
    • Compute: Activities such as data extraction, transformation, and analysis also generate compute charges.
      For detailed pricing, refer to the UKB-RAP Rate Card.
  2. Policy and Compliance
    • Before integrating external datasets, researchers must review UK Biobank’s data sharing policy to ensure compliance with ethical and legal requirements.
    • Imported data should align with governance standards and consent agreements.

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Article is closed for comments.