Use of third-party code-sharing repositories

  • Updated

This policy sets out how researchers are expected to act when using third-party code-sharing repositories such as GitHub, to maintain the security of participant data.

It is the responsibility of the researcher, and ultimately their Principal Investigator (PI) and Institute, to ensure compliance with the policy, which will be monitored regularly by UK Biobank.

Failure to comply with this policy will lead to serious penalties, including permanent bans from UK Biobank, and legal proceedings with institutions as necessary.

Researcher requirements

UK Biobank researchers are required to follow the following policy when using code repositories:

  1. Researchers must thoroughly review their historic code repositories to ensure no UK Biobank data are present. This must have been completed by 30 April 2026.
  2. From 30 April 2026, researchers must use our Git Audit Tool to check their new commit files before posting code on GitHub.
  3. All researchers are required to add details of any public code repository accounts they hold to their UK Biobank registration within AMS by 30 April 2026. Details required include username and repository URL, across all platforms (e.g. Github, Gitee, Zenodo).

Using code repositories

UK Biobank is built on the principles of open science, and we encourage researchers to publicly share findings transparently to encourage further research to improve human health.

Part of modern open science involves the uploading of software tools and/or development and analysis scripts to online code repositories, which is required by some journals and which we support. However, any scripts built for use in UK Biobank’s Research Analysis Platform (UKB-RAP) may have been run on underlying UK Biobank data and it is possible to carry through this data into code repositories.

The publication of deidentified participant-level data to code repositories is not acceptable and UK Biobank take necessary steps to ensure this does not happen. In the rare event that this does happen, cases are identified within 24 hours and the data removed rapidly to minimise impact.

This policy document describes the high standards we expect of researchers, the means by which we will monitor compliance, and the consequences for incorrect use of code repositories.

Proper use of UK Biobank data

UK Biobank participant data must always be treated in a manner consistent with the provisions of the Access Procedures and the Material Transfer Agreement (MTA) signed as a condition of access, as well as within our Exemptions Policy.

Specifically:

  • Participant-level data must but not be shared (directly or indirectly) with unauthorised individuals or unauthorised third parties;
  • Participant-level data must not be shared, stored or uploaded (directly or indirectly) to web-based or other repositories accessible by unauthorised individuals or unauthorised third parties

Guidance and support

UK Biobank has created resources to support researchers in the proper use of code repositories. These include:

A bespoke training module  

Our training module provides practical guidance on using code repositories when working with UK Biobank data. It covers version control fundamentals, repository management, and best practices for contributing code in a collaborative setting. This training became mandatory on 31st March 2026 and can be accessed in AMS (login required).

Community Guidance  

There are several pieces of documentation outlining how to collaborate with fellow researchers, share code responsibly, and contribute to a productive and inclusive research ecosystem. Community guidance can be found here.

Self-checking tool

We have published a Python-based tool which generates an audit report to help researchers identify potential sensitive files within a Git repository, or its commit history. We will continue to iterate these self-checking tools and appreciate user feedback.

Monitoring and enforcing the policy

The UK Biobank team conducts daily searches of GitHub to identify code repositories that contain UK Biobank data. These searches will soon be expanded to wider platforms in priority order.

The daily checks identify problematic repositories through automated review and manual validation. To this point any data identified has either been removed voluntarily by the researcher or through legal mechanisms such as a DMCA takedown notice.

When your work goes public on a code repository, our automated tools search it within 24hr to check for the presence of pseudonymised participant data. Each day it searches major places researchers publish code and data (including GitHub, GitLab, Gitee, Hugging Face, and Zenodo) for anything related to UK Biobank.

 When the tools find repositories requiring review, this occurs through a two-stage process:

  • First, a fast scan looks through the entire history of every file - not just the current version - for tell-tale patterns like external ID (EID) numbers.
  • Then, our multi-agent AI solution reads the repository the way a human reviewer would: it understands the README, figures out what the project is actually doing, and brings in specialist tools to identify different file types (spreadsheets, notebooks, genetic and imaging files, code, archives, and so on) to judge whether real participant data is genuinely exposed.

Each repository is then given a clear rating so that anything sensitive can be flagged and acted on quickly by our internal teams through standardised take-down processes

In addition to these actions, from 1st May 2026 researchers will receive an escalating series of penalties:

  • First offense - a formal warning.
  • Second offense - a permanent ban from using UK Biobank resources.
  • Third offense - permanent bans at a PI or an institutional level.

A limited number of extenuating circumstances for non-compliance by 1st May will be considered by exception, for instance a researcher who has been on parental leave within this period of time.

Was this article helpful?

2 out of 2 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.