Use of Artificial Intelligence (AI) applications and models

Huma B
Huma B Communications Team The helpers that keep the community running smoothly. UKB Community team
  • Updated

UK Biobank is well aware of the many recent AI-related developments in clinical research.  UK Biobank considers that the use of AI applications and models is a reasonable (and indeed inevitable) approach and UK Biobank supports the responsible use of AI applications and models on the basis set out in this note.  Having considered this carefully, there are certain practices that UK Biobank would ask researchers:

  • to adopt generically: namely, when researchers use AI in their research using UK Biobank data, it should be done in full compliance with prevailing good practice and principles;
  • to adopt specifically: namely, when researchers develop or train AI models based on UK Biobank participant level data.  This elaborates on what is already contained within UK Biobank’s Access Procedures and the standard MTA; and
  • to avoid: namely, there are certain practices that UK Biobank requires researchers not to use, facilitate or promote in their research so that UK Biobank data (or its equivalent) does not become available generally or to other researchers (who have not gone through UK Biobank Access Process).  

1. What UK Biobank does want researchers to do pro-actively in terms of generic standards

1.1 UK Biobank strongly encourages researchers that when they undertake research using AI, they do so according to the relevant prevailing standards on AI usage.  These generic standards are there to ensure:

  • Safety, security and robustness;
  • Transparency and explainability;
  • Fairness;
  • Accountability and governance; and
  • Contestability and redress.

1.2 The UK Government’s version of these standards (which are based on what the Turing institute prepared) are set out in the attached link.  

1.3  There are some more specific research-relevant standards which UK Biobank would also ask researchers to take into account:

 2. Development of proprietary AI models using UK Biobank participant level data

The following guidance is consistent with the provisions of UK Biobank’s MTA.

2.1 If a researcher develops a proprietary AI model, which has been trained on UK Biobank participant level data (whether in whole or part), then the following applies:

  • The researcher is entitled to retain the run file that establishes the functioning of the model (the model architecture) and can use and licence this model as it sees reasonably fit.  The software for the model and the model architecture is notionally the property of the researcher.
  • Any parameters derived from the UK Biobank data for such a model (which are also the property of the researcher) must be returned to UK Biobank as derived variables (part of the Results Data), which can then be made available to other researchers in the normal way.
  • The researcher can use the model and the UK Biobank derived parameters (the trained model) themselves for their research.  UK Biobank considers that it is reasonable practice for a trained model to be able to generate aggregate or summary outputs.  Thus researchers can also make the trained model available to third parties (on such terms as they see reasonably fit, taking into account UK Biobank’s de facto open access approach) with the following proviso: 
  1. The researcher must determine if it is possible that the trained model can, when used by the research or the third party either a) retrieve copies (in whole or part) of actual data participant-level b) generate the equivalent (actual or synthetic) participant-level data or c) otherwise serve to create a research environment which is equivalent or comparable to using participant-level data;
  2. If this could be the case, then UK Biobank considers that this is the equivalent of providing direct access to UK Biobank participant level data and as such is not permitted under the Access Procedures (and the terms of the MTA).
  3. Further, any outputs generated by the trained model must comply with UK Biobank’s guidance on publication of minimum numbers (so as to protect participant confidentiality), including https://www.ukbiobank.ac.uk/media/cukhqxtp/uk-biobank-policy-on-protecting-confidentiality-in-public-browsers-final_.pdf .
  • For further clarity in terminology and outputs:
  1. Parameters are variables that the model learns during the training process: for example, neural network weights in a deep learning model.  UK Biobank would expect to see these parameters described, as part of the description of the model architecture, in the researcher’s publication of its Findings.
  2. Generative AI refers to any models that are designed to generate data similar to what the model was trained upon.

3. What UK Biobank does not want researchers to do in relation to publicly accessible Generative AI models (which include Multi Modal Generative Models and Large Language Models)

3.1   These restrictions are all contained within UK Biobank’s standard MTA, but to be explicit, UK Biobank does not permit any researcher:

  • to (directly or indirectly) feed or enable UK Biobank participant level data to be incorporated into a publicly available Generative AI or similar model; and/or
  • to enable (however inadvertently) UK Biobank participant level data to be incorporated into a publicly available Generative AI or similar model.  This includes making it publicly available on any searchable website by (for example) posting it to GitHub or similar repository.

3.2 It is the responsibility (and legal obligation) of the researcher to ensure that the use of any publicly accessible Generative AI or similar model used for the analysis of UK Biobank data, is compliant with this guidance and the provisions of the MTA.

Related to

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Article is closed for comments.