Metabolomics data webinar: summary, resources, and Q&A

  • Updated

Our webinar Inside the world’s largest metabolomics study: insights from UK Biobank (21 January 2026) explored the full NMR metabolomics dataset, now available for 500,000 participants as part of our November 2025 data release.

The session covered how the data were generated, what is available to researchers, and how these measures are already supporting research into ageing, cardiometabolic disease, and risk prediction. It also included practical guidance on accessing and analysing the data through the UK Biobank Research Analysis Platform (UKB-RAP).


Watch now

Watch the full event here:

 


Opening remarks: why metabolomics matters

Martin Rutter (Deputy Chief Scientist, UK Biobank) opened the webinar by framing metabolomic profiling as a powerful, objective readout of the combined effects of genetics, lifestyle, and environment. He highlighted the potential of large-scale metabolomics to improve disease prediction, provide mechanistic insight to support drug development and prevention, and inform population-level health strategies. The session aimed to encourage uptake of the newly released NMR data and to support researchers in using them effectively.


About the data and how they were generated

Jump ahead

Luke Jostins‑Dean (Principal Scientist, Nightingale Health) introduced Nightingale Health high‑throughput NMR platform underpinning the dataset. Using one‑proton NMR spectroscopy, the platform quantifies ~250 metabolic biomarkers, primarily in absolute units (for example mmol/L or µg/L), enabling direct biological interpretation and comparison.

Key features of the biomarker panel include:

  • Lipoproteins and lipids: detailed measurements of total lipids, cholesterol, triglycerides, phospholipids, and fatty‑acid composition (including omega‑3, omega‑6, saturated and unsaturated fats).
  • Lipoprotein subclasses: fine‑grained breakdown by particle size (for example very large to small HDL and LDL particles), including lipid composition within each subclass.
  • Other metabolites: amino acids (including branched‑chain amino acids), ketone bodies, glycolysis‑related metabolites (glucose, lactate, pyruvate), creatinine, albumin, and GlycA as a marker of chronic inflammation.

Luke also outlined the phases in which these were made available to researchers via UK Biobank:

  • Phase 1 (2021): ~120,000 samples
  • Phase 2 (2023): ~half of the cohort
  • Phase 3 (2025): ~500,000 participants, including ~490,000 baseline samples, ~20,000 repeat samples (around five years apart), and ~10,000 blinded duplicates for quality control

Quality assurance showed high reproducibility across spectrometers and batches. Early batch effects identified for a small number of biomarkers (for example alanine) have since been corrected, with updated fields now available on Showcase.

Luke also highlighted existing publications based on early access to the data, large‑scale GWAS and meta‑analyses (including ancestry‑specific analyses), and a range of supporting resources, available below

Read more:


What data are available and how to access them

Jump ahead

Lora Boteva (Senior Data Analyst, UK Biobank) provided a practical overview of the dataset and how it can support research.

  • Coverage: NMR metabolomics data for ~98% of the ~500,000 UK Biobank participants
  • Variables: 249 metabolic measures, including 168 absolute concentrations and 81 ratios
  • Longitudinal data: repeat measures for ~17,500 participants (baseline 2006–2010; repeat 2012–2013)
  • QC and metadata: processing times, spectrometer identifiers, batch information, and QC flags included for transparency and user‑defined filtering

All individual-level data are available through UK Biobank’s Research Analysis Platform (UKB-RAP), which provides a secure, scalable environment with up-to-date releases.

Lora outlined the application process and mandatory training requirements, the use of project-based workspaces for collaboration, and the range of tools available for different experience levels. These range from the cohort browser and table exporter to RStudio, Jupyter Notebooks, Spark, command-line tools, and custom applets. Researchers may download summary statistics, plots, and code, while individual-level data must remain on the platform.

She encouraged researchers to consult Showcase documentation, UKB-RAP guides, the Community forum, and UK Biobank’s GitHub repositories, with an NMR-focused notebook due shortly (now available via resources section below).

Read more:


Case Study: Biological ageing

Jump ahead

Note: Shiyu's connection was unfortunately cut off toward the end. You can watch her closing remarks here.

Shiyu Zhang (PhD Candidate, Clinical Medicine at Xiangya Hospital, Central South University) presented work published in Nature Communications (2024) that used UK Biobank NMR data to characterise biological ageing at scale.

The study identified 54 metabolites that predict all-cause mortality and developed a metabolomic ageing score that outperformed chronological age, telomere length, and other ageing metrics in predicting mortality and disease risk. Using repeat samples, the analysis also showed that faster rates of metabolomic ageing associate with substantially higher mortality risk.

Together, these findings show how large-scale metabolomics can capture systemic ageing processes and support causal inference when combined with genetic data.

Read more:


Case Study: Cardiovascular risk prediction

Jump ahead

Mike Inouye (Professor of Systems Genomics and Population Health, University of Cambridge) presented work published in the European Heart Journal (2025) assessing whether metabolomic scores improve cardiovascular disease (CVD) risk prediction beyond established tools such as SCORE2.

Using data from nearly 300,000 eligible UK Biobank participants, the team showed that individual NMR biomarkers deliver modest improvements in risk discrimination, while multi-biomarker metabolomic scores provide larger, additive gains. They also demonstrated that combining metabolomic scores with polygenic risk scores improves risk reclassification while maintaining similar numbers needed to treat.

Population-level modelling suggested that integrating metabolomic scores could prevent additional CVD events without reducing prescribing efficiency, highlighting their translational potential for health systems.

Read more:


Q&A

In the closing discussion, Luke quickly addressed questions on:

  • Kaggle resources: derived from an in‑person training workshop, providing hands‑on guidance for analysis using simulated data
  • Quality control: generally light‑touch QC is sufficient; many extreme values represent true biological outliers rather than technical artefacts
  • Units: all measurements are in absolute units, with full details available in Showcase metadata

Other questions raised in the text chat during the session:

How can we access the simulated data for training purposes?

We are currently working on developing synthetic data for training. The mean ANS standard deviation for each field is available on Showcase so data for specific fields can be generated based on these metrics

How does one assess metabolites given they are in different measuring units? Is the data already transformed or are the metabolites assessed individually comparing healthy vs disease?

The unit of measurement can be found under the data field information on UK Biobank's Showcase. For example in data field 23475 acetate has units mmol/l.

I was wondering if the UKB NMR package developed by Dr. Ritchie and Prof Inouye is updated to the latest UKB NMR data release?

No, the data on UKB-RAP has not been run through the UKB NMR package.


Summary points

  • The full-cohort UK Biobank NMR metabolomics dataset is one of the largest and most detailed population-scale metabolomics resources globally.
  • Absolute quantification, repeat measurements, and deep linkage to genetic, clinical, and imaging data make the dataset particularly powerful for longitudinal and integrative analyses.
  • Early case studies in ageing and cardiovascular disease demonstrate both discovery science and clear translational potential.
  • Extensive documentation, training materials, and example workflows support researchers at all stages.

Researchers are encouraged to explore the data through UKB-RAP and to engage with the UK Biobank Community as the resource continues to grow.

 

Was this article helpful?

3 out of 3 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.