Clarification on Proteomic Data Analysis (Olink NPX)

Manal Alharbi

I am currently analyzing proteomic (Olink NPX) data provided by UK Biobank. I've performed missing-value imputation using KNN, PCA, and differential expression analysis using limma. However, my PCA does not clearly separate case and control groups, and the fold-change values obtained from limma appear lower than expected.

Could you please advise on:

  1. Recommended methods for identifying and removing outliers specifically in proteomic (Olink NPX) data.
  2. Whether the proteomic NPX data provided by UK Biobank is already z-scored (standardized), or if additional scaling (e.g., z-scoring) is recommended before PCA or limma differential expression analysis.

Comments

1 comment

  • Comment author
    Michael Turner

    Hi,

     

    I am just getting to grips with the data myself but:

    1. The missing values mean that the protein wasn't detected above the limit of detection. Some of these “missing” values will be true low levels of a protein. I have not seen a clear consensus on what to do with missing values in the literature (but would love it if someone knew what a good consensus is). I am not sure there is a straightforward answer to this question as it may sometimes reflect assay failure and sometimes reflect biologically-plausible low protein levels.

    2. The data are not standardised. The NPX values reflect the number of qPCR cycles taken to reach a given threshold. As the amount of DNA doubles with each PCR cycle they are effectively on the log2 scale. Some researchers using o-link data have normalised the data, e.g.: https://www.sciencedirect.com/science/article/pii/S0735109723075654?via%3Dihub

    I hope this is helpful!

    Mike 

    3

Please sign in to leave a comment.