Clarification on Proteomic Data Analysis (Olink NPX)
I am currently analyzing proteomic (Olink NPX) data provided by UK Biobank. I've performed missing-value imputation using KNN, PCA, and differential expression analysis using limma. However, my PCA does not clearly separate case and control groups, and the fold-change values obtained from limma appear lower than expected.
Could you please advise on:
- Recommended methods for identifying and removing outliers specifically in proteomic (Olink NPX) data.
- Whether the proteomic NPX data provided by UK Biobank is already z-scored (standardized), or if additional scaling (e.g., z-scoring) is recommended before PCA or limma differential expression analysis.
Comments
1 comment
Hi,
I am just getting to grips with the data myself but:
1. The missing values mean that the protein wasn't detected above the limit of detection. Some of these “missing” values will be true low levels of a protein. I have not seen a clear consensus on what to do with missing values in the literature (but would love it if someone knew what a good consensus is). I am not sure there is a straightforward answer to this question as it may sometimes reflect assay failure and sometimes reflect biologically-plausible low protein levels.
2. The data are not standardised. The NPX values reflect the number of qPCR cycles taken to reach a given threshold. As the amount of DNA doubles with each PCR cycle they are effectively on the log2 scale. Some researchers using o-link data have normalised the data, e.g.: https://www.sciencedirect.com/science/article/pii/S0735109723075654?via%3Dihub
I hope this is helpful!
Mike
Please sign in to leave a comment.