UKB Proteomics imputation
Hi all - I think a similar question about dealing with missing data has been posted before, but I was wondering what suggestions people have on dealing with data that didn't pass QC (and is therefore ‘NA’ in the UKB proteomics release)? Are there any recommendations for imputation of this data (ie what thresholds to use) et cetera?
Thank you!
Comments
3 comments
Hi everyone,
I'm dealing with a similar situation concerning missing data in proteomics research. I've considered imputation strategies like using mean or median values, but I'm not entirely convinced these are the best solutions. Do you think an approach like bootstrapping would be effective?
Also, how do you handle 'NA' data during your statistical tests? This is particularly challenging since proteins with undetermined expression values might not be reliably linked to the disease condition, which complicates the analyses.
I'm interested in learning more about how you manage these challenges and whether you've found robust methods for addressing biases introduced by missing data. Your insights and suggestions would be greatly appreciated.
Thank you!
Hi Nihakira - seems like we are facing the same problems! I have come up with some solutions to the missing values, though not sure how ‘robust’ these are… I'm happy to have a chat about this if you want and we can share some ideas - I haven't considered the bootstrapping approach but it sounds interesting! Feel free to email me silvia.shen[at]ed.ac.uk :)
Hi Silvia,
I've already reached out via email to discuss this matter further. Please check your inbox (including the spam/junk folder) to ensure you received it. Looking forward to your response. :)
Please sign in to leave a comment.