Efficient Processing of Bulk Imaging Data

10 March 2025 20:28
2 comments

Dear colleagues,

I have been trying to assess carotid ultrasound imaging data using the example code provided here: https://github.com/UK-Biobank/UKB-RAP-Notebooks-Access/blob/main/JupyterNotebook_Python/A109_Find-participant-bulk-files.ipynb.

I only need to unzip the files, perform some simple image processing, and save the processed images on the platform. However, the provided code takes an extremely long time to process the files. Is there a more efficient way to handle this data? For example, would it be possible to execute the jobs in parallel?

Thank you in advance.

Best wishes,
Murad

Comments

2 comments

Lea K. Data Analyst UKB Community team
- 04 April 2025 14:10
Hi Murad,
Thank you for reaching out. You might find the xargs argument useful to execute jobs in parallel.
For example:
```
cat <list_of_files>.txt | xargs -I {} dx run <app> -iinput_file={} --destination=""$DX_PROJECT_CONTEXT_ID"/<path/to/folder>"
```
Please keep checking the UKB GitHub as we are planning to publish a guide on running jobs for bulk files.
Hope this helps. Thank you for using the Community forum.
0
Murad Omarov
- 08 April 2025 12:51
Thank you very much!
Am I right to understand that I can provide my Python script instead of <app>?

0

Please sign in to leave a comment.