Efficient Processing of Bulk Imaging Data

Murad Omarov

Dear colleagues,

I have been trying to assess carotid ultrasound imaging data using the example code provided here: https://github.com/UK-Biobank/UKB-RAP-Notebooks-Access/blob/main/JupyterNotebook_Python/A109_Find-participant-bulk-files.ipynb.

I only need to unzip the files, perform some simple image processing, and save the processed images on the platform. However, the provided code takes an extremely long time to process the files. Is there a more efficient way to handle this data? For example, would it be possible to execute the jobs in parallel?

Thank you in advance.

Best wishes,
Murad

Comments

2 comments

  • Comment author
    Lea K. Data Analyst The helpers that keep the community running smoothly. UKB Community team

    Hi Murad,

    Thank you for reaching out. You might find the xargs argument useful to execute jobs in parallel. 

    For example:

    cat <list_of_files>.txt | xargs -I {} dx run <app> -iinput_file={} --destination=""$DX_PROJECT_CONTEXT_ID"/<path/to/folder>"

    Please keep checking the UKB GitHub as we are planning to publish a guide on running jobs for bulk files.

    Hope this helps. Thank you for using the Community forum.
     

    0
  • Comment author
    Murad Omarov

    Thank you very much! 

    Am I right to understand that I can provide my Python script instead of <app>?

    0

Please sign in to leave a comment.