Uploading many small files very slow: Hi all, I'm trying to extract zipped images and re-upload them to the platform. Although small (each slice is around 250kB), the upload times are prohibitively slow. What is your approach?

Former User of DNAx Community_64

21 March 2023 00:00
5 comments

Comments

5 comments

Ondrej Klempir DNAnexus Team
- 21 March 2023 10:12
Hi {@005t000000AD7ADAA1}?,

I worked on a similar thing a couple of months ago and I would not recommend to unzip and upload many many files back to platform. It may cause a high load and also navigating the files than is not too quick and convenient. If you decide to use zipped imaging data on RAP, I recommend the following post that shows how to process zipped bulk imaging files: https://community.dnanexus.com/s/question/0D5t000004EtXLYCA3/is-there-a-way-to-extract-the-bulk-imaging-data-using-the-spark-jupyter-notebook

And here is another one about storing imaging files on RAP: https://community.dnanexus.com/s/question/0D5t000004DClaWCAT/where-can-i-save-processed-images-from-the-ukb-bulk-data-and-later-use-them-for-training-the-network-do-we-have-any-example-for-such-task-looking-specifically-in-liver-mri-images

0
Ondrej Klempir DNAnexus Team
- 21 March 2023 10:13
And I am really interested to hear more about your use case.

0
Former User of DNAx Community_64
- 21 March 2023 11:01
I'm using unsupervised learning to train representation models on OCT b-slices. This requires the models to access individual slices of many patients multiple (potentially hundreds) of times, so extracting the zip file every time slows this process down significantly. Perhaps I'll try the approach mentioned in the linked post, thanks @Ondrej Klempir? !

0
Ondrej Klempir DNAnexus Team
- 27 March 2023 13:22
Perfect! It would be great to hear your experience then!

0
Former User of DNAx Community_64
- 29 March 2023 07:57
@Ondrej Klempir? as the models (optimally) are only to be trained once, I have opted to unpack my training subset on a sufficiently large compute node and run the experiment as such. Downloading and extracting take around 3h, which in the bigger picture of the experiment is negligible.

0

Please sign in to leave a comment.