Uploading many small files very slow:
Hi all, I'm trying to extract zipped images and re-upload them to the platform. Although small (each slice is around 250kB), the upload times are prohibitively slow. What is your approach?
I worked on a similar thing a couple of months ago and I would not recommend to unzip and upload many many files back to platform. It may cause a high load and also navigating the files than is not too quick and convenient. If you decide to use zipped imaging data on RAP, I recommend the following post that shows how to process zipped bulk imaging files: https://community.dnanexus.com/s/question/0D5t000004EtXLYCA3/is-there-a-way-to-extract-the-bulk-imaging-data-using-the-spark-jupyter-notebook
I'm using unsupervised learning to train representation models on OCT b-slices. This requires the models to access individual slices of many patients multiple (potentially hundreds) of times, so extracting the zip file every time slows this process down significantly. Perhaps I'll try the approach mentioned in the linked post, thanks @Ondrej Klempir? !
@Ondrej Klempir? as the models (optimally) are only to be trained once, I have opted to unpack my training subset on a sufficiently large compute node and run the experiment as such. Downloading and extracting take around 3h, which in the bigger picture of the experiment is negligible.
Comments
5 comments
Hi {@005t000000AD7ADAA1}?,
I worked on a similar thing a couple of months ago and I would not recommend to unzip and upload many many files back to platform. It may cause a high load and also navigating the files than is not too quick and convenient. If you decide to use zipped imaging data on RAP, I recommend the following post that shows how to process zipped bulk imaging files: https://community.dnanexus.com/s/question/0D5t000004EtXLYCA3/is-there-a-way-to-extract-the-bulk-imaging-data-using-the-spark-jupyter-notebook
And here is another one about storing imaging files on RAP: https://community.dnanexus.com/s/question/0D5t000004DClaWCAT/where-can-i-save-processed-images-from-the-ukb-bulk-data-and-later-use-them-for-training-the-network-do-we-have-any-example-for-such-task-looking-specifically-in-liver-mri-images
And I am really interested to hear more about your use case.
I'm using unsupervised learning to train representation models on OCT b-slices. This requires the models to access individual slices of many patients multiple (potentially hundreds) of times, so extracting the zip file every time slows this process down significantly. Perhaps I'll try the approach mentioned in the linked post, thanks @Ondrej Klempir? !
Perfect! It would be great to hear your experience then!
@Ondrej Klempir? as the models (optimally) are only to be trained once, I have opted to unpack my training subset on a sufficiently large compute node and run the experiment as such. Downloading and extracting take around 3h, which in the bigger picture of the experiment is negligible.
Please sign in to leave a comment.