Where can I save processed images from the UKB bulk data and later use them for training the network. Do we have any example for such task. Looking specifically in LIVER MRI IMAGES.
For your processed image, which file format is it stored in? Do you extracted image derived phenotypes and have your image described by series of numbers? Or do you output image as an output of your pipeline (is it e.g. segmented mask)? Which image processing task do you apply?
Also which network/method you plan to train on the processed data?
0
Permanently deleted user
Thanks Ondrej for a quick response. I want to do some post processing on the liverImages (normalize,crop, resize, etc.) and save them as .png format to train the model using deep learning. I am just confused how to do that on the DNAnexus platform. I want to train barlow twins on the IDEAL and Shmolli protocal and will also need the segmented images in the training process. but can i just work as i do on the local system. save them on the DNAnexus storage and load them while training.
A) Of course, you may want to create a folder structure in the DNAnexus UKB RAP project and save the images there (dx upload your files back to parent project once images are processed). This could be sufficient option when working with smaller number of files - which I assume is not the case here and you aim at training your models on many many training samples (thousands?). Maybe even more, if you do some image augmentation and create 1:N images. If that is the case, I would avoid this option. Uploading many files to platform might also affect performance, so I would not recommend it for big data.
B) In case you do all the processing as one pipeline running on one worker, you could zip it, store it on platform. And later download to the worker and filter relevant images for training. Alternatively, you can choose a level of zip granularity, e.g. zipped images for IDEAL, zipped images for Shmolli protocol and zipped images for segmentation or something like this, i.e. to decrease the total number of images being uploaded to platform. At the same let's aim to solution which is convenient for your use case and skill.
C) Data serialization. I am assuming you work with Python. I would give an attempt to e.g. HDF5. HDF5 stands for Hierarchical Data Format. I found this article very useful to read more about image databases and serialization: https://realpython.com/storing-images-in-python/#storing-with-hdf5 To be honest, I do not have hands on experience with HDF5 on RAP, but I am considering it as very promising for image storing.
0
Permanently deleted user
Thanks a lot @Ondrej Klempir? for a detailed reply. I will try out your suggestions.
Comments
4 comments
For your processed image, which file format is it stored in? Do you extracted image derived phenotypes and have your image described by series of numbers? Or do you output image as an output of your pipeline (is it e.g. segmented mask)? Which image processing task do you apply?
Also which network/method you plan to train on the processed data?
Thanks Ondrej for a quick response. I want to do some post processing on the liverImages (normalize,crop, resize, etc.) and save them as .png format to train the model using deep learning. I am just confused how to do that on the DNAnexus platform. I want to train barlow twins on the IDEAL and Shmolli protocal and will also need the segmented images in the training process. but can i just work as i do on the local system. save them on the DNAnexus storage and load them while training.
Hi {@005t000000Aqg0DAAR}?, here are my ideas:
A) Of course, you may want to create a folder structure in the DNAnexus UKB RAP project and save the images there (dx upload your files back to parent project once images are processed). This could be sufficient option when working with smaller number of files - which I assume is not the case here and you aim at training your models on many many training samples (thousands?). Maybe even more, if you do some image augmentation and create 1:N images. If that is the case, I would avoid this option. Uploading many files to platform might also affect performance, so I would not recommend it for big data.
B) In case you do all the processing as one pipeline running on one worker, you could zip it, store it on platform. And later download to the worker and filter relevant images for training. Alternatively, you can choose a level of zip granularity, e.g. zipped images for IDEAL, zipped images for Shmolli protocol and zipped images for segmentation or something like this, i.e. to decrease the total number of images being uploaded to platform. At the same let's aim to solution which is convenient for your use case and skill.
C) Data serialization. I am assuming you work with Python. I would give an attempt to e.g. HDF5. HDF5 stands for Hierarchical Data Format. I found this article very useful to read more about image databases and serialization: https://realpython.com/storing-images-in-python/#storing-with-hdf5 To be honest, I do not have hands on experience with HDF5 on RAP, but I am considering it as very promising for image storing.
Thanks a lot @Ondrej Klempir? for a detailed reply. I will try out your suggestions.
Please sign in to leave a comment.