I'd ideally like to be able to back up results of downstream analyses as well as code/scripts stored on RAP.
On our local HPC, the whole filesystem is automatically backed up periodically, which we have found to be a useful feature (providing a safeguard against accidental file deletion and allowing rollback of errant changes).
I understand that the "original" UKB data will always be available, which is helpful. I'm mostly hoping to avoid a scenario in which we perform a series of analyses over the next 3 years and then one day someone accidentally deletes a directory and we have to recover a long pipeline.
It is possible to copy files of your downstream analyses to backup projects periodically. It should not be expensive because copies (clones) do not duplicate data. That said, copies do make DNAnexus hardlinks (metadata objects). It should be okay if it is a smaller number of files though.
And where you can keep your code as you work on the long pipelines?
I would do a ?release? to git.
0
Permanently deleted user
This is helpful to know; thanks.
Given that copies (clones) do not duplicate data, what would happen to the cloned file in the backup project if the original file was deleted?
Comments
5 comments
Could please tell me and the Community more about what you understand as backup, what data you want to backup and where you want to back it up?
Actually, "original" UKB data will always be available to be dispensed: https://dnanexus.gitbook.io/uk-biobank-rap/getting-started/creating-a-project
And data is refreshed: https://dnanexus.gitbook.io/uk-biobank-rap/getting-started/updating-dispensed-data
I'd ideally like to be able to back up results of downstream analyses as well as code/scripts stored on RAP.
On our local HPC, the whole filesystem is automatically backed up periodically, which we have found to be a useful feature (providing a safeguard against accidental file deletion and allowing rollback of errant changes).
I understand that the "original" UKB data will always be available, which is helpful. I'm mostly hoping to avoid a scenario in which we perform a series of analyses over the next 3 years and then one day someone accidentally deletes a directory and we have to recover a long pipeline.
It is possible to copy files of your downstream analyses to backup projects periodically. It should not be expensive because copies (clones) do not duplicate data. That said, copies do make DNAnexus hardlinks (metadata objects). It should be okay if it is a smaller number of files though.
And where you can keep your code as you work on the long pipelines?
I would do a ?release? to git.
This is helpful to know; thanks.
Given that copies (clones) do not duplicate data, what would happen to the cloned file in the backup project if the original file was deleted?
Po-Ru
I did a quick testing experiment:
Please sign in to leave a comment.