Importing functions from one Jupyter notebook (or Python script) to another within a DNAnexus JupyterLab session
Hi! I have a number of Python functions that I've written and wish to import into multiple Jupyter notebooks to avoid copy-and-pasting those function definitions repeatedly. Normally, I would just save these functions in a Python script and import them at the top of my notebook. However, this has proven somewhat trickier on DNAnexus and I was looking for advice on how to proceed!
Within a Jupyter lab session, you generally need to specify the full mounted file path of any file you wish to access (eg pd.read_csv("/mnt/project/my_dir/my_file.csv") as opposed to pd.read_csv("my_dir/my_file.csv")). I'm having trouble getting this file path convention working in conjunction with import statements. This is especially an issue for DNAnexus notebooks (as opposed to "local" notebooks that are created within the JupyterLab execution environment). Has anyone found a workaround to this? One alternative is running "dx download my_function_notebook" at the start of each notebook, but this is not optimal, as then I'd have to re-upload and download that notebook each time I make any changes to it.
On a similar note, is there a way to edit Python scripts either within a Jupyter Lab session or elsewhere on DNAnexus, without having to download and reupload them with each set of changes? (I've currently been storing my functions in a Jupyter notebook file instead of a script to make them easier to access, but having them in a script may take out one layer of complication as well.)
Comments
9 comments
Lara says: Normally, I would just save these functions in a Python script and import them at the top of my notebook.
Ondrej replies: I think that this is what DX JupyterLab snapshot offers. You will just need to install/save/set up your python package once, save snapshot and reuse your session as a background env in the future for your all notebooks. I would give it a try!
[https://documentation.dnanexus.com/user/jupyter-notebooks#environment-snapshots]
Lara says: One alternative is running "dx download my_function_notebook" at the start of each notebook, but this is not optimal, as then I'd have to re-upload and download that notebook each time I make any changes to it.
Ondrej replies: What you described as not optimal, for me, I consider it as my primary choice how to work with JL ntbks on RAP. I do not use much the notebooks visible from the parent project (accessible from the DNAnexus tab in the left sidebar - mounted via dxfuse). Since DNAnexus files are immutable, whenever you save the notebook, the current version should be uploaded to the project and replaces the previous version, i.e. the file of the same name.
[https://documentation.dnanexus.com/user/jupyter-notebooks/quickstart#3.-edit-and-save-the-notebook-in-the-project]
Lara says: On a similar note, is there a way to edit Python scripts either within a Jupyter Lab session or elsewhere on DNAnexus, without having to download and reupload them with each set of changes?
Ondrej replies:
As per doc,
https://documentation.dnanexus.com/user/jupyter-notebooks#accessing-data
https://documentation.dnanexus.com/user/jupyter-notebooks#uploading-data
I believe that "download/upload" process is a recommended way how to interact (non-read-only fashion) with permanent vs. temporary storages in the cloud.
Maybe some other folks will know better.
Hmm, when I've accessed notebooks from the DNAnexus tab, I've found this to be the case. But when I download and upload, I find that I just end up with a new copy of the file, and it leaves the old one behind as well instead of replacing it. (This is why I do tend to work with the mounted notebooks instead, because I've found dealing with the buildup of notebook versions from repeated download/upload became a bit of a headache.) Not sure why we seem to have opposite experiences here though
Thanks for all this helpful info! This is something I definitely will try in the future. For now, my functions are still in flux, which is why I was hoping for a way to keep them easily editable while also importing them in dynamic notebooks.
Thanks! Similar to my question above, with respect to notebooks, is there a way for the upload to replace the previous version of file of the same name, instead of building up numerous copies?
re: Is there a way to replace previous versions of notebooks instead of building numerous copies?
Let's say you saved your jupyter notebook on your project platform (permanent). Now you want to edit this notebook and save the changes. When you start a jupyter lab session you can navigate to your project platform "DNAnexus" on the left hand side bar (see documentation)
Then you can open your saved jupyter notebook, edit it. Then when you click to save it, your new edits will be saved to your existing jupyter notebook in your project. You won't need to re-upload and have multiple copies of notebooks.
https://documentation.dnanexus.com/user/jupyter-notebooks#the-project-on-the-dnanexus-platform
You can try to combine dx upload & dx rm into a one-liner, alias or shell script (and have it activated ideally as a part of the JL snapshot), i.e. first delete the file(s) with given name and then upload. It seems there is no dx upload -f command to force delete if the file exists, but you should be able to combine the two commands into desired logic.
At the same time / From time to time, I would also dx upload the same notebook to a different back up directory (e.g. keeping the timestamp of data creation) to have a full archive of the notebooks.
----------------------------
dx rm -h
usage: dx rm [-h] [--env-help] [-a] [-r] [-f] path [path ...]
Remove data objects and folders.
positional arguments:
path Paths to remove
optional arguments:
-h, --help show this help message and exit
--env-help Display help message for overriding environment variables
-a, --all Apply to all results with the same name without prompting
-r, --recursive Recurse into a directory
-f, --force Force removal of files
Sorry for not being clear, the sentence "Since DNAnexus files are immutable, whenever you save the notebook, the current version should be uploaded to the project and replaces the previous version, i.e. the file of the same name." was taken from documentation and it is actually related to DNAnexus Tab (mounted by dxfuse). This is not the case for the approach I prefer, i.e. download and upload. See below what I would do.
Please sign in to leave a comment.