dx download not transferring .dataset file
I'm trying to move the main dataset from my DNA Nexus project storage to a Jupyter Lab instance with "dx download" as per this tutorial video, so that I can interact with the data.
I was able to copy another file (.csv) to the instance, but not app103105_20250603231112.dataset, which I assume is the main dataset? When I try to “dx download” that file name, the Terminal prints "Skipping non-file data object app103105_20250603231112.dataset (record-J0zkP8jJxgZ12q1ypp0y0886)" and then stops.
Has anyone seen and resolved this issue? Or is there a different approach for accessing the data via Jupyter Lab?
Comments
1 comment
Hi Janet,
the dataset record item is not a file. It relates to the Parquet database that holds all the tabular data. In order to work with the tabular data, you first need to extract a copy of the sections you need into a file (or several files), and then use dx download to copy the file into the JupyterLab instance.
This article has more information https://community.ukbiobank.ac.uk/hc/en-gb/articles/26224573928349-Working-with-Jupyter-Notebooks
This thread might be useful: https://community.ukbiobank.ac.uk/hc/en-gb/community/posts/18637026979485-How-to-extract-selected-cohort-in-csv-format .
Thank you for using the forum.
Please sign in to leave a comment.