Trying to export the entire participant entity using Table Exporter
Hello,
I am now trying to export the entire participant dataset using Table Exporter on DNAnexus within one project which I plan to share with most of the collaborators on my application. I am using High priority and the highest possible instance within the high priority. Is there a time estimation for the process to be completed? I used UKB format and REPLACE option, because I don't want to end up with string labels when there are actual numerical values for the variables.
Thanks!
May
Comments
6 comments
We don't recommend exporting the entire participant entity as a csv since it is so large. The csv that table exporter creates in your project will incur a storage cost - please see the rate card for cost information. Instead, we recommend selecting the fields you are interested in, either to export as a smaller csv into your project using Table Exporter or by querying using Spark in Jupyter Lab.
For more information please see:
Thank you very much! Is the full list of field labels available on the UKB showcase as well or is there another source that I am not aware of? I would like to have the full list of field labels as a dataset that I can use to label the variables in Stata. Thanks!
Hi - yes, you can get a dataset dictionary using the
dx extract_datasetcommand. Open a JupyterLab instance (or RStudio or ttyd) and enter in the terminal:You can find your dataset id by looking at the metadata in the manage section of your project. The ID should start with ‘record-’. Here is an example (dataset ID highlighted in yellow):
There is also an example of finding the dataset ID programmatically in Example Notebook 105, as well as an example of getting the dataset dictionary.
This will result in 3 files in the workspace with information on entities, fields and encodings, including the names of all the columns in the dataset. You can upload these to your project using
dx upload.There is also a folder called Showcase metadata which contains the relevant Showcase Schema (the files on RAP will be up to date with your project, so it is recommended to use those instead of schema downloaded from the website).
Some more information that may be relevant:
Hope this helps!
Hi, When I run the below -
dx extract_dataset project-xxxx:record-xxxx -ddd --delimiter “,”I get the following error message. Can you help?
dxpy.exceptions.PermissionDenied: The file download is not allowed. The file is in a restricted project., code 401. Request Time=1721404869.0652313, Request ID=1721404869127-703203Please open a JupyterLab $_terminal within your project on the RAP to enter dx commands.
Please try
dx extract_dataset record-xxxx --ddd --delimiter “,”
but replace the xxxx with the id of your dataset record (see the images above).
Can this command also be used within a virtual machine set up by Swiss Army Knife? So that the output would end up in the project space without being downloaded?
I.e. I would run Swiss Army Knife with a script containing the above command?
Please sign in to leave a comment.