Trying to export the entire participant entity using Table Exporter

Hello,

I am now trying to export the entire participant dataset using Table Exporter on DNAnexus within one project which I plan to share with most of the collaborators on my application. I am using High priority and the highest possible instance within the high priority. Is there a time estimation for the process to be completed? I used UKB format and REPLACE option, because I don't want to end up with string labels when there are actual numerical values for the variables. 

 

Thanks!

 

May

 

Comments

6 comments

  • Comment author
    Daisy V The helpers that keep the community running smoothly. UKB Community team Data Analyst

    We don't recommend exporting the entire participant entity as a csv since it is so large. The csv that table exporter creates in your project will incur a storage cost - please see the rate card for cost information. Instead, we recommend selecting the fields you are interested in, either to export as a smaller csv into your project using Table Exporter or by querying using Spark in Jupyter Lab.

    For more information please see:

     

    0
  • Comment author
    May A. Baydoun

    Thank you very much! Is the full list of field labels available on the UKB showcase as well or is there another source that I am not aware of? I would like to have the full list of field labels as a dataset that I can use to label the variables in Stata. Thanks!

    0
  • Comment author
    Daisy V The helpers that keep the community running smoothly. UKB Community team Data Analyst

    Hi - yes, you can get a dataset dictionary using the dx extract_dataset command. Open a JupyterLab instance (or RStudio or ttyd) and enter in the terminal:

    dx extract_dataset <your_dataset_id> -ddd

    You can find your dataset id by looking at the metadata in the manage section of your project. The ID should start with ‘record-’. Here is an example (dataset ID highlighted in yellow):

    There is also an example of finding the dataset ID programmatically in Example Notebook 105, as well as an example of getting the dataset dictionary.

    This will result in 3 files in the workspace with information on entities, fields and encodings, including the names of all the columns in the dataset. You can upload these to your project using dx upload.

    There is also a folder called Showcase metadata which contains the relevant Showcase Schema (the files on RAP will be up to date with your project, so it is recommended to use those instead of schema downloaded from the website). 

    Some more information that may be relevant:

    Hope this helps!

    1
  • Comment author
    Kaarina Kowalec

    Hi, When I run the below -

    dx extract_dataset project-xxxx:record-xxxx -ddd --delimiter “,”

     I get the following error message. Can you help?

    dxpy.exceptions.PermissionDenied: The file download is not allowed. The file is in a restricted project., code 401. Request Time=1721404869.0652313, Request ID=1721404869127-703203

    0
  • Comment author
    Rachael W The helpers that keep the community running smoothly. UKB Community team Data Analyst

    Please open a JupyterLab $_terminal within your project on the RAP to enter dx commands.

    Please try 

    dx extract_dataset record-xxxx --ddd --delimiter “,”

    but replace the xxxx with the id of your dataset record (see the images above).

    0
  • Comment author
    David Curtis

    Can this command also be used within a virtual machine set up by Swiss Army Knife? So that the output would end up in the project space without being downloaded?

    I.e. I would run Swiss Army  Knife with a script containing the above command?

    0

Please sign in to leave a comment.