How to use dx extract_dataset without download to local machine
This seems like it should be obvious but I haven't figured it out yet: is there a way to use dx extract_dataset to create a file with a subset of data fields within the user's RAP project directory instead of on the user's local machine? It doesn't make sense that the extracted dataset is stored locally when we are no longer allowed to keep individual-level data and all such files are supposed to be stored and analyzed on the RAP. This CLI tool seems more efficient to use than building a cohort and using the Table Exporter, but I have not found an option in dx extract_datset to redirect output to the RAP.
Comments
1 comment
Hi Jeanne, one way to deal with this would be to start a JupyterLab instance and a $_ terminal, and use the dx command there. This would extract the dataset into the JupyterLab instance storage. You could then copy it from the JupyterLab storage to your main UKB-RAP project directory using dx upload. See https://community.ukbiobank.ac.uk/hc/en-gb/community/posts/23300100792221-How-do-I-extract-the-entire-Proteomic-data-without-being-linked-to-a-specific-Phenotype-cohort-from-the-browser
Another possibility might be the ttyd app, but I haven’t used that so I can’t say for sure.
The dx extract_dataset command is good for small amounts of data, but some researchers have found it a bit limiting for large exports, so the table exporter or Spark commands might be more convenient.
Please sign in to leave a comment.