Running Table Exporter inside a Posit Workbench instance
Can we please have the ability to run the Table Exporter tool locally within a Posit Workbench?
Currently the process to work with phenotype data has quite a bit of cognitive overhead as we have to:
(1) Spin up an Posit Workbench to curate a list of field IDs we want to extract.
(2) Run the Table Exporter tool on this list of field IDs to extract the corresponding field data into the persistent storage for the project. This can be done either by uploading the list of field IDs to the project storage then running Table Exporter tool, or by launching the Table Exporter tool with `dx run`.
(3) In both cases, Table Exporter saves the extracted dataset to the project storage, and then this needs to be download into the Posit Workbench instance.
One can either leave the Posit Workbench instance running between steps 2 and 3, or Terminate the session to save a few pence and come back and wait for a new instance to spin up once Table Exporter is finished.
Given that the Table Exporter jobs typically use the same instance type as the Posit Workbench, it seems like it should be more straightforward to just run Table Exporter locally from within the Posit Workbench without needing to spin up a separate instance for the Table Exporter job, and would reduce the cognitive overhead of having to interact with the project storage in the middle of an analysis pipeline.
Comments
3 comments
Hi Scott,
have you considered using a JupyterLab R notebook instead of a Posit R session? You would then be able to use Spark commands to interact with the tabular data in the Parquet database. There are some example notebooks available at https://github.com/UK-Biobank/UKB-RAP-Notebooks-Access to help with that.
Thank you for using the forum.
Hello Scott,
You might be particularly interested in seeing the A110 notebook on the UK Biobank Github, which walks you through how to use table exporter in RStudio:
https://github.com/UK-Biobank/UKB-RAP-Notebooks-Access/blob/main/RStudio/A110_Export_participant_data.Rmd
Hi Bethan,
Thanks, this is essentially the solution I've ended up going with after learning the platform in more detail, with an added step that after submitting the Table Exporter job with dx run, I have a while loop that checks the job status every 60 seconds then continues with the rest of the script once the job has completed (or throws an error if the job fails).
Please sign in to leave a comment.