Does dxdata still work?

Edited 28 October 2024 17:30
6 comments

I have pip installed dxdata and I've tried using its functions shown in a few github notebooks. Like this one:

https://github.com/dnanexus/OpenBio/blob/master/dxdata/getting_started_with_dxdata.ipynb

But when I try to run dxdata.connect or dxdata.load_dataset it tells me these functions cannot be found in the package. I get an error like: AttributeError: module 'dxdata' has no attribute 'connect'

I am trying to filter to ~100,000 participants in the UKBB cohort, then get their columns of baseline data. How best can I do this if I can't use dxdata? I've extracted data before for the whole cohort using either table exported or dx extract dataset, however this time I'm stuck on needing to filter the number of participants first.

I've tried:

dx create_cohort --from project-[ID]:record-[ID] \

--cohort-ids-file ./Cohort_IDs.csv \

/New_Phenotype_Cohort

But this has an error that some of the IDs are not found, when I clean up/remove those it doesn't find, it just finds more it says are: ValueError: The following supplied IDs do not match IDs in the main entity of dataset,

My IDs are in the format of just the numbers like 1234, 5678 etc.

Comments

6 comments

Daisy V UKB Community team Data Analyst
- 30 October 2024 16:52
Hi Hannah,
dxdata should be available preinstalled on all Jupyter lab with Spark Cluster instances on the UKB-RAP (a spark cluster is required for interacting with the dataset). You don't need to pip install the package. The package will only work when used on the UKB-RAP - it won't work locally on your own computer.
In addition to the DNAnexus notebooks you are using, you may find the UK Biobank Introductory Notebooks helpful.
Hope this resolves the issue,
Daisy

0
Janaki Velmurugan
- 22 April 2025 10:47
Hi there,
I am having similar issue. I am using Jupyter lab with Spark Cluster and following the below tutorial and not able to pass the import("dxdata") step.
https://github.com/UK-Biobank/UKB-RAP-Notebooks-Access/blob/main/JupyterNotebook_R/A106_Hypertension-data_R.ipynb
Could you please advise. My aim is to programmatically create cohort using R.
Thanks,
Janaki

-1
Rachael W UKB Community team Data Analyst
- 22 April 2025 15:10
Hi Janaki,
I've asked dnanexus support about this. In the meantime, I suggest you try the equivalent python notebook A103, as the import dxdata seems to be ok there.

0
Rachael W UKB Community team Data Analyst
- Edited 23 April 2025 07:50
Hi Janaki,
dnanexus support have explained that this is due to an issue with the version of the reticulate package, and they have suggested this workaround:

if(!require(pacman)) install.packages("pacman") pacman::p_load(reticulate, dplyr, parallel, skimr, VennDiagram, grid, scales, ggplot2, readr, arrow)
use_python("/opt/conda/bin/python")
dxdata <- import("dxdata")

There is a related issue in the notebook github, see https://github.com/UK-Biobank/UKB-RAP-Notebooks-Access/issues , which suggests an alternative fix, and we will update the affected notebooks in due course.
Thank you for using the forum.

0
Janaki Velmurugan
- 23 April 2025 10:22
Many thanks Rachel for helping with this.
There seems to be problem with installing dependency ‘assertthat’ and I am not able to install arrow package to go through the tutorial. Could you please advise?

0
Janaki Velmurugan
- 23 April 2025 13:29
Just to update, the above issue with arrow package installation was resolved when a new Jupyterlab spark instance was used. I am able to follow through the tutorial.
Thanks,
Janaki

2

Please sign in to leave a comment.