Does dxdata still work?
I have pip installed dxdata and I've tried using its functions shown in a few github notebooks. Like this one:
https://github.com/dnanexus/OpenBio/blob/master/dxdata/getting_started_with_dxdata.ipynb
But when I try to run dxdata.connect or dxdata.load_dataset it tells me these functions cannot be found in the package. I get an error like: AttributeError: module 'dxdata' has no attribute 'connect'
I am trying to filter to ~100,000 participants in the UKBB cohort, then get their columns of baseline data. How best can I do this if I can't use dxdata? I've extracted data before for the whole cohort using either table exported or dx extract dataset, however this time I'm stuck on needing to filter the number of participants first.
I've tried:
dx create_cohort --from project-[ID]:record-[ID] \
--cohort-ids-file ./Cohort_IDs.csv \
/New_Phenotype_Cohort
But this has an error that some of the IDs are not found, when I clean up/remove those it doesn't find, it just finds more it says are: ValueError: The following supplied IDs do not match IDs in the main entity of dataset,
My IDs are in the format of just the numbers like 1234, 5678 etc.
Comments
6 comments
Hi Hannah,
dxdata should be available preinstalled on all Jupyter lab with Spark Cluster instances on the UKB-RAP (a spark cluster is required for interacting with the dataset). You don't need to pip install the package. The package will only work when used on the UKB-RAP - it won't work locally on your own computer.
In addition to the DNAnexus notebooks you are using, you may find the UK Biobank Introductory Notebooks helpful.
Hope this resolves the issue,
Daisy
Hi there,
I am having similar issue. I am using Jupyter lab with Spark Cluster and following the below tutorial and not able to pass the import("dxdata") step.
https://github.com/UK-Biobank/UKB-RAP-Notebooks-Access/blob/main/JupyterNotebook_R/A106_Hypertension-data_R.ipynb
Could you please advise. My aim is to programmatically create cohort using R.
Thanks,
Janaki
Hi Janaki,
I've asked dnanexus support about this. In the meantime, I suggest you try the equivalent python notebook A103, as the import dxdata seems to be ok there.
Hi Janaki,
dnanexus support have explained that this is due to an issue with the version of the reticulate package, and they have suggested this workaround:
if(!require(pacman)) install.packages("pacman") pacman::p_load(reticulate, dplyr, parallel, skimr, VennDiagram, grid, scales, ggplot2, readr, arrow)
use_python("/opt/conda/bin/python")
dxdata <- import("dxdata")
There is a related issue in the notebook github, see https://github.com/UK-Biobank/UKB-RAP-Notebooks-Access/issues , which suggests an alternative fix, and we will update the affected notebooks in due course.
Thank you for using the forum.
Many thanks Rachel for helping with this.
There seems to be problem with installing dependency ‘assertthat’ and I am not able to install arrow package to go through the tutorial. Could you please advise?
Just to update, the above issue with arrow package installation was resolved when a new Jupyterlab spark instance was used. I am able to follow through the tutorial.
Thanks,
Janaki
Please sign in to leave a comment.