Does dxdata still work?

Hannah Louise Nicholls

I have pip installed dxdata and I've tried using its functions shown in a few github notebooks. Like this one:

https://github.com/dnanexus/OpenBio/blob/master/dxdata/getting_started_with_dxdata.ipynb

 

But when I try to run dxdata.connect or dxdata.load_dataset it tells me these functions cannot be found in the package. I get an error like: AttributeError: module 'dxdata' has no attribute 'connect'

 

I am trying to filter to ~100,000 participants in the UKBB cohort, then get their columns of baseline data. How best can I do this if I can't use dxdata? I've extracted data before for the whole cohort using either table exported or dx extract dataset, however this time I'm stuck on needing to filter the number of participants first.

 

 

I've tried:

 

dx create_cohort --from project-[ID]:record-[ID] \

--cohort-ids-file ./Cohort_IDs.csv \

/New_Phenotype_Cohort

 

 

But this has an error that some of the IDs are not found, when I clean up/remove those it doesn't find, it just finds more it says are: ValueError: The following supplied IDs do not match IDs in the main entity of dataset,

My IDs are in the format of just the numbers like 1234, 5678 etc.

Comments

6 comments

  • Comment author
    Daisy V The helpers that keep the community running smoothly. UKB Community team Data Analyst

    Hi Hannah,

    dxdata should be available preinstalled on all Jupyter lab with Spark Cluster instances on the UKB-RAP (a spark cluster is required for interacting with the dataset). You don't need to pip install the package. The package will only work when used on the UKB-RAP - it won't work locally on your own computer.

    In addition to the DNAnexus notebooks you are using, you may find the UK Biobank Introductory Notebooks helpful.

    Hope this resolves the issue,

    Daisy

    0
  • Comment author
    Janaki Velmurugan

    Hi there,

    I am having similar issue. I am using Jupyter lab with Spark Cluster and following the below tutorial and not able to pass the import("dxdata") step.   

    https://github.com/UK-Biobank/UKB-RAP-Notebooks-Access/blob/main/JupyterNotebook_R/A106_Hypertension-data_R.ipynb

    Could you please advise. My aim is to programmatically create cohort using R. 

    Thanks,

    Janaki

    -1
  • Comment author
    Rachael W The helpers that keep the community running smoothly. UKB Community team Data Analyst

    Hi Janaki,

    I've asked dnanexus support about this.  In the meantime, I suggest you try the equivalent python notebook A103, as the import dxdata seems to be ok there.

    0
  • Comment author
    Rachael W The helpers that keep the community running smoothly. UKB Community team Data Analyst
    • Edited

    Hi Janaki,

    dnanexus support have explained that this is due to an issue with the version of the reticulate package, and they have suggested this workaround:

     

    if(!require(pacman)) install.packages("pacman") pacman::p_load(reticulate, dplyr, parallel, skimr, VennDiagram, grid, scales, ggplot2, readr, arrow)

    use_python("/opt/conda/bin/python")

    dxdata <- import("dxdata")

     

    There is a related issue in the notebook github, see https://github.com/UK-Biobank/UKB-RAP-Notebooks-Access/issues , which suggests an alternative fix, and we will update the affected notebooks in due course.

    Thank you for using the forum.

    0
  • Comment author
    Janaki Velmurugan

    Many thanks Rachel for helping with this. 

    There seems to be problem with installing dependency ‘assertthat’ and I am not able to install arrow package to go through the tutorial. Could you please advise?

    0
  • Comment author
    Janaki Velmurugan

    Just to update, the above issue with arrow package installation was resolved when a new Jupyterlab spark instance was used. I am able to follow through the tutorial.

    Thanks,

    Janaki

    2

Please sign in to leave a comment.