I would like to access phenotype data from within a WDL task using a python script. How do I localize the database file as an input to the task? I'm getting a wrong type error.
I'd like to do something similar to this jupyter notebook https://github.com/dnanexus/OpenBio/blob/master/UKB_notebooks/ukb-rap-pheno-basic.ipynb , but in a python script that is executed in a WDL task instead of as a jupyter notebook. In order to access the appNNN_YYY.dataset file using dxpy from that python script, I presume I need to localize that dataset file to the WDL task. So in my WDL I have
task ... {
input {
File dataset = "dx://ProjectName:/appNNN_YYY.dataset"
}
...
}
However, I'm getting the following error:
failure executing Task action 'run'
java.lang.Exception: Found dx:object of the wrong type DxRecord(record-NNN,Some(DxProject(project-NNN)))
It seems that DNANexus considers this dataset file to have the type DxRecord instead of the type File, and so is failing to map it to the File type that the WDL requires. But WDL doesn't have a corresponding record type.
What's the solution here? I fundamentally just want to access phenotype data programmatically from my script in a WDL task.
Comments
3 comments
In my opinion, this will not be possible to implement in WDL. I believe that interacting with dnax database object from WDL is not supported (actually the reason is IMO that Spark WDL workflows are not supported).
I wanted to do something similar as you some time ago, but later I decided to rather go with noninteractive Spark JupyterLab, which is Spark app and supports working with dnax database.
https://documentation.dnanexus.com/user/jupyter-notebooks#non-interactive-execution-of-notebooks
Alternatively for pheno data extraction, you may want to try dx extract dataset (I have not yet tested this, but this might be a workaround) directly from WDL task to extract phenotypic data. You could also build an applet which implements the dx extract dataset logic.
https://documentation.dnanexus.com/user/helpstrings-of-sdk-command-line-utilities#extract_dataset
Another tip is explained here: https://github.com/dnanexus/dxWDL/blob/v1/doc/ExpertOptions.md#calling-existing-applets
This is describing a way how to call an applet from WDL (so you can combine dxni applets, e.g. pheno extraction described above, into a large WDL workflow).
Maybe other folks here will know better answer.
If the native DNAnexus applet could have the ability to access database file, another solution is to use hybrid approach where the applet is native DNAnexus applet, but you chain them up as workflow using WDL.
Thank you both. Per both of your suggestions, I'm trying to use the native DNAnexus applet table_exporter to do the work for me and chain it together with WDL. But I'm running into some issues, which I posted as another question here: Question Detail (dnanexus.com)
Please sign in to leave a comment.