I'm trying to look for reports of a specific disease in a range of datasets. I used the cohort browser and then ran a Jupyter notebook with PySpark to get the data. ICD-10, -9, etc. works,but "Non-cancer illness code, self-reported" gives a Py4JJava error

2 comments

Anastazie Sedlakova DNAnexus Team
- 26 July 2023 11:53
I am not sure what exact command you were using. When looking at the Cohort Browser, field 20002 has multiple instances, see printscreen. If using dx extract_dataset, you should specify each instance separately, e.g.

dx extract_dataset record-GVPjffjJy8JvPXg9gkQKbz6b --fields "participant.eid,participant.p20002_i0,participant.p20002_i1,participant.p20002_i2"

Note, no space after comma.

After using this command I am getting similar to this example data:

participant.eid,participant.p20002_i0,participant.p20002_i1,participant.p20002_i2
1234567,,"[1309]",
1234568,,,
1234569,"[1156,99999,1286,1287]",,
1234570,"[1111,1330]",,

We have the whole example in Jupyter Notebook.

0
Former User of DNAx Community_20
- 27 July 2023 18:52
I was filtering in the cohort browser, which means the filter looks like this:

0

Please sign in to leave a comment.