I'm trying to look for reports of a specific disease in a range of datasets. I used the cohort browser and then ran a Jupyter notebook with PySpark to get the data. ICD-10, -9, etc. works,but "Non-cancer illness code, self-reported" gives a Py4JJava error
I am not sure what exact command you were using. When looking at the Cohort Browser, field 20002 has multiple instances, see printscreen. If using dx extract_dataset, you should specify each instance separately, e.g.
Comments
2 comments
I am not sure what exact command you were using. When looking at the Cohort Browser, field 20002 has multiple instances, see printscreen. If using dx extract_dataset, you should specify each instance separately, e.g.
dx extract_dataset record-GVPjffjJy8JvPXg9gkQKbz6b --fields "participant.eid,participant.p20002_i0,participant.p20002_i1,participant.p20002_i2"
Note, no space after comma.
After using this command I am getting similar to this example data:
participant.eid,participant.p20002_i0,participant.p20002_i1,participant.p20002_i2
1234567,,"[1309]",
1234568,,,
1234569,"[1156,99999,1286,1287]",,
1234570,"[1111,1330]",,
We have the whole example in Jupyter Notebook.
I was filtering in the cohort browser, which means the filter looks like this:
Please sign in to leave a comment.