I'm trying to look for reports of a specific disease in a range of datasets. I used the cohort browser and then ran a Jupyter notebook with PySpark to get the data. ICD-10, -9, etc. works,but "Non-cancer illness code, self-reported" gives a Py4JJava error

Comments

2 comments

  • Comment author
    Anastazie Sedlakova DNAnexus Team

    I am not sure what exact command you were using. When looking at the Cohort Browser, field 20002 has multiple instances, see printscreen. If using dx extract_dataset, you should specify each instance separately, e.g.

     

    dx extract_dataset record-GVPjffjJy8JvPXg9gkQKbz6b --fields "participant.eid,participant.p20002_i0,participant.p20002_i1,participant.p20002_i2"

     

    Note, no space after comma.

     

    After using this command I am getting similar to this example data:

     

    participant.eid,participant.p20002_i0,participant.p20002_i1,participant.p20002_i2

    1234567,,"[1309]",

    1234568,,,

    1234569,"[1156,99999,1286,1287]",,

    1234570,"[1111,1330]",,

     

    We have the whole example in Jupyter Notebook.

     

    Screenshot

    0
  • I was filtering in the cohort browser, which means the filter looks like this:Screenshot 2023-07-27 at 2.52.03 PM

    0

Please sign in to leave a comment.