Issue with exporting 'eid' using Table Exporter

I noticed that Table Exporter always runs into problems when exporting 'eid'.   For example, the job below seems alright in terms of syntax, but it ran forever, and I had to terminate it after 14 hours. However, when  -ifield_names="eid" was removed, the job completed in less than 10 mins. I wonder wat is the problem?   dataset="project-GQzBFBjJkYqB4zzbFyGPbb21:record-GQzYQGQJ0GJqKx4xX9PYGkVk" dx run table-exporter -idataset_or_cohort_or_dashboard=${dataset}\  -ioutput="test7" \  -ioutput_format="TSV" \  -iheader_style="UKB-FORMAT" \  -ifield_names="p22189" \  -ifield_names="p21022" \  -ifield_names="eid" \  --brief \  --yes

Comments

5 comments

  • Comment author
    Chai Fungtammasan DNAnexus Team

    Thanks for reporting this. It's a known bug for this eid field, but the engineering team needs to make quite a revamp to fix it. As a temporary workaround, you can specify "entity" of the app as "participant" which could get around the issue.

     

    https://community.dnanexus.com/s/question/0D5t0000048t9hUCAQ/i-want-to-run-a-phewas-analysis-using-phesant-and-currently-trying-to-generate-the-phenotype-file-i-am-using-table-exporter-to-extract-the-phenotype-data-for-my-variables-of-interest

    1
  • Comment author
    Former User of DNAx Community_10

    I would like to use this code to get the same table. I have the project ID.

    project-GJB3GpQJBJkYK0j74jp4vJZ9

    How do I get the record ID.

    Will the rest of the code run.

    0
  • Comment author
    Peter Joshi

    Sorry for the long post,  but I found the documentation and errors so opaque, I think this might help someone.

    I imagine most users want to include "eid" when exporting data using table-exporter. While I understand that fixing the root issue might be complex, the documentation could be updated to prevent confusion. Currently, the error that occurs when "entity": "participant" is not set is very opaque - my system just seemed to wait endlessly for something that never appeared. It took me several days to isolate the cause and then consult frame the right question to consult the community. Improving the docs or default parameters would save users a lot of time.

    ✅ Suggested Fixes:

    • Add a clear note in the documentation about the importance of setting "entity": "participant" when working with participant data.
    • Consider making "participant" the default value for entity.


    Anyway, here's what I found worked

    🧪 Working Input Set (for reference):

    [0] Output Prefix (output)                  = "test_entity_participant"
    [1] Output File Format (output_format)     = "TSV"
    [2] Coding Option (coding_option)          = "REPLACE" (default)
    [3] Header Style (header_style)            = "FIELD-NAME" (default)
    [4] Entity (entity)                        = "participant"
    [5] File containing Field Names (field_names_file_txt)
    [6] Field Names (field_names)              = ["eid", "p31", "p30620_i0"]
    [7] Field Titles (field_titles)
    [8] Cohort Table Entity Names (cohort_table_entity_names)
    [9] Cohort Table Entity Titles (cohort_table_entity_titles)
    

    🖥️ Here's how I got the thing to work end to end

    dx cd /
    dx run table-exporter
    

    This launches interactive mode:

    Input:   Dataset or Cohort or Dashboard (dataset_or_cohort_or_dashboard)
    Class:   record
    

    I entered typing app TAB TAB and got dx autofill to work

    dataset_or_cohort_or_dashboard: appXXXXX_20250104090009.dataset
    

    Then I completed the other fields as above. Hitting ENTER alone submitted the job.

     

    After submission, dx prints a usable JSON block. Although as far as I have seen undocumented, this can be reused with:

    dx run table-exporter --input-json '...'
    
    where ... is 
    
    {
        "dataset_or_cohort_or_dashboard": {
            "$dnanexus_link": {
                "project": "project-qqqq",
                "id": "record-qqqqq"
            }
        },
        "output_format": "TSV",
        "output": "test_entity_participant",
        "field_names": [
            "eid",
            "p31",
            "p30620_i0"
        ],
        "entity": "participant"
    }
    
    


    If you vim/nano/save the JSON to input.json:

    dx run table-exporter --input-json-file input.json
    
    

    Note: The --input-json-file path refers to a local filesystem path, not a DNAnexus path. I imagine you can also use /mnt/project/... via FUSE, though FUSE has high latency.

    The run takes a few minutes (i.e. less than 10), mainly getting the server started. Not the endless hours waiting for nothing if entity is not specified.

    Here's what the output looks like

    | eid   | p31    | p30620_i0 |
    |-------|--------|------------|
    | AAAAA | Male   | xx.xx      |
    | CCCCC | Female | xx.xx      |
    | EEEEE | Female | xx.xx      |
    | BBBBB | Male   |            |
    | DDDDD | Female | xx.xx      |
    

    Hope this helps someone. Let me know if anything is unclear or could be improved.

    5
  • Comment author
    Joseph Paillard

    I had the same problem; the solution from Peter Joshi also worked for me. Thank you!

    2
  • Comment author
    Andrew Silberfeld

    Same here. Thank you Peter!! Knowing that you have to specify the ‘participant’ entity made a big difference.

    0

Please sign in to leave a comment.