Issue with exporting 'eid' using Table Exporter

24 May 2023 00:00
5 comments

I noticed that Table Exporter always runs into problems when exporting 'eid'. For example, the job below seems alright in terms of syntax, but it ran forever, and I had to terminate it after 14 hours. However, when -ifield_names="eid" was removed, the job completed in less than 10 mins. I wonder wat is the problem? dataset="project-GQzBFBjJkYqB4zzbFyGPbb21:record-GQzYQGQJ0GJqKx4xX9PYGkVk" dx run table-exporter -idataset_or_cohort_or_dashboard=${dataset}\ -ioutput="test7" \ -ioutput_format="TSV" \ -iheader_style="UKB-FORMAT" \ -ifield_names="p22189" \ -ifield_names="p21022" \ -ifield_names="eid" \ --brief \ --yes

Comments

5 comments

Chai Fungtammasan DNAnexus Team
- 25 May 2023 02:45
Thanks for reporting this. It's a known bug for this eid field, but the engineering team needs to make quite a revamp to fix it. As a temporary workaround, you can specify "entity" of the app as "participant" which could get around the issue.

https://community.dnanexus.com/s/question/0D5t0000048t9hUCAQ/i-want-to-run-a-phewas-analysis-using-phesant-and-currently-trying-to-generate-the-phenotype-file-i-am-using-table-exporter-to-extract-the-phenotype-data-for-my-variables-of-interest

1
Former User of DNAx Community_10
- 26 November 2023 17:08
I would like to use this code to get the same table. I have the project ID.
project-GJB3GpQJBJkYK0j74jp4vJZ9
How do I get the record ID.
Will the rest of the code run.

0
Peter Joshi
- 17 April 2025 14:28
Sorry for the long post, but I found the documentation and errors so opaque, I think this might help someone.
I imagine most users want to include "eid" when exporting data using table-exporter. While I understand that fixing the root issue might be complex, the documentation could be updated to prevent confusion. Currently, the error that occurs when "entity": "participant" is not set is very opaque - my system just seemed to wait endlessly for something that never appeared. It took me several days to isolate the cause and then consult frame the right question to consult the community. Improving the docs or default parameters would save users a lot of time.
✅ Suggested Fixes:
- Add a clear note in the documentation about the importance of setting "entity": "participant" when working with participant data.
- Consider making "participant" the default value for entity.
Anyway, here's what I found worked

🧪 Working Input Set (for reference):
```
[0] Output Prefix (output)                  = "test_entity_participant"
[1] Output File Format (output_format)     = "TSV"
[2] Coding Option (coding_option)          = "REPLACE" (default)
[3] Header Style (header_style)            = "FIELD-NAME" (default)
[4] Entity (entity)                        = "participant"
[5] File containing Field Names (field_names_file_txt)
[6] Field Names (field_names)              = ["eid", "p31", "p30620_i0"]
[7] Field Titles (field_titles)
[8] Cohort Table Entity Names (cohort_table_entity_names)
[9] Cohort Table Entity Titles (cohort_table_entity_titles)
```
🖥️ Here's how I got the thing to work end to end
```
dx cd /
dx run table-exporter
```
This launches interactive mode:
```
Input:   Dataset or Cohort or Dashboard (dataset_or_cohort_or_dashboard)
Class:   record
```
I entered typing app TAB TAB and got dx autofill to work
```
dataset_or_cohort_or_dashboard: appXXXXX_20250104090009.dataset
```
Then I completed the other fields as above. Hitting ENTER alone submitted the job.

After submission, dx prints a usable JSON block. Although as far as I have seen undocumented, this can be reused with:
```
dx run table-exporter --input-json '...'

where ... is 

{
    "dataset_or_cohort_or_dashboard": {
        "$dnanexus_link": {
            "project": "project-qqqq",
            "id": "record-qqqqq"
        }
    },
    "output_format": "TSV",
    "output": "test_entity_participant",
    "field_names": [
        "eid",
        "p31",
        "p30620_i0"
    ],
    "entity": "participant"
}
```
If you vim/nano/save the JSON to input.json:
```
dx run table-exporter --input-json-file input.json
```
Note: The --input-json-file path refers to a local filesystem path, not a DNAnexus path. I imagine you can also use /mnt/project/... via FUSE, though FUSE has high latency.

The run takes a few minutes (i.e. less than 10), mainly getting the server started. Not the endless hours waiting for nothing if entity is not specified.
Here's what the output looks like
```
| eid   | p31    | p30620_i0 |
|-------|--------|------------|
| AAAAA | Male   | xx.xx      |
| CCCCC | Female | xx.xx      |
| EEEEE | Female | xx.xx      |
| BBBBB | Male   |            |
| DDDDD | Female | xx.xx      |
```
Hope this helps someone. Let me know if anything is unclear or could be improved.
5
Joseph Paillard
- 15 October 2025 14:39
I had the same problem; the solution from Peter Joshi also worked for me. Thank you!

2
Andrew Silberfeld
- 05 February 2026 20:12
Same here. Thank you Peter!! Knowing that you have to specify the ‘participant’ entity made a big difference.

0

Please sign in to leave a comment.