I noticed that Table Exporter always runs into problems when exporting 'eid'.
For example, the job below seems alright in terms of syntax, but it ran forever, and I had to terminate it after 14 hours. However, when -ifield_names="eid" was removed, the job completed in less than 10 mins. I wonder wat is the problem?
dataset="project-GQzBFBjJkYqB4zzbFyGPbb21:record-GQzYQGQJ0GJqKx4xX9PYGkVk"
dx run table-exporter -idataset_or_cohort_or_dashboard=${dataset}\
-ioutput="test7" \
-ioutput_format="TSV" \
-iheader_style="UKB-FORMAT" \
-ifield_names="p22189" \
-ifield_names="p21022" \
-ifield_names="eid" \
--brief \
--yes
Thanks for reporting this. It's a known bug for this eid field, but the engineering team needs to make quite a revamp to fix it. As a temporary workaround, you can specify "entity" of the app as "participant" which could get around the issue.
Sorry for the long post, but I found the documentation and errors so opaque, I think this might help someone.
I imagine most users want to include "eid" when exporting data using table-exporter. While I understand that fixing the root issue might be complex, the documentation could be updated to prevent confusion. Currently, the error that occurs when "entity": "participant" is not set is very opaque - my system just seemed to wait endlessly for something that never appeared. It took me several days to isolate the cause and then consult frame the right question to consult the community. Improving the docs or default parameters would save users a lot of time.
✅ Suggested Fixes:
Add a clear note in the documentation about the importance of setting "entity": "participant" when working with participant data.
Consider making "participant" the default value for entity.
dx run table-exporter --input-json-file input.json
Note: The --input-json-file path refers to a local filesystem path, not a DNAnexus path. I imagine you can also use /mnt/project/... via FUSE, though FUSE has high latency.
The run takes a few minutes (i.e. less than 10), mainly getting the server started. Not the endless hours waiting for nothing if entity is not specified.
Comments
5 comments
Thanks for reporting this. It's a known bug for this eid field, but the engineering team needs to make quite a revamp to fix it. As a temporary workaround, you can specify "entity" of the app as "participant" which could get around the issue.
https://community.dnanexus.com/s/question/0D5t0000048t9hUCAQ/i-want-to-run-a-phewas-analysis-using-phesant-and-currently-trying-to-generate-the-phenotype-file-i-am-using-table-exporter-to-extract-the-phenotype-data-for-my-variables-of-interest
I would like to use this code to get the same table. I have the project ID.
project-GJB3GpQJBJkYK0j74jp4vJZ9
How do I get the record ID.
Will the rest of the code run.
Sorry for the long post, but I found the documentation and errors so opaque, I think this might help someone.
I imagine most users want to include
"eid"when exporting data usingtable-exporter. While I understand that fixing the root issue might be complex, the documentation could be updated to prevent confusion. Currently, the error that occurs when"entity": "participant"is not set is very opaque - my system just seemed to wait endlessly for something that never appeared. It took me several days to isolate the cause and then consult frame the right question to consult the community. Improving the docs or default parameters would save users a lot of time.✅ Suggested Fixes:
"entity": "participant"when working with participant data."participant"the default value for entity.Anyway, here's what I found worked
🧪 Working Input Set (for reference):
🖥️ Here's how I got the thing to work end to end
This launches interactive mode:
I entered typing app TAB TAB and got dx autofill to work
Then I completed the other fields as above. Hitting ENTER alone submitted the job.
After submission,
dxprints a usable JSON block. Although as far as I have seen undocumented, this can be reused with:If you vim/nano/save the JSON to input.json:
Note: The
--input-json-filepath refers to a local filesystem path, not a DNAnexus path. I imagine you can also use/mnt/project/...via FUSE, though FUSE has high latency.The run takes a few minutes (i.e. less than 10), mainly getting the server started. Not the endless hours waiting for nothing if entity is not specified.
Here's what the output looks like
Hope this helps someone. Let me know if anything is unclear or could be improved.
I had the same problem; the solution from Peter Joshi also worked for me. Thank you!
Same here. Thank you Peter!! Knowing that you have to specify the ‘participant’ entity made a big difference.
Please sign in to leave a comment.