How can I get data from a DNA Nexus database into another format?

07 October 2022 00:00
4 comments

I can create a DNA Nexus database, and confirm that it's present with dx describe. E.g.: import dxpy db_name = "mydb" mt_name = "my_table" stmt = f"CREATE DATABASE IF NOT EXISTS {db_name} LOCATION 'dnax://'" spark.sql(stmt).show() db_uri = dxpy.find_one_data_object(name=f"{db_name}", classname="database")['id'] mt_url = f"dnax://{db_uri}/{mt_name}" # Save Hail MatrixTable (defined elsewhere) to database table mt.write(mt_url) How can I get this data to be accessible in another format (e.g. tsv) in my project?

Comments

4 comments

Ondrej Klempir DNAnexus Team
- 08 October 2022 06:42
I think that saving it to csv and manipulating the data on hdfs might resolve this:
https://community.dnanexus.com/s/question/0D5t0000045I12HCAS/where-does-saved-data-go-on-a-jupyter-spark-cluster

0
Ondrej Klempir DNAnexus Team
- 08 October 2022 06:44
https://discuss.hail.is/t/exporting-data-from-matrixtable-into-tsv/2406

0
Former User of DNAx Community_47
- 08 October 2022 22:38
Thanks for your reply.
I guess this means that if I produce a large file (many terabytes) then the local HD of a single node would need to be large enough to hold that file?

It seems like there ought to be a way for a cluster to produce data that can be directly stored to the project / cloud buckets...

0
Chai Fungtammasan DNAnexus Team
- 11 October 2022 15:08
Currently, not. The dxfuse is in read only mode in this case, so writing object is only feasible if you write into instance and upload them to the platform.
The largest instance could hold up to 60 TB which should be sufficient.

0

Please sign in to leave a comment.