Where does saved data go on a Jupyter spark cluster?

I can process some test data with Hail on a Jupyter lab spark cluster, and for example export a filtered VCF. How can I get this data back up to my DNA Nexus project? Example:   import hail as hl builder = ( SparkSession .builder .enableHiveSupport() ) spark = builder.getOrCreate() hl.init(sc=spark.sparkContext)   hl.utils.get_1kg('data/') hl.import_vcf('data/1kg.vcf.bgz').write('data/1kg.mt', overwrite=True) mt = hl.read_matrix_table('data/1kg.mt')   ## apply some filters / processing [...]   mt.rows().export('test/mydata.tsv.gz', delimiter='\t') # Where does this go?? Can I put is somewhere and then use dx-upload-all-outputs?      

Comments

1 comment

  • Comment author
    Ondrej Klempir DNAnexus Team

    My understanding is that e.g. test/mydata.tsv.gz is saved into HDFS file system (Cluster distributed hadoop storage). You could try to move it from hdfs to Local notebook storage (/opt/...) using "hdfs dfs -get file"

     

    https://www.geeksforgeeks.org/hdfs-commands/

     

    Once your file is in the Local notebook storage (the same place as your notebook), you should be able to upload the file to project.

    0

Please sign in to leave a comment.