Is there a way to mount Project/Bucket folders on VM workers directly, like NFS mount?

Permanently deleted user
I ask because it seems that it takes 12 minutes to download a 50GB cram from a bucket folder to a worker and we have a few hundred thousand WGS cram files.  In addition samtools view uses a 3GB cache from ebi by default each time. It would be nice if I can access a pre-built fasta cache in the bucket from a worker directly. If I have to download the files, I assume it would be more efficient to download cache from bucket then from http://www.ebi.ac.uk/ through internet? Thanks for help.

Comments

14 comments

  • Comment author
    Ted Laderas DNAnexus Team

    Hi Yong,

     

    If you prepend a /mnt/project/ to the beginning of your file paths (such as /mnt/project/Bulk Files/...), you can utilize the dxFUSE file system to access files from the project storage without downloading the file first. Note that it is currently only read-only.

     

    This works in bash, Python, and R code.

     

    Hope that helps.

    Ted

    0
  • Comment author
    Permanently deleted user

    Hi Ted,

    Thank you! That's really good to know. It seems the performance of download vs direct access through dxFUSE are similar from github site https://github.com/dnanexus/dxfuse. What is the preferred method, or more common method, to access the project data, download vs /mnt/project ?

    Yong

    0
  • Comment author
    Permanently deleted user

    Actually, if I do this

    subprocess.check_call('ls -l /mnt/project/', shell=True)

    in the python code, I am getting

     File "/usr/lib/python3.8/subprocess.py", line 364, in check_call

    STDERR    raise CalledProcessError(retcode, cmd)

    STDERR subprocess.CalledProcessError: Command 'ls -l /mnt/project/' returned non-zero exit status 2

     

    What did I miss?

    0
  • Comment author
    Ted Laderas DNAnexus Team

    Hi Yong,

     

    I believe that is because of how the platform is designed. Folders themselves are not data objects on the platform, they are represented in the metadata for each of the data objects.

     

    Hence, any usage of /mnt/project/ must refer to a particular data object, so I think just calling /mnt/project/ will return an error.

     

    We are now recommending that users use /mnt/project/ because it is more convenient to them.

     

    Best,

    Ted

    0
  • Comment author
    Ted Laderas DNAnexus Team

    Just keep in mind that it is currently read only. You won't be able to do something like write.csv(my_file, "/mnt/project/my_folder/myfile.csv") - you'll have to use dx upload to get results off of the platform.

    0
  • Comment author
    Permanently deleted user

    Hi Ted,

    Thank you, Ted for the prompt answer.

    So we can open the file to read. But can't do something like

    subprocess.check_call('cp /mnt/project/bucketfolder/myfile myfile.copy', shell=True)

    or

    subprocess.check_call('samtools view /mnt/project/MyBam/test.cram', shell=True)

     

    assuming samtools is specified in the runSpec in dxapp.json?

    Yong

    0
  • Comment author
    Ondrej Klempir DNAnexus Team

    In all cases of using dxfuse, it is only performant with sequential (streaming) reads in order, therefore I think that "samtools view /mnt/project/MyBam/test.cram" might be a good use case for dxfuse. On the other hand "cp /mnt/project/bucketfolder/myfile myfile.copy" will need to "read/download/stream" entire file so there will not be much difference between dxfuse and dx download.

    0
  • Comment author
    Ondrej Klempir DNAnexus Team

    Hello, as far as I know, the dxfuse, i.e. "/mnt/project/" is not available/preinstalled everywhere. It is part of Swiss Army Knife, JupyterLab and also ttyd, but for instance not available in applets (but it can be installed there).

    0
  • Comment author
    Ondrej Klempir DNAnexus Team

    Are you trying  'ls -l /mnt/project/' from JupyterLab or your custom-made applet? If the latter, I would guess that dxfuse is not available.

    0
  • Comment author
    Permanently deleted user

    I am trying to write to access the project files from my applet. I did notice that JupyterLab terminal allows the /mnt/project access. Thanks.

    0
  • Comment author
    Permanently deleted user

    Sorry for the naive question: Can you include Swiss Army Knife in your custom applet (any applet example for that?) ?

    Thank you.

    0
  • Comment author
    Ondrej Klempir DNAnexus Team

    a) samtools view is part of Swiss Army Knife (samtools command is specified via -icmd parameter)

     

    https://ukbiobank.dnanexus.com/app/swiss-army-knife

     

    b) if needed, you can run SAK from your custom made bash applet via

     

    dx run app-swiss-army-knife

     

    For more options and details:

     

     

     

     

     

    0
  • Comment author
    Permanently deleted user

    Thank you! This really helps.

    0
  • For those, like me, who tried this in their own apps based on this response: this doesn't seem to generally be true, it only seems to be true for specific apps that DNANexus has created.

     

    I've opened up a question to see if there is a way to enable this in our own apps: https://community.dnanexus.com/s/question/0D582000000L513CAC/dxfuse-automatically-mount-mntproject-on-custom-docker-images

    0

Please sign in to leave a comment.