I have seen a public resource hosted in google cloud and would like to access that information in DNAnexus. Is it possible to access datasets that are in gs:// from DNAnexus?
Thank you so much for your help. Is there a way to access requester-pays Google Cloud Platform buckets from DNA nexus? Also, the files are large and are in hail format so cannot do step A.
Not sure if I can help you with this follow up question as I do not have any experience with requester-pays Google Cloud Platform buckets. It seems that gsutil should work for it: https://cloud.google.com/storage/docs/using-requester-pays. Theoretically, you would need to first download data into worker and then upload it to dnax project.
Thank you so much for the response. I would like to ask you a question. Is it possible to create an applet to use the tools I install later? if so May I know how to do it?
@Akhil Pampana?, please create a new post, so other Community members can contribute and read your question. This seems to me as an unrelated question to the Title of this thread.
Hello, I have tried copying files from the gs:// file system to DNAnexus but it dint work citing file access read-only permissions. Is there a way to read those files into hail directly without copying them to our filesystem? Also, do we have any filesystem (like gs:// for google) present for us?
I also tried database creation and copying the files to database, but that also dint worked. Please do let me know how to proceed further.
In that case, I'm afraid that the only solution is to export data into BGEN or pVCF first before importing them to UKB-RAP. I will point out to our solution owner that there is an interest in data ingestion from different cloud.
Comments
8 comments
A) If the file in gs:// is not too big, you can download it to your local machine and then upload to ukbrap dnax project.
B) You could run ttyd app and access public data. I followed the instructions here: https://cloud.google.com/storage/docs/access-public-data#api-link
There are 4 options how to interact with google cloud: API Link, Console, Command line and Client libraries.
I have tested the option API Link. The following command worked:
wget https://storage.googleapis.com/gcp-public-data-landsat/LC08/01/001/003/LC08_L1GT_001003_20140812_20170420_01_T2/LC08_L1GT_001003_20140812_20170420_01_T2_B3.TIF
And also tested the option Command line (I needed to install gsutil first on ttyd):
gsutil ls -r gs://gcp-public-data-landsat/LC08/01/001/003/LC*
Similarly, as a bioinformatics example, I was able to access gs://genomics-public-data/references
(https://googlegenomics.readthedocs.io/en/latest/use_cases/discover_public_data/reference_genomes.html):
gsutil ls -r gs://genomics-public-data/references
gsutil cp gs://genomics-public-data/references/GRCh37/chr1.fa.gz .
Copying gs://genomics-public-data/references/GRCh37/chr1.fa.gz...
\ [1 files][ 64.7 MiB/ 64.7 MiB]
Operation completed over 1 objects/64.7 MiB.
Once you download the desired files into worker, you can then upload them to dnax ukbrap project.
Thank you so much for your help. Is there a way to access requester-pays Google Cloud Platform buckets from DNA nexus? Also, the files are large and are in hail format so cannot do step A.
Not sure if I can help you with this follow up question as I do not have any experience with requester-pays Google Cloud Platform buckets. It seems that gsutil should work for it: https://cloud.google.com/storage/docs/using-requester-pays. Theoretically, you would need to first download data into worker and then upload it to dnax project.
For direct access in Hail, have you seen this Hail Forum thread https://discuss.hail.is/t/im-encountering-bucket-is-a-requester-pays-bucket-but-no-user-project-provided/2536?
Thank you so much for the response. I would like to ask you a question. Is it possible to create an applet to use the tools I install later? if so May I know how to do it?
@Akhil Pampana?, please create a new post, so other Community members can contribute and read your question. This seems to me as an unrelated question to the Title of this thread.
Sure
Hello, I have tried copying files from the gs:// file system to DNAnexus but it dint work citing file access read-only permissions. Is there a way to read those files into hail directly without copying them to our filesystem? Also, do we have any filesystem (like gs:// for google) present for us?
I also tried database creation and copying the files to database, but that also dint worked. Please do let me know how to proceed further.
In that case, I'm afraid that the only solution is to export data into BGEN or pVCF first before importing them to UKB-RAP. I will point out to our solution owner that there is an interest in data ingestion from different cloud.
Please sign in to leave a comment.