How can I create a list of all files available in a project in the RAP? Thanks! Former User of DNAx Community_93 19 January 2022 00:00 1 comment Comments 1 comment Sort by Date Votes Anastazie Sedlakova DNAnexus Team 24 January 2022 20:09 Hello, you can use following code in the JupyterLab that you launched in the project of interest. I am using folder Bulk as an example. import csvimport sysimport dxpyfrom dxpy.utils.resolver import resolve_existing_path project, folder, _ = resolve_existing_path("Bulk") def list_folder(folder, project): print("Listing {}".format(folder)) output = dxpy.api.project_list_folder(project, input_params={"folder": folder, "describe": {"fields": {"id": True, "name": True, "size": True}}}) for obj in output['objects']: if obj['id'].startswith('file-'): desc = obj["describe"] yield [desc['id'], desc['name'], folder, str(desc['size'])] for subfolder in output['folders']: yield from list_folder(subfolder, project)with open("files.txt", "w") as f: writer = csv.writer(f, delimiter="\t") writer.writerow(["id", "name", "folder", "size"]) for row in list_folder(folder, project): writer.writerow(row) Resulting table will have following format id name folder sizefile-ID1 1234567_20253_2_0.zip /Bulk/Brain MRI/T2 FLAIR/10 36181516 Bulk folder in my project has about 900,000 files and writing them took about 15 minutes on mem1_hdd1_v2_x16 instance. 0 Please sign in to leave a comment.
Comments
1 comment
Hello, you can use following code in the JupyterLab that you launched in the project of interest. I am using folder Bulk as an example.
import csv
import sys
import dxpy
from dxpy.utils.resolver import resolve_existing_path
project, folder, _ = resolve_existing_path("Bulk")
def list_folder(folder, project):
print("Listing {}".format(folder))
output = dxpy.api.project_list_folder(project, input_params={"folder": folder, "describe": {"fields": {"id": True, "name": True, "size": True}}})
for obj in output['objects']:
if obj['id'].startswith('file-'):
desc = obj["describe"]
yield [desc['id'], desc['name'], folder, str(desc['size'])]
for subfolder in output['folders']:
yield from list_folder(subfolder, project)
with open("files.txt", "w") as f:
writer = csv.writer(f, delimiter="\t")
writer.writerow(["id", "name", "folder", "size"])
for row in list_folder(folder, project):
writer.writerow(row)
Resulting table will have following format
id name folder size
file-ID1 1234567_20253_2_0.zip /Bulk/Brain MRI/T2 FLAIR/10 36181516
Bulk folder in my project has about 900,000 files and writing them took about 15 minutes on mem1_hdd1_v2_x16 instance.
Please sign in to leave a comment.