Error Reading Public Datasets

06 February 2023 00:00
8 comments

Hello,

I am trying to access the publicly available gnomAD dataset from AWS s3// file systems but I am getting timeout waiting for connection from pool error. May I know how to proceed with this error?

```

Code:

mt = hl.read_matrix_table("s3://gnomad-public-us-east-1/release/3.1.2/mt/genomes/gnomad.genomes.v3.1.2.hgdp_1kg_subset_dense.mt")

var_qc_mt = hl.variant_qc(mt)

#Filter to variants with AF between 0.05 & 0.95, and call rate greater than 0.999

filtered_mt = var_qc_mt.filter_rows(((var_qc_mt.variant_qc.AF[0] > 0.05) & (var_qc_mt.variant_qc.AF[1] > 0.05)) &

((var_qc_mt.variant_qc.AF[0] < 0.95) & (var_qc_mt.variant_qc.AF[1] < 0.95)) &

(var_qc_mt.variant_qc.call_rate > 0.999))

prunned_mt = hl.ld_prune(mt_hgdp_tgp_clean.GT, r2=0.1, bp_window_size=500000)

```

Error:

FatalError: ConnectionPoolTimeoutException: Timeout waiting for connection from pool

Java stack trace:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 55 in stage 1.0 failed 4 times, most recent failure: Lost task 55.3 in stage 1.0 (TID 522, ip-10-60-49-94.eu-west-2.compute.internal, executor 0): com.amazonaws.AmazonClientException: Unable to execute HTTP request: Timeout waiting for connection from pool

```

Please do let me know how to fix this issue.

Regards

Akhil

Comments

8 comments

Alexandra Lee DNAnexus Team
- 06 February 2023 18:27
I'm not able to reproduce the timeout error based on the code provided. Do you know which command call within this code block is throwing the error?

Here is a previous post about troubleshooting Hail that you might find helpful: https://community.dnanexus.com/s/question/0D5t000004AflSiCAJ/hail-troubleshooting-for-ukb-data

0
Former User of DNAx Community_6
- 06 February 2023 20:33
This block is throwing error which was running well on Friday.

prunned_mt = hl.ld_prune(mt_hgdp_tgp_clean.GT, r2=0.1, bp_window_size=500000)

0
Chai Fungtammasan DNAnexus Team
- 07 February 2023 20:04
@Akhil Could you reach out to ukbiobank-support@dnanexus.com with your error. We many need out team to look into your error in detail.

0
Former User of DNAx Community_6
- 07 February 2023 20:18
Sure, I will reach out to ukbiobank support. Thank you so much.

0
Former User of DNAx Community_6
- 16 February 2023 17:09
Thank you for the help. I haven't got any response from the ukbiobank support team. I have checked online and found this StackOverflow solution(https://stackoverflow.com/questions/56259853/why-aws-is-rejecting-my-connections-when-i-am-using-wholetextfiles-with-pyspar).

They suggested starting spark based on this builder.

builder = (
  SparkSession
  .builder
  .config("fs.s3a.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
  .config("fs.s3a.awsAccessKeyId", aws_access_key)
  .config("fs.s3a.awsSecretAccessKey", aws_secret_key)
  .config("fs.s3a.fast.upload", "true")
  .config("fs.s3a.multipart.size", "1G")
  .config("fs.s3a.fast.upload.buffer", "disk")
  .config("fs.s3a.connection.maximum", 200)
  .config("fs.s3a.attempts.maximum", 20)
  .config("fs.s3a.connection.timeout", 30)
  .config("fs.s3a.threads.max", 10)
  .config("fs.s3a.buffer.dir", "hdfs:///user/hadoop/temporary/s3a")
)

I tried running this but got aws_access_key not found. Since DNAnexus is aws based, is there a way to get aws based secret keys and access key ids in DNAnexus ?

Regards
Akhil

0
Chai Fungtammasan DNAnexus Team
- 16 February 2023 18:00
We could not share the secret key, and I'm afraid it's best to let support team handle this since you could grant them access to the project.

0
Former User of DNAx Community_6
- 16 February 2023 19:07
Thank you so much for the information. I have reached out to the help desk last week but no response till now.

0
Chai Fungtammasan DNAnexus Team
- 16 February 2023 20:29
The response could be slow from time to time depending on ticket volume. We will discuss with them and see how we resolve the issue. It would be hard for the public community to help since most of us are focusing on UKB data.

0

Please sign in to leave a comment.