Anyone else seeing Py4JJavaError today when using toPandas()?

I have never had any issue using toPandas() before and have completed multiple analyses in the past.. I've changed the set of field names to be extracted multiple times in an attempt to diagnose the issue but keep seeing the error.   Any suggestions on how to fix this?   ---error below----

Comments

3 comments

  • y4JJavaError Traceback (most recent call last)

    <ipython-input-10-54eb4a4bd748> in <module>

    1 #convert spark dataframe into pandas dataframe

    ----> 2 pandas_df = df_1.toPandas()

    3 len(pandas_df)

     

    /cluster/spark/python/pyspark/sql/dataframe.py in toPandas(self)

    2141

    2142 # Below is toPandas without Arrow optimization.

    -> 2143 pdf = pd.DataFrame.from_records(self.collect(), columns=self.columns)

    2144

    2145 dtype = {}

     

    /cluster/spark/python/pyspark/sql/dataframe.py in collect(self)

    532 """

    533 with SCCallSiteSync(self._sc) as css:

    --> 534 sock_info = self._jdf.collectToPython()

    535 return list(_load_from_socket(sock_info, BatchedSerializer(PickleSerializer())))

    536

     

    /cluster/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in __call__(self, *args)

    1255 answer = self.gateway_client.send_command(command)

    1256 return_value = get_return_value(

    -> 1257 answer, self.gateway_client, self.target_id, self.name)

    1258

    1259 for temp_arg in temp_args:

     

    /cluster/spark/python/pyspark/sql/utils.py in deco(*a, **kw)

    61 def deco(*a, **kw):

    62 try:

    ---> 63 return f(*a, **kw)

    64 except py4j.protocol.Py4JJavaError as e:

    65 s = e.java_exception.toString()

     

    /cluster/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)

    326 raise Py4JJavaError(

    327 "An error occurred while calling {0}{1}{2}.\n".

    --> 328 format(target_id, ".", name), value)

    329 else:

    330 raise Py4JError(

     

    Py4JJavaError: An error occurred while calling o897.collectToPython.

    : org.apache.spark.SparkException: Exception thrown in awaitResult:

    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)................(too long to post)

     

    0
  • I've also tried changing the number of spark cluster nodes and instance types available - nothing changes

    0
  • I switched from pandas to koalas and found a work around. Not sure what changed. Thanks either way?

    0

Please sign in to leave a comment.