Product: TIBCO Spotfire®
Spark operators fail
Spark operators fail
Spark operators fail
Problem
When running one of currently supported Spark operators (Summary Statistics, Linear Regression, Logistic Regression, K-Means, Alpine Forest), the job starts from Spotfire Data Science side and submits the Hadoop job to the linked cluster (see the attached screenshot).
The Spotfire Data Science user can see some progress on the submitted job, and then the job fails on the Spotfire Data Science side. However, the submitted Hadoop job finishes successfully. AlpineAgent log file shows the following:
...
2016-01-25 14:44:14.774 GMT-0800 ERROR [pool-11-thread-2] com.alpine.datamining.workflow.AnalyticNodeThread.run(AnalyticNodeThread.java:88) - Summary Statistics-4 failed, please check yarn cluster log and Alpine agent log for details
java.lang.RuntimeException: Summary Statistics-4 failed, please check yarn cluster log and Alpine agent log for details
at com.alpine.spark.client.CoreSparkSubmitter$.complete(CoreSparkSubmitter.scala:127)
at com.alpine.spark.explore.SparkSummaryStatisticsTrainRunner.run(SparkSummaryStatistics.scala:858)
at com.alpine.spark.explore.SparkSummaryStatisticsTrainRunner.run(SparkSummaryStatistics.scala:789)
at com.alpine.datamining.api.impl.hadoop.util.SparkUtil$$anonfun$runSparkJob$1.apply(SparkUtil.scala:91)
at com.alpine.utility.hadoop.SecurityUtil$$anonfun$8$$anon$1.run(SecurityUtil.scala:496)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at com.alpine.utility.hadoop.SecurityUtil$$anonfun$8.apply(SecurityUtil.scala:499)
...
Cause
Spotfire Data Science uses Akka for the underlying communication between the Spotfire Data Science agent and the cluster. When Spotfire Data Science submits a Spark job, it also provides its own IP and port so that the cluster can report back when the job is done (or when it fails). Spotfire Data Science doesn't keep pinging the cluster for the status of the submitted Spark jobs (unlike regular mapreduce jobs). That's why Spotfire Data Science doesn't really have an update about the Spark job if the provided IP or port number are incorrect - the Spark job successfully finishes on the cluster, but Spotfire Data Science times out and thinks the job has failed.
Resolution
Make sure that the following parameters are set correctly:
1. Configure the correct IP address and port number of the Spotfire Data Science server (the ones that all cluster nodes have access to) on the bottom of the $CHORUS_HOME/shared/ALPINE_DATA_REPOSITORY/configuration/alpine.conf
file by setting the following parameters (replace the IP_address_of Alpine
by the actual IP address of Alpine, and number 3549 by whatever port number is open):
alpine.spark.sparkAkka.akka.remote.netty.tcp.hostname=IP_address_of Alpine
alpine.spark.sparkAkka.akka.remote.netty.tcp.port=3549
2. Based on this page - Spotfire Data Science? Default Ports , configure the alpine.agent.baseAkkaPort
parameter with a base port number that all cluster nodes have access to use it when accessing Spotfire Data Science. Not only that configured port number, but also the next consecutive port numbers should be open for all cluster nodes when accessing Spotfire Data Science. For instance, if the value of alpine.agent.baseAkkaPort
is set to 2570, then make sure that all these ports are available too - from 2570 to 2580 inclusive.
After making the changes in step 1 and 2, restart Spotfire Data Science and try to run the Spark operators again.
Comments
0 comments
Article is closed for comments.