Date Posted:
Product: TIBCO Spotfire®
Product: TIBCO Spotfire®
Problem:
TIBCO Spotfire Server in a cluster crashes due to long garbage collection (GC) pauses
Solution:
In a cluster, the TIBCO Spotfire Server may go offline due to long garbage collection (GC) pauses. In the catalina.log you may see a warning like:
WARNING [jvm-pause-detector-worker] org.apache.ignite.logger.java.JavaLogger.warning Possible too long JVM pause: 816 milliseconds.In the server.log you would see the following errors:
WARN 2019-03-22T18:08:56,678-0400 [] discovery.tcp.TcpDiscoverySpi: Timed out waiting for message delivery receipt (most probably, the reason is in long GC pauses on remote node; consider tuning GC and increasing 'ackTimeout' configuration property). Will retry to send message with increased timeout [currentTimeout=10000, rmtAddr=TIBCO.Spotfire.Server/xx.xx.xx.xx:xxxx, rmtPort=xxxx] WARN 2019-03-22T18:08:56,691-0400 [] discovery.tcp.TcpDiscoverySpi: Failed to send message to next node [msg=TcpDiscoveryConnectionCheckMessage [super=TcpDiscoveryAbstractMessage [sndNodeId=null, id=160cc2f6961-299208f8-08f8-44d8-a99d-a4a3a5df7537, verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, isClient=false]], next=TcpDiscoveryNode [id=b80724c4-1a62-4149-b616-56284fe4a6f8, addrs=[xx.xx.xx.xx], sockAddrs=[TIBCO.Spotfire.Server/xx.xx.xx.xx:xxxx], discPort=5702, order=6, intOrder=4, lastExchangeTime=1552305360552, loc=false, ver=2.5.0#20180523-sha1:86e110c7, isClient=false], errMsg=Failed to send message to next node [msg=TcpDiscoveryConnectionCheckMessage [super=TcpDiscoveryAbstractMessage [sndNodeId=null, id=160cc2f6961-299208f8-08f8-44d8-a99d-a4a3a5df7537, verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, isClient=false]], next=ClusterNode [id=b80724c4-1a62-4149-b616-56284fe4a6f8, order=6, addr=[10.209.129.158], daemon=false]]] WARN 2019-03-22T18:08:56,693-0400 [] discovery.tcp.TcpDiscoverySpi: Local node has detected failed nodes and started cluster-wide procedure. To speed up failure detection please see 'Failure Detection' section under javadoc for 'TcpDiscoverySpi' ... WARN 2019-03-22T18:08:56,977-0400 [] discovery.tcp.TcpDiscoverySpi: Node is out of topology (probably, due to short-time network problems). INFO 2019-03-22T18:08:56,982-0400 [] discovery.tcp.TcpDiscoverySpi: Finished serving remote node connection [rmtAddr=/xx.xx.xx.xx:xxxx, rmtPort=xxxx WARN 2019-03-22T18:08:56,994-0400 [] managers.discovery.GridDiscoveryManager: Local node SEGMENTED: TcpDiscoveryNode [id=b80724c4-1a62-4149-b616-56284fe4a6f8, addrs=[10.209.129.158], sockAddrs=[TIBCO.Spotfire.Server/xx.xx.xx.xx:xxxx], discPort=xxxx, order=6, intOrder=4, lastExchangeTime=1553292536994, loc=true, ver=2.5.0#20180523-sha1:86e110c7, isClient=false] ERROR 2019-03-22T18:08:57,014-0400 [] : Critical system error detected. Will be handled accordingly to configured handler [hnd=class o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread tcp-disco-srvr-#3%TIBCO-Spotfire% is terminated unexpectedly.]] java.lang.IllegalStateException: Thread tcp-disco-srvr-#3%TIBCO-Spotfire% is terminated unexpectedly. at org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5686) ~[ignite-core.jar:2.5.0] at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) ~[ignite-core.jar:2.5.0] ERROR 2019-03-22T18:08:57,014-0400 [] : JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread tcp-disco-srvr-#3%TIBCO-Spotfire% is terminated unexpectedly.]]
Apache Ignite is sensitive to long GC pauses (i.e. a few seconds) and this, high CPU utilization, high memory utilization, or network communication issues can cause cluster segmentation issues and cluster disconnects which in turn causes the TIBCO Spotfire Server to shut down.
To help avoid this, the following two changes are recommended:
To avoid the GC overhead with resizing the heap, set the minimum heap size and maximum heap to the same value in the CATALINA_OPTS settings. Instructions:
For TIBCO Spotfire Server 10.5 and lower versions running as a Windows service:
- Stop TIBCO Spotfire Server service.
- On the command line, go to the <TIBCO Spotfire Server installation directory>/tomcat/bin directory.
- Enter below command
service.bat remove
- Open <TIBCO Spotfire Server installation directory>/tomcat/bin/service.bat file.
- Locate '--JvmMs' and '--JvmMx' entries and change '--JvmMs' value matching to current '--JvmMx' value. For example if --JvmMx is 4096 then change --JvmMs to 4096
if "%JvmMs%" == "" set JvmMs=4096 if "%JvmMx%" == "" set JvmMx=4096
- Save and close the file.
- Enter below command in command prompt
service.bat install
- Start TIBCO Spotfire Server service.
- Stop the Spotfire Server service.
- On the command line, go to the <installation dir>/tomcat/bin directory.
- Enter the following command:
service.bat remove
- Open the <installation dir>/tomcat/bin/setenv.bat file in a text editor.
- Locate the following entries and change the numbers to suitable memory values (in MB):
JvmMs=512 JvmMx=4096
- Save and close the file.
- Enter the following command:
service.bat install
- Start the Spotfire Server service.
For TIBCO Spotfire Server not running as a Windows service:
- Go to <TIBCO Spotfire Server installation Directory>\tomcat\bin folder.
- Open setenv.sh file if the TIBCO Spotfire Server is installed on Linux machine (or) Open setenv.bat file if TIBCO Spotfire Server is installed on Windows machine.
- Change heap size values
- For 10.2 and lower versions : At the end of CATALINA_OPTS attribute add -Xms and -Xmx and set them both to the same value, matching the current JAVA_OPTS -Xmx value. For example if JAVA_OPTS -Xmx=4096M then set CATALINA_OPTS to -Xms4096M -Xmx4096M. For example:
set JAVA_HOME=C:\tibco\tss\7.11.0\jdk set JRE_HOME=C:\tibco\tss\7.11.0\jdk\jre set JAVA_OPTS=-server -XX:+DisableExplicitGC -Xms4096M -Xmx4096M set CATALINA_OPTS=-Dcom.sun.management.jmxremote -Dorg.apache.catalina.session.StandardSession.ACTIVITY_CHECK=true -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector -Xms4096M -Xmx4096M
- For 10.3 , 10.4 and 10.5 versions : For CATALINA_OPTS attribute, change -Xms value matching to current -Xmx value. For example if -Xmx is 4096M then change -Xms to 4096M
set JAVA_HOME=C:\tibco\tss\10.5.0\jdk set JRE_HOME=C:\tibco\tss\10.5.0\jdk\jre rem Uncomment the line below to enable GC logging set GC_LOG=-XX:+PrintGCDetails -XX:+PrintAdaptiveSizePolicy -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=25M -Xloggc:%CATALINA_HOME%\logs\gc-%%t.log set JAVA_OPTS=-server -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC %GC_LOG% set CATALINA_OPTS=-Xms4096M -Xmx4096M -Dcom.sun.management.jmxremote -Dorg.apache.catalina.session.StandardSession.ACTIVITY_CHECK=true -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector -Djava.library.path="%PATH%;C:\tibco\tss\10.5.0\tomcat\spotfire-lib;C:\tibco\tss\10.5.0\tomcat\custom-ext"
- For 10.6 and higher versions: Change JvmMs value matching to current JvmMx value. For example if JvmMx is 4096 then change JvmMs to 4096.
set JAVA_HOME=C:\tibco\tss\10.7.0\jdk set JRE_HOME=C:\tibco\tss\10.7.0\jdk\jre set JvmMs=4096 set JvmMx=4096 rem Uncomment the line below to enable GC logging rem set GC_LOG=-XX:+PrintGCDetails -XX:+PrintAdaptiveSizePolicy -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=25M -Xloggc:%CATALINA_HOME%\logs\gc-%%t.log set JAVA_OPTS=-server -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC %GC_LOG% set CATALINA_OPTS=-Xms%JvmMs%M -Xmx%JvmMx%M -Dcom.sun.management.jmxremote -Dorg.apache.catalina.session.StandardSession.ACTIVITY_CHECK=true -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector -Djava.library.path="%PATH%;C:\tibco\tss\10.7.0\tomcat\spotfire-lib;C:\tibco\tss\10.7.0\tomcat\custom-ext"
- For 10.2 and lower versions : At the end of CATALINA_OPTS attribute add -Xms and -Xmx and set them both to the same value, matching the current JAVA_OPTS -Xmx value. For example if JAVA_OPTS -Xmx=4096M then set CATALINA_OPTS to -Xms4096M -Xmx4096M. For example:
- Restart TIBCO Spotfire Server service.
To resolve cluster segmentation issues which occurred due to short-term communication issues, increase "clustering.apacheignite.timeouts.failure-detection-timeout" server configuration property value to 60000. Instructions:
For TIBCO Spotfire Server 10.3.0 and higher:
- Open a command prompt and go to <TIBCO Spotfire Server installation Directory>\tomcat\spotfire-bin
- Export the TIBCO Spotfire Server configuration using command "config export-config":
<TIBCO Spotfire Server installation Directory>\tomcat\spotfire-bin> config export-config
- Increase the failure detection timeout using command "config set-config-prop":
<TIBCO Spotfire Server installation Directory>\tomcat\spotfire-bin> config set-config-prop --name="clustering.apacheignite.timeouts.failure-detection-timeout" --value=60000
- Import the configuration using command "config import-config":
<TIBCO Spotfire Server installation Directory>\tomcat\spotfire-bin> config import-config -c "increased cluster failure detection timeout "
- Restart the TIBCO Spotfire Server service.
For TIBCO Spotfire Server Versions 7.11.0, 7.12.0, 7.13.0 and 7.14.0:
- Open a command prompt and go to <TIBCO Spotfire Server installation Directory>\tomcat\bin
- Export the TIBCO Spotfire Server configuration using command "config export-config":
<TIBCO Spotfire Server installation Directory>\tomcat\bin> config export-config
- Increase the failure detection timeout using command "config set-config-prop":
<TIBCO Spotfire Server installation Directory>\tomcat\bin> config set-config-prop --name="clustering.apacheignite.timeouts.failure-detection-timeout" --value=60000
- Import the configuration using command "config import-config":
<TIBCO Spotfire Server installation Directory>\tomcat\bin> config import-config -c "increased cluster failure detection timeout "
- Restart the TIBCO Spotfire Server service.
Comments
0 comments
Article is closed for comments.