How to disable WildFly clustering at runtime

This tutorial discusses how you can perform cluster maintenance on one or more nodes of your cluster. We will show which approach you can use both in Standalone mode and Domain mode without losing continuity in your services.

WildFly cluster maintenance

Today I have found an interesting question of StackOverFlow asking how to temporarily remove a server node from the cluster so that maintenance is done on the node.

The simplest option you can use consists in shutting down the server node and then use the WildFly CLI in embedded mode to perform management operations on the Node when it is down. The, you will restart the server node. Here is an article that discusses it more in detail. How to configure WildFly from the CLI in embedded mode

On the other hand, if you don’t want to stop and restart the server node we need to operate at network level to remove a Server Node from the cluster. You can do that by setting a different multicast address for the protocol you are using (default udp) so that you server node will temporarily leave the cluster.

At first, check with your System Administrator for an available alternative multicast address. The following reference will be of help: http://www.iana.org/assignments/multicast-addresses/multicast-addresses.xhtml

Supposing you have decided to use the 230.0.0.11 address (instead of the default 230.0.0.4), you can issue the following CLI on your server node:

/socket-binding-group=standard-sockets/socket-binding=jgroups-udp/:write-attribute(name=multicast-address,value=230.0.0.11)

Now reload your server configuration and check from your logs the new cluster view:

 22:32:16,032 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport]

(ServerService Thread Pool — 55) ISPN000094: Received new cluster view:  [nodeB/web, nodeA/web] 

In our example, the nodeC of the cluster just left the cluster. At a later time, you can get it back into the cluster with:

/socket-binding-group=standard-sockets/socket-binding=jgroups-udp/:write-attribute(name=multicast-address,value=230.0.0.4)

Side node: you might find on your server logs for about a minute the following warning saying:

22:27:16,094 WARN  [org.jgroups.protocols.TP$ProtocolAdapter] (INT-1,shared=udp)  JGRP000031: nodeA/web: dropping unicast message to wrong destination nodeC/web

This is a failure in the UNICAST3 protocol which still tries to send messages to cluster node that left. It will disappear in 1 munute, yet you can tune the timeout parameter by setting the following property on your UNICAST3 configuration:

<subsystem xmlns="urn:jboss:domain:jgroups:2.0" default-stack="udp">

   <stack name="udp">

          . . . .

          <protocol type="UNICAST3">

                    <property name="conn_close_timeout">5</property>

          </protocol>

          . . . .

</subsystem>

Domain configuration

When dealing with Domain configuration you can achieve the same effect by managing the socket-binding-group of a Server Group or of Individual Servers. 

At first, create a new Socket Binding group which has a custom jgroups-udp IP address (or jgroups-tcp if you are using a TCP cluster):

<socket-binding-group name="ha-sockets.maintenance" default-interface="public">
    <socket-binding name="ajp" port="${jboss.ajp.port:8009}"/>
    <socket-binding name="http" port="${jboss.http.port:8080}"/>
    <socket-binding name="https" port="${jboss.https.port:8443}"/>
    <socket-binding name="jgroups-mping" port="0" multicast-address="${jboss.default.multicast.address:230.0.0.4}" multicast-port="45700"/>
    <socket-binding name="jgroups-tcp" port="7600"/>
    <socket-binding name="jgroups-tcp-fd" port="57600"/>
    <socket-binding name="jgroups-udp" port="55200" multicast-address="${jboss.default.multicast.address:230.0.0.11}" multicast-port="45688"/>
    <socket-binding name="jgroups-udp-fd" port="54200"/>
    <socket-binding name="modcluster" port="0" multicast-address="224.0.1.105" multicast-port="23364"/>
    <socket-binding name="txn-recovery-environment" port="4712"/>
    <socket-binding name="txn-status-manager" port="4713"/>
    <outbound-socket-binding name="mail-smtp">
        <remote-destination host="localhost" port="25"/>
    </outbound-socket-binding>
</socket-binding-group>

Now, when you need to temporarily remove a Server Group from the cluster, just issue from your CLI (you can use as well the Admin Console):

/server-group=other-server-group/:write-attribute(name=socket-binding-group,value=ha-sockets.maintenance)

You will also need an host reload after the above operation:

reload --host=master

Now, the other server group will leave the cluster so you can perform maintenance. Later on, you can let your Server Group join the cluster, by setting the standard ha-sockets bindings:

/server-group=other-server-group/:write-attribute(name=socket-binding-group,value=ha-sockets)

As a side node, consider that you can even set the socket bindings at server level with:

/host=master/server-config=server-one/:write-attribute(name=socket-binding-group,value=ha-sockets.maintenance)

I don’t advice it though as you will have an unsynchronized configuration between your servers which are part of a Server group.

Here is the original Thread from StackOverflow. Feel free to add a “thumbs up” if you have found this solution useful 🙂