JMS Clustering in WildFly and JBoss EAP

In this tutorial we will learn how to achieve High Availability in JMS clustering, comparing the configuration available with Hornetq messaging (AS 7 – WildFly 8 – JBoss EAP 6) and Artemis-mq (WildFly 10 or newer and JBoss EAP 7)

In order to achieve high availability, you have the choice between using a shared store and data replication:

AWhen using a shared store, both live and backup servers share the same entire data directory using a shared file system. This means the paging directory, journal directory, large messages and binding journal. When failover occurs and a backup server takes over, it will load the persistent storage from the shared file system and clients can connect to it.

BWhen using data replication, the live and the backup servers do not share the same storage and all data synchronization is done through network traffic. Therefore, all (persistent) data traffic received by the live server will be duplicated to the backup.

ArtemisMQ clustering (WildFly)

Let’s see some practical examples of JMS clustering using WildFly messaging broker (ArtemisMQ). At first we will demonstrate how to create a JMS cluster using  data replication, then we will show how to achieve a JMS cluster which uses shared storage as ha-policy.

Example 1: JMS with Data replication

In order to run this example, we will be using two standalone servers installation, which are unpacked in two different folders:

$ mkdir nodeA
$ mkdir nodeB

$ unzip wildfly-18.0.0.Final.zip -d nodeA
$ unzip wildfly-18.0.0.Final.zip -d nodeB

Now start the first server using the full-ha profile:

$ ./standalone.sh -c standalone-full-ha.xml -Djboss.node.name=server1  

Connect to the server using the CLI:

$ ./jboss-cli.sh -c

The first change you should apply to your configuration is the cluster is the cluster password, in order to prevent any unwanted remote client to connect to the server:

/subsystem=messaging-activemq/server=default/:write-attribute(name=cluster-password,value=secretpassword)

Then, in order to connect to the cluster, you need to set the password in the System Property jboss.messaging.cluster.password accordingly:

/system-property=jboss.messaging.cluster.password:add(value=secretpassword)

Next, we need to configure the ha-policy, which determines the High Availability policy to be used by your cluster. This can be one of the following ones:

  • live-only: live only server with no HA capabilities apart from scale down.
  • replication: configuration for a replicated server, either master, slave or co-located.
  • shared-store: configuration for a shared store server, either master, slave or co-located.

Co-located Backup Servers are messaging servers which are running in the same JVM as another live server. The colocated backup server will inherit its configuration from the live erver it originated from, except for its name, which will be set to __colocated_backup_n__ (where n is the number of backups the server has created). When configuring colocated backup servers it is important to keep in mind two things:

  1. First, each server element in the configuration will need its own remote-connector and remote-acceptor that listen on unique ports. For example, a live server can be configured to listen on port 5445, while its backup uses 5446. The ports are defined in socket-binding elements that must be added to the default socket-binding-group. Cluster-related configuration elements in each server configuration will use the new remote-connector. The relevant configuration is included in each of the examples that follow.
  2. Secondly, you need to properly configure paths for journal related directories. For example, in a shared store colocated topology, both the live server and its backup, colocated on another live server, must be configured to share directory locations for the binding and message journals, for large messages, and for paging.

Here is how elect one messaging server to be “replication-master” by setting the ha-policy accordingly:

/subsystem=messaging-activemq/server=default/ha-policy=replication-master:add

You need to reload your configuration for changes to take effect:

reload

Now let’s query the ha-policy uses replication setting the current server as master:

 /subsystem=messaging-activemq/server=default:read-children-resources(child-type=ha-policy)
{
    "outcome" => "success",
    "result" => {"replication-master" => {   
        "check-for-live-server" => true,
        "cluster-name" => undefined,
        "group-name" => undefined,
        "initial-replication-sync-timeout" => 30000L
    }}
}

The attribute check-for-live-server when set to true checks are performed for a (live) server using our own server ID when starting up. This option is only necessary for performing ‘fail-back’ on replicating servers.

The cluster-name can be used to configure multiple cluster connections by setting an unique cluster name. This setting is used by a replicating backups and by live servers that may attempt fail-back.

Finally, group-name, when set remote backup servers will only pair with live servers with matching group-name.

Now let’s move to the second WildFly server. We will start it using a port offset to avoid a conflict with the live server:

$ ./standalone.sh -c standalone-full-ha.xml -Djboss.node.name=server2 -Djboss.socket.binding.port-offset=100 

Next connect to the server using the CLI and the target port (in our case we use an offset of 100 units):

$ ./jboss-cli.sh -c --controller=localhost:10090

In order to connect to the cluster, we need to set as well the cluster-password to be consistent with the first server:

/subsystem=messaging-activemq/server=default/:write-attribute(name=cluster-password,value=secretpassword)

Then, in order to connect to the cluster, you need to set the password in the System Property jboss.messaging.cluster.password accordingly:

/system-property=jboss.messaging.cluster.password:add(value=secretpassword)

Now, in order to elect another server as backup using replication-slave ha-policy, we will execute the following command:

[standalone@localhost:10090 /] /subsystem=messaging-activemq/server=default/ha-policy=replication-slave:add

You need to reload your configuration for changes to take effect:

reload

Finally, verify that the ha-policy has been updated accordingly:

 /subsystem=messaging-activemq/server=default:read-children-resources(child-type=ha-policy)
{
    "outcome" => "success",
    "result" => {"replication-slave" => {
        "allow-failback" => true,
        "cluster-name" => undefined,
        "group-name" => undefined,
        "initial-replication-sync-timeout" => 30000L,
        "max-saved-replicated-journal-size" => 2,
        "restart-backup" => true,
        "scale-down" => undefined,
        "scale-down-cluster-name" => undefined,
        "scale-down-connectors" => undefined,
        "scale-down-discovery-group" => undefined,
        "scale-down-group-name" => undefined
    }}
}

To confirm that the backup server has started and waiting to fail before it gets active, the following log will display in the console:

10:11:08,494 INFO  [org.apache.activemq.artemis.core.server] (AMQ229000: Activation for server ActiveMQServerImpl::serverUUID=null) AMQ221109: Apache ActiveMQ Artemis Backup Server version 2.10.1 [null] started, waiting live to fail before it gets active
10:11:09,054 INFO  [org.apache.activemq.artemis.core.server] (Thread-1 (ActiveMQ-client-netty-threads)) AMQ221024: Backup server ActiveMQServerImpl::serverUUID=8d2ccc85-2e0c-11ea-86ee-dc8b283ae3be is synchronized with live-server.
10:11:12,020 INFO  [org.apache.activemq.artemis.core.server] (Thread-1 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1859d2bc)) AMQ221031: backup announced

As proof of concept, if you crash the live server of your cluster, the backup will then become the new master:

10:15:38,831 INFO  [org.apache.activemq.artemis.core.server] (AMQ229000: Activation for server ActiveMQServerImpl::serverUUID=null) AMQ221007: Server is now live
10:15:38,861 INFO  [org.jboss.as.connector.deployment] (MSC service thread 1-5) WFLYJCA0007: Registered connection factory java:/JmsXA
10:15:38,888 INFO  [org.apache.activemq.artemis.ra] (MSC service thread 1-5) AMQ151007: Resource adaptor started

Example 2: JMS with Shared Store

In the second example, we will be configuring a JMS Cluster using a shared store. Start from a clean WildFly installation of two standalone servers as in the first example.

Start the first server using the full-ha profile:

$ ./standalone.sh -c standalone-full-ha.xml -Djboss.node.name=server1  

Then, connect to the server using the CLI:

$ ./jboss-cli.sh -c

The first change you should apply to your configuration is the cluster is the cluster password, in order to prevent any unwanted remote client to connect to the server:

/subsystem=messaging-activemq/server=default/:write-attribute(name=cluster-password,value=secretpassword)

Then, in order to connect to the cluster, you need to set the password in the System Property jboss.messaging.cluster.password accordingly:

/system-property=jboss.messaging.cluster.password:add(value=secretpassword)

Next, set the ha-policy to be “shared-store-master” with the attribute

/subsystem=messaging-activemq/server=default/ha-policy=shared-store-master:add(failover-on-server-shutdown=true)

TIP: When the attribute failover-on-server-shutdown is set to true, the server must failover when it is normally shutdown. The default is false.

Then, we need to choose a shared storage for a set of directories that are used by Artemis. So we will create the folling set of folders:

$ mkdir /home/artemis/shared/bindings
$ mkdir /home/artemis/shared/journal
$ mkdir /home/artemis/shared/largemessages
$ mkdir /home/artemis/shared/paging

Once that the directory structure is ready, let’s switch the default Artemis paths to the shared storage folders:

/subsystem=messaging-activemq/server=default/path=bindings-directory:write-attribute(name=path,value=/home/artemis/shared/bindings)
/subsystem=messaging-activemq/server=default/path=journal-directory:write-attribute(name=path,value=/home/artemis/shared/journal)
/subsystem=messaging-activemq/server=default/path=large-messages-directory:write-attribute(name=path,value=/home/artemis/shared/largemessages)
/subsystem=messaging-activemq/server=default/path=paging-directory:write-attribute(name=path,value=/home/artemis/shared/paging)

You need to reload your configuration for changes to take effect:

reload

Now query the ha-policy attribute to verify that it’s consistent with our changes:

/subsystem=messaging-activemq/server=default:read-children-resources(child-type=ha-policy)
{
    "outcome" => "success",
    "result" => {"shared-store-master" => {"failover-on-server-shutdown" => true}}
}

Now let’s move to the second WildFly server. We will start it using a port offset to avoid a conflict with the live server:

$ ./standalone.sh -c standalone-full-ha.xml -Djboss.node.name=server2 -Djboss.socket.binding.port-offset=100 

Next connect to the server using the CLI and the target port (in our case we use an offset of 100 units):

$ ./jboss-cli.sh -c --controller=localhost:10090

In order to connect to the cluster, we need to set as well the cluster-password to be consistent with the first server:

/subsystem=messaging-activemq/server=default/:write-attribute(name=cluster-password,value=secretpassword)

Then, in order to connect to the cluster, you need to set the password in the System Property jboss.messaging.cluster.password accordingly:

/system-property=jboss.messaging.cluster.password:add(value=secretpassword)

Now, configure the server to be a backup server, using a shared store:

/subsystem=messaging-activemq/server=default/ha-policy=shared-store-slave:add(failover-on-server-shutdown=true)

The backup server will use the same sets of directories as the live server:

/subsystem=messaging-activemq/server=default/path=bindings-directory:write-attribute(name=path,value=/home/francesco/jboss/shared/bindings)
/subsystem=messaging-activemq/server=default/path=journal-directory:write-attribute(name=path,value=/home/francesco/jboss/shared/journal)
/subsystem=messaging-activemq/server=default/path=large-messages-directory:write-attribute(name=path,value=/home/francesco/jboss/shared/largemessages)
/subsystem=messaging-activemq/server=default/path=paging-directory:write-attribute(name=path,value=/home/francesco/jboss/shared/paging)

You need to reload your configuration for changes to take effect:

reload

Verify that that the configuration of the ha-policy is consistent with the changes we have applied:

/subsystem=messaging-activemq/server=default:read-children-resources(child-type=ha-policy)
{
    "outcome" => "success",
    "result" => {"shared-store-slave" => {
        "allow-failback" => true,
        "failover-on-server-shutdown" => true,
        "restart-backup" => true,
        "scale-down" => undefined,
        "scale-down-cluster-name" => undefined,
        "scale-down-connectors" => undefined,
        "scale-down-discovery-group" => undefined,
        "scale-down-group-name" => undefined
    }}
}

HornetQ clustering (JBoss AS 7/JBoss EAP 6)

As most of the cluster configurations are done in Domain mode we will show both examples using a Managed Domain. The tweak is that, in order to discriminate between active nodes and backup nodes, using the same Profile, we need to use System Properties which will be overridden at Server level.

Shared Storage

Here is a sample Shared Storage configuration:

<paths>
        <path name="hornetq.shared.dir" path="/home/journal-A/bindings"/>
</paths>
. . .
    <subsystem xmlns="urn:jboss:domain:messaging:1.4">
        <hornetq-server>
            <persistence-enabled>true</persistence-enabled>
            <cluster-password>password</cluster-password>
            <backup>${jboss.messaging.hornetq.backup:false}</backup>
            <failover-on-shutdown>true</failover-on-shutdown>
            <shared-store>true</shared-store>
            <create-bindings-dir>true</create-bindings-dir>
            <create-journal-dir>true</create-journal-dir>
            <journal-type>NIO</journal-type>
            <journal-min-files>2</journal-min-files>
            <paging-directory path="paging" relative-to="hornetq.shared.dir"/>
            <bindings-directory path="bindings" relative-to="hornetq.shared.dir"/>
            <journal-directory path="journal" relative-to="hornetq.shared.dir"/>
            <large-messages-directory path="large-messages" relative-to="hornetq.shared.dir"/>
          ....
         </hornetq-server>
     </subsystem>

The key properties of this configuration are:

  • backup: true: marks a node as backup node. false: marks the server as an active node
  • shared-store: true : enables Shared Store. false: Uses data replication

Some other properties are the following ones:

  • failover-on-shutdown: true: allows failover to occur during “normal” shutdown. If false, failover will only occur on unexpected loss of node
  • allow-failback: true: a backup that is promoted to active will give control back if the original active node comes back online. If false, the backup node will remain the master node even if the original active node comes back.

In order to discriminate between the Active and Backup node, we will set the jboss.messaging.hornetq.backup property in the host.xml file. Here is the first server:

<server name="server-1" group="main-server-group" auto-start="true">
 	<system-properties>
      		<property name="jboss.messaging.hornetq.backup" value="false"/>
   	</system-properties>
       <socket-bindings port-offset="0"/>
</server>

Then on the other server jboss.messaging.hornetq.backup should be set to “true”

<server name="server-2" group="main-server-group" auto-start="true">
 	<system-properties>
      		<property name="jboss.messaging.hornetq.backup" value="true"/>
   	</system-properties>
         <socket-bindings port-offset="100"/>
</server>

Please notice that, for high availability purposes, the live server and the backup server must be installed on two separated physical (or virtual) hosts, provisioned in such a way to minimize the probability of both host failing at the same time. Also, highly available HornetQ requires access to reliable shared file system storage, so a file system such as GFS2 or a SAN must be made available to both hosts. HornetQ instances will store on the shared directory, among other things, their bindings and journal files. NFS v4 appropriately configured is also an option.

Data Replication

Here is a sample Data Replication configuration:

<subsystem xmlns="urn:jboss:domain:messaging:1.4">

    <hornetq-server>
	    <persistence-enabled>true</persistence-enabled>
	    <cluster-password>password</cluster-password>
	    <backup>${jboss.messaging.hornetq.backup:false}</backup>
	    <failover-on-shutdown>true</failover-on-shutdown>
	    <shared-store>false</shared-store>
	    <create-bindings-dir>true</create-bindings-dir>
	    <create-journal-dir>true</create-journal-dir>
	    <journal-type>NIO</journal-type>
	    <journal-min-files>2</journal-min-files>
	    <check-for-live-server>true</check-for-live-server>
	    <backup-group-name>NameOfLiveBackupPair</backup-group-name>
	   ....
  </hornetq-server>
</subsystem>

As you can see, the main difference is that we have setup the shared-store to false. Another property, named backup-group-name, links the active and backup servers together for replication. You need to configure the jboss.messaging.hornetq.backup property on the single nodes as we showed in the shared storage example

That’s all! Enjoy High Availability on your JBoss EAP or WildFly messaging subsystem!