Monitoring Enterprise applications with OpenShift and Prometheus

This article covers how to monitor Java Enterprise applications using OpenShift Container Platform 4.6. For the purpose of this example, we will be using JBoss Enterprise Application platform Expansion pack which empowers JBoss EAP with Microprofile API, such as the Metrics API.

The monitoring solution we will use is Prometheus which is available out of the box in OpenShift 4.6 and later. The advantage of using the OpenShift’s embedded Prometheus is that you will use a single Console (the Development Console of OpenShift) to view and manage your metrics.

Older versions of OpenShift can still use Prometheus, however you will need to install it separately and access metric through its Console.

For those who are new to it, Prometheus, is an increasingly popular toolkit that provides monitoring and alerts across applications and servers.

The primary components of Prometheus include

  • Prometheus server which scrapes and stores time series data
  • A push gateway for supporting short-lived jobs
  • Special-purpose exporters for services like HAProxy, StatsD, Graphite, etc.
  • An Alertmanager to handle alerts
  • Client libraries for instrumenting application code

To run our demo, we will use Code Ready Containers (CRC), which provides simple startup to an OpenShift single node cluster. If you want to learn how to get started with CRC, check this tutorial:

Enabling application monitoring on OpenShift

Before starting CRC, you need to set the property “enable-cluster-monitoring” to true as the monitoring, alerting, and telemetry are disable by default on CRC.

$ crc config set enable-cluster-monitoring true

Next start CRC:

$ crc start
   . . . . .

Started the OpenShift cluster.

The server is accessible via web console at:
  https://console-openshift-console.apps-crc.testing

Log in as administrator:
  Username: kubeadmin
  Password: T3sJD-jjueE-2BnHe-ftNBw

Log in as user:
  Username: developer
  Password: developer

Use the 'oc' command line interface:
  $ eval $(crc oc-env)
  $ oc login -u developer https://api.crc.testing:6443

Now you can login as kubeadmin user:

$ oc login -u kubeadmin https://api.crc.testing:6443

Now, in order to enable the embedded Prometheus, we will edit the cluster-monitoring-config ConfigMap in the openshift-monitoring namespace. We need to set the attribute “techPreviewUserWorkload” to true:

$ oc -n openshift-monitoring edit configmap cluster-monitoring-config

And set the techPreviewUserWorkload setting to true under data/config.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    techPreviewUserWorkload:
      enabled: true

Great. You will see that in a minute the Prometheus, Thanos and Alert Manager Pods will start:

$ oc get pods -n openshift-monitoring
NAME                                           READY   STATUS    RESTARTS   AGE
alertmanager-main-0                            5/5     Running   0          58d
alertmanager-main-1                            5/5     Running   0          58d
alertmanager-main-2                            5/5     Running   0          58d
cluster-monitoring-operator-686555c948-7lq8z   2/2     Running   0          58d
grafana-6f4d96d7fd-5zz9p                       2/2     Running   0          58d
kube-state-metrics-749954d685-h9nfx            3/3     Running   0          58d
node-exporter-5tk5t                            2/2     Running   0          58d
openshift-state-metrics-587d97bb47-rqljs       3/3     Running   0          58d
prometheus-adapter-85488fffd6-8qpdz            1/1     Running   0          2d8h
prometheus-adapter-85488fffd6-nw9kz            1/1     Running   0          2d8h
prometheus-k8s-0                               7/7     Running   0          58d
prometheus-k8s-1                               7/7     Running   0          58d
prometheus-operator-658ccb589c-qbgwq           2/2     Running   0          58d
telemeter-client-66c98fb87-7qfv5               3/3     Running   0          58d
thanos-querier-7cddb86d6c-9vwmc                5/5     Running   0          58d
thanos-querier-7cddb86d6c-hlmb7                5/5     Running   0          58d

Deploying an Enterprise application which produces Metrics

The application we will monitor is a WildFly quickstart which contains MicroProfile Metrics in a REST Endpoint: https://github.com/wildfly/quickstart/tree/master/microprofile-metrics

The main method in the REST Endpoint calculates if a number is a prime number, using the following annotations from the Metrics API:

@GET
@Path("/prime/{number}")
@Produces(MediaType.TEXT_PLAIN)
@Counted(name = "performedChecks", displayName="Performed Checks", description = "How many prime checks have been performed.")
@Timed(name = "checksTimer", absolute = true, description = "A measure of how long it takes to perform the primality test.", unit = MetricUnits.MILLISECONDS)
@Metered(name = "checkIfPrimeFrequency", absolute = true)
public String checkIfPrime(@PathParam("number") long number) {
   // code here
}

To get started, we will create a project named “xp-demo”:

$ oc new-project xp-demo

Then, in order to run the EAP XP application, we need at at first deploy the required template files for JBoss EAP Expansion Pack:

oc replace --force -n openshift -f https://raw.githubusercontent.com/jboss-container-images/jboss-eap-openshift-templates/eap-xp2/jboss-eap-xp2-openjdk11-openshift.json

oc replace --force -n openshift -f https://raw.githubusercontent.com/jboss-container-images/jboss-eap-openshift-templates/eap-xp2/templates/eap-xp2-basic-s2i.json

Now we can deploy the actual application, using as template parameter the EAP image jboss-eap-xp2-openjdk11-openshift:2.0:

 oc new-app --template=eap-xp2-basic-s2i -p APPLICATION_NAME=eap-demo -p EAP_IMAGE_NAME=jboss-eap-xp2-openjdk11-openshift:2.0 -p EAP_RUNTIME_IMAGE_NAME=jboss-eap-xp2-openjdk11-runtime-openshift:2.0 -p IMAGE_STREAM_NAMESPACE=openshift -p SOURCE_REPOSITORY_URL=https://github.com/wildfly/quickstart -p SOURCE_REPOSITORY_REF=23.0.0.Final -p CONTEXT_DIR="microprofile-metrics"

Watch for the EAP Pods to come up:

$ oc get pods
NAME                               READY   STATUS      RESTARTS   AGE
eap-demo-1-6dw28                   1/1     Running     0          22h
eap-demo-1-deploy                  0/1     Completed   0          22h
eap-demo-2-build                   0/1     Completed   0          22h
eap-demo-build-artifacts-1-build   0/1     Completed   0          22h

Since the EAP metrics are available on port 9990, we will create an additional service named “eap-xp2-basic-app-admin” which targets the port 9990 of the “eap-demo” application:

apiVersion: v1
kind: Service
metadata:
  annotations:
  labels:
    app: eap-xp2-basic-s2i-admin
    app.kubernetes.io/component: eap-xp2-basic-s2i-admin
    app.kubernetes.io/instance: eap-xp2-basic-s2i-admin
    application: eap-xp2-basic-app-admin
    template: eap-xp2-basic-s2i-admin
    xpaas: "1.0"
  name: eap-xp2-basic-app-admin
  namespace: xp-demo
spec:
  ports:
    - name: admin
      port: 9990
      protocol: TCP
      targetPort: 9990
  selector:
    deploymentConfig: eap-demo
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

Save it in a file, for example service.yml and create it with:

$ oc create -f service.yml

Check that the list of services now include also the “eap-xp2-basic-app-admin” bound to port 9990:

$ oc get services
NAME                      TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
eap-demo                  ClusterIP   10.217.4.139   <none>        8080/TCP   22h
eap-demo-ping             ClusterIP   None           <none>        8888/TCP   22h
eap-xp2-basic-app-admin   ClusterIP   10.217.5.217   <none>        9990/TCP   22h

As optional step, we have exposed also a Route for this service, on the path “/metrics” so that we can check metrics also from outside the cluster:

$ oc expose svc/eap-xp2-basic-app-admin --path=/metrics

Here’s the available Routes:

$ oc get routes
NAME          HOST/PORT                              PATH       SERVICES                  PORT    TERMINATION     WILDCARD
admin-route   admin-route-xp-demo.apps-crc.testing   /metrics   eap-xp2-basic-app-admin   admin                   None
eap-demo      eap-demo-xp-demo.apps-crc.testing                 eap-demo                  <all>   edge/Redirect   None

The last resource we need to create, is a ServiceMonitor which will be used by Prometheus to select the service to monitor. For this purpose, we will include a selector matching the “eap-xp2-basic-s2i-admin”:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: prometheus-example-monitor
  name: prometheus-example-monitor
  namespace: xp-demo
spec:
  endpoints:
  - interval: 30s
    port: admin
    scheme: http
  selector:
    matchLabels:
      app: eap-xp2-basic-s2i-admin

Save it in a file, for example servicemonitor.yml and create it with:

$ oc create -f servicemonitor.yml

As Prometheus by default will scrape metrics on the “/metrics” URI, nothing else but the service binding is required.

Now let’s turn to the Web admin console.

Accessing Metrics from the Enterprise application

Log into the OpenShift console and Switch to the Developer tab. You will see that a Monitoring tab is available:

Select the Montoring Tab. You will enter in the Monitoring Dashboard which contains a set of line charts (CPU, Memory, Bandwidth, Packets transmitted) for the EAP Pods:

Now switch to the “Metrics” tab. From there you can either select one of the metrics seen in the Dashboards or you can opt for Custom Metrics. We will choose a Custom Metric. Which one? to know the available metrics you can query the admin-route-xp-demo.apps-crc.testing which targets EAP on port 9990:

Here is an excerpt from it:

application_duplicatedCounter_total{type="original"} 0.0
application_duplicatedCounter_total{type="copy"} 0.0
# HELP application_org_wildfly_quickstarts_microprofile_metrics_PrimeNumberChecker_parallelAccess_current Number of parallel accesses
# TYPE application_org_wildfly_quickstarts_microprofile_metrics_PrimeNumberChecker_parallelAccess_current gauge
application_org_wildfly_quickstarts_microprofile_metrics_PrimeNumberChecker_parallelAccess_current 0.0
# TYPE application_org_wildfly_quickstarts_microprofile_metrics_PrimeNumberChecker_parallelAccess_max gauge
application_org_wildfly_quickstarts_microprofile_metrics_PrimeNumberChecker_parallelAccess_max 0.0
# TYPE application_org_wildfly_quickstarts_microprofile_metrics_PrimeNumberChecker_parallelAccess_min gauge
application_org_wildfly_quickstarts_microprofile_metrics_PrimeNumberChecker_parallelAccess_min 0.0
# HELP application_org_wildfly_quickstarts_microprofile_metrics_PrimeNumberChecker_highestPrimeNumberSoFar Highest prime number so far.
# TYPE application_org_wildfly_quickstarts_microprofile_metrics_PrimeNumberChecker_highestPrimeNumberSoFar gauge
application_org_wildfly_quickstarts_microprofile_metrics_PrimeNumberChecker_highestPrimeNumberSoFar 13.0
# HELP application_org_wildfly_quickstarts_microprofile_metrics_PrimeNumberChecker_performedChecks_total How many prime checks have been performed.
# TYPE application_org_wildfly_quickstarts_microprofile_metrics_PrimeNumberChecker_performedChecks_total counter
application_org_wildfly_quickstarts_microprofile_metrics_PrimeNumberChecker_performedChecks_total 7.0
# TYPE application_injectedCounter_total counter
application_injectedCounter_total 0.0
# TYPE application_checkIfPrimeFrequency_total counter
application_checkIfPrimeFrequency_total 7.0
# TYPE application_checkIfPrimeFrequency_rate_per_second gauge
application_checkIfPrimeFrequency_rate_per_second 0.009764570078023867
# TYPE application_checkIfPrimeFrequency_one_min_rate_per_second gauge
application_checkIfPrimeFrequency_one_min_rate_per_second 0.01514093009622217
# TYPE application_checkIfPrimeFrequency_five_min_rate_per_second gauge
application_checkIfPrimeFrequency_five_min_rate_per_second 0.015280392552156003
# TYPE application_checkIfPrimeFrequency_fifteen_min_rate_per_second gauge
application_checkIfPrimeFrequency_fifteen_min_rate_per_second 0.0067481340497595925
# TYPE application_checksTimer_rate_per_second gauge
application_checksTimer_rate_per_second 0.009764593427401155
# TYPE application_checksTimer_one_min_rate_per_second gauge
application_checksTimer_one_min_rate_per_second 0.01514093009622217
# TYPE application_checksTimer_five_min_rate_per_second gauge
application_checksTimer_five_min_rate_per_second 0.015280392552156003
# TYPE application_checksTimer_fifteen_min_rate_per_second gauge
application_checksTimer_fifteen_min_rate_per_second 0.0067481340497595925
# TYPE application_checksTimer_min_seconds gauge
application_checksTimer_min_seconds 1.301E-5
# TYPE application_checksTimer_max_seconds gauge
application_checksTimer_max_seconds 1.08862E-4
# TYPE application_checksTimer_mean_seconds gauge
application_checksTimer_mean_seconds 2.3666899949653865E-5
# TYPE application_checksTimer_stddev_seconds gauge
application_checksTimer_stddev_seconds 2.4731858133453515E-5
# HELP application_checksTimer_seconds A measure of how long it takes to perform the primality test.
# TYPE application_checksTimer_seconds summary
application_checksTimer_seconds_count 7.0

For example, we will be checking a Performance metric, the “application_checksTimer_max_seconds” metric, which is the maximum number (in seconds) required to check for prime number:

Now let’s start requesting the “/prime” endpoint, passing a number to be checked, for example: https://eap-demo-xp-demo.apps-crc.testing/prime/13

You will see that the Custom Metric will track our metric:

 

Adding Alerts for relevant metrics

Alerting with Prometheus is separated into two parts. Alerting rules in Prometheus servers send alerts to an Alertmanager. The Alertmanager then manages those alerts, including silencing, inhibition, aggregation and sending out notifications via methods such as email, on-call notification systems, and chat platforms.

As an example, we will add a sample alert to capture the value of “application_checksTimer_max_seconds” when it goes over a certain threshold (3 ms). Here is our sample metric file:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: example-alert
  namespace: xp-demo
spec:
  groups:
  - name: performance
    rules:
    - alert: CheckTimerAlert
      expr: version{job="application_checksTimer_max_seconds"} > 0.003

Save it in a file, for example alert.yml and create it with:

$ oc create -f alert.yml

Let’s request some time-consuming prime numbers calculations such as: https://eap-demo-xp-demo.apps-crc.testing/prime/10001891

You will now see that as the count for that metric goes above the defined threshold (5), an alert will be available in the “Alerts” tab:

Clicking on it, you will see the condition which activated the Alert:

That’s all.

We have just covered how to enable application monitoring on OpenShift 4.6 (or newer). We have therefore enabled the embedded Prometheus service and deployed an Enterprise application which emits metrics that can be scraped by Prometheus.

Next, we have showed how this metrics can be captured by OpenShift developer Console and how to bind alerts for critical metrics.