Sentinel (GermainUX’s Self-Monitoring System)

Sentinel is a monitoring system, spun off from GermainUX, designed to help ensure the availability, stability, and performance of GermainUX environments.

The Sentinel script continuously monitors critical infrastructure, services, logs, and application endpoints, then summarizes all findings into a centralized health report.

Monitoring Capabilities

Sentinel supports monitoring of the following components and behaviors:

  • Connects to GermainUX infrastructure services (such as Zookeeper and ActiveMQ) and records:

    • Availability and outages

    • Queue usage and related metrics

  • Monitors a configurable list of operating system services and reports:

    • Availability status

    • CPU utilization

    • Memory usage

  • Monitors configurable log files and checks:

    • Last write/update time

    • Errors or warning thresholds

    • Stale or inactive logs

  • Monitors configurable HTTP endpoints and reports:

    • Availability status

    • HTTP response validation

    • Response performance

  • Generates and sends consolidated health reports via email, with configurable conditions controlling when reports should be sent

Health Status Classification

Sentinel categorizes monitored software features using the following status levels:

  • RED — Indicates a software feature that is failing, unavailable, or critically impacted

  • ORANGE — Indicates a software feature experiencing degraded performance, slowness, warnings, or intermittent errors

  • GREEN — Indicates a software feature that is healthy, available, and performing normally

Reporting

Sentinel summarizes all monitoring findings into a single consolidated report to simplify operational visibility and accelerate issue detection.

The attached example provides a sample report generated by the Sentinel system.

Please note that the exact content, thresholds, formatting, and delivery behavior of Sentinel reports may vary depending on the specific implementation and configuration of the environment.

Example of a Report sent by Sentinel script:

Status

Germain Service

Check

Info


GermainEngineManager-apsep03050

LogActivity

Path:     '\\serverABC\e$\Software\Germain\GermainEngineProd\logs\EngineManager.log'
Modified: 2/20/2020, 9:59:45AM (0.52 minutes ago)
Errors:   7
Warn:     Errors present in recent log.


ActiveMQ

AvailabilityCheck

Status: Running, PID(1296 9708)


ActiveMQ

BrokerStats | localhost

Temp Percent: 0 | MemoryPercent: 0 | StorePercent: 0


ActiveMQ

QueueStats | apm.action

QueueSize: 0 | ConsumerCount: 1 | EnqueueCount: 28656 | DequeueCount: 28656


ActiveMQ

QueueStats | apm.analytics

QueueSize: 0 | ConsumerCount: 1 | EnqueueCount: 15729162 | DequeueCount: 15729162


ActiveMQ

QueueStats | apm.session

QueueSize: 0 | ConsumerCount: 1 | EnqueueCount: 0 | DequeueCount: 0


ActiveMQ

QueueStats | apm.storage

QueueSize: 0 | ConsumerCount: 2 | EnqueueCount: 7480961 | DequeueCount: 7480961


ActiveMQ

QueueStats | apm.storage.analytics

QueueSize: 0 | ConsumerCount: 2 | EnqueueCount: 189809 | DequeueCount: 189809


GermainActionServices

AvailabilityCheck

Status: Running, PID(1636)


GermainActionServices

LogActivity

Path:     'E:\Software\GermainService\var\logs\action-services.log'
Modified: 2/20/2020, 10:00:04AM (0.19 minutes ago)
Errors:   0


GermainAggregatorServices

AvailabilityCheck

Status: Running, PID(6720)


GermainAggregatorServices

LogActivity

Path:     'E:\Software\GermainAPMService\var\logs\aggregator-services.log'
Modified: 2/20/2020, 10:00:15AM (0.01 minutes ago)
Errors:   0


GermainAnalyticsServices

AvailabilityCheck

Status: Running, PID(1624)


GermainAnalyticsServices

LogActivity

Path:     '\\serverABC\e$\Software\GermainAPMService\var\logs\analytics-services.log'
Modified: 2/20/2020, 10:00:09AM (0.11 minutes ago)
Errors:   0


GermainAPMConfigServices

EndpointAvailability

Rest Endpoint Response Code: 200


GermainAPMConfigServices

EndpointAvailability

Rest Endpoint Response Code: 200


GermainAPMConfigServices-apsep02522

LogActivity

Path:     'E:\Software\apache-tomcat-8\logs\config-services.log'
Modified: 2/20/2020, 10:00:07AM (0.09 minutes ago)
Errors:   0


GermainAPMConfigServices-apsep02523

LogActivity

Path:     '\\serverABC\e$\Software\apache-tomcat-8\logs\config-services.log'
Modified: 2/20/2020, 9:59:45AM (0.44 minutes ago)
Errors:   0


GermainAPMEnginesProd

AvailabilityCheck

Status: Running, PID(16580 4704)


GermainAPMIngestionServices-apsep02522

LogActivity

Path:     'E:\Software\apache-tomcat-8\logs\ingestion-services.log'
Modified: 2/20/2020, 10:00:11AM (0.02 minutes ago)
Errors:   0


GermainAPMIngestionServices-apsep02523

LogActivity

Path:     '\\serverABC\e$\Software\apache-tomcat-8\logs\ingestion-services.log'
Modified: 2/20/2020, 10:00:09AM (0.04 minutes ago)
Errors:   0


GermainAPMQueryServices

EndpointAvailability

Rest Endpoint Response Code: 200


GermainAPMQueryServices

EndpointAvailability

Rest Endpoint Response Code: 200


GermainAPMQueryServices-apsep02522

LogActivity

Path:     'E:\Software\apache-tomcat-8\logs\query-services.log'
Modified: 2/20/2020, 9:59:54AM (0.29 minutes ago)
Errors:   0


GermainAPMQueryServices-apsep02523

LogActivity

Path:     '\\apsep02523\e$\Software\apache-tomcat-8\logs\query-services.log'
Modified: 2/20/2020, 9:59:52AM (0.32 minutes ago)
Errors:   0


GermainEngineManager-apsep03069

LogActivity

Path:     '\\apsep03069\e$\Software\Germain\GermainAPMEngineProd\logs\EngineManager.log'
Modified: 2/20/2020, 9:59:58AM (0.32 minutes ago)
Errors:   0


GermainEngineManagerProd

AvailabilityCheck

Status: Running, PID(121228)


GermainSessionTrackingServices

AvailabilityCheck

Status: Running, PID(1060)


GermainSessionTrackingServices

LogActivity

Path:     '\\serverABC\e$\Software\GermainAPMService\var\logs\session-tracking.log'
Modified: 2/20/2020, 10:00:01AM (0.26 minutes ago)
Errors:   0


GermainStorageServices

AvailabilityCheck

Status: Running, PID(1648)


GermainStorageServices

LogActivity

Path:     'E:\Software\GermainAPMService\var\logs\storage-services.log'
Modified: 2/20/2020, 10:00:11AM (0.08 minutes ago)
Errors:   0


Sentinel in Kubernetes Environments

Potential Issue with kubectl exec and Unresponsive Pods

When running Sentinel monitoring jobs through a scheduled cron task that relies on kubectl exec, it is important to understand the behavior of Kubernetes when the target pod becomes hung or unresponsive.

kubectl exec is a synchronous operation that depends entirely on:

  • Pod responsiveness

  • Kubernetes API availability

  • Network connectivity between the control plane and the pod

If the target pod enters a hung or unhealthy state, the kubectl exec command may block indefinitely (or until the Kubernetes API timeout is reached).

Because Sentinel is commonly triggered using a cron scheduler, each scheduled execution starts a new kubectl exec process. If a previous execution remains blocked, subsequent executions may overlap or appear to “queue up.” Once the pod becomes responsive again, multiple delayed executions may run nearly simultaneously.

This behavior is expected and is a limitation of using kubectl exec for scheduled remote execution. kubectl exec is not queue-based or asynchronous.

Run Sentinel directly as a cron job inside the target pod.

This avoids dependency on external kubectl exec calls and eliminates blocking behavior caused by pod responsiveness issues.

Please coordinate with your Kubernetes cluster administrators to configure an internal cron scheduler within the pod or container environment.

Alternative Approach

If Sentinel must be executed externally using kubectl exec, implement both:

  • A timeout mechanism

  • A lock mechanism to prevent overlapping executions

Additional recommendations:

  • Remove the -it flags from kubectl exec

  • Use timeout to terminate stuck executions

  • Use flock to ensure only one Sentinel execution runs at a time

Example Cron Wrapper Script

#!/bin/bash
(
  flock -n 9 || { echo "Previous run still active, skipping."; exit 1; }

  timeout 120 kubectl exec stage-cgw-0 -- bash -c "cd /persistent/germain/sentinel; ./sentinel.sh"

  timeout 120 kubectl exec stage-sai-0 -- bash -c "cd /persistent/germain/sentinel; ./sentinel.sh"

  timeout 120 kubectl exec stage-ses-0 -- bash -c "cd /persistent/germain/sentinel; ./sentinel.sh"

) 9>/var/lock/sentinel-cron.lock

Additional Recommendations

  • Monitor pod health proactively to reduce hung-state occurrences

  • Configure Kubernetes liveness/readiness probes appropriately

  • Review Kubernetes API timeout settings if long-running executions are expected

  • Consider moving Sentinel execution to Kubernetes-native CronJob resources when possible

Service: Enterprise

Feature Availability: 2023.1