Skip to main content
Skip table of contents

Sentinel (GermainUX’s Self-Monitoring System)

Sentinel is a monitoring system, spun off from GermainUX, designed to help ensure the availability, stability, and performance of GermainUX environments.

The Sentinel script continuously monitors critical infrastructure, services, logs, and application endpoints, then summarizes all findings into a centralized health report.

Monitoring Capabilities

Sentinel supports monitoring of the following components and behaviors:

  • Connects to GermainUX infrastructure services (such as Zookeeper and ActiveMQ) and records:

    • Availability and outages

    • Queue usage and related metrics

  • Monitors a configurable list of operating system services and reports:

    • Availability status

    • CPU utilization

    • Memory usage

  • Monitors configurable log files and checks:

    • Last write/update time

    • Errors or warning thresholds

    • Stale or inactive logs

  • Monitors configurable HTTP endpoints and reports:

    • Availability status

    • HTTP response validation

    • Response performance

  • Generates and sends consolidated health reports via email, with configurable conditions controlling when reports should be sent

Health Status Classification

Sentinel categorizes monitored software features using the following status levels:

  • RED — Indicates a software feature that is failing, unavailable, or critically impacted

  • ORANGE — Indicates a software feature experiencing degraded performance, slowness, warnings, or intermittent errors

  • GREEN — Indicates a software feature that is healthy, available, and performing normally

Reporting

Sentinel summarizes all monitoring findings into a single consolidated report to simplify operational visibility and accelerate issue detection.

The attached example provides a sample report generated by the Sentinel system.

Please note that the exact content, thresholds, formatting, and delivery behavior of Sentinel reports may vary depending on the specific implementation and configuration of the environment.

Example of a Report sent by Sentinel script:

Status

Germain Service

Check

Info

GermainEngineManager-apsep03050

LogActivity

CODE
Path:     '\\serverABC\e$\Software\Germain\GermainEngineProd\logs\EngineManager.log'
Modified: 2/20/2020, 9:59:45AM (0.52 minutes ago)
Errors:   7
Warn:     Errors present in recent log.

ActiveMQ

AvailabilityCheck

Status: Running, PID(1296 9708)

ActiveMQ

BrokerStats | localhost

Temp Percent: 0 | MemoryPercent: 0 | StorePercent: 0

ActiveMQ

QueueStats | apm.action

QueueSize: 0 | ConsumerCount: 1 | EnqueueCount: 28656 | DequeueCount: 28656

ActiveMQ

QueueStats | apm.analytics

QueueSize: 0 | ConsumerCount: 1 | EnqueueCount: 15729162 | DequeueCount: 15729162

ActiveMQ

QueueStats | apm.session

QueueSize: 0 | ConsumerCount: 1 | EnqueueCount: 0 | DequeueCount: 0

ActiveMQ

QueueStats | apm.storage

QueueSize: 0 | ConsumerCount: 2 | EnqueueCount: 7480961 | DequeueCount: 7480961

ActiveMQ

QueueStats | apm.storage.analytics

QueueSize: 0 | ConsumerCount: 2 | EnqueueCount: 189809 | DequeueCount: 189809

GermainActionServices

AvailabilityCheck

Status: Running, PID(1636)

GermainActionServices

LogActivity

CODE
Path:     'E:\Software\GermainService\var\logs\action-services.log'
Modified: 2/20/2020, 10:00:04AM (0.19 minutes ago)
Errors:   0

GermainAggregatorServices

AvailabilityCheck

Status: Running, PID(6720)

GermainAggregatorServices

LogActivity

CODE
Path:     'E:\Software\GermainAPMService\var\logs\aggregator-services.log'
Modified: 2/20/2020, 10:00:15AM (0.01 minutes ago)
Errors:   0

GermainAnalyticsServices

AvailabilityCheck

Status: Running, PID(1624)

GermainAnalyticsServices

LogActivity

CODE
Path:     '\\serverABC\e$\Software\GermainAPMService\var\logs\analytics-services.log'
Modified: 2/20/2020, 10:00:09AM (0.11 minutes ago)
Errors:   0

GermainAPMConfigServices

EndpointAvailability

Rest Endpoint Response Code: 200

GermainAPMConfigServices

EndpointAvailability

Rest Endpoint Response Code: 200

GermainAPMConfigServices-apsep02522

LogActivity

CODE
Path:     'E:\Software\apache-tomcat-8\logs\config-services.log'
Modified: 2/20/2020, 10:00:07AM (0.09 minutes ago)
Errors:   0

GermainAPMConfigServices-apsep02523

LogActivity

CODE
Path:     '\\serverABC\e$\Software\apache-tomcat-8\logs\config-services.log'
Modified: 2/20/2020, 9:59:45AM (0.44 minutes ago)
Errors:   0

GermainAPMEnginesProd

AvailabilityCheck

Status: Running, PID(16580 4704)

GermainAPMIngestionServices-apsep02522

LogActivity

CODE
Path:     'E:\Software\apache-tomcat-8\logs\ingestion-services.log'
Modified: 2/20/2020, 10:00:11AM (0.02 minutes ago)
Errors:   0

GermainAPMIngestionServices-apsep02523

LogActivity

CODE
Path:     '\\serverABC\e$\Software\apache-tomcat-8\logs\ingestion-services.log'
Modified: 2/20/2020, 10:00:09AM (0.04 minutes ago)
Errors:   0

GermainAPMQueryServices

EndpointAvailability

Rest Endpoint Response Code: 200

GermainAPMQueryServices

EndpointAvailability

Rest Endpoint Response Code: 200

GermainAPMQueryServices-apsep02522

LogActivity

CODE
Path:     'E:\Software\apache-tomcat-8\logs\query-services.log'
Modified: 2/20/2020, 9:59:54AM (0.29 minutes ago)
Errors:   0

GermainAPMQueryServices-apsep02523

LogActivity

CODE
Path:     '\\apsep02523\e$\Software\apache-tomcat-8\logs\query-services.log'
Modified: 2/20/2020, 9:59:52AM (0.32 minutes ago)
Errors:   0

GermainEngineManager-apsep03069

LogActivity

CODE
Path:     '\\apsep03069\e$\Software\Germain\GermainAPMEngineProd\logs\EngineManager.log'
Modified: 2/20/2020, 9:59:58AM (0.32 minutes ago)
Errors:   0

GermainEngineManagerProd

AvailabilityCheck

Status: Running, PID(121228)

GermainSessionTrackingServices

AvailabilityCheck

Status: Running, PID(1060)

GermainSessionTrackingServices

LogActivity

CODE
Path:     '\\serverABC\e$\Software\GermainAPMService\var\logs\session-tracking.log'
Modified: 2/20/2020, 10:00:01AM (0.26 minutes ago)
Errors:   0

GermainStorageServices

AvailabilityCheck

Status: Running, PID(1648)

GermainStorageServices

LogActivity

CODE
Path:     'E:\Software\GermainAPMService\var\logs\storage-services.log'
Modified: 2/20/2020, 10:00:11AM (0.08 minutes ago)
Errors:   0

Sentinel in Kubernetes Environments

Potential Issue with kubectl exec and Unresponsive Pods

When running Sentinel monitoring jobs through a scheduled cron task that relies on kubectl exec, it is important to understand the behavior of Kubernetes when the target pod becomes hung or unresponsive.

kubectl exec is a synchronous operation that depends entirely on:

  • Pod responsiveness

  • Kubernetes API availability

  • Network connectivity between the control plane and the pod

If the target pod enters a hung or unhealthy state, the kubectl exec command may block indefinitely (or until the Kubernetes API timeout is reached).

Because Sentinel is commonly triggered using a cron scheduler, each scheduled execution starts a new kubectl exec process. If a previous execution remains blocked, subsequent executions may overlap or appear to “queue up.” Once the pod becomes responsive again, multiple delayed executions may run nearly simultaneously.

This behavior is expected and is a limitation of using kubectl exec for scheduled remote execution. kubectl exec is not queue-based or asynchronous.

Recommended Solutions

Recommended Approach (Preferred)

Run Sentinel directly as a cron job inside the target pod.

This avoids dependency on external kubectl exec calls and eliminates blocking behavior caused by pod responsiveness issues.

Please coordinate with your Kubernetes cluster administrators to configure an internal cron scheduler within the pod or container environment.

Alternative Approach

If Sentinel must be executed externally using kubectl exec, implement both:

  • A timeout mechanism

  • A lock mechanism to prevent overlapping executions

Additional recommendations:

  • Remove the -it flags from kubectl exec

  • Use timeout to terminate stuck executions

  • Use flock to ensure only one Sentinel execution runs at a time

Example Cron Wrapper Script

CODE
#!/bin/bash
(
  flock -n 9 || { echo "Previous run still active, skipping."; exit 1; }

  timeout 120 kubectl exec stage-cgw-0 -- bash -c "cd /persistent/germain/sentinel; ./sentinel.sh"

  timeout 120 kubectl exec stage-sai-0 -- bash -c "cd /persistent/germain/sentinel; ./sentinel.sh"

  timeout 120 kubectl exec stage-ses-0 -- bash -c "cd /persistent/germain/sentinel; ./sentinel.sh"

) 9>/var/lock/sentinel-cron.lock

Additional Recommendations

  • Monitor pod health proactively to reduce hung-state occurrences

  • Configure Kubernetes liveness/readiness probes appropriately

  • Review Kubernetes API timeout settings if long-running executions are expected

  • Consider moving Sentinel execution to Kubernetes-native CronJob resources when possible

Service: Enterprise

Feature Availability: 2023.1

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.