Skip to main content
Skip table of contents

Setting up Disaster Recovery

The following steps are a generic Disaster Recovery plan to taken as a template when planing environment migrations from a Productive environment (PROD) to a Recovery/Failover (RECOVERY) environment. As any templates, make sure to review specific and special cases before proceeding with it.

1. Stop All Germain Engines on Production Nodes

Linux example

First, you need to login to each Germain PROD engine servers and run the following commands:

BASH
sudo su
ps -ef | grep manager
kill < PID from step #2 >

After these, wait for all engine processes to stop. It can be verified using the following command:

BASH
ps -ef | grep engine

2. Update Kafka Url To Point to Recovery Kafka

  • Login to Germain and navigate to following URL: <https://<PROD domain>/germainapm/console/s/#germain.apm.monitoringClient.queueConnectors(Kafka)

  • Update the URL field from your PROD Kafka instance to your RECOVERY instance.

    image-20241008-102417.png

    Kafka connector configuration - Germain UX

3. Stop All Germain Services on Prod Nodes

Linux example

Login to your Germain PROD server, then run the following commands to kill Germain services:

BASH
sudo su
ps -ef | grep germain | grep <service>  # Example service names: [ aggregation, action, session, analytics, storage ]
kill < PID from step #2 >

For each service, wait and confirm the service is down.

Having the services down, then proceed by shutting down Tomcat:

BASH
cd /germain/apache-tomcat-8/bin   # Navigate to Tomcat's bin folder
./shutdown.sh

You can confirm if Tomcat is down by running the following command:

BASH
ps -ef | grep tomcat

4. Validate All the Infrastructure Services Are Running As Expected on Recovery

  • Kafka

  • Zookeeper

  • Hazelcast

  • ElasticSearch

5. Start All Germain Services on Recovery Nodes

Linux example

The service startup order is the following:

  1. Kafka

  2. Tomcat

  3. Zookeeper

    BASH
    cd /germain/solr/apache-zookeeper-3.7.1
    nohup bin/zkServer.sh start # confirm if the sevice is up "ps -ef | grep zookeeper"
  4. ElasticSearch

    CODE
    cd /germain/elasticsearch-7.17.7
    nohup bin/elasticsearch & # confirm if the sevice is up "ps -ef | grep elasticsearch"
  5. Hazelcast

    BASH
    cd /germain/hazelcast-5.3.1
    nohup bin/hz start &  # confirm the sevice is up "ps -ef | grep hazelcast"
  6. Storage

  7. Session Tracking

    BASH
    cd /germain/services
    nohup bin/sessiontracking-services &
    cd /germain/services/var/logs  # verify the status in <service name>.log
  8. Analytics

    BASH
    cd /germain/services
    nohup bin/analytics-services &
    cd /germain/services/var/logs # verify the status in <service name>.log
  9. Aggregate

    BASH
    cd /germain/services
    nohup bin/action-services &
    cd /germain/services/var/logs  # verify the status in <service name>.log
  10. Action

    BASH
    cd /germain/services
    nohup bin/aggregate-services &
    cd /germain/services/var/logs  # verify the status in <service name>.log

6. Start All Germain Engines on Recovery Nodes

IMPORTANT steps to take BEFORE starting the engines

  • Remove the session.txt file from each engine node before starting.

  • Modify the hostname from the Germain state screen of all nodes before starting. (From PROD to RECOVERY; eg. PROD_***** to RECV_*****)

Linux example

Login to your Germain RECOVERY server, then run the following commands to startup the Germain engines:

CODE
sudo su
ps -ef | grep engine  # Confirm no old engines are running
cd /ebay/germain/engine
nohup bin/startEngineManager.sh &

It is possible to check the engine manager status by checking the log file EngineManager.log in the path germain/engine/logs.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.