The following steps are a generic Disaster Recovery plan to taken as a template when planing environment migrations from a Productive environment (PROD) to a Recovery/Failover (RECOVERY) environment. As any templates, make sure to review specific and special cases before proceeding with it.
1. Stop All Germain Engines on Production Nodes
Linux example
First, you need to login to each Germain PROD engine servers and run the following commands:
BASH
sudo su
ps -ef | grep manager
kill < PID from step #2 >
After these, wait for all engine processes to stop. It can be verified using the following command:
BASH
ps -ef | grep engine
2. Update Kafka Url To Point to Recovery Kafka
Login to Germain and navigate to following URL: <https://<PROD domain>/germainapm/console/s/#germain.apm.monitoringClient.queueConnectors(Kafka)
Update the URL field from your PROD Kafka instance to your RECOVERY instance.
Kafka connector configuration - Germain UX
3. Stop All Germain Services on Prod Nodes
Linux example
Login to your Germain PROD server, then run the following commands to kill Germain services:
BASH
sudo su
ps -ef | grep germain | grep <service> # Example service names: [ aggregation, action, session, analytics, storage ]
kill < PID from step #2 >
For each service, wait and confirm the service is down.
Having the services down, then proceed by shutting down Tomcat:
BASH
cd /germain/apache-tomcat-8/bin # Navigate to Tomcat's bin folder
./shutdown.sh
You can confirm if Tomcat is down by running the following command:
BASH
ps -ef | grep tomcat
4. Validate All the Infrastructure Services Are Running As Expected on Recovery
Kafka
Zookeeper
Hazelcast
ElasticSearch
5. Start All Germain Services on Recovery Nodes
Linux example
The service startup order is the following:
Kafka
Tomcat
Zookeeper
BASH
cd /germain/solr/apache-zookeeper-3.7.1
nohup bin/zkServer.sh start # confirm if the sevice is up "ps -ef | grep zookeeper"
ElasticSearch
CODE
cd /germain/elasticsearch-7.17.7
nohup bin/elasticsearch & # confirm if the sevice is up "ps -ef | grep elasticsearch"
Hazelcast
BASH
cd /germain/hazelcast-5.3.1
nohup bin/hz start & # confirm the sevice is up "ps -ef | grep hazelcast"
Storage
Session Tracking
BASH
cd /germain/services
nohup bin/sessiontracking-services &
cd /germain/services/var/logs # verify the status in <service name>.log
Analytics
BASH
cd /germain/services
nohup bin/analytics-services &
cd /germain/services/var/logs # verify the status in <service name>.log
Aggregate
BASH
cd /germain/services
nohup bin/action-services &
cd /germain/services/var/logs # verify the status in <service name>.log
Action
BASH
cd /germain/services
nohup bin/aggregate-services &
cd /germain/services/var/logs # verify the status in <service name>.log
6. Start All Germain Engines on Recovery Nodes
IMPORTANT steps to take BEFORE starting the engines
Remove the session.txt file from each engine node before starting.
Modify the hostname from the Germain state screen of all nodes before starting. (From PROD to RECOVERY; eg. PROD_***** to RECV_*****)
Linux example
Login to your Germain RECOVERY server, then run the following commands to startup the Germain engines:
CODE
sudo su
ps -ef | grep engine # Confirm no old engines are running
cd /ebay/germain/engine
nohup bin/startEngineManager.sh &
It is possible to check the engine manager status by checking the log file EngineManager.log in the path germain/engine/logs.
JavaScript errors detected
Please note, these errors can depend on your browser setup.
If this problem persists, please contact our support.